This library is built using Python and the NLTK library to detect aspects and sentiment of reviews for a certain product
These instructions will get a copy of the project up and running on your local machine
Before installing this project you'll want to make sure you have NLTK downloaded
> pip install nltk
Next you'll need to install the required NLTK corpra by first opening the Python terminal
> python
Once the Python terminal is open use the following commands to open the NLTK downloader
>> import nltk
>> nltk.download()
Once the downloader window pops up, install the following corpra -
- averaged_perceptron_tagger - Averaged Perceptron Tagger
- brown - Brown Corpus
- punkt - Punkt Tokenizer Models
- stopwords - Stopwords Corpus
- treebank - Penn Treebank Sample
- universal_tagset - Mappings to the Universal Part-of-Speech Tagset
- wordnet - WordNet
- words - Word List
You will need the SciPy and scikit-learn libraries
> pip install scipy
> pip install sklearn
Next you will have to install CoreNLP for using the CoreNLP Dependency Parser
Navigate to CoreNLP's download page or use the direct link to download the required models and unzip them to a convenient location.
In terminal before hosting the models, change directory to the CoreNLP folder
> cd /<path-to-corenlp>/
To host the models use the following command
> java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000 &
The models will need to be hosted or the project will not be able to run
If you want to cache data to make subsequent runs on the same data faster then create a data folder in the root directory of this project
> cd /<path-to-project>/
> mkdir data
The files that will be generated into the data folder are the following
- model.dat - Stores the LinearSVC model used in sentiment analysis
- potentialAspects.dat - Stores the initial potential aspects from the AspectDetector
- vectorizer.dat - Stores the CountVectorizer used in sentiment analysis
These files will be quite large ( > 10mb )
- NLTK - Library for Natural Language Processing
- Standford CoreNLP - NLP tools developed by Standford
- Scikit-Learn - Machine Learning framework for Python
- SciPy - Library used for scientific and technical computing