Aspect Based Sentiment Analysis

This library is built using Python and the NLTK library to detect aspects and sentiment of reviews for a certain product

Getting Started

These instructions will get a copy of the project up and running on your local machine

NLTK Prerequisites

Before installing this project you'll want to make sure you have NLTK downloaded

> pip install nltk

Next you'll need to install the required NLTK corpra by first opening the Python terminal

> python

Once the Python terminal is open use the following commands to open the NLTK downloader

>> import nltk
>> nltk.download()

Once the downloader window pops up, install the following corpra -

averaged_perceptron_tagger - Averaged Perceptron Tagger
brown - Brown Corpus
punkt - Punkt Tokenizer Models
stopwords - Stopwords Corpus
treebank - Penn Treebank Sample
universal_tagset - Mappings to the Universal Part-of-Speech Tagset
wordnet - WordNet
words - Word List

Additional Prerequisites

You will need the SciPy and scikit-learn libraries

> pip install scipy
> pip install sklearn

CoreNLP

Next you will have to install CoreNLP for using the CoreNLP Dependency Parser

Navigate to CoreNLP's download page or use the direct link to download the required models and unzip them to a convenient location.

In terminal before hosting the models, change directory to the CoreNLP folder

> cd /<path-to-corenlp>/

To host the models use the following command

> java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000 &

The models will need to be hosted or the project will not be able to run

Caching Data

If you want to cache data to make subsequent runs on the same data faster then create a data folder in the root directory of this project

> cd /<path-to-project>/
> mkdir data

The files that will be generated into the data folder are the following

model.dat - Stores the LinearSVC model used in sentiment analysis
potentialAspects.dat - Stores the initial potential aspects from the AspectDetector
vectorizer.dat - Stores the CountVectorizer used in sentiment analysis

These files will be quite large ( > 10mb )

Built With

NLTK - Library for Natural Language Processing
Standford CoreNLP - NLP tools developed by Standford
Scikit-Learn - Machine Learning framework for Python
SciPy - Library used for scientific and technical computing

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
_samplereview3		_samplereview3
reviews		reviews
training_data		training_data
.gitignore		.gitignore
AspectDetector.py		AspectDetector.py
CorpusReader_TFIDF.py		CorpusReader_TFIDF.py
MyCorpusReader.py		MyCorpusReader.py
MyViterbi.py		MyViterbi.py
MyWordNetSimilarity.py		MyWordNetSimilarity.py
README.md		README.md
SentimentAnalyzer.py		SentimentAnalyzer.py
__init__.py		__init__.py
agg		agg
out.csv		out.csv
presentation.pptx		presentation.pptx
report.docx		report.docx
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aspect Based Sentiment Analysis

Getting Started

NLTK Prerequisites

Additional Prerequisites

CoreNLP

Caching Data

Built With

Authors

About

Releases

Packages

Languages

walker76-school/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Aspect Based Sentiment Analysis

Getting Started

NLTK Prerequisites

Additional Prerequisites

CoreNLP

Caching Data

Built With

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages