- Web Application to classify the tweets on US Airlines into Negative, Positive or Neutral classes.
- Used NLP techniques (tokenization, n-grams, stemming, stopwords removal) and other Machine Learning (Bayesian networks, Neural networks, SVM, Lexical bag of words) algorithms in R and Python
- Comparison of the prediction accuracy of different supervised classifiers used.
- Python
- R
-
Python
Install the following modules- tweepy - a Python wrapper for the Twitter API. Documentation, Streaming_how_to
- textblob - library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
-
R
Packages:- tm package: A framework for text mining applications within R. It does a good job for text cleaning (stemming, delete the stopwords, etc) and transforming texts to document-term matrix (dtm).
- wordcloud