Natural Language Processing 1 (NLP) used in conjunction with a Support Vector Machine 2 (SVM) to classify the sentiment of tweets to determine a correlation with Bitcoin prices.
Note: Not a full/complete project
Preprocessing is handled by utilizing NLP techniques provided by the Natural Language Toolkit3 (NLTK) to normalize textual data. Textual data is then converted into a Vector Space Model (VSM) with Term Frequency-Inverse Document Frequency4 (TF-IDF). A SVM is then used for binary classification of tweets to determine positive and non-positive sentiment of tweets. Tweets can be acquired using TweetStreamer.
- Revised Stop Word Removal
- Negation Handling
- Emoji Support and Scoring