Project Description

StockTwits Sentiment Classifier

An Application of Random Forest!

Project Description

Introduction

Objective: Project for my intern at Research Center VERA, Ca' Foscari University of Venice.
Abstract: 2,045,322 cryptocurrency-related Tweets (~287MB) are retrieved using StockTwits API. The messages are posted from 28/11/2014 to 25/07/2020. Nearly half of those messages are labelled with sentiment (i.e. Bullish/Bearish). Based on the labeled dataset, a Random Forest model is then trained to classify the sentiments of Tweets about cryptocurrencies, resulting in a 74.75% prediction accuracy on test set.
Status: Completed.

Methods Used

Text-processing, inspired by Renault (2017) and Chen et al. (2019).
TF-IDF (for text-vectorization).
Truncated SVD (for dimension reduction).
Random Forest.

Dependencies

Python 3
numpy==1.18.5
pandas==1.0.5
scikit-learn==0.23.2
requests==2.24.0

Getting Started

How to Run

Clone this repo: git clone https://github.com/dang-trung/stocktwits-sentiment-classifier
Create your environment (virtualenv):
virtualenv -p python3 venv
source venv/bin/activate (bash) or venv\Scripts\activate (windows)
(venv) cd stocktwits-sentiment-classifier
(venv) pip install -e

Or (conda):
conda env create -f environment.yml
conda activate stocktwits-sentiment-classifier
Run in terminal:
python -m sentiment_classifier
Note that due to API limits, it will take several days to fully download all 2m+ cryptocurrencies-related Tweets on StockTwits from 2014 to 2020.

Data Storage

Downloaded messages will be stored in data/01_raw.
Messages after being processed (so that only information relevant to sentiment) will be stored in data/02_processed.
Vectorized text messages are stored in data/03_vectorized (since this file is small compared to the files generated by step 1 and 2, I already included this in the repo.)
External files (symbols of cryptos & rules for text-processing) are stored in data/04_external

Results

Model parameters: ntree=500, max_depth=20, max_samples=0.75
Confusion matrix of training set

		Actual Classes
		Bearish	Bullish
Predicted Class	Bearish	82,208	8,426
	Bullish	5,269	85,365

Confusion matrix of test set (~74.75% accuracy)

		Actual Classes
		Bearish	Bullish
Predicted Class	Bearish	59,888	30,747
	Bullish	175,937	551,880

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
reports		reports
sentiment_classifier		sentiment_classifier
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StockTwits Sentiment Classifier

Project Description

Introduction

Methods Used

Dependencies

Table of Contents

Getting Started

How to Run

Data Storage

Results

Read More

About

Uh oh!

Packages

Languages

License

dang-trung/stocktwits-sentiment-classifier

Folders and files

Latest commit

History

Repository files navigation

StockTwits Sentiment Classifier

Project Description

Introduction

Methods Used

Dependencies

Table of Contents

Getting Started

How to Run

Data Storage

Results

Read More

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages