Project Description

Introduction

Objective: Project for my intern at Research Center VERA, Ca' Foscari University of Venice.
Abstract: 2,045,322 cryptocurrency-related Tweets (~287MB) are retrieved using StockTwits API. The messages are posted from 28/11/2014 to 25/07/2020. Nearly half of those messages are labelled with sentiment (i.e. Bullish/Bearish). Based on the labeled dataset, a Random Forest model is then trained to classify the sentiments of Tweets about cryptocurrencies, resulting in a 74.75% prediction accuracy on test set.
Status: Completed.

Methods Used

Text-processing, inspired by Renault (2017) and Chen et al. (2019).
TF-IDF (for text-vectorization).
Truncated SVD (for dimension reduction).
Random Forest.

Dependencies

Python 3
numpy==1.18.5
pandas==1.0.5
scikit-learn==0.23.2
requests==2.24.0

Getting Started

How to Run

Clone this repo: git clone https://github.com/dang-trung/stocktwits-sentiment-classifier
Create your environment (virtualenv):
virtualenv -p python3 venv
source venv/bin/activate (bash) or venv\Scripts\activate (windows)
(venv) cd stocktwits-sentiment-classifier
(venv) pip install -e

Or (conda):
conda env create -f environment.yml
conda activate stocktwits-sentiment-classifier
Run in terminal:
python -m sentiment_classifier
Note that due to API limits, it will take several days to fully download all 2m+ cryptocurrencies-related Tweets on StockTwits from 2014 to 2020.

Data Storage

Downloaded messages will be stored in data/01_raw.
Messages after being processed (so that only information relevant to sentiment) will be stored in data/02_processed.
Vectorized text messages are stored in data/03_vectorized (since this file is small compared to the files generated by step 1 and 2, I already included this in the repo.)
External files (symbols of cryptos & rules for text-processing) are stored in data/04_external

Results

Model parameters: ntree=500, max_depth=20, max_samples=0.75
Confusion matrix of training set

		Actual Classes
		Bearish	Bullish
Predicted Class	Bearish	82,208	8,426
	Bullish	5,269	85,365

Confusion matrix of test set (~74.75% accuracy)

		Actual Classes
		Bearish	Bullish
Predicted Class	Bearish	59,888	30,747
	Bullish	175,937	551,880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

StockTwits Sentiment Classifier

Project Description

Introduction

Methods Used

Dependencies

Table of Contents

Getting Started

How to Run

Data Storage

Results

Read More

Files

README.md

Latest commit

History

README.md

File metadata and controls

StockTwits Sentiment Classifier

Project Description

Introduction

Methods Used

Dependencies

Table of Contents

Getting Started

How to Run

Data Storage

Results

Read More