Leveraging LLMs for Informed Bitcoin Trading Decisions: Prompting with Social and News Data Reveals Promising Predictive Abilities
The work was carried out by:
This thesis investigates the potential of leveraging Large Language Models (LLMs) to support Bitcoin traders. Specifically, we analyze the correlation between Bitcoin price movements and sentiment expressed in news headlines, posts, and comments on social media. We build a novel, large-scale dataset that aggregates various features related to Bitcoin and its price over time, spanning from 2016 to 2024, and includes data from news outlets, social media posts, and comments. Using this dataset, we tried to evaluate the effectiveness of LLMs and Deep Learning models by making predictions on real data through standard classification tasks, as well as backtesting and demo trading accounts with different investment strategies. We build interactive interfaces to annotate real-time data via LLMs, perform custom backtesting, and visualize demo trading account performances. Our approach leverages the extended context capabilities of recent LLMs through simple prompting to generate outputs such as textual reasoning, sentiment, recommended trading actions, and confidence scores. Our findings reveal that LLMs represent a powerful tool for assisting trading decisions, opening up promising avenues for future research.
+- root
+- backtest
+- data_annotation
+- data_exploratory_analysis
+- data_mining
+- data_predictions
+- demo
+- models
+- shared
+- utils
+- config.py
+- requirements.py
Where:
backtest
: Contains the scripts needed to backtest the strategies using the decisions made by the LLMs.data_annotation
: Contains the procedures to annotate the data using the LLMs.data_exploratory_analysis
: Contains the scripts that allow visualizing the data collected and the decisions made by the LLMs.data_mining
: Contains the procedures to retrieve all the data needed for research and to generate a single dataset.data_predictions
: Contains the scripts that allow deep learning models to be used to make predictions based on the data collected.demo
: Contains the files needed to view real-time data annotation and the performance of demo trading accounts.hf_data
: Contains all datasets collected during the retrieval, annotation, and visualization process.models
: Contains the definition of the deep learning models used during the prediction process.secrets
: Contains the secrets used within the project such as api keys and account credentials.shared
: Contains variables and constants shared by most of the files in the project.utils
: Contains methods that are shared by most of the files in the project.config.py
: Contains the configuration variables shared by most of the files in the project.requirements.py
: Contains requirements to be installed before running the project.
We use Python 3.12.4 which is the last version supported by PyTorch.
python3 -m venv .venv
.venv\scripts\activate
pip install -r requirements.py
You can download the needed data from this Hugging Face Repository.
Put the downloaded folders into hf_data
directory.
The annotated
folder contains the original dataset with the annotation of the respective LLMs.
The merged
folder contains the raw dataset without annotation of LLMs (price data, blockchain, and sentiment indices)
Download and install Ollama
Setup the following LLMs:
Create Gemini API Key
Create gemini.json
file in the secrets
directory and add
{
"GOOGLE_API_KEY_1": "<api_key>",
}
Create Reddit API Key
Create reddit.json
file in the secrets
directory and add
{
"client_id": "<client_id>",
"client_secret": "<client_secret>",
"user_agent": "<user_agent>"
}
Download and install HTTP Toolkit
Set on your pc as a custom proxy:
ip: 127.0.0.1
port: 8080
Execute
python -m demo.demo
Execute
python -m backtest.backtest