The Crypto News Sentiment Analyzer is a tool designed to provide sentiment analysis of cryptocurrency-related news headlines. By scraping headlines from various news sources via RSS feeds, analyzing them using BERT (Bidirectional Encoder Representations from Transformers), and storing the results in a PostgreSQL database and/or a CSV file, this tool helps users understand the overall mood of the cryptocurrency market as reflected in the news.
- RSS Feed Parsing: The tool pulls headlines and article content from multiple cryptocurrency news sources using RSS feeds.
- Sentiment Analysis: Each headline is analyzed using the BERT model, which assigns a sentiment label (ranging from 1 to 5 stars) and a confidence score based on the content of the entire article.
- CSV and Database Output: The analyzed data is saved in a CSV file and stored in a PostgreSQL database, including the publication date, headline, link, source, sentiment label, and sentiment score.
- Data Aggregation: The script calculates the average sentiment score across all headlines to provide an overall sentiment snapshot.
- Duplicate Handling: The tool is designed to avoid counting the same story from the same source multiple times.
git clone https://github.com/boilerrat/crypto-news-sentiment-analyzer.git
cd crypto-news-sentiment-analyzer
It is recommended to use a virtual environment for managing dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Create a .env
file in the project root directory based on the provided .env.sample
:
cp .env.sample .env
Edit the .env
file to include your database connection details and any other environment variables:
DB_HOST=your_database_host
DB_PORT=your_database_port
DB_NAME=your_database_name
DB_USER=your_database_username
DB_PASSWORD=your_database_password
If you haven't already set up your PostgreSQL database, follow these steps:
sudo -i -u postgres
psql
CREATE DATABASE blockscent_db;
CREATE USER database_usr WITH PASSWORD 'your_password_here';
GRANT ALL PRIVILEGES ON DATABASE blockscent_db TO database_usr;
Exit the PostgreSQL prompt:
\q
Make sure that your PostgreSQL service is running and that you can connect to the database using the credentials provided in your .env
file.
Once everything is set up, you can run the script:
python BlockScent.py
This script will:
- Parse the RSS feeds specified in the
sources.json
file. - Analyze the sentiment of each headline.
- Save the results to both a CSV file and the PostgreSQL database.
To continuously update the sentiment analysis data, you can schedule the script to run periodically using cron
on Linux or Task Scheduler
on Windows.
Simply run the script using the command above. The script will parse the RSS feeds, analyze the sentiment of each headline, and save the results in both a CSV file (crypto_news_sentiment2.csv
) and a PostgreSQL database.
You can access the data directly from the PostgreSQL database using any SQL client, such as DBeaver or pgAdmin. Alternatively, you can review the data in the CSV file generated by the script.
Here’s a sample .env.sample
file:
# Database connection details
DB_HOST=localhost
DB_PORT=5432
DB_NAME=blockscent_db
DB_USER=boilerrat
DB_PASSWORD=your_password_here
This file should be included in your repository, but make sure to exclude the actual .env
file by listing it in your .gitignore
.
BERT (Bidirectional Encoder Representations from Transformers) is a powerful model developed by Google for NLP tasks. In this project, we use a pre-trained BERT model fine-tuned for sentiment analysis, which outputs a sentiment score and label for each article.
-
Sentiment Label (Stars): The BERT model outputs a sentiment label ranging from 1 to 5 stars:
- 1 Star: Very Negative
- 2 Stars: Negative
- 3 Stars: Neutral
- 4 Stars: Positive
- 5 Stars: Very Positive
-
Sentiment Score: The sentiment score is a confidence score between 0 and 1, representing how strongly the model feels about its assigned sentiment label.
- A score closer to 1 indicates high confidence in the sentiment label.
- A score closer to 0.5 indicates less confidence, meaning the sentiment could be more ambiguous.
To provide a more nuanced understanding of the sentiment:
- Positive/Negative Label: This label is determined by whether the sentiment score is greater than or less than 0.5.
- Positive: A sentiment score greater than 0.5.
- Negative: A sentiment score less than or equal to 0.5.
- Date: The publication date of the article, extracted from the RSS feed. If the date is unavailable, it is listed as "Unknown."
- Headline: The title of the article as provided by the RSS feed.
- Sentiment: A label indicating whether the overall sentiment is Positive or Negative, based on the sentiment score.
- Stars: The sentiment label assigned by the BERT model, represented as a rating from 1 to 5 stars.
- Score: The sentiment confidence score, ranging from 0 to 1, with higher values indicating stronger confidence in the sentiment label.
- Link: The URL to the original article.
-
Expand Data Sources:
- Add more RSS feeds from additional cryptocurrency news websites to improve the breadth of sentiment analysis.
- Remove Junk
-
Enhanced Sentiment Scoring:
- Implement more nuanced sentiment analysis by considering contextual word meanings and additional NLP techniques.
-
Duplicate Handling:
- Improve functionality to ensure that the same story from the same source is not counted multiple times.
-
Scheduled Runs:
- Set up the script to run on a schedule (e.g., daily) to continuously update the sentiment analysis data.
-
Sentiment Trends & Visualizations:
- Create visualizations of sentiment trends over time to better understand the market's mood.
-
Web Interface:
- Develop a simple web interface where users can view sentiment trends over time, filter by date or source, and export data as needed.
-
Real-Time Sentiment Analysis:
- Implement real-time sentiment analysis to provide up-to-the-minute sentiment insights.
-
VPS Deployment:
- Deploy the entire system on a Virtual Private Server (VPS) to ensure it's always running and accessible.