This project performs sentiment analysis on financial news headlines for major tech stocks.
News is scraped from FinViz, parsed using BeautifulSoup, processed with Pandas, analyzed using NLTK VADER, and visualized using Matplotlib.
The project was developed on Kaggle Notebook.
Addendent: β‘οΈ Indian Stock News Sentiment
- Python 3
- Pandas
- NumPy
- BeautifulSoup (bs4)
- urllib.request
- NLTK VADER Sentiment Analyzer
- NRC Emotion Lexicon
- Matplotlib
- Plotly Express
- FinViz.com (News Source)
- Fetches live financial news headlines from FinViz.
- Uses custom
user-agentto avoid access restrictions.
- Parses each row of the news table for:
- Ticker
- Date
- Time
- News Headline
- Computes VADER compound sentiment score for each headline.
- Classifies headlines into positive, negative, or neutral sentiment.
- Groups sentiment by ticker and date.
- Plots bar charts comparing company sentiment trends.
I created FinViz URLs for each ticker and fetched the HTML using urllib.request with a custom user-agent.
Then, using BeautifulSoup, I located the news-table containing all news rows.
For each news row, I extracted:
- Headline text
- Date and time
- Corresponding ticker
All entries were stored in a Pandas DataFrame.
Using VADER's polarity_scores(), I computed the compound score for each headline and appended it to the DataFrame.
To analyze trends, I grouped sentiment by ticker and date and plotted the results using Matplotlib.
These graphs provide the core analytical insights of the Stock News Sentiment Project.
- Normalized Sentiment Bar Chart β Shows each stockβs average sentiment score normalized using Z-score, with error bars reflecting sentiment volatility.
- Sentiment Heatmap Over Time β Displays how sentiment changes day-by-day for each stock, helping identify trends, spikes, and market reactions.
- Treemap of Stock Sentiment Strength β A size-based map where larger blocks represent stronger sentiment magnitude (positive or negative), providing a fast visual ranking of which stocks dominate sentiment.
These three graphs form the primary backbone of sentiment understanding: overall sentiment β volatility β time trends β relative magnitude.
These graphs provide deeper NLP-based emotional and linguistic breakdowns:
π¦ Word Clouds
- Positive Sentiment Word Cloud β Highlights the most frequent optimistic words investors/media use.
- Negative Sentiment Word Cloud β Shows common negative or fear-driven words.
Using the NRC Emotion Lexicon (developed by the National Research Council of Canada) to classify the emotional tone of stock-related news headlines.
The lexicon contains 14,000+ English words, each labeled with one or more of the following 10 emotions: anger, anticipation, disgust, fear, joy, negative, positive, sadness, surprise, trust By mapping words in news headlines to these emotions, the project generates:
π Emotion Analysis Visuals
- Emotion Radar Chart (per stock) β Shows emotional distribution (anger, anticipation, trust, etc.) for a single ticker.
- Emotion Comparison Line Graph β Plots each emotion across all tickers to identify which stock is highest in which emotion category.
- Emotion Distribution Bar Chart β Side-by-side comparison of emotion counts for each ticker.
These graphs deepen understanding of how the market is talking about the stock, not just whether sentiment is positive or negative.
How to navigate HTML structures and extract specific tables/rows. Understanding how FinViz organizes its data.
Handling missing values, converting text dates, and structuring scraped data. Using groupby() and .unstack() for pivot-style analysis.
Understanding VADERβs scoring system (neg, neu, pos, compound). How sentiment scores correlate with financial news headlines.
Creating grouped bar charts to compare sentiment over time. Understanding how aggregated sentiment differs per ticker.
Importance of user-agent headers Handling potential blocked requests
-
Add real-time updates using scheduled scrapers (cron / Airflow).
-
Expand from FinViz to APIs like NewsAPI, Reddit, Twitter, etc.
-
Build a dashboard using Streamlit or Dash.
-
Add machine learning models to predict future price movement from sentiment.
-
Integrate word clouds or topic modeling (LDA).
-
Apply custom lexicons for finance-specific sentiment.
-
Store data in SQL/NoSQL for historical tracking.
Here are the features I plan to add in future updates:
-
Normalized Sentiment Score
Convert raw VADER compound values into a scaled or standardized metric for easier comparison. -
Sentiment & Emotion Graphs
Add multi-line graphs showing positive/negative/neutral sentiment trends and emotion breakdowns. -
Dataset-Based Analysis
Instead of only live scraping, add sentiment analysis using uploaded or external datasets for larger sample sizes. -
Indian Stock Market Support
Extend scraping and sentiment processing to NSE/BSE tickers. However, it is a project in and of itself, and thus deserves a repository of it's own β‘οΈ Indian Stock News Sentiment
Figure 1: Average sentiment score per stock per day.
Figure 2: Treemap of sentiments per ticker.
Figure 3: Normalized average sentiment score per stock.
Figure 4: Distribution of emotions per stock.
Figure 5: Overlapping line graph of emotion strength of stocks.
Figure 6: Sentiment Heatmap per day per stock.
Figure 7: Average sentiment score per stock per day.
Figure 8: Average sentiment score per stock per day.
Figure 9: Average sentiment score per stock per day.
Performed sentiment analysis not from web scraping but from a dataset (the author of the dataset is Pratyush Puri on Kaggle https://www.kaggle.com/datasets/pratyushpuri/financial-news-market-events-dataset-2025)
Figure 1: Average normalized sentiment score per stock per day.
