This project focuses on predicting companies' ESG (Environmental, Social, Governance) indices using stock prices and news sentiment, and conversely predicting stock prices based on ESG ratings and news sentiment. The goal is to explore how financial performance and public perception interact through machine learning.
News data is collected using a Google News scraper built with Node.js and Puppeteer. Articles are gathered for multiple companies based on search queries and stored in JSON format.
Environmental, Social, Governance and total ESG scores are extracted from Yahoo Finance using custom scripts.
Historical stock prices for each company are downloaded from Yahoo Finance and consolidated into a unified dataset.
The news pipeline consists of several stages:
- News Reader – Converts raw article JSON files into structured DataFrames
- Title Classification – Classifies news headlines into four categories: government, social, environment and neutral
- Sentiment Analysis – Applies multiple models to label headlines as positive or negative
These outputs are stored and reused in later stages.
Multiple datasets are merged to form a comprehensive final dataset:
- News metadata, topic classification and sentiment scores
- Daily stock prices per company
- Individual E, S, G ratings and total ESG scores
The result is a per-news, per-company dataset containing sentiment features, financial data and ESG indicators, which serves as input for modeling.
Several XGBoost models are developed for different prediction tasks.
Models predict Environmental, Social, Governance and total ESG scores using:
- Stock prices
- Percentages of positive and negative news across government, environment and social categories
Two approaches are explored:
- Predicting stock price using previous-day stock values, ESG ratings and news sentiment
- A second model excluding previous-day stock prices to reduce potential overfitting, relying only on ESG and news-based features
This project demonstrates an end-to-end pipeline combining web scraping, NLP-based news analysis, feature engineering and XGBoost modeling to study the relationship between ESG ratings, news sentiment and stock performance. It provides a framework for analyzing how sustainability indicators and public narratives can influence financial markets.