This project aims to analyze political bias in mainstream media by classifying news articles from various outlets as right-leaning, centrist, or left-leaning. The goal is to assess potential biases within these news sources and provide insights into their political orientations. The implementation leverages transformer for sequence classification, particularly fine-tuning BERT with Low-Rank Adaptation (LoRA). Post-training quantization (PTQ) was used to optimize the model’s performance.
- Classifies news articles to show if they lean left, right, or center, and
- Aims to help readers make informed choices by identifying bias in news sources
- Monitors bias trends on specific issues, helping ensure balanced reporting
- Useful for detecting potential misinformation or slanted perspectives
- Useful for researchers studying media bias, journalism, or political influence in the news
Released on 15 July 2020, the POLUSA dataset contains 0.9 million political news articles, carefully balanced across different periods and news outlet popularity. It provides a valuable resource for analyzing political trends and biases in media. The dataset is available for download on Zenodo.org.
Link: https://zenodo.org/records/3946057/files/polusa_balanced.zip?download=1
Follow these steps to set up the project:
git clone https://github.com/iampujan/political_leaning_news_detection_backend.git
cd your-repository
curl -LsSf https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv sync
Run the notebook sequentially in a Jupyter Notebook environment or a similar setup:
- Step 1: Download data using gdown or any alternative method.
- Step 2: Preprocess text data with tokenization, stopword removal, and bigram extraction.
- Step 3: Fine-tune a transformers model using the provided pipeline.
- Step 4: Evaluate model performance and generate classification reports.
To reproduce the results:
- Ensure the dependencies are installed as described in the Setup section.
- Follow the cell execution sequence in the notebook:
- Data download and exploration.
- Data preprocessing.
- Model training and evaluation.
- Save results and logs using mlflow.
uvicorn app.main:app --host 0.0.0.0 --port 8080
The server should start http://localhost:8080
npm start
The server should start on http://localhost:3000
For any questions or clarifications, please contact Raza Mehar at [raza.mehar@gmail.com], Pujan Thapa at [iampujan@outlook.com] or Syed Najam Mehdi at [najam.electrical.ned@gmail.com].