News Outlet Freedom Detection

There are many news organizations around the world. News organizations play a vital role in relaying important news of home and abroad to its readers. These news often cover incidents that are either positive, negative, or neutral. Moreover, some news stories can be viewed as talking for a country or talking against it.

Freedom of speech is the right to express one's ideas and opinions without censorship, restraint, or fear of retribution.

A news outlet is free if it can report news in an unbiased manner and free from censorship. In this project, we aim to detect if a local news organization is free. To do so, we compare the sentiment and stance of the organization with international news reporting institutions Reuters and Associated Press. A news outlet whose news correlates well with these international organizations are deemed as having freedom of press. Meaning, they are free from censorship.

Table of Contents

Problem Statement
Data Collection
BERTopic
Sentiment and stance analysis with LLaMa-2
Case Studies
Acknolwedgements

Problem Statement

This study investigates freedom of speech in local news across countries, examining topic-specific distinctions by comparing sentiment and stance scores with international sources to reveal correlations and assess agreement levels.

(back to top)

Data Collection

Source	Canada	Russia	China
Local	CBC and Global News	The Moscow Times	China Daily
International	Reuters and Associated Press

Used Selenium to search and accumulate article URLs.
Employed News Please to fetch article data from collected URLs.
Raw data was processed to get it ready for the text mining steps.

(back to top)

BERTopic

I used BERTopic to perform topic modeling. BERTopic has 4 distinct phases:

It uses Sentence Transformer Model to convert sentences into vector representations. Often having dimensions exceeding 256.
To reduce dimensions from 256, BERTopic employs UMAP. This reduces the dimensions while retaining global and local information among the data.
Afterwards, vectors are clustered using HDBSCAN a hiererchical algorithm.
Finally, c-TF-IDF is used to get topic representations for each cluster.
Fed top 10 represention words for each topic into ChatGPT to get a word for custom topic name.

(back to top)

Sentiment and stance analysis with LLaMa-2

I used LLaMa-2, the open-source LLM from meta, for sentiment and stance analysis. To do so, I first had to finetune the base version of a 6 billion parameter LLaMa2 model.

Funetuning LLaMa-2

Engineered prompt to get the best possible answer from an LLM. The prompt was tuned with Prompt Perfect.

 As a neutral news analyst, assess the sentiment and stance of the news article excerpt and assign a score between -1.0 (completely negative/against-{country}) and 1.0 (completely positive/pro-{country}) for both sentiment and stance. Provide a single short sentence to justify your scores, drawing on the article's language, tone, and presentation to support your analysis.

 Article Excerpt:
 - Title: "{title}"
 - Content: "{content}{dot}"

 Output format: 
 1. Sentiment: [Positive/Neutral/Negative]
     * Score: [Your Score]
     * Reason: [Your Reason] 
 2. Stance: [Pro-{country}/Impartial/Against-{country}]
     * Score: [Your Score]
     * Reason: [Your Reason]

Select 300 samples from dataset to finetune LLaMa-2 model. Fit each example in the prompt and feed it to ChatGPT. Save answers from ChatGPT as finetuning dataset.
Utilize huggingface's autotrain package to finetune LLaMa-2.
- Used QLoRA to enable training on single GPU on google colab.
- Used PEFT (Parameter Efficient Finetuning) to reduce training time.

Sentiment and Stance Analysis

Use finetuned model to inference on collected data.
Parse responses to get sentiment and stance classes and scores for each article.
Perform hypothesis testing to arrive at conclusions.

Test Name	Parameter of Interest	Null Hypothesis
Welch Test	Mean	Both sources on average report news with the same score.
Wilcoxon Test	Median
F-test	Variance	News from sources have similar variance across sentiment and/or stance.
Pearson’s Test	Linear Correlation	Sentiment and/or stance of reported news from sources aren’t correlated.
Spearman’s Test	Monotonic Relationship

(back to top)

Case Studies

Detailed case studies about China, Russia, and Canada can be found here.

Acknolwedgements

As a graduate student of University of Rochester, I am greatly indebted to my teachers for arming me with the knowledge required to perform the analytical and technical aspects of this project. In particular,

I would like to express my gratitude to Professor Jiebo Luo for his invaluable guidance throughout the Data Mining course. The knowledge and insights I gained from this course have been instrumental in processing the accumulated news corpus and performing topic modelling using BERTopic. I am thrilled to see how the techniques I learned from the course can be applied in real-world scenarios.
I would like to extend my sincere appreciation to Professor Anson Kahng for his invaluable guidance throughout the Computational Introduction to Statistics course. The coursework provided me with the necessary tools to design and carry out hypothesis tests to find statistically significant distinctions between local and international news. I am grateful for the opportunity to apply the knowledge I gained from the course in real-world scenarios.
I would like to extend my sincere appreciation to Professor Hangfeng He for his invaluable guidance throughout the Natural Language Processing course. The course provided me with a comprehensive understanding of the world of LLMs and armed me with the knowledge required to utilize LLaMa-2 for this project. I am grateful for the opportunity to apply the knowledge I gained from the course in real-world scenarios.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
images		images
report		report
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Outlet Freedom Detection

Problem Statement

Data Collection

BERTopic

Sentiment and stance analysis with LLaMa-2

Funetuning LLaMa-2

Sentiment and Stance Analysis

Case Studies

Acknolwedgements

About

Releases

Packages

Languages

Shakleen/News-Outlet-Freedom-Detection

Folders and files

Latest commit

History

Repository files navigation

News Outlet Freedom Detection

Problem Statement

Data Collection

BERTopic

Sentiment and stance analysis with LLaMa-2

Funetuning LLaMa-2

Sentiment and Stance Analysis

Case Studies

Acknolwedgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages