TruthGuard

TruthGuard is a Python NLP project that classifies COVID-19 news articles, separating evidence-based reporting from conspiracy theories. It’s built to help combat misinformation and promote reliable information during the pandemic.

Demo

Generating a Prediction using Article URL

Generating a Prediction from Article Text

source for text: Reuters

Tools Used:

Word2Vec model: for generating meaningful word embeddings
Sci-kit Learn Library: to train various machine learning models.
Spacy package: utilized for advanced text processing.
Pandas & Matplotlib: for data manipulation and visualization
Chart.js: for visualizing prediction data
Regular Expressions: for cleaning and preparing the textual data.
Beautiful Soup: for intelligent parsing of web scrapage.
Newspaper3k package: to extract complete news articles.

Methodology

My journey started with identifying websites labeled as pro-science or conspiracy-themed using MediaBiasFactCheck. To gather data, I built a custom scraper with Beautiful Soup that pulled metadata from the latest COVID-19 articles on these sites.

Using Newspaper3k, I retrieved the full text of relevant articles. The data was then cleaned and refined with SpaCy and regular expressions—removing dates, links, stop words, and applying lemmatization to create a more analyzable dataset.

To capture the semantic meaning of each article, I applied the pre-trained Word2Vec Google News (300d) model, generating embeddings that reflected the nuanced language of news content.

Finally, I split the dataset into training and test sets and trained multiple machine learning models from Scikit-learn—including Logistic Regression, Support Vector Machine, Linear Discriminant Analysis, Naive Bayes, and Decision Tree Classifier. Each model was evaluated to identify the most effective approach for accurately classifying articles.TruthGuard stands as a testament to the power of combining advanced NLP techniques and machine learning to illuminate the truth in a world overwhelmed with misinformation.

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
assets		assets
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_collection.ipynb		data_collection.ipynb
feature_engineering.ipynb		feature_engineering.ipynb
model_training.ipynb		model_training.ipynb
poetry.lock		poetry.lock
preprocessing.ipynb		preprocessing.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TruthGuard

Demo

Generating a Prediction using Article URL

Generating a Prediction from Article Text

Tools Used:

Methodology

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

isobarbaric/TruthGuard

Folders and files

Latest commit

History

Repository files navigation

TruthGuard

Demo

Generating a Prediction using Article URL

Generating a Prediction from Article Text

Tools Used:

Methodology

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages