Fake News Real News Analysis

I performed Fake news analysis, Real news analysis and built classification model using MultinomialNB, Logistic Regression, Random Forest and LSTM with the TF-IDF vectorizer for fake and real news. I have used two datasets for this work, Fake News and Real News. This data are from October to December 2016. Here, I analyzed both news types and built the classification model by combining both.

Built a web app by deploying the LSTM model to AWS using dash and flask: http://dash-env-dev.us-east-1.elasticbeanstalk.com/

Fake News Analysis

There are different fake news types(bias, conspiracy, hate, satire, etc.) in which bias, conspiracy, and hate are the most frequent
Most of the news articles belong to US country
In content, most frequently used words in the single word list are Trump, Clinton, People, and Election
In the title, most frequently used words in single word list are Trump, Hillary, Clinton, Election, Russia, and Obama
Check news types(bias, conspiracy, hate, satire, etc.) for words, Trump, Clinton, Obama, Election, Putin, and Russia
- Found that Trump and Clinton words appear most of the time in bias, conspiracy, and hate news types, however, Trump word appears a lot in bias news type around 138 articles and Clinton word appears a lot in conspiracy news type, around 66 articles
- Obama, Election, Putin, Russia words appear in the bias, conspiracy, and hate news types, but in a few articles
Most frequent bigram words: hillary clinton, donald trump, onion america, america finest, clinton campaign, etc
Most frequent trigram words: onion america finest, dakota access pipeline, syrian report november, etc
Top publications for fake news are presstv, greanvillepost, scott, and many others

Real News Analysis

In content, most frequently used words are trump, people, clinton, and president
In the title, most frequently seen words are trump, clinton, election, obama, and russia
Most frequent bigram words: donald trump, hillary clinton, hurricane matthew, white house, and north carolina
Most frequent trigram words: atlanic politics policy, politics policy daily, and dakota access pipeline
Top publications: Reuters, NPR, Washington Post, Guardian, CNN, and New York Times

Feature Engineering

Added a few columns such as title and content length, number of capital letters of title and content, number of punctuation in title and content

Text Pre-processing

Removed special characters and numbers from text
Lowering the text
Removed stopwords
Tokenization: convert sentences to words
Lemmatization: An approach to remove inflection by determining the part of speech and utilizing detailed database of the language

Classification

Compared models for classification: MultinominialNB, Logistic Regression, Random Forests and LSTM. LSTM performs well, it has 99% accuracy rate for training set and 93% accuracy rate for testing set.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Images		Images
dash_flask_web_app		dash_flask_web_app
README.md		README.md
data_cleaning.ipynb		data_cleaning.ipynb
fake_news_analysis.ipynb		fake_news_analysis.ipynb
fake_news_classification_using_LSTM.ipynb		fake_news_classification_using_LSTM.ipynb
fake_real_news_classification.ipynb		fake_real_news_classification.ipynb
kmeans_clustering_with_standardize.ipynb		kmeans_clustering_with_standardize.ipynb
real_news_analysis.ipynb		real_news_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Real News Analysis

Fake News Analysis

Real News Analysis

Feature Engineering

Text Pre-processing

Classification

About

Releases

Packages

Languages

SonalSavaliya/Fake-News-Real-News-classification

Folders and files

Latest commit

History

Repository files navigation

Fake News Real News Analysis

Fake News Analysis

Real News Analysis

Feature Engineering

Text Pre-processing

Classification

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages