NewsLies

An Arabic Fake News Detection using LSTM and AraBERT

This repository contains a project aimed at detecting fake news in Arabic using advanced Natural Language Processing (NLP) techniques. The project leverages the Arabic Fake News Dataset (AFND) and builds a deep learning model using Long Short-Term Memory (LSTM) networks and AraBERT for text classification.
This model is still under development

Dataset

The dataset used in this project is the Arabic Fake News Dataset (AFND) from Kaggle. This dataset is a collection of over 600,000 public Arabic news articles collected from 134 different Arabic news websites. The articles are classified into three categories: credible, not credible, and undecided.

Dataset Structure

sources.json: Contains 134 lines corresponding to 134 public Arabic news websites. The URLs of the websites are anonymized as "source_1", "source_2", etc.
Dataset Directory: Contains 134 sub-directories named after the anonymous sources. Each sub-directory has a scraped_articles.json file, which stores the title, text, and publication date of the articles from that source.

Creators of the Dataset:

Ashwaq Khalil
Moath Jarrah
Monther Aldwairi

Project Structure

The entire project, including data preprocessing, model building, training, and evaluation, is contained within a single Jupyter notebook:

NewsLies.ipynb: This notebook includes:
- Data Preprocessing: Advanced text preprocessing including text normalization, stopword removal, and stemming using Farasa and ISRIStemmer.
- Model Definition: LSTM-based model architecture with AraBERT embeddings and an attention mechanism for improved classification.
- Model Training: Training the model on the Arabic Fake News Dataset, along with evaluation of its performance.
- Inference: Running the trained model on new Arabic news articles to classify them.

Setup and Installation

Clone the repository:

git clone https://github.com/Assem-ElQersh/NewsLies.git
cd NewsLies

Install the required packages:
```
pip install -r requirements.txt
```
Download the dataset from Kaggle and place it in the data/ directory.
Open the Jupyter Notebook:
```
jupyter notebook NewsLies.ipynb
```
Run the cells in the notebook sequentially to preprocess the data, train the model, and perform inference.

Model Details

Preprocessing: The text is normalized, diacritics and special characters are removed, and stemming is performed using Farasa tools.
Embedding Layer: AraBERT is used to generate dynamic embeddings for Arabic text.
LSTM Layers: The model contains multiple LSTM layers to capture the temporal dependencies in the text.
Attention Mechanism: An attention layer is added to focus on the most important parts of the text.
Output Layer: The output is a softmax layer for multi-class classification.

Evaluation

The model is evaluated on the test set using accuracy, precision, recall, and F1-score. A confusion matrix is also provided for a detailed view of the model's performance.

Results

Accuracy: The model achieves high accuracy in detecting fake news articles, with detailed metrics provided in the results section.
Confusion Matrix: Visualizes the performance across all three classes (credible, not credible, undecided).

License

This Notebook is licensed under the MIT License - see the LICENSE file for details.
The dataset used in this project does not specify a license. Please review the usage policies set by the dataset creators on Kaggle.

Acknowledgements

Special thanks to the dataset Owners Ashwaq Khalil, Moath Jarrah, and Monther Aldwairi for making the Arabic Fake News Dataset available for research and development.

Contributions

Contributions are welcome! Feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
NewsLies.ipynb		NewsLies.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsLies

Table of Contents

Dataset

Dataset Structure

Creators of the Dataset:

Project Structure

Setup and Installation

Model Details

Evaluation

Results

License

Acknowledgements

Contributions

About

Releases

Packages

Languages

License

Assem-ElQersh/NewsLies

Folders and files

Latest commit

History

Repository files navigation

NewsLies

Table of Contents

Dataset

Dataset Structure

Creators of the Dataset:

Project Structure

Setup and Installation

Model Details

Evaluation

Results

License

Acknowledgements

Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages