Arabic Sentiment Analysis

LLMs know everything, but don't understand anything.

- Omar Yasser

This project focuses on the sentiment analysis of company reviews in various dialects of Arabic.

Preprocessing

Data Cleansing: Removal of nulls and duplicates to ensure a clean dataset.
Text Normalization: Stripping away punctuation, digits, and special characters to focus on the linguistic essence.
Diacritic Handling: Removing diacritics and normalizing Arabic characters to address the variability in text input.
Language Homogenization: Translating the few non-Arabic words into Arabic to maintain linguistic consistency.
Emoji Mapping: Emojis, often conveying strong sentiments, were mapped to their textual meanings.

Models

Four models were implemented:

Finetuned AraBERT: Leveraging the power of AraBERT, finetuned to our specific dataset.
Transformer from Scratch: Building a Transformer model from the ground up, to better understand its architecture.
LSTM
Bidirectional LSTM: LSTM, but it captures both forward and backward directions.

Results

Our team won in a Kaggle university-wide Arabic Sentiment Analysis competition (out of more than 100 teams). Our model achieved an impressive 87.5% accuracy, outperforming the second-best team by a significant margin of 2%.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
datasets		datasets
LICENSE		LICENSE
README.md		README.md
arabert_finetuning.ipynb		arabert_finetuning.ipynb
bidirectional_lstm.ipynb		bidirectional_lstm.ipynb
emojis.csv		emojis.csv
kaggle_leaderboard.png		kaggle_leaderboard.png
lstm.ipynb		lstm.ipynb
preprocessing.ipynb		preprocessing.ipynb
report.pdf		report.pdf
transformer.ipynb		transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Sentiment Analysis

Preprocessing

Models

Results

Team Members

About

Releases

Packages

Contributors 4

Languages

License

Omar-Yasser/arabic-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Arabic Sentiment Analysis

Preprocessing

Models

Results

Team Members

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages