Skip to content

The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis , Forecasting and Topic Modeling using Machine Learning , Deep Learning and Transformers!

Notifications You must be signed in to change notification settings

omar-sherif9992/Dialect-LLM-Bachelor-Project

Repository files navigation

Hi , I am Eng. Omar Sherif Ali

Computer Science & Engineering

"Language is the bridge that connects minds, and NLP is the compass guiding us to understand and unlock its infinite potential."

Welcome to my Bachelor Thesis

Logo
Logo
Logo Logo Logo Logo Logo
Logo
Logo Logo

Egyptian Tweet Sentiment Analysis , Forecasting and Topic Modeling

The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis, Forecasting, and Topic Modeling using Machine Learning, Deep Learning, and Transformers!

📄View the Thesis »
· Presentation · Demo Video · Report Bug · Be a Contributer

model architechture

💡 Description

In recent years, social media platforms have become increasingly popular for individuals to express their thoughts and opinions on various topics and situations. Monitoring sentiment and understanding the evolution of topics is crucial for governments to identify negative sentiments and respond promptly. In this study, we developed a sentiment analysis ensemble model architecture consisting of 4 different Transformers namely: MARBERT,CaMel-bert-DA, CaMel-bert-Mix, and Alanzi. This ensemble architecture was followed by a final classification layer, consisting of a three-layer feed-forward neural network. The output of each transformer applied on its logits a formula, and the resulting logits were then summed before applying the softmax activation function. Evaluating the model’s performance on the sentiment analysis test dataset, an impressive test accuracy of 83%. To analyze the temporal trends of sentiment, we applied the LSTM model using a sliding window to time-stamped tweets related to English News, which generated significant discussions on social media. That is then translated to Arabic by a translation model. we plotted the sentiment arc and observed our results with the original results. We explored the effectiveness of BERTopic, a topic modeling technique, in comparison to LDA and NMF techniques. By employing various pre-trained Arabic language models as embeddings, we conducted topic modeling and aspect-based analysis. The results highlight that BERTopic and NMF achieved comparable and competitive outcomes, demonstrating their capability to capture meaningful topics effectively. However, LDA exhibited poor performance in generating coherent and informative topics. These findings emphasize the superiority of BERTopic and NMF over LDA in topic modeling tasks.

model architechture

Pipeline

  • Tweet Collecting & Merging & Pre-processing & Analysis & Correctness Investigation Open In Colab

  • Machine Learning Sentiment Analysis Open In Colab

  • Deep Learning Sentiment Analysis Open In Colab

  • Pure Transformers Sentiment Analysis Open In Colab

  • Customized Transformers & Ensemble Model Sentiment Analysis & Website Open In Colab

  • Sentiment Forecasting Open In Colab

  • Aspect-Based & Topic Modeling Open In Colab

  • Zero Shot Classification Open In Colab

Sentiment Analysis Website

model architechture

💻️ Languages & Libraries Used

(back to top)

⚠️ Disclaimer

Users who will Use this Data should only use it for Practice and not for Commercial Purposes !

(back to top)

Author: Omar Sherif Ali - OSA

(back to top)

Connect with me


Made with ❤️ by Omar Sherif Ali - OSA.

© OSA - 2022

(back to top)

Daily Progress

First Week

Date Day Progress Resources
2023-03-01 Saturday reading in OReily Sckitlearn book ,revisied Python and searching for extra resources book
2023-03-01 Sunday reading in OReily Sckitlearn book and searching for extra resources book
2023-03-01 Monday Learned NLP,Tokenization,stemmation,lemmetization,count vectorizer Udemy course on Python
2023-03-01 Tuesday Discovered Pandas and done 5 notesbooks for practice on kaggle and finished Data cleaning course in Kaggle Kaggle
2023-03-01 Wednesday learned TF-IDF+ notebooks excercise and finished Machine Learning Beginner Kaggle Course
2023-03-02 Thursday Digging deep in models and their hyperparameters and participating in kaggle competition Kaggle competition on house prices
2023-03-03 Friday understand feature engineering and importance, imputers ,Worked on project proposal Medium article on best practices for ML models

Second Week

Date Day Progress Resources
2023-03-01 Saturday Understand more models SVMs,KNN and ensemble models XGBoost Coursera course on Python for Data Science
2023-03-01 Sunday Finished Kaggle ML interediate course,categorial encoding,and search for more resources Kaggle
2023-03-01 Monday ROC,AUC,Conersion matrix,Covariance matrix
2023-03-01 Tuesday Learned about standardization and done a project on all previously learned models Coursera course on Python for Data Science
2023-03-01 Wednesday Activation functions and Sentiment analysis
2023-03-02 Thursday Model Interpretation(Model-agnostics),text summarization,random walk
2023-03-03 Friday

About

The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis , Forecasting and Topic Modeling using Machine Learning , Deep Learning and Transformers!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published