Supervised by : Professor Mervat Mustafa Fahmy Abuelkheir
"Language is the bridge that connects minds, and NLP is the compass guiding us to understand and unlock its infinite potential."
The aim of the Bachelor project is to innovate a new way for Arabic (Egyptian-Dialect) Sentiment Analysis, Forecasting, and Topic Modeling using Machine Learning, Deep Learning, and Transformers!
📄View the Thesis »
·
Presentation
·
Demo Video
·
Report Bug
·
Be a Contributer
In recent years, social media platforms have become increasingly popular for individuals to express their thoughts and opinions on various topics and situations. Monitoring sentiment and understanding the evolution of topics is crucial for governments to identify negative sentiments and respond promptly. In this study, we developed a sentiment analysis ensemble model architecture consisting of 4 different Transformers namely: MARBERT,CaMel-bert-DA, CaMel-bert-Mix, and Alanzi. This ensemble architecture was followed by a final classification layer, consisting of a three-layer feed-forward neural network. The output of each transformer applied on its logits a formula, and the resulting logits were then summed before applying the softmax activation function. Evaluating the model’s performance on the sentiment analysis test dataset, an impressive test accuracy of 83%. To analyze the temporal trends of sentiment, we applied the LSTM model using a sliding window to time-stamped tweets related to English News, which generated significant discussions on social media. That is then translated to Arabic by a translation model. we plotted the sentiment arc and observed our results with the original results. We explored the effectiveness of BERTopic, a topic modeling technique, in comparison to LDA and NMF techniques. By employing various pre-trained Arabic language models as embeddings, we conducted topic modeling and aspect-based analysis. The results highlight that BERTopic and NMF achieved comparable and competitive outcomes, demonstrating their capability to capture meaningful topics effectively. However, LDA exhibited poor performance in generating coherent and informative topics. These findings emphasize the superiority of BERTopic and NMF over LDA in topic modeling tasks.
-
Tweet Collecting & Merging & Pre-processing & Analysis & Correctness Investigation
-
Customized Transformers & Ensemble Model Sentiment Analysis & Website
- Python
- Data Analysis and Visualization
- Machine Learning
- Deep Learning
- Transformers
- Topic Modeling
Users who will Use this Data should only use it for Practice and not for Commercial Purposes !
Made with ❤️ by Omar Sherif Ali - OSA.
© OSA - 2022
Date | Day | Progress | Resources |
---|---|---|---|
2023-03-01 | Saturday | reading in OReily Sckitlearn book ,revisied Python and searching for extra resources | book |
2023-03-01 | Sunday | reading in OReily Sckitlearn book and searching for extra resources | book |
2023-03-01 | Monday | Learned NLP,Tokenization,stemmation,lemmetization,count vectorizer | Udemy course on Python |
2023-03-01 | Tuesday | Discovered Pandas and done 5 notesbooks for practice on kaggle and finished Data cleaning course in Kaggle | Kaggle |
2023-03-01 | Wednesday | learned TF-IDF+ notebooks excercise and finished Machine Learning Beginner Kaggle Course | |
2023-03-02 | Thursday | Digging deep in models and their hyperparameters and participating in kaggle competition | Kaggle competition on house prices |
2023-03-03 | Friday | understand feature engineering and importance, imputers ,Worked on project proposal | Medium article on best practices for ML models |
Date | Day | Progress | Resources |
---|---|---|---|
2023-03-01 | Saturday | Understand more models SVMs,KNN and ensemble models XGBoost | Coursera course on Python for Data Science |
2023-03-01 | Sunday | Finished Kaggle ML interediate course,categorial encoding,and search for more resources | Kaggle |
2023-03-01 | Monday | ROC,AUC,Conersion matrix,Covariance matrix | |
2023-03-01 | Tuesday | Learned about standardization and done a project on all previously learned models | Coursera course on Python for Data Science |
2023-03-01 | Wednesday | Activation functions and Sentiment analysis | |
2023-03-02 | Thursday | Model Interpretation(Model-agnostics),text summarization,random walk | |
2023-03-03 | Friday |