Machine Learning Sentiments on Urdu Text

📌 About the Project

This project focuses on sentiment analysis for Urdu text using machine learning and deep learning models. With over 300 million Urdu speakers worldwide, sentiment analysis for Urdu has been underexplored due to a lack of resources and datasets. This research aims to bridge this gap by implementing various NLP techniques and classification models to analyze sentiments in Urdu movie reviews.

🔍 Problem Statement

Traditional sentiment analysis models primarily focus on English and other Western languages. Due to the complexity of the Urdu language, existing NLP methods fail to deliver optimal results. The key challenges include:

Right-to-left script processing
Lack of annotated datasets
Complex grammar and morphological structure
Tokenization and word segmentation issues

🎯 Approach - Project Planning & Aims Grid

Dataset Preparation: Used a dataset of 50,000 Urdu movie reviews with positive and negative labels.
Preprocessing: Implemented stop-word removal, lemmatization, and tokenization.
Feature Extraction: Applied TF-IDF, Bag of Words (BoW), and Word2Vec.
Classification Models: Used Support Vector Machine (SVM), Decision Tree, Logistic Regression, and Long Short-Term Memory (LSTM).
Evaluation Metrics: Compared models using accuracy, precision, recall, F1-score, confusion matrix, and ROC curve.
Optimization: Fine-tuned hyperparameters and performed cross-validation to improve accuracy.

🛠 Technologies Used

Programming Language: Python
Libraries & Frameworks:
- TensorFlow & Keras (Deep Learning)
- Scikit-learn (ML Models)
- NLTK & UrduHack (NLP Processing)
- Gensim (Word2Vec)
- Matplotlib & Seaborn (Visualization)
Dataset Source: GitHub, Kaggle

🚀 Setup Process

Prerequisites

Ensure you have Python 3.x installed. Install the required dependencies using:

pip install -r requirements.txt

Running the Project

Clone the repository:

git clone https://github.com/yourusername/urdu-sentiment-analysis.git
cd urdu-sentiment-analysis

Prepare the dataset and place it in the data/ directory.
Run the preprocessing script:
```
python preprocessing.py
```
Train and evaluate machine learning models:
```
python train_models.py
```
Train and evaluate deep learning models:
```
python train_lstm.py
```

📊 Results & Findings

LSTM with Word2Vec outperformed other models with 87.94% accuracy.
SVM with TF-IDF achieved the second-best accuracy of 80.92%.
Deep learning models required higher computational resources but provided better results.
Feature extraction techniques significantly impacted model performance.

📜 Legal & Ethical Considerations

This project ensures compliance with data privacy regulations.
The dataset was sourced from publicly available repositories.
No personally identifiable information (PII) is included in the dataset.

🤝 Contributing

Contributions are welcome! Feel free to fork this repository, create a branch, and submit a pull request.

📧 Contact

For any inquiries, reach out via email: Emaazsiddiq@gmail.com

🏆 Acknowledgments

Supervisor: Dr. Na Helian
University: University of Hertfordshire
Open-source contributors for NLP libraries

This project is open-source and licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BOW DECISION TREE.ipynb		BOW DECISION TREE.ipynb
BOW LOGISTIC REGRESSION.ipynb		BOW LOGISTIC REGRESSION.ipynb
BOW SVM.ipynb		BOW SVM.ipynb
MASTER PROJECT BAG OF WORDS.ipynb		MASTER PROJECT BAG OF WORDS.ipynb
MASTER PROJECT TF-IDF SENTIMENT ANALYSIS.ipynb		MASTER PROJECT TF-IDF SENTIMENT ANALYSIS.ipynb
MASTER PROJECT WORD2VECTOR SENTIMENT ANALYSIS + Deep Learning.ipynb		MASTER PROJECT WORD2VECTOR SENTIMENT ANALYSIS + Deep Learning.ipynb
Machine Learning Sentiments on Urdu Text.pdf		Machine Learning Sentiments on Urdu Text.pdf
README.md		README.md
TFIDF DECISION TREE.ipynb		TFIDF DECISION TREE.ipynb
TFIDF LOGISTIC REGRESSION.ipynb		TFIDF LOGISTIC REGRESSION.ipynb
TFIDF SVM.ipynb		TFIDF SVM.ipynb
imdb_urdu_reviews_test.gz		imdb_urdu_reviews_test.gz
imdb_urdu_reviews_train.gz		imdb_urdu_reviews_train.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Sentiments on Urdu Text

📌 About the Project

🔍 Problem Statement

🎯 Approach - Project Planning & Aims Grid

🛠 Technologies Used

🚀 Setup Process

Prerequisites

Running the Project

📊 Results & Findings

📜 Legal & Ethical Considerations

🤝 Contributing

📧 Contact

🏆 Acknowledgments

About

Uh oh!

Languages

MuhammadEmaaz/Machine-Learning-Sentiments-on-Urdu-Text

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Sentiments on Urdu Text

📌 About the Project

🔍 Problem Statement

🎯 Approach - Project Planning & Aims Grid

🛠 Technologies Used

🚀 Setup Process

Prerequisites

Running the Project

📊 Results & Findings

📜 Legal & Ethical Considerations

🤝 Contributing

📧 Contact

🏆 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages