ML-urdu-classification

Classification of Urdu News Articles

This project explores supervised machine learning techniques to classify Urdu-language news articles into five predefined categories: entertainment, business, sports, science-technology, and international. Using data scraped from prominent Urdu news websites, a dataset of 2,750 articles was prepared, involving extensive preprocessing steps such as text normalization, lemmatization, and tokenization.

Three models were implemented and evaluated: Multinomial Naive Bayes (MNB), Logistic Regression, and Neural Networks. MNB provided a simple and effective baseline, achieving an accuracy of 96.55%, while Logistic Regression offered robust classification with a 95.27% accuracy. The Neural Network outperformed both, achieving an impressive accuracy of 97.45% through advanced sequential modeling with dropout layers to prevent overfitting.

Performance was assessed using accuracy, precision, recall, and F1 scores, with confusion matrices providing insights into misclassifications. The project highlights the potential of machine learning for natural language processing in underrepresented languages like Urdu, while identifying limitations such as reliance on traditional models and the lack of contextual semantic understanding.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
DataAnalysis.ipynb		DataAnalysis.ipynb
LogisticRegression.ipynb		LogisticRegression.ipynb
NaiveBayes.ipynb		NaiveBayes.ipynb
NeuralNetwork.ipynb		NeuralNetwork.ipynb
Preprocessing.ipynb		Preprocessing.ipynb
README.md		README.md
Report.pdf		Report.pdf
Scraping.ipynb		Scraping.ipynb
raw_data.csv		raw_data.csv
scraped_data.csv		scraped_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-urdu-classification

About

Releases

Packages

Languages

samiemirza/ML-urdu-classification

Folders and files

Latest commit

History

Repository files navigation

ML-urdu-classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages