Arabic Dialect Classification

Many countries speak Arabic; however, each country has its own dialect. The aim of this project is to build a model that predicts the dialect given the text.

Overview

In this project, we have explored various machine learning models such as Support Vector Machine (SVM), XGBoost, and Multinomial Naive Bayes (MultinomialNB). After experimentation, we found that the MultinomialNB model achieved the highest accuracy of 79%.

Additionally, we utilized ARABERT, a BERT-based model from Hugging Face, to further improve the accuracy of our predictions. With ARABERT, we achieved an accuracy of 82%.

Project Structure

data/: Contains database and cleaned data used for training and evaluation, as well as the data fetching script (fetch_data.py) for easy access to the data.
Models/: Contains saved model parameters.
Notebooks/: Jupyter notebooks used for data exploration, model training, and evaluation.
Preprocessing/: Jupyter notebooks for the data cleaning process.
Web App/: Contains the web app script for deployment.

WebApp

The Web application for the AraBert Based Model using Streamlit: WebApp Video

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Models		Models
Notebooks		Notebooks
Preprocessing		Preprocessing
Web App		Web App
data		data
models/arabert_model_fine_tuned		models/arabert_model_fine_tuned
web app		web app
.gitattributes		.gitattributes
.gitignore		.gitignore
Data Fetching.ipynb		Data Fetching.ipynb
Data_Fetching_script.ipynb		Data_Fetching_script.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Dialect Classification

Overview

Project Structure

WebApp

About

Releases

Packages

Contributors 4

Languages

adelelwan24/Arabic-Dialect-Classification

Folders and files

Latest commit

History

Repository files navigation

Arabic Dialect Classification

Overview

Project Structure

WebApp

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages