Arabic-Dialect-Identification

This Repo contains 5 Main things :

Data fetching
Data pre-processing
ML Models Training
DL Models Training
Deployment with Flask
Demo GIF

1- Data Fetching

Using requests to request the data with id column -which was given- by POST request
Save fetched data with its dialect labels as tweetsWithLabels.csv

2- Data pre-processing

This notebook consist of :

remove_emoji(text) function
Two approaches of pre-processing
explore the most common words in each country
prepare the QADI test-set

3- ML Models Training

This notebook consist of :

Load cleaned data-set
CountVectorizer
TFIDF
Mazajak

4- DL Models Training

This notebook consist of :

Load cleaned data-set
AraBERTv2-base with ktrain ( best Results )
AraBERTv2-base with PyTorch

5- Deployment with Flask

in this folder you found :

ktrain with flask.py for loading pretrained ktrain model add deal with flask
AraBERTpreprocess.py for pre-processing
templates/prediction.html to get inputs
templates/Result.html to display the post-procssing and prediction result
static/base.css

6- Demo