Skip to content

Samahmaamri/Hate-Speech-Detection-in-Arabic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Hate-Speech-Detection-in-Arabic

Hate Speech Detection in Arabic Using NLP

Overview: Coding

This project focuses on detecting hate speech in Arabic text using Natural Language Processing (NLP) techniques. The objective is to classify Arabic tweets or texts into hate speech or non-hate speech categories. Given the challenges of processing Arabic text due to its unique morphology and grammar, this project employs preprocessing techniques and machine learning model.



Dataset:

We use a publicly available Arabic dataset for hate speech detection, which contains a collection of Arabic tweets labeled as "Hate Speech" or "Not Hate Speech".
Dataset Source: https://github.com/rewire-online/multilingual-hatecheck

Preprocessing:

Arabic text presents unique challenges in NLP due to its complex morphology and diacritics. This project uses the following preprocessing steps:

==> Text Normalization: Convert text to a standard form.
==> Tokenization: Use Farasa Segmenter for Arabic tokenization.
==> Stopwords Removal: Remove common Arabic stopwords that do not contribute to text meaning.
==> TF-IDF Vectorization: Convert text data into numerical form using TF-IDF.


Model Training:

We use Support Vector Machine (SVM) to classify the Arabic text into hate speech or non-hate speech