The project aims to develop a Machine Learning model which classifies the tweets as hateful or non-hateful tweets.
To develop this model, we are using a supervised learning approach. Initially, we are working on the dataset available online which has around 30k tweets labelled as hateful or not.
Aligning with a real-world scenario, this data is imbalanced having large data from 'non-hateful' category and small data from 'hateful' category which makes it an interesting project.
Whole data is split into the test and train data, and the model's performance is cross-validated over the test data which is satisfactory.
In future, work is to be done on improving the performance, increasing the categories of the tweets and making it work in the real time.
Whole code is there in my_twitter.ipynb notebook.
Dataset is taken from internet.