Natural Language Processing with Disaster Tweets

My entry for the Disaster Tweets Kaggle competition which received a public score of 0.78945 and was submitted in March 2021. Competition participants create a machine learning model to detect which tweets are about real disasters and which ones aren't.

General info

Both the train and test datasets were manually cleaned using pandas and regular expressions, including expanding contractions and removing accents, special characters, and stopwords. A variety of features were then extracted to find, for example, the amount of hashtags, urls, word tokens, and tweet tokens for each entry.

After visualising and observing the engineered features, they were prepared for model training. This involved utilising a TFID Vectoriser and fitting the cleaned dataset with the selected features via a sparse matrix. In the end, four features were transformed and fit to the cleaned dataset.

Three different models, a Naive Bayes classifier, logistic regression, and support vector classifier, were trained and optimised according to their F-scores. The logistic regression model was found to be the best fit for the data.

The following files are included in this repo:

disaster_tweets.ipynb: the code
disaster_tweets_submission.csv: predictions for test dataset
train.csv: train dataset
test.csv: test dataset

Technologies

Python 3.7.6

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
disaster_tweets.ipynb		disaster_tweets.ipynb
disaster_tweets_submission.csv		disaster_tweets_submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing with Disaster Tweets

General info

Technologies

About

Releases

Packages

Languages

License

katkaypettitt/kaggle-disaster-tweets

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing with Disaster Tweets

General info

Technologies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages