A machine learning-based project that detects disaster-related tweets to aid in disaster response and relief efforts. The project applies Natural Language Processing (NLP) techniques and classification models to analyze tweet data and classify them as disaster or non-disaster.
- Project Motivation
- Dataset Description
- Key Features
- Implementation Details
- Model Evaluation
- Setup and Usage
- Technologies Used
- Future Scope
- Contributions
Disasters disrupt lives and require immediate attention. Social media platforms like Twitter have become crucial tools for information dissemination during such events. By analyzing tweets in real time, this project aims to:
- Automate the classification of tweets as disaster-related or not.
- Support quicker decision-making for disaster response teams.
The dataset used for this project is sourced from Kaggle. It includes labeled tweets, with each tweet tagged as:
- 1: Related to a disaster.
- 0: Not related to a disaster.
- Training Set: 7,000 tweets
- Testing Set: 3,000 tweets
id
: Unique tweet identifiertext
: Content of the tweettarget
: Label (0 or 1) indicating if the tweet is disaster-related
- Cleaned and preprocessed tweet data for consistent analysis.
- NLP-based feature extraction methods, including:
- Tokenization
- Lemmatization
- Stop-word removal
- Implementation of multiple machine learning models:
- Logistic Regression
- Support Vector Machine (SVM)
- Naive Bayes
- Comparative analysis of model performance.
-
Data Cleaning:
Removed URLs, special characters, and stop words. -
Feature Extraction:
- Used TF-IDF Vectorizer to extract meaningful features from the text data.
-
Model Training:
- Trained models include Logistic Regression, SVM, and Naive Bayes.
- Used 80:20 split for training and testing.
-
Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1 Score
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Logistic Regression | 77.48% | 76.75% | 67.64% | 71.91% |
SVM | 78.69% | 80.41% | 65.79% | 72.37% |
Naive Bayes | 80.18% | 82.67% | 67.64% | 74.41% |
- Python 3.8 or above
- Jupyter Notebook or any Python IDE
-
Clone the repository:
git clone https://github.com/SrujanBhirud/Disaster-Detection-using-Tweets.git cd Disaster-Detection-using-Tweets
-
Install dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook
- Python
- Pandas, NumPy (Data Manipulation)
- Scikit-learn (Machine Learning)
- Matplotlib, Seaborn (Visualization)
- Integration with real-time Twitter data using the Twitter API.
- Implementation of deep learning models like LSTMs for enhanced accuracy.
- Expansion to multilingual disaster detection.
Contributions are always welcome!
Feel free to fork the repository and submit pull requests with improvements, fixes, or new features.
⭐ If you found this project helpful, please give it a star!