Disaster Alerting System

Apache Spark streaming application using Twitter4j and developer account (provides OAuth keys) to stream live tweets and classify them. The model predicts whether a particular tweet is informing about a disaster or not. This way, we can create an efficient Disaster Alerting System, which takes information straight from witnesses.

Using NLP techniques to process tweets and classify them.

Flair framework built on PyTorch has been used to train the text classifier. GloVe Twitter embeddings stacked with Flair embeddings ('news-forward' and 'news-backward'). The model itself is a GRU (a certain kind of RNN). The model has been trained with default hyperparameters, so they are not presented here. My goal wasn't to train the best model, but to build a working infrastructure.

The dataset for training has been taken from Kaggle.

Visit Google Colab to see the training code

Model results on a test dataset (a 1/10 of train.csv)

              precision    recall  f1-score   support

    disaster     0.8513    0.6735    0.7521       340
not disaster     0.7744    0.9050    0.8346       421

   micro avg     0.8016    0.8016    0.8016       761
   macro avg     0.8128    0.7893    0.7933       761
weighted avg     0.8088    0.8016    0.7977       761
 samples avg     0.8016    0.8016    0.8016       761

After tuning hyperparameters, paying more attention to text cleaning, maybe changing GRU to BiLSTM, and possible feature engineering, you may get quite better results.

Screenshots from server log:

Green color means the model's prediction is "not disaster", and red - "disaster"

TODO

Docker for the server
GUI

⚙️ Requirements and building

git clone https://github.com/Goader/disaster-alerting-spark.git

🐍 Python

Flask 2.0.1
Flair 0.8.0.post1
Colorama

To run the server - simply run server.py script passing a path to the model .pt file as an argument

🛢️ Scala

Scala 2.12.10
Apache Spark 3.1.1
Apache Bahir (Spark Streaming Twitter) 2.4.0
Play JSON Parser 2.9.2
Twitter4j 4.0.6

To run the application - use SBT.

There is 1 App object:

Application - main application object, which runs the program (make sure the server is running by this time)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Disaster Alerting System

TODO

⚙️ Requirements and building

🐍 Python

🛢️ Scala

⌨️ Created by @Goader

Files

README.md

Latest commit

History

README.md

File metadata and controls

Disaster Alerting System

TODO

⚙️ Requirements and building

🐍 Python

🛢️ Scala

⌨️ Created by @Goader