Skip to content

A demonstration application for LSTM based time series forecasting with NOAA temperature data. Dataset organized with MongoDB, model performance analyzed with SQL (MySQL), REST API created with FastAPI, and application containerized with Docker

Notifications You must be signed in to change notification settings

d-f/weather-prediction-app

Repository files navigation

weather-prediction-app

The NOAA has hosted hourly weather data including air temperature measurements for multiple different stations around the world. The different stations can be downloaded at: https://www.ncei.noaa.gov/data/global-hourly/access/2023/?C=S;O=D

For this demonstration application, the 99999927516.csv was used containing data from a station in Alaska.

git clone https://github.com/d-f/weather-prediction-app.git C:\\ml_projects\\weather_prediction_app\\code\\
cd C:\\ml_projects\\weather_prediction_app\\code\\weather-prediction-app\\

This csv was uploaded to a local MongoDB using PyMongo and the following:

python upload_csv.py -host_name "localhost:27017/" -db_name noaa_global_hourly -col_name wp_app -csv_dir "C:\\personal_ML\\weather_prediction_app\\code\\noaa_data\\"

Each row in the CSV file contains data for roughly every 5 minutes, and the temperature reading has various quality control checks.

PyMongo was used in order to filter for missing temperature values or temperature values with undesirable quality control flags:

python create_datasets.py -host_name "localhost:27017/" -db_name "noaa_global_hourly" -col_name "wp_app" -save_path "./raw_dataset.json"

The above code will create a JSON file dictated by -save_path where the keys are timestamps and the values are temperature values.

These raw data are normalized, windowed, partitioned and made seasonally stable with:

python process_dataset.py -raw_data_filepath "./raw_dataset.json" -n_save_filepath "./normalized_dataset.json" -val_prop 0.1 -input_width 12 -output_width 24 -data_prep_json "./data_prep.json"

seasonality

Figure 1: Seasonality decomposition results

In order to train models:

python train_lstm.py -output_size 24 -input_size 12 -hidden_size 1024 -dataset_json "./normalized_dataset.json" -batch_size 16 -lr 1e-6 num_layers 32 -num_epochs 256 -model_save_name model_1.pth.tar -result_dir "./results/" -patience 5

To describe performance of models with MySQL installed:

mysql --local_infile=1 -u root -p

Enter password

SET GLOBAL local_infile=ON;
CREATE DATABASE weather_prediction;
USE weather_prediction;
source C:/path/to/load_combined.sql
source C:/path/to/describe_performance.sql

Output:

batch_size learning_rate number_of_LSTM_layers hidden_size test_loss difference_from_min
8 0.0008 4 1024 0.00298405 0
8 0.0007 32 512 0.00465322 0.0016691710334271193
16 0.00001 32 512 0.00529043 0.002306379145011306
16 0.001 64 1024 0.0055018 0.0025177544448524714
16 0.001 32 1024 0.00568289 0.002698844065889716
32 0.001 32 512 0.00594506 0.002961012301966548
16 0.001 32 512 0.0069524 0.003968354547396302
16 0.0005 32 256 0.00760935 0.004625295987352729
8 0.0001 8 128 0.00999573 0.007011685287579894
16 0.0001 32 512 0.0151662 0.01218219450674951

Table 1: Results from the first query in describe_performance.sql

number_of_LSTM_layers test_loss difference_from_min
4 0.00298405 0
32 0.00465322 0.0016691710334271193
32 0.00529043 0.002306379145011306
64 0.0055018 0.0025177544448524714
32 0.00568289 0.002698844065889716
32 0.00594506 0.002961012301966548
32 0.0069524 0.003968354547396302
32 0.00760935 0.004625295987352729
8 0.00999573 0.007011685287579894
32 0.0151662 0.01218219450674951

Table 2: Results from the second query in describe_performance.sql

model_save_name batch_size learning_rate number_of_LSTM_layers hidden_size input_size output_size test_loss
stable_model_10.pth.tar 8 0.0008 4 1024 12 24 0.00298405

Table 3: Results from the third query in describe_performance.sql

To dockerize the application:

docker build -t weather_prediction_app .

In order to run the application to send POST requests (port 80 used for this example):

docker run -p 80:80 weather_prediction_app

About

A demonstration application for LSTM based time series forecasting with NOAA temperature data. Dataset organized with MongoDB, model performance analyzed with SQL (MySQL), REST API created with FastAPI, and application containerized with Docker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published