A serverless analytical ML system tha predicts surf (wave) heights at Lahinch Beach, Ireland:
- Live Predictions of Surf Height at Lahinch
- PyData London talk on CJSurf as a Serverless ML Platform
- Predicting surf wave heights (ICML 2005)
- Hopsworks: Features, models, and assets are stored on https://app.hopsworks.ai
- Github Actions: Two feature pipelines and a batch prediction pipeline are executed in total five times per day using GitHubActions.
- Github Pages: The latest predictions are published on the github pages site.
The model training notebook was run manually in Colab, and can be run again at any time, using the new training data that has been collected since the last training run.
CJSurf is written entirely in Python.
Requirements: Create accounts on app.hopsworks.ai, github.com, streamlit.io.
Files:
- Github Actions files: .github/workflows/*.yml - they run the notebooks below on 6 hr and 24 hr schedules using bash scripts.
- Streamlit UI:
streamlit-image.py
- this Python program downloads the prediction image from Hopsworks and displays it. You need to set the HOPSWORKS_API_KEY environment variable in your Streamlit application. You create the HOPSWORKS_API_KEY in app.hopsworks.ai. - Notebooks:
surf-report-feature-pipeline.ipynb
: Downloads the latest surf report for today and writes it to thelahinch
feature group. Run manually first with 'BACKFILL=True' to fill the feature group with some surf reports from 2004 from a csv file.swell-predictions-feature-pipeline.ipynb
: Downloads the latest swell predictions and writes them toswells_exploded
. Run manually first with 'BACKFILL=True' to fill the feature group with some swell predictions from 2004 from a csv file.training-pipeline.ipynb
: Trains a k-nearest neighbor model using scikit-learn. Creates training data using a feature viewlahinch_surf
that is created by performing a point-in-time correct join of features from thelahinch
andswells_exploded
feature groups.batch-prediction-pipeline.ipynb
: Gets the latest feature values for thelahinch_surf
feature view and makes predictions of the surf heights for every 2 hours for the next 238 hours. It writes the predictions to a feature groupwave_predictions
and generates a PNG image with the predictions that is uploaded to Hopsworks. Streamlit downloads and shows this PNG as the surf predictions.
- Scripts: these are run by the Github Actions workflows. They use nbconvert to convert the notebooks to Python programs that are then run.
Buoy for Predictions
- ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/v16.2/gfs.20220710/00/wave/station/bulls.t00z/gfswave.62081.spec
- https://polar.ncep.noaa.gov/waves/WEB/gfswave.latest_run/plots/gfswave.62081.bull
- https://polar.ncep.noaa.gov/waves/product_table.shtml?-latest-gfswave-tp_sw1-NE_atlantic-
Surf Height Observations at Lahinch Beach