A platform for sharing interactive versions of trained machine learning models for chemistry.
This repo acts as a template for creating an interactive webapp for one trained model. The SupraShare homepage hosts several webapps created using this template, all of which require completely different Python environments. We use Docker to keep these environments separated and Caddy as a webserver with a reverse proxy to make each webapp accessible.
While the app in this template will start if run locally as is, it does not contain any working ML models. The reason for this is so we could include more features that you may wish to use in your app, not all of which are compatible with the cage prediction model.
For a working example that closely resembles the model in this template see the cage prediction webapp repo.
GitHub makes it easy to create a repository from a template.
We recommend naming the repository <model-name>-webapp
and including only the default
branch (develop
). A new main
branch can be created afterwards for deployment purposes
(see below).
The code in the website
directory uses Plotly Dash and is
separated into several submodules. The terminology is well-explained in the Dash documentation.
It also uses Dash Bootstrap Components for styling,
layout and interactive components.
Of the below files, Layout, Model and Callbacks are designed to be adapted to suit the needs of the webapp. The others - App, Components and Drawer - are intended to mostly be imported as-is (just like any other Python package).
- Components: Each component used on the webpage and imported in Layout is defined here. Generally these are functions that return a dash core or dash bootstrap component. These components are designed to be flexible and reusable and contain detail on how to use them in docstrings.
- Drawer: The components that are unique to the JSME Drawer - also imported in Layout. This is in a separate module due to dependency restrictions with older versions of scikit-learn.
- Layout: The content for the webpage is defined here, based on imported components.
- Model: Functions that load, prepare and run the actual ML model(s).
- Callbacks: Functions that are automatically called whenever an input component's property changes. These functions make the app interactive and are how users can interact with the underlying model.
- App: The main python file to run the webapp. Brings together the Layout and Callback components and sets various important variables such as APP_NAME and URL_PATH.
Note: some callbacks are defined in Callbacks and others in Components and Drawer. Those in Callbacks are app-specific and where we recommend adding your own callbacks, the others are there to make the custom components interactive.
graph TD;
Components-->Layout;
Model-->Callbacks;
Callbacks-->App;
Layout-->App;
The trained ML models themselves should be put here, usually in a pickled format or similar.
Unit tests that check your model(s) give expected outputs for given inputs go here. You could also include more advanced tests to ensure the webapp behaves as expected.
As well as the usual files such as LICENCE
, .gitignore
etc., we have:
.github/workflows/ci.yml
,.flake8
and.pre-commit-config.yaml
used for continuous integration.Dockerfile
anddocker-compose.yml
contain settings used by Docker.
Docker is used to set up a container within which the webapp
runs. The Dockerfile
is used by Docker to build the image that defines the
container automatically. In our case, the steps are to pull a miniconda3 parent image
then set up a conda environment.
When a container based on the image starts running the webapp server is started and the
correct port is exposed. More information on the Dockerfile
can be found here.
Note: The continuumio/miniconda3 parent image is quite large (although smaller than an anaconda equivalent) and still uses anaconda so can be slow to build complex environments. You may consider using micromamba to speed things up a bit.
The docker-compose
command is used to orchestrate
multi-container Docker applications and the YAML file configures this. For development,
the YAML file in this repository is very simple as there is only one container (for the
webapp) which is named web
in the YAML file.
The command docker-compose up --build
is used to build the image(s) then start the
containers.
docker-compose
is more useful (and the config more complicated) in the deployment
environemnt where multiple webapps containing different models are being run.
Github Actions are used to run code quality checks on push and pull events. The pre-commit configuration is used for this. As the name suggests, these hook scripts can be used to identify problems before committing code. To set this up on your local machine run:
pip install pre-commit
pre-commit install
pre-commit run --all-files # optionally to run once on everything
For easy deployment, a Docker image is automatically generated according to the ci.yml config when changes are merged into the main branch.
The webserver setup consists of only a few configuration files:
- docker-compose.yml configures the multi-container docker setup as described above and
- docker-compose.override.yml handles the automatic pulling of new images using watchtower.
- Caddyfile configures the Caddy webserver, mainly handling the reverse proxy that directs client requests to the correct webapp.
- app_list.yml is used internally by each app to provide links to other apps in the navbar.
All of these files need updating (apart from docker-compose.override.yml
) when a new
webapp is added.
The webserver at is configured to periodically check the package repositories specified
in docker-compose.yml
for new images. Assuming this is called new-webapp
(and the repository
address is imperialcollegelondon/new-webapp) a new service needs to be specified in
docker-compose.yml`, e.g.:
services:
cage-prediction-webapp:
image: ghcr.io/imperialcollegelondon/cage-prediction-webapp:main
environment:
- APP_NAME=Is My Cage Porous?
- URL_PREFIX=/is-my-cage-porous
restart: unless-stopped
volumes:
- ./app_list.yaml:/usr/src/app/app_list.yaml
+ new-webapp:
+ image: ghcr.io/imperialcollegelondon/new-webapp:main
+ environment:
+ - APP_NAME=New App Name
+ - URL_PREFIX=/new-webapp
+ restart: unless-stopped
+ volumes:
+ - ./app_list.yaml:/usr/src/app/app_list.yaml
The Caddyfile
must be updated to direct requests to the new app:
https://suprashare.rcs.ic.ac.uk {
tls /srv/cert /srv/key
reverse_proxy /is-my-cage-porous/* cage-prediction-webapp:8050
+ reverse_proxy /new-webapp/* new-webapp:8050
file_server {
root /srv/www/root/
}
The app-list.yaml
should include a new mapping of the APP_NAME to the URL path:
Home: /
Is My Cage Porous?: /is-my-cage-porous/
+ New App Name: /new-webapp/
Finally, you may at this point want to add a link to your new app in the front page app.
SupraShare is jointly developed by the Jelfs Research Group and the Imperial College Research Software Engineering Team.
Please use the GitHub Issue Tracker to ask questions, report bugs and request features.
Each newly created app based on this template should include in the README:
- Details of any pulication and archived data relating to the model, including DOIs
- Instructions on how to use the app (allowed input format(s) and expected output(s))
- Instructions for running unit tests