Code reused from: https://gitlab.com/mlpds_mit/askcosv2/forward_predictor/augmented_transformer The dockerfile has been edited to create a standalone image that be run using:
docker run <image-name> -d -p 9510-9512:9510-9512
Docker image can be found at: https://hub.docker.com/r/ai4greeneln/product_prediction
Benchmarking and serving modules for reaction outcome prediction with Augmented Transformer, reimplemented using OpenNMT based on the manuscript (https://www.nature.com/articles/s41467-020-19266-y). Models are released under the same license as the source code (MIT license).
First set up the url to the remote registry
export ASKCOS_REGISTRY=registry.gitlab.com/mlpds_mit/askcosv2/askcos2_core
Then follow the instructions below to use either Docker, or Singularity (if Docker or root privilege is not available). Building or pulling either CPU or GPU image would suffice. If GPUs are not available, just go with the CPU image.
- Only option: build from local
(CPU) docker build -f Dockerfile_cpu -t ${ASKCOS_REGISTRY}/forward_predictor/augmented_transformer:1.0-cpu .
(GPU) docker build -f Dockerfile_gpu -t ${ASKCOS_REGISTRY}/forward_predictor/augmented_transformer:1.0-gpu .
- Only option: build from local
(CPU) singularity build -f augmented_transformer_cpu.sif singularity_cpu.def
(GPU) singularity build -f augmented_transformer_gpu.sif singularity_gpu.def
sh scripts/download_trained_models.sh
(CPU) sh scripts/serve_cpu_in_docker.sh
(GPU) sh scripts/serve_gpu_in_docker.sh
GPU-based container requires a CUDA-enabled GPU and the NVIDIA Container Toolkit (or nvidia-docker in the past). By default, the first GPU will be used.
(CPU) sh scripts/serve_cpu_in_singularity.sh
(GPU) sh scripts/serve_gpu_in_singularity.sh
The error messages related to torchserve logging can be safely ignored. Note that these scripts start the service in the background (i.e., in detached mode). So they would need to be explicitly stopped if no longer in use
(Docker) docker stop forward_augmented_transformer
(Singularity) singularity instance stop forward_augmented_transformer
- Sample query
curl http://0.0.0.0:9510/predictions/pistachio_23Q3 \
--header "Content-Type: application/json" \
--request POST \
--data '{"smiles": ["[CH2:23]1[O:24][CH2:25][CH2:26][CH2:27]1.[F:1][c:2]1[c:3]([N+:10](=[O:11])[O-:12])[cH:4][c:5]([F:9])[c:6]([F:8])[cH:7]1.[H-:22].[NH2:13][c:14]1[s:15][cH:16][cH:17][c:18]1[C:19]#[N:20].[Na+:21]"]}'
- Sample response
List of
{
"products": List[str],
"scores": List[float]
},
Requirement: requests and pytest libraries (pip installable)
With the service started, run
pytest
Follow the instructions in Step 1 in the Serving section to build or pull the GPU docker image. It should have the name ${ASKCOS_REGISTRY}/forward_predictor/augmented_transformer:1.0-gpu
Note: the Docker needs to be rebuilt before running whenever there is any change in code.
Prepare the raw .csv files for train, validation and test (atom mapping not required). The required columns are "id" and "rxn_smiles", where "rxn_smiles" contains atom-mapped reaction SMILES, optionally with reagents.
Configure the environment variables in ./scripts/benchmark_in_docker.sh, especially the paths, to point to the absolute paths of raw files and desired output paths.
export DATA_NAME="USPTO_480k_mix"
export TRAIN_FILE=$PWD/data/USPTO_480k_mix/raw/raw_train.csv
export VAL_FILE=$PWD/data/USPTO_480k_mix/raw/raw_val.csv
export TEST_FILE=$PWD/data/USPTO_480k_mix/raw/raw_test.csv
...
Run benchmarking on a machine with GPU using
bash scripts/benchmark_in_docker.sh
This will run the preprocessing, training and predicting for Augmented Transformer with Top-n accuracies up to n=20 as the final outputs. Progress and result logs will be saved under ./logs.
The estimated running times for benchmarking the USPTO_480k dataset on a 32-core machine with 1 RTX3090 GPU are
- Preprocessing: ~1 hr
- Training: ~40 hrs
- Testing: ~30 mins
If you want to create servable model archives from own checkpoints (e.g., trained on different datasets), please refer to the archiving scripts (scripts/archive_in_docker.sh). Change the arguments accordingly in the script before running. It's mostly bookkeeping by replacing the data name and/or checkpoint paths; the script should be self-explanatory. Then execute the scripts with
sh scripts/archive_in_docker.sh
The servable model archive (.mar) will be generated under ./mars. Serving newly archived models is straightforward; simply replace the --models args in scripts/serve_{cpu,gpu}_in_{docker,singularity}.sh
with the new model name and .mar. The --models flag for torchserve can also take multiple arguments to serve multiple model archives concurrently.