Dockerfile produce functioning container for both Julia and CmdStan

The Dockerfile now produces a separate conda environment were all packages are installed. Furthermore, the compiled stan model is deleted, thus they need to be recompiled inside the container. This was required for the model to work. The Python dependencies are installed to match the exact versions of the environment used to produce the results from the article. These depencies are decribed in the article/requirements.txt. The error propagation notebook now starts with a block that detects if it is run inside the docker container, if so it set the cmdstan path. The Julia setup has been shrink significantly, now it no longer updates the packages, but uses the versions specified in the project.toml file. Unfortunately, the Julia packages is not installed during image build, but this is instead done in the julia_run_all.jl file.
biosustain · May 27, 2024 · 3cdecab · 3cdecab
1 parent 3e370ae
commit 3cdecab
Show file tree

Hide file tree

Showing 8 changed files with 1,931 additions and 1,567 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,22 +1,65 @@
-# Start from a core stack version
+# This Dockerfile is used to build the image in which the results
+# of the paper can be reproduced. The image is based on the
+# Jupyter Data Science Notebook image from Jupyter Docker Stack and 
+# extends the image with CmdStan, and the Julia environment. 
+# Furthermore the image uses the exact versions of the libraries, 
+# that were used to produce the results in the paper.
+# We want to thank the Jupyter Docker Stack Recepies as we got a 
+# lot inspiration from the following recipe:
+# https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html#add-a-custom-conda-environment-and-jupyter-kernel
+
+# Start from a core stack version, locked the version to ensure reproducibility
 FROM quay.io/jupyter/datascience-notebook:2024-05-20
 
-ENV CMDSTAN_VERSION=2.31.0
+# Set versions of core software
+ENV CMDSTAN_VERSION=2.33.0
+ENV CMDSTANPY_VERSION=1.1.0
+# Name your environment and choose the Python version
+ARG env_name=venv_pseudobatch
+ARG py_ver=3.10.8
 
 # Copy all files from directory to docker image
 COPY --chown=${NB_UID}:${NB_GID} . .
 
-# Install in the default python3 environment
-# Install cmdstanpy
-RUN pip install --no-cache-dir "cmdstanpy==1.0.4"
+# remove the precompile files for pseudobatch error_propagation
+RUN rm -f \ 
+    pseudobatch/error_propagation/stan/error_propagation \
+    pseudobatch/error_propagation/stan/error_propagation.exe \
+    pseudobatch/error_propagation/stan/error_propagation.hpp
+
+# You can add additional libraries required for notebooks here
+RUN mamba create --yes -p "${CONDA_DIR}/envs/${env_name}" \
+    python=${py_ver} \
+    'ipykernel' \
+    'jupyterlab' && \
+    mamba clean --all -f -y
 
-# Install cmdstan
-RUN python -m cmdstanpy.install_cmdstan --version "${CMDSTAN_VERSION}" --cores 2
+# Install cmdstanpy and cmdstan inside the environment
+RUN mamba run -p "${CONDA_DIR}/envs/${env_name}" mamba install --yes -c conda-forge cmdstanpy=="${CMDSTANPY_VERSION}" cmdstan=="${CMDSTAN_VERSION}" && \
+    mamba clean --all -f -y && \
+    fix-permissions "${CONDA_DIR}" && \
+    fix-permissions "/home/${NB_USER}"
 
-RUN pip install --no-cache-dir -e ".[error_propagation]" && \
+# Create Python kernel and link it to jupyter
+RUN "${CONDA_DIR}/envs/${env_name}/bin/python" -m ipykernel install --user --name="${env_name}" && \
     fix-permissions "${CONDA_DIR}" && \
     fix-permissions "/home/${NB_USER}"
 
-RUN julia --project=julia-env -e 'using Pkg; pkg"activate"; pkg"precompile"' && \
+# Installing the pseudobatch from local source using pip
+RUN "${CONDA_DIR}/envs/${env_name}/bin/pip" install --no-cache-dir -e \
+    '.[error_propagation]'
+
+# Install specific requirements to match versions used when producing the paper
+RUN "${CONDA_DIR}/envs/${env_name}/bin/pip" install --no-cache-dir -r article/requirements.txt
+
+# This changes the custom Python kernel so that the custom environment will
+# be activated for the respective Jupyter Notebook and Jupyter Console
+# hadolint ignore=DL3059
+RUN /opt/setup-scripts/activate_notebook_custom_env.py "${env_name}"
+
+USER ${NB_UID}
+
+# Setup Julia environment for simulations
+RUN julia --project=julia-env -e 'using Pkg; Pkg.activate(); Pkg.instantiate()' && \
     chmod -R go+rx "${CONDA_DIR}/share/jupyter" && \
     fix-permissions "${JULIA_PKGDIR}" "${CONDA_DIR}/share/jupyter"
diff --git a/article/README.md b/article/README.md
@@ -7,6 +7,8 @@ This folder contains all information that is required to reproduce the results o
 - **notebooks** - jyputer notebooks that reproduces the results and plots described in the paper.
 - **simulation_scripts** - Julia scripts used to generate the simulated data.
 - **data** - contains the data both simulated and real world data used in the article
+- **requirements.txt** - specifies the exact versions of all python packages required to reproduce the results of the paper. The main use is for building the Docker image.
+- **run_all.sh** - Shell script to reproduce the simulated data. This script should be executed from within the Docker container.
 
 
 ## How to reproduce results
@@ -19,11 +21,15 @@ See how to install Docker on your machine at the [Docker website](https://docs.d
 
 ### 2. Start the docker container
 - Ensure that the image is correctly loaded by running `docker image ls`. An image named *pseudobatch:1.0* should be found on that list.
-- Run `docker run --rm -it -p 8888:8888 "pseudobatch:1.0"`
+- Run `docker run --rm -it -p 8888:8888 "pseudobatch:1.2"`
 - Open the container using the instructions printed in the terminal
 
 ### 3. Rerun the simulations
-The simulated data that is used to test and show case the pseudo-batch transformation can be recreated by running a series of Julia scripts. To simplify the process they can be all be run at once by opening a terminal and executing `./run_all.sh` (this should be done INSIDE the docker container).
+The simulated data that is used to test and show case the pseudo-batch transformation can be recreated by running a series of Julia scripts. To simplify the process they can be all be run at once using the shell script `article/run_all.sh`. Please use the following commands
+
+1. Open a terminal inside the Docker container
+2. Navigate into the `article` folder, using `cd article`
+3. Run the shell script using the command `./run_all.sh`
 
 ### 4. Rerun analysis
-To rerun the analysis simply open the notebooks and run them.
+To rerun the analysis simply open the notebooks, CHANGE THE KERNEL to `venv_pseudobatch` (do this in the upper right corner), and run all cells. The error propagation analysis will take a while because it needs first has to compile the stan model.
diff --git a/article/figures/marginal_dist_of_fitted_parameters.png b/article/figures/marginal_dist_of_fitted_parameters.png