Skip to content

Commit

Permalink
Dockerfile produce functioning container for both Julia and CmdStan
Browse files Browse the repository at this point in the history
The Dockerfile now produces a separate conda environment were all packages are installed. Furthermore, the compiled stan model is deleted, thus they need to be recompiled inside the container. This was required for the model to work.

The Python dependencies are installed to match the exact versions of the environment used to produce the results from the article. These depencies are decribed in the article/requirements.txt.

The error propagation notebook now starts with a block that detects if it is run inside the docker container, if so it set the cmdstan path.

The Julia setup has been shrink significantly, now it no longer updates the packages, but uses the versions specified in the project.toml file. Unfortunately, the Julia packages is not installed during image build, but this is instead done in the julia_run_all.jl file.
  • Loading branch information
viktorht committed May 27, 2024
1 parent 3e370ae commit 3cdecab
Show file tree
Hide file tree
Showing 8 changed files with 1,931 additions and 1,567 deletions.
61 changes: 52 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,22 +1,65 @@
# Start from a core stack version
# This Dockerfile is used to build the image in which the results
# of the paper can be reproduced. The image is based on the
# Jupyter Data Science Notebook image from Jupyter Docker Stack and
# extends the image with CmdStan, and the Julia environment.
# Furthermore the image uses the exact versions of the libraries,
# that were used to produce the results in the paper.
# We want to thank the Jupyter Docker Stack Recepies as we got a
# lot inspiration from the following recipe:
# https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html#add-a-custom-conda-environment-and-jupyter-kernel

# Start from a core stack version, locked the version to ensure reproducibility
FROM quay.io/jupyter/datascience-notebook:2024-05-20

ENV CMDSTAN_VERSION=2.31.0
# Set versions of core software
ENV CMDSTAN_VERSION=2.33.0
ENV CMDSTANPY_VERSION=1.1.0
# Name your environment and choose the Python version
ARG env_name=venv_pseudobatch
ARG py_ver=3.10.8

# Copy all files from directory to docker image
COPY --chown=${NB_UID}:${NB_GID} . .

# Install in the default python3 environment
# Install cmdstanpy
RUN pip install --no-cache-dir "cmdstanpy==1.0.4"
# remove the precompile files for pseudobatch error_propagation
RUN rm -f \
pseudobatch/error_propagation/stan/error_propagation \
pseudobatch/error_propagation/stan/error_propagation.exe \
pseudobatch/error_propagation/stan/error_propagation.hpp

# You can add additional libraries required for notebooks here
RUN mamba create --yes -p "${CONDA_DIR}/envs/${env_name}" \
python=${py_ver} \
'ipykernel' \
'jupyterlab' && \
mamba clean --all -f -y

# Install cmdstan
RUN python -m cmdstanpy.install_cmdstan --version "${CMDSTAN_VERSION}" --cores 2
# Install cmdstanpy and cmdstan inside the environment
RUN mamba run -p "${CONDA_DIR}/envs/${env_name}" mamba install --yes -c conda-forge cmdstanpy=="${CMDSTANPY_VERSION}" cmdstan=="${CMDSTAN_VERSION}" && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"

RUN pip install --no-cache-dir -e ".[error_propagation]" && \
# Create Python kernel and link it to jupyter
RUN "${CONDA_DIR}/envs/${env_name}/bin/python" -m ipykernel install --user --name="${env_name}" && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"

RUN julia --project=julia-env -e 'using Pkg; pkg"activate"; pkg"precompile"' && \
# Installing the pseudobatch from local source using pip
RUN "${CONDA_DIR}/envs/${env_name}/bin/pip" install --no-cache-dir -e \
'.[error_propagation]'

# Install specific requirements to match versions used when producing the paper
RUN "${CONDA_DIR}/envs/${env_name}/bin/pip" install --no-cache-dir -r article/requirements.txt

# This changes the custom Python kernel so that the custom environment will
# be activated for the respective Jupyter Notebook and Jupyter Console
# hadolint ignore=DL3059
RUN /opt/setup-scripts/activate_notebook_custom_env.py "${env_name}"

USER ${NB_UID}

# Setup Julia environment for simulations
RUN julia --project=julia-env -e 'using Pkg; Pkg.activate(); Pkg.instantiate()' && \
chmod -R go+rx "${CONDA_DIR}/share/jupyter" && \
fix-permissions "${JULIA_PKGDIR}" "${CONDA_DIR}/share/jupyter"
12 changes: 9 additions & 3 deletions article/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ This folder contains all information that is required to reproduce the results o
- **notebooks** - jyputer notebooks that reproduces the results and plots described in the paper.
- **simulation_scripts** - Julia scripts used to generate the simulated data.
- **data** - contains the data both simulated and real world data used in the article
- **requirements.txt** - specifies the exact versions of all python packages required to reproduce the results of the paper. The main use is for building the Docker image.
- **run_all.sh** - Shell script to reproduce the simulated data. This script should be executed from within the Docker container.


## How to reproduce results
Expand All @@ -19,11 +21,15 @@ See how to install Docker on your machine at the [Docker website](https://docs.d

### 2. Start the docker container
- Ensure that the image is correctly loaded by running `docker image ls`. An image named *pseudobatch:1.0* should be found on that list.
- Run `docker run --rm -it -p 8888:8888 "pseudobatch:1.0"`
- Run `docker run --rm -it -p 8888:8888 "pseudobatch:1.2"`
- Open the container using the instructions printed in the terminal

### 3. Rerun the simulations
The simulated data that is used to test and show case the pseudo-batch transformation can be recreated by running a series of Julia scripts. To simplify the process they can be all be run at once by opening a terminal and executing `./run_all.sh` (this should be done INSIDE the docker container).
The simulated data that is used to test and show case the pseudo-batch transformation can be recreated by running a series of Julia scripts. To simplify the process they can be all be run at once using the shell script `article/run_all.sh`. Please use the following commands

1. Open a terminal inside the Docker container
2. Navigate into the `article` folder, using `cd article`
3. Run the shell script using the command `./run_all.sh`

### 4. Rerun analysis
To rerun the analysis simply open the notebooks and run them.
To rerun the analysis simply open the notebooks, CHANGE THE KERNEL to `venv_pseudobatch` (do this in the upper right corner), and run all cells. The error propagation analysis will take a while because it needs first has to compile the stan model.
Binary file modified article/figures/marginal_dist_of_fitted_parameters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3cdecab

Please sign in to comment.