diff --git a/ten-simple-rules-dockerfiles.tex b/ten-simple-rules-dockerfiles.tex index 6063595..b334f45 100644 --- a/ten-simple-rules-dockerfiles.tex +++ b/ten-simple-rules-dockerfiles.tex @@ -567,11 +567,9 @@ \section{Docker \& Dockerfiles}\label{docker-dockerfiles}} \newpage -\hypertarget{consider-tools-to-assist-with-dockerfile-generation}{% -\section*{1. Consider tools to assist with Dockerfile -generation}\label{consider-tools-to-assist-with-dockerfile-generation}} -\addcontentsline{toc}{section}{1. Consider tools to assist with -Dockerfile generation} +\hypertarget{use-available-tools}{% +\section*{1. Use available tools}\label{use-available-tools}} +\addcontentsline{toc}{section}{1. Use available tools} \ztitlerefsetup{title=1} \zlabel{rule:tools} \label{rule:tools} \zrefused{rule:tools} @@ -618,7 +616,7 @@ \section*{1. Consider tools to assist with Dockerfile \subsection{Tools for container generation}\label{tools-for-container-generation}} -Repo2docker {[}25{]} is a tool maintained by +\texttt{repo2docker} {[}25{]} is a tool maintained by \href{https://jupyter.org/}{Project Jupyter} that can help to transform a source code or data repository, e.g., GitHub, GitLab, or Zenodo, into a container. The tool relies on common configuration files for defining @@ -644,15 +642,15 @@ \subsection{Tools for container The resulting container image installs the dependencies listed in the requirements file, and it provides an entrypoint to run a notebook server to interact with any existing workflows in the repository. Since -repo2docker is used within \href{https://mybinder.org/}{MyBinder.org}, -if you make sure your workflow is ``Binder-ready'', you and others can -also obtain an online workspace with a single click. However, one -precaution to consider is that the default command above will create a -home for the current user, meaning that the container itself would not -be ideal to share; instead, any researcher interested in interacting -with the code inside should run repo2docker themselves and create their -own container. Because repo2docker is deterministic, the environments -are the same +\texttt{repo2docker} is used within +\href{https://mybinder.org/}{MyBinder.org}, if you make sure your +workflow is ``Binder-ready'', you and others can also obtain an online +workspace with a single click. However, one precaution to consider is +that the default command above will create a home for the current user, +meaning that the container itself would not be ideal to share; instead, +any researcher interested in interacting with the code inside should run +\texttt{repo2docker} themselves and create their own container. Because +\texttt{repo2docker} is deterministic, the environments are the same (see~\hyperref[{rule:pinning}]{Rule~\ztitleref{rule:pinning}} for ensuring the same software versions). @@ -666,9 +664,37 @@ \subsection{Tools for container supports multiple programming languages and configurations files, just as \texttt{repo2docker} does, but it attempts to create a readable \texttt{Dockerfile} compatible with plain Docker and to improve user -experience by cleverly adjusting instructions to reduce build time. For -any tool that you use, be sure to look at documentation for usage and -configuration options, and look for options to add metadata (e.g., +experience by cleverly adjusting instructions to reduce build time. +While perhaps more useful for fine-tuning, linters can also be helpful +when writing Dockerfiles, by catching errors or non-recommended +formulations (see \hyperref[{rule:usage}]{Rule~\ztitleref{rule:usage}}). + +\hypertarget{tools-for-templating}{% +\subsection{Tools for templating}\label{tools-for-templating}} + +It is likely going to be the case that over time you will work on +projects and develop images that are similar in nature to each other. To +avoid constantly repeating yourself, you should consider adopting a +standard workflow that will give you a clean slate for a new project. As +an example, cookie cutter templates {[}33{]} or community templates +(e.g., {[}34{]}) can provide the required structure and files, (e.g., +for documentation, CI, and licenses), for getting started. If you decide +to build your own cookie cutter template, consider collaborating with +your community during development of the standard to ensure it will be +useful to others. + +Part of your project template should be a protocol for publishing the +\texttt{Dockerfile} and even exporting the image to a suitable location, +e.g., a container registry or data repository, taking into consideration +how your workflow can receive a DOI for citation. A template is +preferable to your own set of base images because of the maintenance +efforts the base images require. Therefore, instead of building your own +independent solution, consider contributing to existing suites of images +(see \hyperref[{rule:base}]{Rule~\ztitleref{rule:base}}) and improving +these for your needs. + +For any tool that you use, be sure to look at documentation for usage +and configuration options, and look for options to add metadata (e.g., labels see~\hyperref[{rule:document}]{Rule~\ztitleref{rule:document}}). \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} @@ -683,7 +709,7 @@ \section*{2. Use versioned images}\label{use-versioned-images}} work is crucial, as the image and tag that you choose has important implications for your container. It is good practice to use \textbf{base images} that are maintained by the Docker library, so called -\emph{``official images''} {[}33{]}, which benefit from a review for +\emph{``official images''} {[}35{]}, which benefit from a review for best practices and vulnerability scanning {[}13{]}. You can identify these images by the missing user portion of the image name, which comes before the \texttt{/}, e.g., \texttt{r-base} or \texttt{python}. @@ -711,7 +737,7 @@ \section*{2. Use versioned images}\label{use-versioned-images}} a moving target for your computing environment that can break your workflow. Note that a version tag means that the tagged software is frozen, but it does not mean that the image will not change, as -backwards compatible fixes (cf.~semantic versioning, {[}34{]}), e.g., +backwards compatible fixes (cf.~semantic versioning, {[}36{]}), e.g., version \texttt{1.2.3} that fixes a security problem in version \texttt{1.2.2} or updates to an underlying system library, would be published to the parent tag \texttt{1.2}. @@ -754,7 +780,7 @@ \section*{2. Use versioned images}\label{use-versioned-images}} images} for bioinformatics with R \item \href{https://hub.docker.com/_/neurodebian}{NeuroDebian images} for - neuroscience {[}35{]} + neuroscience {[}37{]} \item \href{https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html}{Jupyter Docker Stacks} for Notebook-based computing @@ -764,7 +790,7 @@ \section*{2. Use versioned images}\label{use-versioned-images}} \end{itemize} For example, here is how we would use a base image \texttt{verse}, which -provides the popular Tidyverse suite of packages {[}36{]}, with R +provides the popular Tidyverse suite of packages {[}38{]}, with R version \texttt{3.5.2} from the \texttt{rocker} organisation on Docker Hub (\texttt{docker.io}, which is the default and can be omitted). @@ -821,7 +847,7 @@ \section{3. Format for clarity}\label{format-for-clarity}} requirements, and (b) transparency and inspectability outweigh storage concerns in data science. If you really need to reduce the size, you may look into using multiple containers (cf.~{[}12{]}) or multi-stage builds -{[}37{]}. +{[}39{]}. Depending on the programming language used, your project may already contain files to manage dependencies, and you may use a package manager @@ -838,7 +864,7 @@ \section{3. Format for clarity}\label{format-for-clarity}} well-documented recipe for the user as well as a machine. Each instruction will result in a new layer, and reasonably grouped changes increase readability of the \texttt{Dockerfile} and facilitate -inspection of the image, e.g., with tools like dive {[}38{]}. Convoluted +inspection of the image, e.g., with tools like dive {[}40{]}. Convoluted \texttt{RUN} instructions can be acceptable to reduce the number of layers, but careful layout and consistent formatting should be applied. @@ -943,7 +969,7 @@ \subsection{Add metadata as labels}\label{add-metadata-as-labels}} \hyperref[{rule:usage}]{Rule~\ztitleref{rule:usage}}). The OCI Image Format Specification provides some common label keys (see -the ``Annotations'' section in {[}39{]}) to help standardise field names +the ``Annotations'' section in {[}41{]}) to help standardise field names across container tools, as shown below. Some keys hold specific content, e.g., \texttt{org.opencontainers.image.documentation} is a URL as character string pointing to documentation on the image, and @@ -1061,60 +1087,10 @@ \subsection{Include usage is a demonstration of your careful work habits and good intentions for transparency and computational reproducibility. -\hypertarget{order-instructions}{% -\section{5. Order instructions}\label{order-instructions}} - -\ztitlerefsetup{title=5} \zlabel{rule:order} \label{rule:order} \zrefused{rule:order} - -You will regularly build an image during development of your workflow. -You can take advantage of \emph{build caching} to avoid execution of -time-consuming instructions, e.g., install from a remote resource or a -copying a file that gets cached. Therefore, you should keep instructions -\emph{in order} of least likely to change to most likely to change. -Docker will execute the instructions in the order that they appear in -the \texttt{Dockerfile}; when one instruction is completed, the result -is cached, and the build moves to the next one. If you change something -in the Dockerfile and rebuild the container, each instruction is -inspected in turn. If it has not changed, the cached layer is used and -the build progresses. Conversely, if the line has changed, that build -step is executed afresh, and then every subsequent instruction will have -to be executed in case the changed line influences a later instruction. -You should regularly re-build the image using the \texttt{-\/-no-cache} -option to learn about broken instructions as soon as possible -(cf.~\hyperref[{rule:usage}]{Rule~\ztitleref{rule:usage}}). Such a -re-build is also a good occasion to revisit the order of instructions, -e.g., if you appended an instruction at the end to save time while -iteratively developing the \texttt{Dockerfile}, and the formatting. You -can add a version tag to the image before the re-build to make sure to -keep a working environment at hand. A recommended ordering based on -these considerations is as follows, and you can use comments to visually -separate these sections in your file (cf.~Listing~1): - -\begin{enumerate} -\def\labelenumi{\arabic{enumi}.} -\tightlist -\item - System libraries -\item - Language-specific libraries or modules -\item - from repositories (i.e., binaries) -\item - from source (e.g., GitHub) -\item - Installation of your own software and scripts (if not mounted) -\item - Copying data and configuration (if not mounted) -\item - Labels -\item - Entrypoint and default command -\end{enumerate} - \hypertarget{specify-software-versions}{% -\section*{6. Specify software +\section*{5. Specify software versions}\label{specify-software-versions}} -\addcontentsline{toc}{section}{6. Specify software versions} +\addcontentsline{toc}{section}{5. Specify software versions} \ztitlerefsetup{title=6} \zlabel{rule:pinning} \label{rule:pinning} \zrefused{rule:pinning} @@ -1185,15 +1161,15 @@ \subsection{Extension packages and programming language \begin{itemize} \tightlist \item - Python: \texttt{requirements.txt} (pip tool, {[}40{]}), - \texttt{environment.yml} (Conda, {[}41{]}) + Python: \texttt{requirements.txt} (pip tool, {[}42{]}), + \texttt{environment.yml} (Conda, {[}43{]}) \item - R: \texttt{DESCRIPTION} file format {[}42{]} and \texttt{r} (``little - R'', {[}43{]}) + R: \texttt{DESCRIPTION} file format {[}44{]} and \texttt{r} (``little + R'', {[}45{]}) \item - JavaScript: \texttt{package.json} of \texttt{npm} {[}44{]} + JavaScript: \texttt{package.json} of \texttt{npm} {[}46{]} \item - Julia: \texttt{Project.toml} and \texttt{Manifest.toml} {[}45{]} + Julia: \texttt{Project.toml} and \texttt{Manifest.toml} {[}47{]} \end{itemize} In some cases (e.g., Conda) the package manager is also able to make @@ -1264,6 +1240,48 @@ \subsection{Extension packages and programming language ensuring that the image works without specifying any users, and, if your image deviates from that, we suggest you document it precisely. +\hypertarget{use-version-control}{% +\section*{6. Use version control}\label{use-version-control}} +\addcontentsline{toc}{section}{6. Use version control} + +\ztitlerefsetup{title=9} \zlabel{rule:publish} \label{rule:publish} \zrefused{rule:publish} + +As plain text files, \texttt{Dockerfile}s are well suited for use with +version control systems. Including a \texttt{Dockerfile} alongside your +code and data is an effective way to consistently build your software, +to show visitors to the repository how it is built and used, to solicit +feedback and collaborate with your peers, and to increase the impact and +sustainability of your work (cf.~{[}48{]}). Most importantly, you should +publish \emph{all} files \texttt{COPY}ied into the image, e.g., test +data or files for software installation from source +(see~\hyperref[{rule:mount}]{Rule~\ztitleref{rule:mount}}), in the same +public repository as the \texttt{Dockerfile}, e.g., in a research +compendium. + +Online collaboration platforms (e.g., GitHub, GitLab) also make it easy +to use CI services to test building and executing your image in an +independent environment. Continuous integration increases stability and +trust, and it allows for images to be published automatically. +Automation strategies exist to build and test images for multiple +platforms and software versions, even with CI. Such approaches are often +used when developing popular software packages for a broad user base +operating across a wide range of target platforms and environments, and +they can be leveraged if you expect your workflow to fall into this +category. Furthermore, the commit messages in your version-controlled +repository preserve a record of all changes to the \texttt{Dockerfile}, +and you can use the same versions in tags for both the container image +and the git repository. + +Importantly, you should publish \emph{all} files \texttt{COPY}ied into +the container, e.g., test data, custom scripts or files for software +installation from source +(see~\hyperref[{rule:mount}]{Rule~\ztitleref{rule:mount}}) in the same +public repository as the \texttt{Dockerfile}, e.g., in a research +compendium. If you prefer to edit your scripts more interactively in a +running container (e.g., using Jupyter) then it may be more convenient +to bind mount their directory from the host at run time, provided all +changes are commited before sharing. + \hypertarget{mount-datasets-at-run-time}{% \section*{7. Mount datasets at run time}\label{mount-datasets-at-run-time}} @@ -1328,7 +1346,7 @@ \section*{7. Mount datasets at run You can use the \texttt{-v}/\texttt{-\/-volume} or preferably \texttt{-\/-mount} flags to \texttt{docker\ run} to configure bind -mounts of directories or files {[}46{]}, including options, as shown in +mounts of directories or files {[}49{]}, including options, as shown in the following examples. If the target path exists within the image, the bind mount will replace it for the started container. (Note, \texttt{\$HOME} is an environment variable in UNIX systems representing @@ -1362,11 +1380,10 @@ \section*{7. Mount datasets at run user \texttt{root} on your host, because that is the default user within the container. -\hypertarget{enable-interactive-usage-and-one-click-execution}{% -\section*{8. Enable interactive usage and one-click -execution}\label{enable-interactive-usage-and-one-click-execution}} -\addcontentsline{toc}{section}{8. Enable interactive usage and one-click -execution} +\hypertarget{make-it-one-click-runnable}{% +\section*{8. Make it one-click +runnable}\label{make-it-one-click-runnable}} +\addcontentsline{toc}{section}{8. Make it one-click runnable} \ztitlerefsetup{title=8} \zlabel{rule:interactive} \label{rule:interactive} \zrefused{rule:interactive} @@ -1374,7 +1391,7 @@ \section*{8. Enable interactive usage and one-click also \hyperref[{rule:usage}]{Rule~\ztitleref{rule:usage}}), because they support common interactive environments for data science and software development. But they are also useful for a ``headless'' execution of -full workflows. For example, {[}47{]} demonstrates a container for +full workflows. For example, {[}50{]} demonstrates a container for running an agent-based model with video files as outputs, and this article's \href{https://rmarkdown.rstudio.com/}{R Markdown} source, which included cells with analysis code, is @@ -1398,7 +1415,7 @@ \section*{8. Enable interactive usage and one-click write clear instructions for how to properly interact with the container, both for yourself and others. A possible weakness with using containers is that they can only provide one default entrypoint and -command. However, tools, e.g., The Scientific Filesystem {[}48{]}, have +command. However, tools, e.g., The Scientific Filesystem {[}51{]}, have been developed to expose multiple entrypoints, environments, help messages, labels, and even install sequences. With plain Docker, you can override the defaults as part of the \texttt{docker\ run} command or in @@ -1411,10 +1428,10 @@ \section*{8. Enable interactive usage and one-click instructions for its usage (see \hyperref[{rule:document}]{Rule~\ztitleref{rule:document}}). To support advanced custom configuration, it is helpful to expose settings via a -configuration file, which can be bind mounted from the host {[}47{]}, +configuration file, which can be bind mounted from the host {[}50{]}, via environment variables (see -\hyperref[{rule:pinning}]{Rule~\ztitleref{rule:pinning}} and {[}49{]}), -or via wrappers using Docker, such as Kliko {[}50{]}. +\hyperref[{rule:pinning}]{Rule~\ztitleref{rule:pinning}} and {[}52{]}), +or via wrappers using Docker, such as Kliko {[}53{]}. \scriptsize @@ -1502,7 +1519,7 @@ \section*{8. Enable interactive usage and one-click automated builds, e.g., by using a small toy example and checking the output, by checking successful responses from HTTP endpoints provided by the container, such as with an HTTP response code of \texttt{200}, or by -using a controller such as Selenium {[}51{]}. +using a controller such as Selenium {[}54{]}. The following example runs a simple R command counting the lines in this article's source file. The file path is passed as an environment @@ -1536,100 +1553,69 @@ \section*{8. Enable interactive usage and one-click If there is only a regular desktop application, the host's window manager can be connected to the container. Although this raises notable security issues, they can be addressed by using the ``X11 forwarding'' -natively supported by Singularity {[}52{]}, which can execute Docker +natively supported by Singularity {[}55{]}, which can execute Docker containers, or by leveraging supporting tools such as \texttt{x11docker} -{[}53{]}. Other alternatives include bridge containers {[}54{]} and +{[}56{]}. Other alternatives include bridge containers {[}57{]} and exposing a regular desktop via the browser (e.g., for Jupyter Hub -{[}55{]}). This variety of approaches renders seemingly more convenient +{[}58{]}). This variety of approaches renders seemingly more convenient uncontainerised environments unnecessary. Just using one's local machine is only slightly more comfortable but much less reproducible and portable. -\hypertarget{use-one-dockerfile-per-project-and-publish-it-with-a-version-control-system}{% -\section*{9. Use one Dockerfile per project and publish it with a -version control -system}\label{use-one-dockerfile-per-project-and-publish-it-with-a-version-control-system}} -\addcontentsline{toc}{section}{9. Use one Dockerfile per project and -publish it with a version control system} - -\ztitlerefsetup{title=9} \zlabel{rule:publish} \label{rule:publish} \zrefused{rule:publish} - -As plain text files, \texttt{Dockerfile}s are well suited for use with -version control systems. Including a \texttt{Dockerfile} alongside your -code and data is an effective way to consistently build your software, -to show visitors to the repository how it is built and used, to solicit -feedback and collaborate with your peers, and to increase the impact and -sustainability of your work (cf.~{[}56{]}). Most importantly, you should -publish \emph{all} files \texttt{COPY}ied into the image, e.g., test -data or files for software installation from source -(see~\hyperref[{rule:mount}]{Rule~\ztitleref{rule:mount}}), in the same -public repository as the \texttt{Dockerfile}, e.g., in a research -compendium. - -Online collaboration platforms (e.g., GitHub, GitLab) also make it easy -to use CI services to test building and executing your image in an -independent environment. Continuous integration increases stability and -trust, and it allows for images to be published automatically. -Automation strategies exist to build and test images for multiple -platforms and software versions, even with CI. Such approaches are often -used when developing popular software packages for a broad user base -operating across a wide range of target platforms and environments, and -they can be leveraged if you expect your workflow to fall into this -category. Furthermore, the commit messages in your version-controlled -repository preserve a record of all changes to the \texttt{Dockerfile}, -and you can use the same versions in tags for both the container image -and the git repository. +\hypertarget{order-the-instructions}{% +\section{9. Order the instructions}\label{order-the-instructions}} -While there are exceptions to the rule (cf.~{[}57{]}), it is generally -feasible to provide one \texttt{Dockerfile} per project. Alternatively, -you could write multiple \texttt{Dockerfile}s starting \texttt{FROM} one -another, i.e., write your own base images. However, because multiple -files scatter information across multiple places, we recommend avoiding -this at the cost of a longer \texttt{Dockerfile}, which can be mitigated -with formatting -(see~\hyperref[{rule:formatting}]{Rule~\ztitleref{rule:formatting}}). - -Importantly, you should publish \emph{all} files \texttt{COPY}ied into -the container, e.g., test data, custom scripts or files for software -installation from source -(see~\hyperref[{rule:mount}]{Rule~\ztitleref{rule:mount}}) in the same -public repository as the \texttt{Dockerfile}, e.g., in a research -compendium. If you prefer to edit your scripts more interactively in a -running container (e.g., using Jupyter) then it may be more convenient -to bind mount their directory from the host at run time, provided all -changes are commited before sharing. +\ztitlerefsetup{title=5} \zlabel{rule:order} \label{rule:order} \zrefused{rule:order} -It is likely going to be the case that over time you will work on -projects and develop images that are similar in nature to each other. -When developing or working on projects with containers, you can switch -between isolated project environments by stopping the container and -restarting it when you are ready to work again, even on another machine -or in a cloud environment. You can even run projects in parallel that do -not share ports without interference. To avoid constantly repeating -yourself, you should consider adopting a standard workflow that will -give you a clean slate for a new project. As an example, cookie cutter -templates {[}58{]} or community templates (e.g., {[}59{]}) can provide -the required structure and files, e.g., for documentation, CI, and -licenses, for getting started. If you decide to build your own cookie -cutter template, consider collaborating with your community during -development of the standard to ensure it will be useful to others. +You will regularly build an image during development of your workflow. +You can take advantage of \emph{build caching} to avoid execution of +time-consuming instructions, e.g., install from a remote resource or +copying a file that gets cached. Therefore, you should keep instructions +\emph{in order} of least likely to change to most likely to change. +Docker will execute the instructions in the order that they appear in +the \texttt{Dockerfile}; when one instruction is completed, the result +is cached, and the build moves to the next one. If you change something +in the Dockerfile and rebuild the container, each instruction is +inspected in turn. If it has not changed, the cached layer is used and +the build progresses. Conversely, if the line has changed, that build +step is executed afresh, and then every subsequent instruction will have +to be executed in case the changed line influences a later instruction. +You should regularly re-build the image using the \texttt{-\/-no-cache} +option to learn about broken instructions as soon as possible +(cf.~\hyperref[{rule:usage}]{Rule~\ztitleref{rule:usage}}). Such a +re-build is also a good occasion to revisit the order of instructions, +e.g., if you appended an instruction at the end to save time while +iteratively developing the \texttt{Dockerfile}, and the formatting. You +can add a version tag to the image before the re-build to make sure to +keep a working environment at hand. A recommended ordering based on +these considerations is as follows, and you can use comments to visually +separate these sections in your file (cf.~Listing~1): -Part of your project template should be a protocol for publishing the -\texttt{Dockerfile} and even exporting the image to a suitable location, -e.g., a container registry or data repository, taking into consideration -how your workflow can receive a DOI for citation. A template is -preferable to your own set of base images because of the maintenance -efforts the base images require. Therefore, instead of building your own -independent solution, consider contributing to existing suites of images -(see \hyperref[{rule:base}]{Rule~\ztitleref{rule:base}}) and improving -these for your needs. +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + System libraries +\item + Language-specific libraries or modules +\item + from repositories (i.e., binaries) +\item + from source (e.g., GitHub) +\item + Installation of your own software and scripts (if not mounted) +\item + Copying data and configuration files (if not mounted) +\item + Labels +\item + Entrypoint and default command +\end{enumerate} -\hypertarget{use-the-container-daily-rebuild-the-image-weekly-clean-up-and-preserve-if-need-be}{% -\section*{10. Use the container daily, rebuild the image weekly, clean -up and preserve if need -be}\label{use-the-container-daily-rebuild-the-image-weekly-clean-up-and-preserve-if-need-be}} -\addcontentsline{toc}{section}{10. Use the container daily, rebuild the -image weekly, clean up and preserve if need be} +\hypertarget{regularly-use-and-rebuild-containers}{% +\section*{10. Regularly use and rebuild +containers}\label{regularly-use-and-rebuild-containers}} +\addcontentsline{toc}{section}{10. Regularly use and rebuild containers} \ztitlerefsetup{title=10} \zlabel{rule:usage} \label{rule:usage} \zrefused{rule:usage} @@ -1646,7 +1632,7 @@ \section*{10. Use the container daily, rebuild the image weekly, clean First, it is a good habit to use your container every time you work on a project and not just as a final step during publication. If the container is the only platform you use, you can be highly confident that -you have properly documented the computing environment {[}60{]}. You +you have properly documented the computing environment {[}59{]}. You should prioritise this usage over others, e.g., non-interactive execution of a full workflow, because it gives you personally the highest value and does not limit your use or others' use of your data @@ -1681,14 +1667,7 @@ \section*{10. Use the container daily, rebuild the image weekly, clean \href{https://github.com/projectatomic/dockerfile_lint}{\texttt{dockerfile-lint}}, which you can integrate with your \texttt{Makefile}. -Third, from time to time you can reduce the system resources occupied by -Docker images and their layers or unused containers, volumes and -networks by running \texttt{docker\ system\ prune\ -\/-all}. After a -prune is performed, it follows naturally to rebuild a container for -local usage or to pull it again from a newly built registry image. This -habit can be automated with a cron job {[}61{]}. - -Fourth, you can export the image to file and deposit it in a public data +Third, you can export the image to file and deposit it in a public data repository, where it not only becomes citable but also provides a snapshot of the \emph{actual} environment you used at a specific point in time. You should include instructions for how to import and run the @@ -1716,7 +1695,7 @@ \section*{10. Use the container daily, rebuild the image weekly, clean submitting your reproducible workflow for peer-review, so you are well positioned for the future of scholarly communication and open science, where these may be standard practices required for publication -{[}21,62--64{]}. +{[}21,60--62{]}. \hypertarget{example-dockerfiles}{% \section{Example Dockerfiles}\label{example-dockerfiles}} @@ -1741,7 +1720,7 @@ \section*{Conclusion}\label{conclusion}} project. We encourage researchers to follow these steps taken by their peers to use \texttt{Dockerfile}s to practice reproducible research, and we encourage them to change the way they communicate towards -``preproducibility'' {[}65{]}, which values openness, transparency and +``preproducibility'' {[}63{]}, which values openness, transparency and honesty to find fascinating problems and advance science. So, we ask researchers, with their best efforts and with their current knowledge, to strive to write readable \texttt{Dockerfile}s for functional @@ -1767,7 +1746,7 @@ \section*{Acknowledgements}\label{acknowledgements}} 1632/17-1. DN and SJE are supported by a Mozilla mini science grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Dav -Clark who provided feedback on the preprint {[}66{]} of this paper. +Clark who provided feedback on the preprint {[}64{]} of this paper. \hypertarget{contributions}{% \section*{Author contributions}\label{contributions}} @@ -1956,171 +1935,160 @@ \section*{References}\label{references}} 32. Stencila. Stencila/dockta {[}Internet{]}. Stencila; 2019. Available: \url{https://github.com/stencila/dockta} +\leavevmode\hypertarget{ref-cookiecutter_contributors_cookiecutter_2019}{}% +33. \{Cookiecutter contributors\}. Cookiecutter/cookiecutter +{[}Internet{]}. cookiecutter; 2019. Available: +\url{https://github.com/cookiecutter/cookiecutter} + +\leavevmode\hypertarget{ref-marwick_rrtools_2019}{}% +34. Marwick B. Benmarwick/rrtools {[}Internet{]}. 2019. Available: +\url{https://github.com/benmarwick/rrtools} + \leavevmode\hypertarget{ref-docker_inc_official_2020}{}% -33. Docker Inc. Official Images on Docker Hub {[}Internet{]}. Docker +35. Docker Inc. Official Images on Docker Hub {[}Internet{]}. Docker Documentation. 2020. Available: \url{https://docs.docker.com/docker-hub/official_images/} \leavevmode\hypertarget{ref-preston-werner_semantic_2013}{}% -34. Preston-Werner T. Semantic Versioning 2.0.0 {[}Internet{]}. Semantic +36. Preston-Werner T. Semantic Versioning 2.0.0 {[}Internet{]}. Semantic Versioning. 2013. Available: \url{https://semver.org/} \leavevmode\hypertarget{ref-halchenko_open_2012}{}% -35. Halchenko YO, Hanke M. Open is Not Enough. Let's Take the Next Step: +37. Halchenko YO, Hanke M. Open is Not Enough. Let's Take the Next Step: An Integrated, Community-Driven Computing Platform for Neuroscience. Frontiers in Neuroinformatics. 2012;6. doi:\href{https://doi.org/10.3389/fninf.2012.00022}{10.3389/fninf.2012.00022} \leavevmode\hypertarget{ref-Wickham2019}{}% -36. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et +38. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the tidyverse. Journal of Open Source Software. The Open Journal; 2019;4: 1686. doi:\href{https://doi.org/10.21105/joss.01686}{10.21105/joss.01686} \leavevmode\hypertarget{ref-docker_multi-stage_2020}{}% -37. Docker Inc. Use multi-stage builds {[}Internet{]}. Docker +39. Docker Inc. Use multi-stage builds {[}Internet{]}. Docker Documentation. 2020. Available: \url{https://docs.docker.com/develop/develop-images/multistage-build/} \leavevmode\hypertarget{ref-goodman_dive_2019}{}% -38. Goodman A. Wagoodman/dive {[}Internet{]}. 2019. Available: +40. Goodman A. Wagoodman/dive {[}Internet{]}. 2019. Available: \url{https://github.com/wagoodman/dive} \leavevmode\hypertarget{ref-opencontainers_image-spec_2017}{}% -39. Opencontainers. Opencontainers/image-spec v1.0.1 - Annotations +41. Opencontainers. Opencontainers/image-spec v1.0.1 - Annotations {[}Internet{]}. GitHub. 2017. Available: \url{https://github.com/opencontainers/image-spec/blob/v1.0.1/annotations.md} \leavevmode\hypertarget{ref-the_python_software_foundation_requirements_2019}{}% -40. The Python Software Foundation. Requirements Files --- pip User +42. The Python Software Foundation. Requirements Files --- pip User Guide {[}Internet{]}. 2019. Available: \url{https://pip.pypa.io/en/stable/user_guide/\#requirements-files} \leavevmode\hypertarget{ref-continuum_analytics_managing_2017}{}% -41. Continuum Analytics. Managing environments --- conda documentation +43. Continuum Analytics. Managing environments --- conda documentation {[}Internet{]}. 2017. Available: \url{https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html} \leavevmode\hypertarget{ref-r_core_team_description_1999}{}% -42. R Core Team. The DESCRIPTION file in "writing r extensions" +44. R Core Team. The DESCRIPTION file in "writing r extensions" {[}Internet{]}. 1999. Available: \url{https://cran.r-project.org/doc/manuals/r-release/R-exts.html\#The-DESCRIPTION-file} \leavevmode\hypertarget{ref-eddelbuettel_littler_2019}{}% -43. Eddelbuettel D, Horner J. Littler: R at the command-line via 'r' +45. Eddelbuettel D, Horner J. Littler: R at the command-line via 'r' {[}Internet{]}. 2019. Available: \url{https://CRAN.R-project.org/package=littler} \leavevmode\hypertarget{ref-npm_creating_2019}{}% -44. npm. Creating a package.json file npm Documentation {[}Internet{]}. +46. npm. Creating a package.json file npm Documentation {[}Internet{]}. 2019. Available: \url{https://docs.npmjs.com/creating-a-package-json-file} \leavevmode\hypertarget{ref-julia_tomls_2019}{}% -45. The Julia Language Contributors. 10. Project.Toml and Manifest.Toml +47. The Julia Language Contributors. 10. Project.Toml and Manifest.Toml · Pkg.Jl {[}Internet{]}. 2019. Available: \url{https://julialang.github.io/Pkg.jl/v1/toml-files/} +\leavevmode\hypertarget{ref-emsley_framework_2018}{}% +48. Emsley I, De Roure D. A Framework for the Preservation of a Docker +Container International Journal of Digital Curation. International +Journal of Digital Curation. 2018;12. +doi:\href{https://doi.org/10.2218/ijdc.v12i2.509}{10.2218/ijdc.v12i2.509} + \leavevmode\hypertarget{ref-docker_use_2019}{}% -46. Docker Inc. Use bind mounts {[}Internet{]}. Docker Documentation. +49. Docker Inc. Use bind mounts {[}Internet{]}. Docker Documentation. 2019. Available: \url{https://docs.docker.com/storage/bind-mounts/} \leavevmode\hypertarget{ref-verstegen_pluc_mozambique_2019}{}% -47. Verstegen JA. JudithVerstegen/PLUC\_Mozambique: First release of +50. Verstegen JA. JudithVerstegen/PLUC\_Mozambique: First release of PLUC for Mozambique {[}Internet{]}. Zenodo; 2019. doi:\href{https://doi.org/10.5281/zenodo.3519987}{10.5281/zenodo.3519987} \leavevmode\hypertarget{ref-sochat_scientific_2018}{}% -48. Sochat V. The Scientific Filesystem. GigaScience. 2018;7. +51. Sochat V. The Scientific Filesystem. GigaScience. 2018;7. doi:\href{https://doi.org/10.1093/gigascience/giy023}{10.1093/gigascience/giy023} \leavevmode\hypertarget{ref-knoth_reproducibility_2017}{}% -49. Knoth C, Nüst D. Reproducibility and Practical Adoption of GEOBIA +52. Knoth C, Nüst D. Reproducibility and Practical Adoption of GEOBIA with Open-Source Software in Docker Containers. Remote Sensing. 2017;9: 290. doi:\href{https://doi.org/10.3390/rs9030290}{10.3390/rs9030290} \leavevmode\hypertarget{ref-molenaar_klikoscientific_2018}{}% -50. Molenaar G, Makhathini S, Girard JN, Smirnov O. Kliko---The +53. Molenaar G, Makhathini S, Girard JN, Smirnov O. Kliko---The scientific compute container format. Astronomy and Computing. 2018;25: 1--9. doi:\href{https://doi.org/10.1016/j.ascom.2018.08.003}{10.1016/j.ascom.2018.08.003} \leavevmode\hypertarget{ref-selenium_2019}{}% -51. Selenium contributors. SeleniumHQ/selenium {[}Internet{]}. Selenium; +54. Selenium contributors. SeleniumHQ/selenium {[}Internet{]}. Selenium; 2019. Available: \url{https://github.com/SeleniumHQ/selenium} \leavevmode\hypertarget{ref-singularity_frequently_2019}{}% -52. Singularity. Frequently Asked Questions Singularity {[}Internet{]}. +55. Singularity. Frequently Asked Questions Singularity {[}Internet{]}. 2019. Available: \url{http://singularity.lbl.gov/archive/docs/v2-2/faq\#can-i-run-x11-apps-through-singularity} \leavevmode\hypertarget{ref-viereck_x11docker_2019}{}% -53. Viereck M. X11docker: Run GUI applications in Docker containers. +56. Viereck M. X11docker: Run GUI applications in Docker containers. Journal of Open Source Software. 2019;4: 1349. doi:\href{https://doi.org/10.21105/joss.01349}{10.21105/joss.01349} \leavevmode\hypertarget{ref-yaremenko_docker-x11-bridge_2019}{}% -54. Yaremenko E. JAremko/docker-x11-bridge {[}Internet{]}. 2019. +57. Yaremenko E. JAremko/docker-x11-bridge {[}Internet{]}. 2019. Available: \url{https://github.com/JAremko/docker-x11-bridge} \leavevmode\hypertarget{ref-yuvipanda_jupyter-desktop-server_2019}{}% -55. Panda Y. Yuvipanda/jupyter-desktop-server {[}Internet{]}. 2019. +58. Panda Y. Yuvipanda/jupyter-desktop-server {[}Internet{]}. 2019. Available: \url{https://github.com/yuvipanda/jupyter-desktop-server} -\leavevmode\hypertarget{ref-emsley_framework_2018}{}% -56. Emsley I, De Roure D. A Framework for the Preservation of a Docker -Container International Journal of Digital Curation. International -Journal of Digital Curation. 2018;12. -doi:\href{https://doi.org/10.2218/ijdc.v12i2.509}{10.2218/ijdc.v12i2.509} - -\leavevmode\hypertarget{ref-kim_bio-docklets_2017}{}% -57. Kim B, Ali TA, Lijeron C, Afgan E, Krampis K. Bio-Docklets: -Virtualization Containers for Single-Step Execution of NGS Pipelines. -bioRxiv. 2017; 116962. -doi:\href{https://doi.org/10.1101/116962}{10.1101/116962} - -\leavevmode\hypertarget{ref-cookiecutter_contributors_cookiecutter_2019}{}% -58. \{Cookiecutter contributors\}. Cookiecutter/cookiecutter -{[}Internet{]}. cookiecutter; 2019. Available: -\url{https://github.com/cookiecutter/cookiecutter} - -\leavevmode\hypertarget{ref-marwick_rrtools_2019}{}% -59. Marwick B. Benmarwick/rrtools {[}Internet{]}. 2019. Available: -\url{https://github.com/benmarwick/rrtools} - \leavevmode\hypertarget{ref-marwick_readme_2015}{}% -60. Marwick B. README of 1989-excavation-report-Madjebebe. 2015; +59. Marwick B. README of 1989-excavation-report-Madjebebe. 2015; doi:\href{https://doi.org/10.6084/m9.figshare.1297059}{10.6084/m9.figshare.1297059} -\leavevmode\hypertarget{ref-wikipedia_contributors_cron_2019}{}% -61. Wikipedia contributors. Cron {[}Internet{]}. Wikipedia. 2019. -Available: -\url{https://en.wikipedia.org/w/index.php?title=Cron\&oldid=929379536} - \leavevmode\hypertarget{ref-eglen_codecheck_2019}{}% -62. Eglen S, Nüst D. CODECHECK: An open-science initiative to facilitate +60. Eglen S, Nüst D. CODECHECK: An open-science initiative to facilitate sharing of computer programs and results presented in scientific publications. Septentrio Conference Series. 2019; doi:\href{https://doi.org/10.7557/5.4910}{10.7557/5.4910} \leavevmode\hypertarget{ref-schonbrodt_training_2019}{}% -63. Schönbrodt F. Training students for the Open Science future. Nature +61. Schönbrodt F. Training students for the Open Science future. Nature Human Behaviour. 2019;3: 1031--1031. doi:\href{https://doi.org/10.1038/s41562-019-0726-z}{10.1038/s41562-019-0726-z} \leavevmode\hypertarget{ref-eglen_recent_2018}{}% -64. Eglen SJ, Mounce R, Gatto L, Currie AM, Nobis Y. Recent developments +62. Eglen SJ, Mounce R, Gatto L, Currie AM, Nobis Y. Recent developments in scholarly publishing to improve research practices in the life sciences. Emerging Topics in Life Sciences. 2018;2: 775--778. doi:\href{https://doi.org/10.1042/ETLS20180172}{10.1042/ETLS20180172} \leavevmode\hypertarget{ref-stark_before_2018}{}% -65. Stark PB. Before reproducibility must come preproducibility +63. Stark PB. Before reproducibility must come preproducibility {[}Internet{]}. Nature. 2018. doi:\href{https://doi.org/10.1038/d41586-018-05256-0}{10.1038/d41586-018-05256-0} \leavevmode\hypertarget{ref-nust_ten_2020}{}% -66. Nüst D, Sochat V, Marwick B, Eglen S, Head T, Hirst T. Ten Simple +64. Nüst D, Sochat V, Marwick B, Eglen S, Head T, Hirst T. Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science {[}Internet{]}. Open Science Framework; 2020 Apr. doi:\href{https://doi.org/10.31219/osf.io/fsd7t}{10.31219/osf.io/fsd7t}