Managing multiple projects

Jump to bottom

Stephany Orjuela edited this page Feb 11, 2020 · 9 revisions

Here, we outline different ways to manage data and software if you want to run the ARMOR workflow on more than one project or data set:

Keep an ARMOR repository and the data of each project together in a single directory. In this way, the software (the Snakefile, the scripts, the Rmd files and the config.yaml) and data from each project are contained in a single directory. The configuration of the workflow will be physically separated for each project and thus, it will be easy to reproduce results. However, you will have ARMOR in multiple physical locations, which means the installed software will be duplicated if you are using the --use-conda option, because it makes a conda environment in that directory (e.g., ARMOR/.snakemake/conda/7a4f9e69).
Clone the ARMOR repository only once and have a separate directory for each project. In this way, the ARMOR directory can be reused for many different projects. This might be useful if you do not want to recreate conda environments for each project and will be using the same Snakefile and scripts for every project. In this case, you will need a different config.yaml file for each project (either in the ARMOR directory or in each project directory.). You will have to specify the path to the config.yaml file every time you want to run the workflow (e.g., snakemake --configfile projectX/config.yaml).
Do not update ARMOR in the middle of an analysis. Not only does this ensure reproducibility, but it avoids dependency clashes, and incompatible version mixing.

Further details can be found at the Running the analysis page.