Skip to content

Commit bf6e291

Browse files
authored
Merge pull request #1 from scheiblr/dev
Version 1.0.0beta
2 parents 3bbea35 + 06bd3de commit bf6e291

File tree

13 files changed

+1180
-2
lines changed

13 files changed

+1180
-2
lines changed

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# intellij IDE
2+
.idea
3+
4+
# tools
5+
tools/annovar/*
6+
!tools/annovar/.gitkeep
7+
8+
tools/gatk/*
9+
!tools/gatk/.gitkeep

LICENSE

Lines changed: 661 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 153 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,155 @@
11
# MIRACUM-Pipe-docker
2-
application using the dockerized version of MIRACUM-Pipe
32

4-
Currently under heavy development
3+
This repo offers a framework to easily work with the dockerized version of [MIRACUM-Pipe](https://github.com/AG-Boerries/MIRACUM-Pipe)
4+
5+
## Setup and installation
6+
7+
In order to run the miracum pipeline, one needs to setup tools and databases which we are not allowed to ship due to license issues.
8+
We prepared this project in a way which allows you to easily add the specific components into the pipeline.
9+
Prior running the setup script, some components need to be installed manually interaction:
10+
11+
- tools
12+
- [annovar](http://download.openbioinformatics.org/annovar_download_form.php)
13+
- required additional database for annovar
14+
- create a database for the latest COSMIC release (according to the [annovar manual](http://annovar.openbioinformatics.org/en/latest/user-guide/filter/#cosmic-annotations))
15+
- Download [prepare_annovar_user.pl](http://www.openbioinformatics.org/annovar/download/prepare_annovar_user.pl) and add to annovar folder
16+
- register at [COSMIC](https://cancer.sanger.ac.uk/cosmic);
17+
- Download the latest release for GRCh37 (as of October 2019 the latest release is v90):
18+
- VCF/CosmicCodingMuts.vcf.gz
19+
- VCF/CosmicNonCodingVariants.vcf.gz
20+
- CosmicMutantExport.tsv.gz
21+
- CosmicNCV.tsv.gz
22+
- unzip all archives
23+
- commands to build the annovar database
24+
25+
```bash
26+
prepare_annovar_user.pl -dbtype cosmic CosmicMutantExport.tsv -vcf CosmicCodingMuts.vcf > hg19_cosmic_coding.txt
27+
prepare_annovar_user.pl -dbtype cosmic CosmicNCV.tsv -vcf CosmicNonCodingVariants.vcf > hg19_cosmic_noncoding.txt
28+
```
29+
30+
- Move both created files to the annovar/humandb folder.
31+
32+
- databases
33+
- [hallmarks of cancer](http://bbglab.irbbarcelona.org)
34+
- h.all.v7.0.entrez.gmt
35+
- [condel score](http://software.broadinstitute.org/gsea/msigdb/)
36+
- fannsdb.tsv.gz
37+
- fannsdb.tsv.gz.tbi
38+
39+
For the tool annovar you need the download link. Follow the url above and request the link by filling out the form. They will send you an email.
40+
While `setup.sh` is running you'll be asked to enter this download link. Alternatively you could also install annovar by manually extracting it into the folder `tools`.
41+
To install the databases install follow the link, register and download the listed files. Just place them into the folder `databaeses` of your cloned project.
42+
43+
Next, run the setup script. We recommend to install everything, which dows **not** include the example and reference data. There are also options to install and setup parts:
44+
45+
```bash
46+
./setup.sh -t all
47+
```
48+
49+
See `setup.sh -h` to list the available options. By default, we do not install the reference gene as well as our example. If you want to install it run
50+
51+
```bash
52+
# download and setup reference gene
53+
./setup.sh -t ref
54+
55+
# download and setup example data
56+
./setup.sh -t example
57+
```
58+
59+
## How to configure and run it
60+
61+
The project structure is as follows:
62+
63+
```shell
64+
.
65+
├── conf
66+
│ └── custom.yaml
67+
├── databases
68+
├── input
69+
├── output
70+
├── references
71+
├── tools
72+
├── LICENSE
73+
├── miracum_pipe.sh
74+
├── README.md
75+
└── setup.sh
76+
```
77+
78+
There are three levels of configuration:
79+
80+
- the docker file ships with [default.yaml](https://github.com/AG-Boerries/MIRACUM-Pipe/blob/master/conf/default.yaml) which is setup with default config parameters
81+
- `conf/custom.yaml` contains settings for the entire runtime environment and overwrites `default.yaml`'s values
82+
- In each patient directory one a `patient.yaml` can be created in which every setting of the other two configs can be overwritten.
83+
84+
### Setting up a patient
85+
86+
It is intended to create a patient folder in `input` for each patient containing `patient.yaml`. Further, we recommend to define in it at least the following parameters:
87+
88+
```yaml
89+
sex: XX # or XY
90+
annotation:
91+
germline: yes # default is no
92+
```
93+
94+
Place the germline R1 and R2 files as well as the tumor files (R1 and R2) into the folder. Either name them `germline_R{1/2}.fastqz.gz` and `tumor_R{1/2}.fastq.gz` or adjust your `patient.yaml` accordingly:
95+
96+
```yaml
97+
[..]
98+
common:
99+
files:
100+
tumor: tumor_R
101+
germline: germline_R
102+
```
103+
104+
### Run the pipeline
105+
106+
There are multiple possibilities to run the pipeline:
107+
108+
- run complete pipeline on one patient
109+
110+
```bash
111+
./run-pipeline -d rel_patient_folder
112+
```
113+
114+
- run a specific task on a given patient
115+
116+
```bash
117+
./run-pipeline -d rel_patient_folder -t task
118+
```
119+
120+
- run all unprocessed (no .processed file in the dir) patients
121+
122+
```bash
123+
./run-pipeline
124+
```
125+
126+
For more information see at the help of the command by running:
127+
128+
```bash
129+
./run-pipeline -h
130+
```
131+
132+
### Parallel computation
133+
134+
The MIRACUM-Pipe consits of five major steps (tasks) of which several can be computed in parallel:
135+
136+
- `td` and `gd`
137+
- `vc` and `cnv`
138+
- `report` which is the last task and bases onto the results of the 4 prior tasks
139+
140+
After the pipeline finishes successfully, it creates the file `.processed` into the patient's direcotry. Per default processed patients are skipped.
141+
The flag `-f` forces a recomputation and neglects that file. Furhtermore, sometimes it is required to rerun a single task. Therefore, use the flag `-t`.
142+
143+
## Logging
144+
145+
MIRACUM-pipe writes its logfiles into `output/<patient_name>/log`. For each task in the pipeline an own logfile is created. With the help of these logfiles one can monitor the current status of the pipeline process.
146+
147+
## Parallell & sequential computing
148+
149+
In `conf/custom.yaml` one can setup ressource parameters as cpucores and memory. If not intentionally called the pipeline on as single thread (sequentially), several tasks compute in parallel. The ressources are divided, thus you can enter the real 100% ressource you want to offer the entire pipline processes. Single threaded is intended to be used in case of limited hardware ressources or very large input files.
150+
151+
**BEWARE**: if you set tmp to be a tempfs (into ram), please consider this, while deciding the process ressources.
152+
153+
## License
154+
155+
This work is licensed under [GNU Affero General Public License version 3](https://opensource.org/licenses/AGPL-3.0).

assets/input/.gitkeep

Whitespace-only changes.

assets/output/.gitkeep

Whitespace-only changes.

assets/references/.gitkeep

Whitespace-only changes.

assets/references/sequencing/.gitkeep

Whitespace-only changes.

databases/.gitkeep

Whitespace-only changes.

databases/dbSNP/.gitkeep

Whitespace-only changes.

miracum_pipe.sh

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
#!/usr/bin/env bash
2+
3+
readonly DIR_MIRACUM="/opt/MIRACUM-Pipe"
4+
5+
function join_by { local IFS="$1"; shift; echo "$*"; }
6+
7+
function usage() {
8+
docker run -it --name run-miracum-pipeline --rm $1:$2 "${DIR_MIRACUM}"/miracum_pipe.sh -h
9+
echo ""
10+
echo "additional optional flags:"
11+
echo " -r set temporary folder into RAM"
12+
echo " -n docker repo name (default is agboerries/miracum_pipe)"
13+
echo " -v version specify version (default is \"latest\")"
14+
exit 1
15+
}
16+
17+
PARAM_DOCKER_REPO_NAME="agboerries/miracum_pipe"
18+
19+
while getopts t:d:v:n:fsrh option; do
20+
case "${option}" in
21+
t) readonly PARAM_TASK=$OPTARG;;
22+
f) readonly PARAM_FORCED=true;;
23+
d) readonly PARAM_DIR_PATIENT=$OPTARG;;
24+
v) PIPELINE_VERSION=$OPTARG;;
25+
r) readonly PARAM_RAM=$OPTARG;;
26+
s) readonly PARAM_SEQ=true;;
27+
n) PARAM_DOCKER_REPO_NAME=$OPTARG;;
28+
h) readonly SHOW_USAGE=true;;
29+
\?)
30+
echo "Unknown option: -$OPTARG" >&2
31+
exit 1
32+
;;
33+
:)
34+
echo "Missing option argument for -$OPTARG" >&2
35+
exit 1
36+
;;
37+
*)
38+
echo "Unimplemented option: -$OPTARG" >&2
39+
exit 1
40+
;;
41+
esac
42+
done
43+
44+
[[ -z "${PIPELINE_VERSION}" ]] && PIPELINE_VERSION='latest'
45+
[[ "${SHOW_USAGE}" ]] && usage "${PARAM_DOCKER_REPO_NAME}" "${PIPELINE_VERSION}"
46+
47+
# conf as volume
48+
if [[ -d $(pwd)/conf ]]; then
49+
readonly VOLUME_CONF="-v $(pwd)/conf/custom.yaml:${DIR_MIRACUM}/conf/custom.yaml"
50+
fi
51+
52+
# call script
53+
if [[ "${PARAM_FORCED}" ]]; then
54+
opt_args='-f'
55+
fi
56+
57+
if [[ "${PARAM_TASK}" ]]; then
58+
opt_args="${opt_args} -t ${PARAM_TASK}"
59+
fi
60+
61+
if [[ "${PARAM_SEQ}" ]]; then
62+
opt_args="${opt_args} -s"
63+
fi
64+
65+
if [[ "${PARAM_DIR_PATIENT}" ]]; then
66+
opt_args="${opt_args} -d ${PARAM_DIR_PATIENT}"
67+
fi
68+
69+
# tmp in ram
70+
if [[ "${PARAM_RAM}" ]]; then
71+
readonly TMP_RAM="--tmpfs /tmp:exec"
72+
fi
73+
74+
echo "running \"${DIR_MIRACUM}/miracum_pipe.sh ${opt_args}\" of docker miracumpipe:${PIPELINE_VERSION}"
75+
echo "---"
76+
docker run -it --name run-miracum-pipeline --rm ${TMP_RAM} ${VOLUME_CONF} \
77+
-u $(id -u $USER) \
78+
-v "$(pwd)/assets/input:${DIR_MIRACUM}/assets/input" \
79+
-v "$(pwd)/assets/output:${DIR_MIRACUM}/assets/output" \
80+
-v "$(pwd)/assets/references:${DIR_MIRACUM}/assets/references" \
81+
-v "$(pwd)/tools/annovar:${DIR_MIRACUM}/tools/annovar" \
82+
-v "$(pwd)/tools/gatk:${DIR_MIRACUM}/tools/gatk" \
83+
-v "$(pwd)/databases:${DIR_MIRACUM}/databases" ${PARAM_DOCKER_REPO_NAME}:"${PIPELINE_VERSION}" "${DIR_MIRACUM}/miracum_pipe.sh" ${opt_args}

0 commit comments

Comments
 (0)