You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32-58Lines changed: 32 additions & 58 deletions
Original file line number
Diff line number
Diff line change
@@ -7,90 +7,64 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
7
7
8
8
### Features
9
9
10
-
***Portability**: Support for many cloud platforms (Google/DNAnexus) and cluster engines (SLURM/SGE/PBS).
11
-
***User-friendly HTML report**: tabulated quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/cross-correlation measures).
***Genomes**: Pre-built database for GRCh38, hg19, mm10, mm9 and additional support for custom genomes.
10
+
***Portability**: The pipeline run can be performed across different cloud platforms such as Google, AWS and DNAnexus, as well as on cluster engines such as SLURM, SGE and PBS.
11
+
***User-friendly HTML report**: In addition to the standard outputs, the pipeline generates an HTML report that consists of a tabular representation of quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/cross-correlation measures). An example of the [HTML report](https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/example_output/qc.html). The [json file](https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/example_output/qc.json) used in generating this report.
12
+
***Supported genomes**: Pipeline needs genome specific data such as aligner indices, chromosome sizes file and blacklist. We provide a genome database downloader/builder for hg38, hg19, mm10, mm9. You can also use this [builder](docs/build_genome_database.md) to build genome database from FASTA for your custom genome.
14
13
15
14
## Installation
16
15
17
-
1) Install [Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell). Make sure that you have python3(> 3.4.1) installed on your system.
16
+
1)[Install Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell).
18
17
19
-
```bash
20
-
$ pip install caper
21
-
```
18
+
> **IMPORTANT**: Make sure that you have python3(> 3.4.1) installed on your system.
22
19
23
-
2) Read through [Caper's README](https://github.com/ENCODE-DCC/caper) carefully.
20
+
```bash
21
+
$ pip install caper # use pip3 if it doesn't work
22
+
```
24
23
25
-
3) Run a pipeline with Caper.
24
+
2) Follow [Caper's README](https://github.com/ENCODE-DCC/caper) carefully. Find an instruction for your platform.
25
+
> **IMPORTANT**: Configure your Caper configuration file `~/.caper/default.conf` correctly for your platform.
26
26
27
-
## Running pipelines without Caper
27
+
3) Git clone this pipeline.
28
+
> **IMPORTANT*: use `~/chip-seq-pipeline2/chip.wdl` as `[WDL]` in Caper's documentation.
28
29
29
-
Caper uses the cromwell workflow execution engine to run the workflow on the platform you specify. While we recommend you use caper, if you want to run cromwell directly without caper you can learn about that [here](docs/deprecated/OLD_METHOD.md).
4) Install pipeline's [Conda environment](docs/install_conda.md) if you want to use Conda instead of Docker/Singularity. Conda is recommneded on local computer and HPCs (e.g. Stanford Sherlock/SCG). Use
36
+
> **IMPORTANT*: use `encode-chip-seq-pipeline` as `[PIPELINE_CONDA_ENV]` in Caper's documentation.
32
37
33
-
You can also run our pipeline on DNAnexus without using Caper or Cromwell. There are two ways to build a workflow on DNAnexus based on our WDL.
38
+
## Test input JSON file
34
39
35
-
1)[dxWDL CLI](docs/tutorial_dx_cli.md)
36
-
2)[DNAnexus Web UI](docs/tutorial_dx_web.md)
37
-
38
-
## Conda
39
-
40
-
We no longer recommend Conda for resolving dependencies and plan to phase out Conda support. Instead we recommend using Docker or Singularity. You can install Singularity and use it for our pipeline with Caper (by adding `--singularity` to command line arguments). Please see [this instruction](docs/install_conda.md).
40
+
Use `https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only_caper.json` as `[INPUT_JSON]` in Caper's documentation.
41
41
42
-
## Tutorial
42
+
## Input JSON file
43
43
44
-
Make sure that you have configured Caper correctly.
45
-
> **WARNING**: Do not run Caper on HPC login nodes. Your jobs can be killed.
44
+
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
46
45
47
-
All files (HTTP URLs) in `examples/caper/ENCSR936XTK_subsampled_chr19_only.json` will be recursively copied into Caper's temporary folder (`--tmp-dir`).
48
-
```bash
49
-
$ caper run chip.wdl -i examples/caper/ENCSR936XTK_subsampled_chr19_only.json --singularity
50
-
```
46
+
[Input JSON file specification](docs/input.md)
51
47
52
-
If you use Docker then replace `--singularity` with `--docker`.
53
-
```bash
54
-
$ caper run chip.wdl -i examples/caper/ENCSR936XTK_subsampled_chr19_only.json --docker
55
-
```
48
+
## Running a pipeline without Caper
56
49
57
-
If you use Conda then remove `--singularity` from the command line and activate pipeline's Conda env before running a pipeline.
58
-
```bash
59
-
$ # source activate encode-chip-seq-pipeline # for Conda < 4.6
60
-
$ conda activate encode-chip-seq-pipeline # for Conda >= 4.6
61
-
$ caper run chip.wdl -i examples/caper/ENCSR936XTK_subsampled_chr19_only.json
62
-
```
50
+
> **WARNING**: This method has been deprecated. There are many unfixed known bugs. We no longer support it.
63
51
64
-
To run it on an HPC (e.g. Stanford Sherlock and SCG). See details at [Caper's README](https://github.com/ENCODE-DCC/caper/blob/master/README.md#how-to-run-it-on-slurm-cluster).
52
+
Caper uses the cromwell workflow execution engine to run the workflow on the platform you specify. While we recommend you use caper, if you want to run cromwell directly without caper you can learn about that [here](docs/deprecated/OLD_METHOD.md).
65
53
66
-
## Input JSON file
54
+
## Running a pipeline on DNAnexus
67
55
68
-
An input JSON file includes all genomic data files, input parameters and metadata for running pipelines. Always use absolute paths in an input JSON.
56
+
You can also run this pipeline on DNAnexus without using Caper or Cromwell. There are two ways to build a workflow on DNAnexus based on our WDL.
69
57
70
-
[Input JSON file specification](docs/input.md)
58
+
1)[dxWDL CLI](docs/tutorial_dx_cli.md)
59
+
2)[DNAnexus Web UI](docs/tutorial_dx_web.md)
71
60
72
61
## How to organize outputs
73
62
74
-
Install [Croo](https://github.com/ENCODE-DCC/croo#installation). Make sure that you have python3(> 3.4.1) installed on your system.
63
+
Install [Croo](https://github.com/ENCODE-DCC/croo#installation). Make sure that you have python3(> 3.4.1) installed on your system. Find a `metadata.json` on Caper's output directory.
75
64
76
65
```bash
77
66
$ pip install croo
78
-
```
79
-
80
-
Find a `metadata.json` on Caper's output directory.
81
-
82
-
```bash
83
67
$ croo [METADATA_JSON_FILE]
84
68
```
85
69
86
-
## Useful tools
87
-
88
-
There are some useful tools to post-process outputs of the pipeline.
89
-
90
-
### qc_jsons_to_tsv
91
-
92
-
[This tool](utils/qc_jsons_to_tsv/README.md) recursively finds and parses all `qc.json` (pipeline's [final output](docs/example_output/v1.1.5/qc.json)) found from a specified root directory. It generates a TSV file that has all quality metrics tabulated in rows for each experiment and replicate. This tool also estimates overall quality of a sample by [a criteria definition JSON file](utils/qc_jsons_to_tsv/criteria.default.json) which can be a good guideline for QC'ing experiments.
93
-
94
-
### ENCODE downloader
95
-
96
-
[This tool](https://github.com/kundajelab/ENCODE_downloader) downloads any type (FASTQ, BAM, PEAK, ...) of data from the ENCODE portal. It also generates a metadata JSON file per experiment which will be very useful to make an input JSON file for the pipeline.
70
+
There is another [useful tool](utils/qc_jsons_to_tsv/README.md) to make a spreadsheet of QC metrics from multiple workflows. This tool recursively finds and parses all `qc.json` (pipeline's [final output](docs/example_output/v1.1.5/qc.json)) found from a specified root directory. It generates a TSV file that has all quality metrics tabulated in rows for each experiment and replicate. This tool also estimates overall quality of a sample by [a criteria definition JSON file](utils/qc_jsons_to_tsv/criteria.default.json) which can be a good guideline for QC'ing experiments.
0 commit comments