You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32-38Lines changed: 32 additions & 38 deletions
Original file line number
Diff line number
Diff line change
@@ -21,18 +21,18 @@ ATACProc is a pipeline to analyze ATAC-seq data. Currently datasets involving on
21
21
22
22
5) Irreproducible Discovery Rate (IDR) analysis (https://github.com/nboley/idr) between a set of peak calls or even a set of input alignment (BAM) files (in which case, peaks are estimated first) corresponding to a set of biological or technical ATAC-seq replicates.
23
23
24
-
6)**New in version 2.0** Support discarding reads falling in blacklisted genomic regions
24
+
6)**New in version 2.0:** Support discarding reads falling in blacklisted genomic regions
25
25
26
-
7)*New in version 2.0* Support extracting nucleosome free reads (NFR), one or more nucleosome containing regions (denoted as +1M), for TF footprinting analysis.
26
+
7)**New in version 2.0:** Support extracting nucleosome free reads (NFR), one or more nucleosome containing regions (denoted as +1M), for TF footprinting analysis.
27
27
28
-
8)*New in version 2.0* Compatibility to the package ATAQV (https://github.com/ParkerLab/ataqv) for generating summary statistics across a set of samples.
28
+
8)**New in version 2.0:** Compatibility to the package ATAQV (https://github.com/ParkerLab/ataqv) for generating summary statistics across a set of samples.
29
29
30
30
#######################
31
31
32
32
Release notes
33
33
-----------------
34
34
35
-
*Version 2.0 - November 2019:*
35
+
**Version 2.0 - November 2019**
36
36
37
37
1) Included TF footprinting, optional discarding of blacklisted genomic regions, motif analysis
38
38
@@ -144,9 +144,9 @@ Following packages / libraries should be installed before running this pipeline:
144
144
python setupLogoData.py --all
145
145
146
146
147
-
*User should include the PATH of above mentioned libraries / packages inside their SYSTEM PATH variable. Alternatively, installation PATHS for some of these packages are to be mentioned in a separate configuration file (described below)*
147
+
**User should include the PATH of above mentioned libraries / packages inside their SYSTEM PATH variable. Alternatively, installation PATHS for some of these packages are to be mentioned in a separate configuration file (described below)**
148
148
149
-
*Following packages / libraries are to be installed for executing IDR code*
149
+
**Following packages / libraries are to be installed for executing IDR code**
150
150
151
151
9) sambamba (we have used version 0.6.7) <http://lomereiter.github.io/sambamba/>
152
152
@@ -168,60 +168,54 @@ Options:
168
168
Mandatory parameters:
169
169
170
170
-C ConfigFile
171
-
Configuration file to be separately provided. Mandatory parameter. Current package includes four sample configuration files named "configfile_*" corresponding to the reference genomes hg19, hg38, mm9 and mm10. Detailed description of the entries in this configuration file are mentioned later.
171
+
Configuration file to be separately provided. Mandatory parameter. Current package includes four sample configuration files named "configfile_*" corresponding to the reference genomes hg19, hg38, mm9 and mm10. Detailed description of the entries in this configuration file are mentioned later.
172
172
173
173
-f FASTQ1
174
-
Read 1 (or forward strand) of paired-end sequencing data [.fq|.gz|.bz2].
175
-
Or, even an aligned genome (.bam file; single or paired end alignment) can be provided.
174
+
Read 1 (or forward strand) of paired-end sequencing data [.fq|.gz|.bz2]. Or, even an aligned genome (.bam file; single or paired end alignment) can be provided.
176
175
177
176
-r FASTQ2
178
-
R2 of pair-end sequencing data [.fq|.gz|.bz2]. If not provided, and the -f parameter
179
-
is not a BAM file, the input is assumed to be single ended.
177
+
R2 of pair-end sequencing data [.fq|.gz|.bz2]. If not provided, and the -f parameter is not a BAM file, the input is assumed to be single ended.
180
178
181
179
-n PREFIX
182
-
Prefix string of output files. For example, -n "TEST" means that the
183
-
output filenames start with the string "TEST". Generally, sample names with run ID, lane information, etc. can be used as a prefix string.
180
+
Prefix string of output files. For example, -n "TEST" means that the output filenames start with the string "TEST". Generally, sample names with run ID, lane information, etc. can be used as a prefix string.
184
181
185
182
-g BOWTIE2_GENOME
186
-
Bowtie2 indexed reference genome. Basically, the folder containing bwt2 indices (corresponding to the reference genome) are to be provided.
187
-
Mandatory parameter if the user provides fastq files as input (-f and -r options).
188
-
If user provides .bam files as an input (-f option) then this field is optional.
183
+
Bowtie2 indexed reference genome. Basically, the folder containing bwt2 indices (corresponding to the reference genome) are to be provided. Mandatory parameter if the user provides fastq files as input (-f and -r options). If user provides .bam files as an input (-f option) then this field is optional.
189
184
190
185
-d OutDir
191
-
Output directory to store the results for the current sample.
186
+
Output directory to store the results for the current sample.
192
187
193
188
-c CONTROLBAM
194
-
Control file(s) used for peak calling using MACS2. One or more alignment files can be provided to be used as a control. It may not be specified at all, in which case MACS2 operates without any control. Control file can be either in *BAM* or in *tagalign.gz* format (the standalone script *bin/TagAlign.sh* in this repository converts BAM file to tagalign.gz format). For multiple control files, they all are required to be of the same format (i.e. either all BAM or all tagalign.gz). Example: -c control1.bam -c control2.bam puts two control files for using in MACS2.
189
+
Control file(s) used for peak calling using MACS2. One or more alignment files can be provided to be used as a control. It may not be specified at all, in which case MACS2 operates without any control. Control file can be either in *BAM* or in *tagalign.gz* format (the standalone script *bin/TagAlign.sh* in this repository converts BAM file to tagalign.gz format). For multiple control files, they all are required to be of the same format (i.e. either all BAM or all tagalign.gz). Example: -c control1.bam -c control2.bam puts two control files for using in MACS2.
195
190
196
191
-w BigWigGenome
197
-
Reference genome as a string. Allowed values are hg19 (default), hg38, mm9 and mm10. If -g option is enabled (i.e. the Bowtie2 index genome is provided), this field is optional. Otherwise, mandatory parameter.
192
+
Reference genome as a string. Allowed values are hg19 (default), hg38, mm9 and mm10. If -g option is enabled (i.e. the Bowtie2 index genome is provided), this field is optional. Otherwise, mandatory parameter.
198
193
199
194
-D DEBUG_TXT
200
-
Binary variable. If 1 (recommended), dumps QC statistics. For a set of samples, those QC statistics can be used later to profile QC variation among different samples.
195
+
Binary variable. If 1 (recommended), dumps QC statistics. For a set of samples, those QC statistics can be used later to profile QC variation among different samples.
201
196
202
197
-q MAPQ_THR
203
-
Mapping quality threshold for bowtie2 alignment. Aligned reads with quality below this threshold are discarded. Default = 30.
198
+
Mapping quality threshold for bowtie2 alignment. Aligned reads with quality below this threshold are discarded. Default = 30.
Binary variable. If 1, overwrites the existing files (if any). Default = 0.
206
+
Binary variable. If 1, overwrites the existing files (if any). Default = 0.
212
207
213
208
-t NUMTHREADS
214
-
Number of sorting, Bowtie2 mapping THREADS [Default = 1]. If multiprocessing core is available, user should specify values > 1 such as 4 or 8, for faster execution of Bowtie2.
209
+
Number of sorting, Bowtie2 mapping THREADS [Default = 1]. If multiprocessing core is available, user should specify values > 1 such as 4 or 8, for faster execution of Bowtie2.
215
210
216
211
-m MAX_MEM
217
-
Set max memory used for PICARD duplication removal [Default = 8G].
212
+
Set max memory used for PICARD duplication removal [Default = 8G].
218
213
219
214
-a ALIGNVALIDMAX
220
-
Set the number of (max) valid alignments which will be searched [Default = 4]
221
-
for Bowtie2.
215
+
Set the number of (max) valid alignments which will be searched [Default = 4] for Bowtie2.
222
216
223
217
-l MAXFRAGLEN
224
-
Set the maximum fragment length to be used for Bowtie2 alignment [Default = 2000]
218
+
Set the maximum fragment length to be used for Bowtie2 alignment [Default = 2000]
225
219
226
220
227
221
Entries in the configuration file (first parameter)
@@ -338,19 +332,19 @@ Within the folder *OutDir* (specified by the configuration option -d) following
*New in version 2.0* De-duplicated reads with shifted forward (+4bp) and reverse strands (-5bp) by Tn5 transposase. Used to extract the nucleosome free and nucleosome containing regions.
335
+
**New in version 2.0:** De-duplicated reads with shifted forward (+4bp) and reverse strands (-5bp) by Tn5 transposase. Used to extract the nucleosome free and nucleosome containing regions.
*New in version 2.0* Bed converted f7, used for MACS2 peak calling.
337
+
**New in version 2.0:** Bed converted f7, used for MACS2 peak calling.
344
338
f1-10: NucleosomeFree.bam
345
-
*New in version 2.0* Alignment with nucleosome free regions (NFR)
339
+
**New in version 2.0:** Alignment with nucleosome free regions (NFR)
346
340
f1-11: mononucleosome.bam
347
-
*New in version 2.0* Alignment with mononucleosome fragments
341
+
**New in version 2.0:** Alignment with mononucleosome fragments
348
342
f1-12: dinucleosome.bam
349
-
*New in version 2.0* Alignment with dinucleosome fragments
343
+
**New in version 2.0:** Alignment with dinucleosome fragments
350
344
f1-13: trinucleosome.bam
351
-
*New in version 2.0* Alignment with trinucleosome fragments
345
+
**New in version 2.0:** Alignment with trinucleosome fragments
352
346
f1-14: Merged_nucleosome.bam
353
-
*New in version 2.0* File containing fragments of nucleosome free and one or more nucleosomes (denoted as NFR +1M, in the HINT-ATAC genome biology paper). Generated by merging files f1-10 to f1-13.
347
+
**New in version 2.0:** File containing fragments of nucleosome free and one or more nucleosomes (denoted as NFR +1M, in the HINT-ATAC genome biology paper). Generated by merging files f1-10 to f1-13.
354
348
355
349
F2: Out_BigWig
356
350
f2-1: ${PREFIX}.bw
@@ -394,10 +388,10 @@ Within the folder *OutDir* (specified by the configuration option -d) following
394
388
Read count statistics.
395
389
396
390
F10: QC_ataqv_ParkerLab_Test
397
-
*New in version 2.0* Folder containing the summary .json files generated by the package ATAQV, which for diferent samples, can be combined to put a summary statistic and displayed in a Web browser.
391
+
**New in version 2.0:** Folder containing the summary .json files generated by the package ATAQV, which for diferent samples, can be combined to put a summary statistic and displayed in a Web browser.
398
392
399
393
F11: TSS_Enrichment_Peaks
400
-
*New in version 2.0* Processes the narrow peaks from the folder F4, and computes the TSS enrichment of these peaks. The underlying file structure is:
394
+
**New in version 2.0:** Processes the narrow peaks from the folder F4, and computes the TSS enrichment of these peaks. The underlying file structure is:
*New in version 2.0* TF footorinting analysis corresponding to the ChIP-seq peaks stored in F4. Here, ${CONTROLSTR} is either "*_No_Control" or "*_With_Control", depending on the use of control BAM file in inferring the peaks. ${FDRTHR} is either 0.01 or 0.05.
406
+
**New in version 2.0:** TF footorinting analysis corresponding to the ChIP-seq peaks stored in F4. Here, ${CONTROLSTR} is either "*_No_Control" or "*_With_Control", depending on the use of control BAM file in inferring the peaks. ${FDRTHR} is either 0.01 or 0.05.
413
407
414
408
The principle is to extract the peak summits and surroundings (by some bp, defined as an offset) and compute the TF footprinting regions and underlying motifs within these regions.
0 commit comments