Skip to content

Commit e949db9

Browse files
committed
updated README
1 parent 520fb05 commit e949db9

File tree

1 file changed

+32
-38
lines changed

1 file changed

+32
-38
lines changed

README.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -21,18 +21,18 @@ ATACProc is a pipeline to analyze ATAC-seq data. Currently datasets involving on
2121

2222
5) Irreproducible Discovery Rate (IDR) analysis (https://github.com/nboley/idr) between a set of peak calls or even a set of input alignment (BAM) files (in which case, peaks are estimated first) corresponding to a set of biological or technical ATAC-seq replicates.
2323

24-
6) **New in version 2.0** Support discarding reads falling in blacklisted genomic regions
24+
6) **New in version 2.0:** Support discarding reads falling in blacklisted genomic regions
2525

26-
7) *New in version 2.0* Support extracting nucleosome free reads (NFR), one or more nucleosome containing regions (denoted as +1M), for TF footprinting analysis.
26+
7) **New in version 2.0:** Support extracting nucleosome free reads (NFR), one or more nucleosome containing regions (denoted as +1M), for TF footprinting analysis.
2727

28-
8) *New in version 2.0* Compatibility to the package ATAQV (https://github.com/ParkerLab/ataqv) for generating summary statistics across a set of samples.
28+
8) **New in version 2.0:** Compatibility to the package ATAQV (https://github.com/ParkerLab/ataqv) for generating summary statistics across a set of samples.
2929

3030
#######################
3131

3232
Release notes
3333
-----------------
3434

35-
*Version 2.0 - November 2019:*
35+
**Version 2.0 - November 2019**
3636

3737
1) Included TF footprinting, optional discarding of blacklisted genomic regions, motif analysis
3838

@@ -144,9 +144,9 @@ Following packages / libraries should be installed before running this pipeline:
144144
python setupLogoData.py --all
145145

146146

147-
*User should include the PATH of above mentioned libraries / packages inside their SYSTEM PATH variable. Alternatively, installation PATHS for some of these packages are to be mentioned in a separate configuration file (described below)*
147+
**User should include the PATH of above mentioned libraries / packages inside their SYSTEM PATH variable. Alternatively, installation PATHS for some of these packages are to be mentioned in a separate configuration file (described below)**
148148

149-
*Following packages / libraries are to be installed for executing IDR code*
149+
**Following packages / libraries are to be installed for executing IDR code**
150150

151151
9) sambamba (we have used version 0.6.7) <http://lomereiter.github.io/sambamba/>
152152

@@ -168,60 +168,54 @@ Options:
168168
Mandatory parameters:
169169

170170
-C ConfigFile
171-
Configuration file to be separately provided. Mandatory parameter. Current package includes four sample configuration files named "configfile_*" corresponding to the reference genomes hg19, hg38, mm9 and mm10. Detailed description of the entries in this configuration file are mentioned later.
171+
Configuration file to be separately provided. Mandatory parameter. Current package includes four sample configuration files named "configfile_*" corresponding to the reference genomes hg19, hg38, mm9 and mm10. Detailed description of the entries in this configuration file are mentioned later.
172172
173173
-f FASTQ1
174-
Read 1 (or forward strand) of paired-end sequencing data [.fq|.gz|.bz2].
175-
Or, even an aligned genome (.bam file; single or paired end alignment) can be provided.
174+
Read 1 (or forward strand) of paired-end sequencing data [.fq|.gz|.bz2]. Or, even an aligned genome (.bam file; single or paired end alignment) can be provided.
176175
177176
-r FASTQ2
178-
R2 of pair-end sequencing data [.fq|.gz|.bz2]. If not provided, and the -f parameter
179-
is not a BAM file, the input is assumed to be single ended.
177+
R2 of pair-end sequencing data [.fq|.gz|.bz2]. If not provided, and the -f parameter is not a BAM file, the input is assumed to be single ended.
180178

181179
-n PREFIX
182-
Prefix string of output files. For example, -n "TEST" means that the
183-
output filenames start with the string "TEST". Generally, sample names with run ID, lane information, etc. can be used as a prefix string.
180+
Prefix string of output files. For example, -n "TEST" means that the output filenames start with the string "TEST". Generally, sample names with run ID, lane information, etc. can be used as a prefix string.
184181

185182
-g BOWTIE2_GENOME
186-
Bowtie2 indexed reference genome. Basically, the folder containing bwt2 indices (corresponding to the reference genome) are to be provided.
187-
Mandatory parameter if the user provides fastq files as input (-f and -r options).
188-
If user provides .bam files as an input (-f option) then this field is optional.
183+
Bowtie2 indexed reference genome. Basically, the folder containing bwt2 indices (corresponding to the reference genome) are to be provided. Mandatory parameter if the user provides fastq files as input (-f and -r options). If user provides .bam files as an input (-f option) then this field is optional.
189184

190185
-d OutDir
191-
Output directory to store the results for the current sample.
186+
Output directory to store the results for the current sample.
192187

193188
-c CONTROLBAM
194-
Control file(s) used for peak calling using MACS2. One or more alignment files can be provided to be used as a control. It may not be specified at all, in which case MACS2 operates without any control. Control file can be either in *BAM* or in *tagalign.gz* format (the standalone script *bin/TagAlign.sh* in this repository converts BAM file to tagalign.gz format). For multiple control files, they all are required to be of the same format (i.e. either all BAM or all tagalign.gz). Example: -c control1.bam -c control2.bam puts two control files for using in MACS2.
189+
Control file(s) used for peak calling using MACS2. One or more alignment files can be provided to be used as a control. It may not be specified at all, in which case MACS2 operates without any control. Control file can be either in *BAM* or in *tagalign.gz* format (the standalone script *bin/TagAlign.sh* in this repository converts BAM file to tagalign.gz format). For multiple control files, they all are required to be of the same format (i.e. either all BAM or all tagalign.gz). Example: -c control1.bam -c control2.bam puts two control files for using in MACS2.
195190
196191
-w BigWigGenome
197-
Reference genome as a string. Allowed values are hg19 (default), hg38, mm9 and mm10. If -g option is enabled (i.e. the Bowtie2 index genome is provided), this field is optional. Otherwise, mandatory parameter.
192+
Reference genome as a string. Allowed values are hg19 (default), hg38, mm9 and mm10. If -g option is enabled (i.e. the Bowtie2 index genome is provided), this field is optional. Otherwise, mandatory parameter.
198193
199194
-D DEBUG_TXT
200-
Binary variable. If 1 (recommended), dumps QC statistics. For a set of samples, those QC statistics can be used later to profile QC variation among different samples.
195+
Binary variable. If 1 (recommended), dumps QC statistics. For a set of samples, those QC statistics can be used later to profile QC variation among different samples.
201196
202197
-q MAPQ_THR
203-
Mapping quality threshold for bowtie2 alignment. Aligned reads with quality below this threshold are discarded. Default = 30.
198+
Mapping quality threshold for bowtie2 alignment. Aligned reads with quality below this threshold are discarded. Default = 30.
204199
205200
-p PEAKCALLGENOMESIZE
206-
genome size parameter for MACS2 peak calling ("hs", "mm", "ce", "dm": default "hs")
201+
genome size parameter for MACS2 peak calling ("hs", "mm", "ce", "dm": default "hs")
207202

208203
Optional parameters:
209204

210205
-O Overwrite
211-
Binary variable. If 1, overwrites the existing files (if any). Default = 0.
206+
Binary variable. If 1, overwrites the existing files (if any). Default = 0.
212207
213208
-t NUMTHREADS
214-
Number of sorting, Bowtie2 mapping THREADS [Default = 1]. If multiprocessing core is available, user should specify values > 1 such as 4 or 8, for faster execution of Bowtie2.
209+
Number of sorting, Bowtie2 mapping THREADS [Default = 1]. If multiprocessing core is available, user should specify values > 1 such as 4 or 8, for faster execution of Bowtie2.
215210
216211
-m MAX_MEM
217-
Set max memory used for PICARD duplication removal [Default = 8G].
212+
Set max memory used for PICARD duplication removal [Default = 8G].
218213
219214
-a ALIGNVALIDMAX
220-
Set the number of (max) valid alignments which will be searched [Default = 4]
221-
for Bowtie2.
215+
Set the number of (max) valid alignments which will be searched [Default = 4] for Bowtie2.
222216
223217
-l MAXFRAGLEN
224-
Set the maximum fragment length to be used for Bowtie2 alignment [Default = 2000]
218+
Set the maximum fragment length to be used for Bowtie2 alignment [Default = 2000]
225219
226220

227221
Entries in the configuration file (first parameter)
@@ -338,19 +332,19 @@ Within the folder *OutDir* (specified by the configuration option -d) following
338332
f1-7: ${PREFIX}.align.sort.MAPQ${MAPQ_THR}.picard_metrics.txt
339333
PICARD metrics log file corresponding to the duplicate removal operation.
340334
f1-8: ${PREFIX}.align.sort.MAPQ${MAPQ_THR}_TN5_Shift.bam
341-
*New in version 2.0* De-duplicated reads with shifted forward (+4bp) and reverse strands (-5bp) by Tn5 transposase. Used to extract the nucleosome free and nucleosome containing regions.
335+
**New in version 2.0:** De-duplicated reads with shifted forward (+4bp) and reverse strands (-5bp) by Tn5 transposase. Used to extract the nucleosome free and nucleosome containing regions.
342336
f1-9: ${PREFIX}.align.sort.MAPQ${MAPQ_THR}_TN5_Shift.bed
343-
*New in version 2.0* Bed converted f7, used for MACS2 peak calling.
337+
**New in version 2.0:** Bed converted f7, used for MACS2 peak calling.
344338
f1-10: NucleosomeFree.bam
345-
*New in version 2.0* Alignment with nucleosome free regions (NFR)
339+
**New in version 2.0:** Alignment with nucleosome free regions (NFR)
346340
f1-11: mononucleosome.bam
347-
*New in version 2.0* Alignment with mononucleosome fragments
341+
**New in version 2.0:** Alignment with mononucleosome fragments
348342
f1-12: dinucleosome.bam
349-
*New in version 2.0* Alignment with dinucleosome fragments
343+
**New in version 2.0:** Alignment with dinucleosome fragments
350344
f1-13: trinucleosome.bam
351-
*New in version 2.0* Alignment with trinucleosome fragments
345+
**New in version 2.0:** Alignment with trinucleosome fragments
352346
f1-14: Merged_nucleosome.bam
353-
*New in version 2.0* File containing fragments of nucleosome free and one or more nucleosomes (denoted as NFR +1M, in the HINT-ATAC genome biology paper). Generated by merging files f1-10 to f1-13.
347+
**New in version 2.0:** File containing fragments of nucleosome free and one or more nucleosomes (denoted as NFR +1M, in the HINT-ATAC genome biology paper). Generated by merging files f1-10 to f1-13.
354348

355349
F2: Out_BigWig
356350
f2-1: ${PREFIX}.bw
@@ -394,10 +388,10 @@ Within the folder *OutDir* (specified by the configuration option -d) following
394388
Read count statistics.
395389

396390
F10: QC_ataqv_ParkerLab_Test
397-
*New in version 2.0* Folder containing the summary .json files generated by the package ATAQV, which for diferent samples, can be combined to put a summary statistic and displayed in a Web browser.
391+
**New in version 2.0:** Folder containing the summary .json files generated by the package ATAQV, which for diferent samples, can be combined to put a summary statistic and displayed in a Web browser.
398392

399393
F11: TSS_Enrichment_Peaks
400-
*New in version 2.0* Processes the narrow peaks from the folder F4, and computes the TSS enrichment of these peaks. The underlying file structure is:
394+
**New in version 2.0:** Processes the narrow peaks from the folder F4, and computes the TSS enrichment of these peaks. The underlying file structure is:
401395

402396
MACS2_Ext_*${CONTROLSTR}/macs2_narrowPeak_Q${FDRTHR}filt_Offset_${OFFSETVAL}/${PEAKTYPE}/*.pdf
403397

@@ -409,7 +403,7 @@ Within the folder *OutDir* (specified by the configuration option -d) following
409403

410404

411405
F12: Motif_MACS2_Ext_*${CONTROLSTR}_narrowPeak_Q${FDRTHR}filt
412-
*New in version 2.0* TF footorinting analysis corresponding to the ChIP-seq peaks stored in F4. Here, ${CONTROLSTR} is either "*_No_Control" or "*_With_Control", depending on the use of control BAM file in inferring the peaks. ${FDRTHR} is either 0.01 or 0.05.
406+
**New in version 2.0:** TF footorinting analysis corresponding to the ChIP-seq peaks stored in F4. Here, ${CONTROLSTR} is either "*_No_Control" or "*_With_Control", depending on the use of control BAM file in inferring the peaks. ${FDRTHR} is either 0.01 or 0.05.
413407

414408
The principle is to extract the peak summits and surroundings (by some bp, defined as an offset) and compute the TF footprinting regions and underlying motifs within these regions.
415409

0 commit comments

Comments
 (0)