-
Notifications
You must be signed in to change notification settings - Fork 21
/
chromap.1
296 lines (288 loc) · 6.79 KB
/
chromap.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
.TH chromap 1 "25 Jan 2024" "chromap-0.2.6 (r490)" "Bioinformatics tools"
.SH NAME
.PP
chromap - fast alignment and preprocessing of chromatin profiles
.SH SYNOPSIS
* Indexing the reference genome:
.RS 4
chromap
.B -i
.RB [ -k
.IR kmer ]
.RB [ -w
.IR miniWinSize ]
.B -r
.I ref.fa
.B -o
.I ref.index
.RE
* Mapping (sc)ATAC-seq reads:
.RS 4
chromap
.B --preset
.I atac
.B -r
.I ref.fa
.B -x
.I ref.index
.B -1
.I read1.fq
.B -2
.I read2.fq
.B -o
.I aln.bed
.RB [ -b
.IR barcode.fq.gz ]
.RB [ --barcode-whitelist
.IR whitelist.txt ]
.RE
* Mapping ChIP-seq reads:
.RS 4
chromap
.B --preset
.I chip
.B -r
.I ref.fa
.B -x
.I ref.index
.B -1
.I read1.fq
.B -2
.I read2.fq
.B -o
.I aln.bed
.RE
* Mapping Hi-C reads:
.RS 4
chromap
.B --preset
.I hic
.B -r
.I ref.fa
.B -x
.I ref.index
.B -1
.I read1.fq
.B -2
.I read2.fq
.B -o
.I aln.pairs
.br
chromap
.B --preset
.I hic
.B -r
.I ref.fa
.B -x
.I ref.index
.B -1
.I read1.fq
.B -2
.I read2.fq
.B --SAM
.B -o
.I aln.sam
.RE
.SH DESCRIPTION
.PP
Chromap is an ultrafast method for aligning and preprocessing high throughput
chromatin profiles. Typical use cases include: (1) trimming sequencing adapters,
mapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing
duplicates; (2) trimming sequencing adapters, mapping single cell ATAC-seq
genomic reads to the human genome, correcting barcodes, removing duplicates and
performing Tn5 shift; (3) split alignment of Hi-C reads against a reference
genome. In all these three cases, Chromap is 10-20 times faster while being
accurate.
.SH OPTIONS
.SS Indexing options
.TP 10
.BI -k \ INT
Minimizer k-mer length [17].
.TP
.BI -w \ INT
Minimizer window size [7]. A minimizer is the smallest k-mer
in a window of w consecutive k-mers.
.TP
.B --min-frag-length
Min fragment length for choosing k and w automatically [30]. Users can increase
this value when the min length of the fragments of interest is long, which can
increase the mapping speed. Note that the default value 30 is the min fragment
length that chromap can map.
.SS Mapping options
.TP 10
.BI --split-alignment
Allow split alignments. This option should be set only when mapping Hi-C reads.
.TP
.BI -e \ INT
Max edit distance allowed to map a read [8].
.TP
.BI -s \ INT
Min number of minimizers required to map a read [2].
.TP
.BI -f \ INT1 [, INT2 ]
Ignore minimizers occuring more than
.I INT1
[500] times.
.I INT2
[1000] is the threshold for a second round of seeding.
.TP
.BI -l \ INT
Max insert size, only for paired-end read mapping [1000].
.TP
.BI -q \ INT
Min MAPQ in range [0, 60] for mappings to be output [30].
.TP
.BI --min-read-length \ INT
Skip mapping the reads of length less than
.I INT
[30]. Note that this is different from the index option
.BR --min-frag-length
, which set
.BR -k
and
.BR -w
for indexing the genome.
.TP
.BI --trim-adapters
Try to trim adapters on 3'. This only works for paired-end reads. When the
fragment length indicated by the read pair is less than the length of the reads,
the two mates are overlapped with each other. Then the regions outside the
overlap are regarded as adapters and trimmed.
.TP
.BI --remove-pcr-duplicates
Remove PCR duplicates.
.TP
.BI --remove-pcr-duplicates-at-bulk-level
Remove PCR duplicates at bulk level for single cell data.
.TP
.BI --remove-pcr-duplicates-at-cell-level
Remove PCR duplicates at cell level for single cell data.
.TP
.BI --Tn5-shift
Perform Tn5 shift. When this option is turned on, the forward mapping start
positions are increased by 4bp and the reverse mapping end positions are
decreased by 5bp. Note that this works only when
.BR --SAM
is NOT set.
.TP
.BI --low-mem
Use low memory mode. When this option is set, multiple temporary intermediate
mapping files might be generated on disk and they are merged at the end of
processing to reduce memory usage. When this is NOT set, all the mapping results
are kept in the memory before they are saved on disk, which works more
efficiently for datasets that are not too large.
.TP
.BI --bc-error-threshold \ INT
Max Hamming distance allowed to correct a barcode [1]. Note that the max
supported threshold is 2.
.TP
.BI --bc-probability-threshold \ FLT
Min probability to correct a barcode [0.9]. When there are multiple whitelisted
barcodes with the same Hamming distance to the barcode to correct, chromap will
process the base quality of the mismatched bases, and compute a probability that
the correction is right.
.TP
.BI -t \ INT
The number of threads for mapping [1].
.SS Input options
.TP 10
.BI -r \ FILE
Reference file.
.TP
.BI -x \ FILE
Index file.
.TP
.BI -1 \ FILE
Single-end read files or paired-end read files 1. Chromap supports mulitple
input files concatenate by ",". For example, setting this option to
"Library1_R1.fastq.gz,Library2_R1.fastq.gz,Library3_R1.fastq.gz" will make
all three files as input and map them in this order. Similarly,
.BR -2
and
.BR -b
also support multiple input files. And the ordering of the input files for all
the three options should match.
.TP
.BI -2 \ FILE
Paired-end read files 2.
.TP
.BI -b \ FILE
Cell barcode files.
.TP
.BI --barcode-whitelist \ FILE
Cell barcode whitelist file. This is supposed to be a txt file where each line
is a whitelisted barcode.
.TP
.BI --read-format \ STR
Format for read files and barcode files ["r1:0:-1,bc:0:-1"] as 10x Genomics
single-end format.
.SS Output options
.TP 10
.BR -o \ FILE
Output file.
.TP
.BR --output-mappings-not-in-whitelist
Output mappings with barcode not in the whitelist.
.TP
.BR --chr-order \ FILE
Custom chromosome order file. If not specified, the order of reference sequences will be used.
.TP
.BR --BED
Output mappings in BED/BEDPE format. Note that only one of the formats should be
set.
.TP
.BR --TagAlign
Output mappings in TagAlign/PairedTagAlign format.
.TP
.BR --SAM
Output mappings in SAM format.
.TP
.BR --pairs
Output mappings in pairs format (defined by 4DN for HiC data).
.TP
.BR --pairs-natural-chr-order \ FILE
Custom chromosome order file for pairs flipping. If not specified, the custom chromosome order will be used.
.TP
.BR --barcode-translate \ FILE
Convert input barcodes to another set of barcodes in the output.
.TP
.BR --summary \ FILE
Summarize the mapping statistics at bulk or barcode level.
.TP
.B -v
Print version number to stdout.
.SS Preset options
.TP 10
.BI --preset \ STR
Preset []. This option applies multiple options at the same time. It should be
applied before other options because options applied later will overwrite the
values set by
.BR --preset .
Available
.I STR
are:
.RS
.TP 10
.B chip
Mapping ChIP-seq reads
.RB ( -l
.I 2000
.B --remove-pcr-duplicates --low-mem
.BR --BED ).
.TP
.B atac
Mapping ATAC-seq/scATAC-seq reads
.RB ( -l
.I 2000
.B --remove-pcr-duplicates --low-mem --trim-adapters --Tn5-shift
.B --remove-pcr-duplicates-at-cell-level
.BR --BED ).
.TP
.B hic
Mapping Hi-C reads
.RB ( -e
.I 4
.B -q
.I 1
.B --low-mem --split-alignment
.BR --pairs ).