forked from macs3-project/MACS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
508 lines (338 loc) · 15.9 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
Small fixes
* Replaced with a newest WigTrackI class and fixed the wignorm script.
2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.4.0rc2 (Valentine)
* --single-wig option is renamed to --single-profile
* BedGraph output with --bdg or -B option.
The BedGraph output provides 1bp resolution fragment pileup
profile. File size is smaller than wig file. This option can be
combined with --single-profile option to produce a bedgraph file
for the whole genome. This option can also make --space,
--call-subpeaks invalid.
* Fix the description of --shiftsize to correctly state that the
value is 1/2 d (fragment size).
* Fix a bug in the call to __filter_w_control_tags when control is
not available.
* Fix a bug on --to-small option. Now it works as expected.
* Fix a bug while counting the tags in candidate peak region, an
extra tag may be included. (Thanks to Jake Biesinger!)
* Fix the bug for the peaks extended outside of chromosome
start. If the minus strand tag goes outside of chromosome start
after extension of d, it will be thrown out.
* Post-process script for a combined wig file:
The "wignorm" command can be called after a full run of MACS14 as
a postprocess. wignorm can calculate the local background from the
control wig file from MACS14, then use either foldchange,
-10*log10(pvalue) from possion test, or difference after asinh
transformation as the score to build a single wig track to
represent the binding strength. This script will take a
significant long time to process.
* --wigextend has been obsoleted.
2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.4.0rc1 (Starry Sky)
* Duplicate reads option
--keep-dup behavior is changed. Now user can specify how many
reads he/she wants to keep at the same genomic location. 'auto' to
let MACS decide the number based on binomial distribution, 'all'
to let MACS keep all reads.
* pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
By default, MACS will now scale the smaller dataset to the bigger
dataset. For instance, if IP has 10 million reads, and Input has 5
million, MACS will double the lambda value calculated from Input
reads while calling BOTH the positive peaks and negative
peaks. This will address the issue caused by unbalanced numbers of
reads from IP and Input. If --to-small is turned on, MACS will
scale the larger dataset to the smaller one. So from now on, if d
is fixed, then the peaks from a MACS call for A vs B should be
identical to the negative peaks from a B vs A.
2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.4.0beta (summer wishes)
* New features
** Model building
The default behavior in the model building step is slightly
changed. When MACS can't find enough pairs to build model
(implemented in alpha version) or the modeled fragment length is
less than 2 times of tag length (implemented in beta version),
MACS will use 2 times of --shiftsize value as fragment length in
the later analysis. --off-auto can turn off this default behavior.
** Redundant tag filtering
The IO module is rewritten. The redundant tag filtering process
becomes simpler and works as promise. The maximum allowed number
of tags at the exact same location is calculated from the
sequencing depth and genome size using a binomial distribution,
for both TREAMENT and CONTROL separately. ( previously only
TREATMENT is considered ) The exact same location means the same
coordination and the same strand. Then MACS will only keep at most
this number of tags at the exact same location in the following
analysis. An option --keep-dup can let MACS skip the filtering and
keep all the tags. However this may bring in a lot of sequencing
bias, so you may get many false positive peaks.
** Single wiggle mode
First thing to mention, this is not the score track that I
described before. By default, MACS generates wiggle files for
fragment pileup for every chromosomes separately. When you use
--single-wig option, MACS will generate a single wiggle file for
all the chromosomes so you will get a wig.gz for TREATMENT and
another wig.gz for CONTROL if available.
** Sniff -- automatic format detection
Now, by default or "-f AUTO", MACS will decide the input file
format automatically. Technically, it will try to read at most
1000 records for the first 10 non-comment lines. If it succeeds,
the format is decided. I recommend not to use AUTO and specify the
right format for your input files, unless you combine different
formats in a single MACS run.
* Options changes
--single-wig and --keep-dup are added. Check previous section in
ChangeLog for detail.
-f (--format) AUTO is now the default option.
--slocal default: 1000
--llocal default: 10000
* Bug fixed
Setup script will stop the installation if python version is not
python2.6 or python2.7.
Local lambda calculation has been changed back. MACS will check
peak_region, slocal( default 1K) and llocal (default 10K) for the
local bias. The previous 200bps default will cause MACS misses
some peaks where the input bias is very sharp.
sam2bed.py script is corrected.
Relative pos in xls output is fixed.
Parser for ELAND_export is fixed to pass some of the no match
lines. And elandexport2bed.py is fixed too. ( however I can't
guarantee that it works on any eland_export files. )
2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.4.0alpha2 (be smarter)
* Options changes
--gsize now provides shortcuts for common genomes, including
human, mouse, C. elegans and fruitfly.
--llocal now will be 5000 bps if there is no input file, so that
local lambda doesn't overkill enriched binding sites.
2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.4alpha (be smarter)
* Options changes
--tsize option is redesigned. MACS will use the first 10 lines of
the input to decide the tag size. If user specifies --tsize, it
will override the auto decided tsize.
--lambdaset is replaced by --slocal and --llocal which mean the
small local region and large local region.
--bw has no effect on the scan-window size now. It only affects the
paired-peaks model process.
* Model building
During the model building, MACS will pick out the enriched regions
which are not too high and not too low to build the paired-peak
model. Default the region is from fold 10 to fold 30. If MACS
fails to build the model, by default it will use the nomodel
settings, like shiftsize=100bps, to shift and extend each
tags. This behavior can be turned off by '--off-auto'.
* Output files
An extra file including all the summit positions are saved in
*_summits.bed file. An option '--call-subpeaks' will invoke
PeakSplitter developed by Mali Salmon to split wide peaks into
smaller subpeaks.
* Sniff ( will in beta )
Automatically recognize the input file format, so use can combine
different format in one MACS run.
Not implemented features/TODO:
* Algorithms ( in near future? )
MACS will try to refine the peak boundaries by calculating the
scores for every point in the candidate peak regions. The score
will be the -10*log(10,pvalue) on a local poisson distribution. A
cutoff specified by users (--pvalue) will be applied to find the
precise sub-peaks in the original candidate peak region. Peak
boudaries and peak summits positions will be saved in separate BED
files.
* Single wiggle track ( in near future? )
A single wiggle track will be generated to save the scores within
candidate peak regions in the 10bps resolution. The wiggle file
is in fixedStep format.
2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.7.1 (Oktoberfest, bug fixed #1)
* bin/Constants.py
Fixed typo. FCSTEP -> FESTEP
* lib/PeakDetect.py
The 'femax' attribute bug is fixed
2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.7 (Oktoberfest)
* bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
Enhancements by Peter Chines:
1. gzip files are supported.
2. when --diag is on, user can set the increment and endpoint for
fold enrichment analysis by setting --fe-step and --fe-max.
Enhancements by Davide Cittaro:
1. BAM and SAM formats are supported.
2. small changes in the header lines of wiggle output.
Enhancements by Me:
1. I added --fe-min option;
2. Bowtie ascii output with suffix ".map" is supported.
Bug fixed:
1. --nolambda bug is fixed. ( reported by Martin in JHU )
2. --diag bug is fixed. ( reported by Bogdan Tanasa )
3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
4. Some "fold change" have been changed to "fold enrichment".
2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.6.1 (default parameter change)
* bin/macs, lib/PeakDetect.py
"--oldfdr" is removed. The 'oldfdr' behaviour becomes
default. "--futurefdr" is added which can turn on the 'new' method
introduced in 1.3.6. By default it's off.
* lib/PeakDetect.py
Fixed a bug. p-value is corrected a little bit.
2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.6 (Birthday cake)
* bin/macs
"track name" is added to the header of BED output file.
Now the default peak detection method is to consider 5k and 10k
nearby regions in treatment data and peak location, 1k, 5k, and
10k regions in control data to calculate local bias. The old
method can be called through '--old' option.
Information about how many total/unique tags in treatment or
control will be saved in final .xls output.
* lib/IO/__init__.py
".fa" will be removed from input tag alignment so only the
chromosome names are kept.
WigTrackI class is added for Wiggle like data structure. (not used
now)
The parser for ELAND multi PET files has been fixed. Now the 5'
tag position for a pair will be kept, whereas in the previous
version, the middle points are kept.
* lib/IO/BinKeeper.py
BinKeeperI class is inspired by Jim Kent's library for UCSC genome
browser, which can quickly access certain region for values in a
large wiggle like data file. (not used now)
* lib/OptValidator.py
typo fixed.
* lib/PeakDetect.py
Now the default peak detection method is to consider 5k and 10k
nearby regions in treatment data and peak location, 1k, 5k, and
10k regions in control data to calculate local bias. The old
method can be called through '--old' option.
Two columns have beed added to BED output file. 4th column: peak
name; 5th column: peak score using -10log(10,pvalue) as score.
* setup.py
Add support to build a Mac App through 'setup.py py2app', or a
Windows executable through 'setup.py py2exe'. You need to install
py2app or py2exe package in order to use these functions.
2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
* PeakDetect.py
Now, besides 1k, 5k, 10k, MACS will also consider peak size region
in control data to calculate local lambda for each peak. Peak
calling results will be slightly different with previous version,
beware!
* OptValidator.py
Typo fixed, ELANDParser -> ELANDResultParser
* OutputWriter.py
Now, modeled d value will be shown on the model figure.
2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
* macs, IO/__init__.py, PeakDetect.py
Add support for ELAND multi format. Add support for Pair-End
experiment, in this case, 5'end and 3'end ELAND multi format files
are required for treatment or control data. See 00README file for
detail.
Add wigextend option.
Add petdist option for Pair-End Tag experiment, which is the best
distance between 5' and 3' tags.
* PeakDetect.py
Fixed a bug which cause the end positions of every peak region
incorrectly added by 1 bp. ( Thanks Mali Salmon!)
* OutputWriter.py
Fix bugs while generating wiggle files. The start position of
wiggle file is set to 1 instead of 0.
Fix a bug that every 10M bps, signals in the first 'd' range are
lower than actual. ( Thanks Mali Salmon!)
2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.3 (wiggle bugs fixed)
* OutputWriter.py
Fix bugs while generating wiggle files. 1. 'span=' is added to
'variableStep' line; 2. previously, every 10M bps, the coordinates
were wrongly shifted to the right for 'd' basepairs.
* macs, PeakDetect.py
Add an option to save wiggle files on different resolution.
2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.2 (tiny bugs fixed)
* IO/__init__.py
Fix 65536 -> 65535. ( Thank Joon)
* Prob.py
Improved for binomial function with extra large number. Imported
from Cistrome project.
* PeakDetect.py
If treatment channel misses reads in some chromosome included in
control channel, or vice versa, MACS will not exit. (Thank Shaun
Mahony)
Instead, MACS will fake a tag at position -1 when calling
treatment peaks vs control, but will ignore the chromosome while
calling negative peaks.
2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3.1 (tiny bugs fixed version)
* Prob.py
Hyunjin Gene Shin contributed some codes to Prob.py. Now the
binomial functions can tolerate large and small numbers.
* IO/__init__.py
Parsers now split lines in BED/ELAND file using any
whitespaces. 'track' or 'browser' lines will be regarded as
comment lines. A bug fixed when throwing StrandFormatError. The
maximum redundant tag number at a single position can be no less
than 65536.
2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.3 (naming clarification version)
* Naming clarification changes according to our manuscript:
'frag_len' is changed to 'd'.
'fold_change' is changed to 'fold_enrichment'.
Suggest '--bw' parameter to be determined by users from the real
sonication size.
Maximum FDR is 100% in the output file.
And other clarifications in 00README file and the documents on the
website.
* IO/__init__.py
If the redundant tag number at a single position is over 32767,
just remember 32767, instead of raising an overflow exception.
* setup.py
fixed a typo.
* PeakDetect.py
Bug fixed for diagnosis report.
2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.2.2gamma
* Serious bugs fix:
Poisson distribution CDF and inverse CDF functions are
corrected. They can produce right results even for huge lambda
now. So that the p-value and FDR values in the final excel sheet
are corrected.
IO package now can tolerate some rare cases; ELANDParser in IO
package is fixed. (Thank Bogdan)
* Improvement:
Reverse paired peaks in model are rejected. So there will be no
negative 'frag_len'. (Thank Bogdan)
* Features added:
Diagnosis function is completed. Which can output a table file for
users to estimate their sequencing depth.
2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.2
* Probe.py is added!
GSL is totally removed from MACS. Instead, I have implemented the
CDF and inverse CDF for poisson and binomial distribution purely
in python.
* Constants.py is added!
Organize constants used in MACS in the Constants.py file.
* All other files are modified!
Foldchange calculation is modified. Now the foldchange only be
calculated at the peak summit position instead of the whole peak
region. The values will be higher and more robust than before.
Features added:
1. MACS can save wiggle format files containing the tag number at
every 10 bp along the genome. Tags are shifted according to our
model before they are calculated.
2. Model building and local lambda calculation can be skipped with
certain options.
3. A diagnosis report can be generated through '--diag'
option. This report can help you get an assumption about the
sequencing saturation. This funtion is only in beta stage.
4. FDR calculation speed is highly improved.
2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
Version 1.1
* TabIO, PeakModel.py ...
Bug fixed to let MACS tolerate some cases while there is no tag on
either plus strand or minus strand.
* setup.py
Check the version of python. If the version is lower than 2.4,
refuse to install with warning.