Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MACS2 doesn't detect SE/PE automatically (at least for our bam files) #71

Open
iromeo opened this issue Nov 12, 2018 · 5 comments
Open
Assignees

Comments

@iromeo
Copy link
Contributor

iromeo commented Nov 12, 2018

Although MACS2 uses AUTO mode by default, our bam files with PE reads from GSM3074495 (part of GSE11622) it handles in SE mode.

So results in /mnt/stripe/bio/raw-data/geo-samples/GSE112622/chipseq/fastq_bams_macs2_broad_0.1/ are questionable.

Green peaks - PE, Yellow peaks - SE
image

SE peaks are over peak called compared to PE peaks:
image

============ Track (1 / 2): 'all on all mm genome  [BAMPE]' ============
Bed Format: (bed6+3, '\t')
Track source: PE.broadPeak
Source size: 340,7 kb
Peaks Statistics:
  Peaks number: 3 446
  Peaks summary length : 1 173 773 bp
  Genome (mm10 [all chromosomes]) coverage: 0.00%
  Lengths:
    Min: 135 bp
    Mean: 340 bp
    Max: 3 594 bp
    5%: 139 bp
    50%: 245 bp
    95%: 757 bp


============ Track (2 / 2): 'all on all mm genome  [AUTO->BAM]' ============
Bed Format: (bed6+3, '\t')
Track source: SE.broadPeak
Source size: 4,3 mb
Peaks Statistics:
  Peaks number: 46 879
  Peaks summary length : 17 742 828 bp
  Genome (mm10 [all chromosomes]) coverage: 0.01%
  Lengths:
    Min: 123 bp
    Mean: 378 bp
    Max: 3 619 bp
    5%: 123 bp
    50%: 327 bp
    95%: 850 bp
@iromeo
Copy link
Contributor Author

iromeo commented Nov 13, 2018

To be honest I found SE peaks here better because they visually cover more peaks in bam coverage. May be our BAM files miss some valuable information about PE, need to compare with Chip-Atlas, I hope they have this files processed (bam and peak)

@iromeo
Copy link
Contributor Author

iromeo commented Nov 13, 2018

I decided to compare our peak calling in auto-mode (SE) and PE with Chip Atlas to figure out what mode could the use and whether PE mode is works for ULI-ChipSeq data
GSE11622 isn't there, so we can take GSE63523
(related files are in /mnt/stripe/rcherniatchik/GSE63523_check)

I took:

  • long mark: H3K27me3 (GSM1551530)
  • short mark: H3K4me3 (GSM1551533)

Chip-Atlas has only narrow peaks with Q-value thresholds (-q 1e-05, 1e-10, or 1e-20), so I took only 1e-05, 1e-10 thresholds and did macs2 peakcalling using this threshold in PE and SE modes.

General observations

  • So seems PE mode isn't bad for ULI data and is quite usable for pair-end reads, not clear why we observe a huge difference in GSE11622 dataset.
  • ChipAtals includes our peaks but has more peaks although we use same qvalue settings & narrow mode (I used cmdline like macs2 callpeak -g mm -t ../GSM1551533_TT2_H3K4me3_100k_mm10.bam -n GSM1551533_05 -q 1e-05)

Not clear why the difference with Chip-Atlas is observed? Is our BAM processing correct?

Details:

Chromsome 1 full size:
image

H3K4me3

  • broadPeaks
    According to overlap PE and SE doesn't differ much, although SE mode contains more peaks (73% of SE peaks are in PE)
  • narrowPeaks q-value 1e-05, 1e-10
    PE includes SE + additional peaks, Chip-Atlas has many unique peaks.

image

Overlap for GSM1551533 tracks with different settings (broad SE, broad PE, ours & chip-atlas (narrow 1e-05, narrow 1e-10)),
image

H3K27me3

  • broadPeaks
    PE intersects 78% SE, 47% PE peaks intersects SE peaks. So more unique PE peaks
  • narrowPeaks q-value 1e-05, 1e-10
    PE and SE are more or less consistent (>70% in each direction), Chip-Atlas has much more peaks

image

Overlap for GSM1551530 tracks with different settings (broad SE, broad PE, ours & chip-atlas (narrow 1e-05, narrow 1e-10)),
image

image

@iromeo
Copy link
Contributor Author

iromeo commented Nov 13, 2018

Not sure whether it is related or not, but BIGWIG provides but us and GSE11622 are slightly different. Maybe we are using different bam files for Macs2 input compared to other results.

Difference visible in JBR and IGV

More or less similar on long range
image

Some difference at large zoom
image

P.S: We could also compare Chip-Atlas and ours bigwigs for GSE63523

@iromeo
Copy link
Contributor Author

iromeo commented Nov 13, 2018

According to MACS2 sources

[macs2_sources]/MACS2/IO/Parser.pyx
Note: BAMPE and BEDPE can't be automatically detected.

So "AUTO" mode will not help and we need to select BAMPE format manually

@iromeo
Copy link
Contributor Author

iromeo commented Nov 13, 2018

Also seems MACS2 treats sorted file (by coordinate or name) differently, see https://groups.google.com/forum/#!topic/macs-announcement/Vh916L3tOOE.

Our files are not sorted by name and read pairs aren't going one after another:
image

In our case sorting fixes read order but doesn't affect MACS2 peak calling results. We get absolutely same results for AUTO (SE) and PE modes

samtools sort -n in.bam -o > out.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants