This is mouse brain tissue (11.5 days of development) where genomic DNA was extracted and nucleosome organisation was reconstituted. There were four libraries:
- two reduced libraries with all 5fC converted to 5hmC (A012 and A013)
- two normal libraries with normal 5fC (A004 and A006)
Sequencing quality was explored using fastqc
and looked acceptable for further analysis. Illumina sequencing adaptors were removed from the paired end sequencing data of mouse hindbrain and heart using cutadapt
:
cutadapt -m 10 -a AGATCGGAAGAGC -A AGATCGGAAGAGC -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz > output_statistics.txt
where R1.fastq.gz
and R2.fastq.gz
are the R1 and R2 paired end raw sequencing files with the _trimmed
counterparts obtained as output. Alignments to the mouse reference genome mm9
were generated using bwa mem
:
bwa mem -M -t 8 $ref $fq1 $fq2
where $ref
is the fasta reference genome file, and $fq1
and $fq2
correspond to the trimmed R1 and R2 sequencing files respectively.
Unmapped, not primary aligned and supplementary reads were filtered from the resulting alignment .bam
files using samtools view
. Reads aligning with low mapping quality or aligning to blacklisted genomic regions of the mm9
reference genome were also filtered out.
samtools view -@8 -S -u -F2820 -q 5 -L mm9-whitelist.bed input.bam | samtools sort -@8 - output.bam
where mm9-whitelist.bed
contains not blacklisted genomic regions and input.bam
is the alignment file obtained with bwa mem
. Resulting filtered alignment files were sorted by coordinate using samtools sort
. Overlapping read pairs were clipped using BamUtil
tools so they do not overlap:
bam clipOverlap --in input.bam --out output.bam --stats --storeOrig XC
In principle we do not need to clip overlapping read pairs because we are not interested in single nucleotide resolution at this stage, however clipping does not affect the downstream analyses anyway. Sequencing duplicates were marked using picard
:
java -Xmx3G -jar picard.jar MarkDuplicates I=input.bam O=output.bam M=log.markdup.txt
where input.bam
is the merged and sorted alignment file, with duplicates marked as output.bam
and processing log details written to log.markdup.txt
. Resulting files were indexed using samtools index
.
The code used, along with its usage instructions, is described here