Skip to content

Latest commit

 

History

History
57 lines (35 loc) · 2.96 KB

File metadata and controls

57 lines (35 loc) · 2.96 KB

Software

Processing

This is mouse brain tissue (11.5 days of development) where genomic DNA was extracted and nucleosome organisation was reconstituted. There were four libraries:

  • two reduced libraries with all 5fC converted to 5hmC (A012 and A013)
  • two normal libraries with normal 5fC (A004 and A006)

Sequencing quality was explored using fastqc and looked acceptable for further analysis. Illumina sequencing adaptors were removed from the paired end sequencing data of mouse hindbrain and heart using cutadapt:

cutadapt -m 10 -a AGATCGGAAGAGC -A AGATCGGAAGAGC -o R1_trimmed.fastq.gz -p R2_trimmed.fastq.gz R1.fastq.gz R2.fastq.gz > output_statistics.txt

where R1.fastq.gz and R2.fastq.gz are the R1 and R2 paired end raw sequencing files with the _trimmed counterparts obtained as output. Alignments to the mouse reference genome mm9 were generated using bwa mem:

bwa mem -M -t 8 $ref $fq1 $fq2

where $ref is the fasta reference genome file, and $fq1 and $fq2 correspond to the trimmed R1 and R2 sequencing files respectively.

Unmapped, not primary aligned and supplementary reads were filtered from the resulting alignment .bam files using samtools view. Reads aligning with low mapping quality or aligning to blacklisted genomic regions of the mm9 reference genome were also filtered out.

samtools view -@8 -S -u -F2820 -q 5 -L mm9-whitelist.bed input.bam | samtools sort -@8 - output.bam

where mm9-whitelist.bed contains not blacklisted genomic regions and input.bam is the alignment file obtained with bwa mem. Resulting filtered alignment files were sorted by coordinate using samtools sort. Overlapping read pairs were clipped using BamUtil tools so they do not overlap:

bam clipOverlap --in input.bam --out output.bam --stats --storeOrig XC

In principle we do not need to clip overlapping read pairs because we are not interested in single nucleotide resolution at this stage, however clipping does not affect the downstream analyses anyway. Sequencing duplicates were marked using picard:

java -Xmx3G -jar picard.jar MarkDuplicates I=input.bam O=output.bam M=log.markdup.txt

where input.bam is the merged and sorted alignment file, with duplicates marked as output.bam and processing log details written to log.markdup.txt. Resulting files were indexed using samtools index.

Computational prediction of nucleosome occupancy

The code used, along with its usage instructions, is described here