Double and single stranded detection of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore sequencing

These scripts and notebooks contain code used in the analysis of nanopore sequence data for the publication: Double and single stranded detection of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore sequencing (in press).

New nanopore sequence data in raw fast5 format as well as aligned bam format (aligned to mm39) has been made available on the NCBI SRA under the BioProject: PRJNA1144670. BedMethyl format data has also been uploaded to the NCBI GEO archive.

How to use this repository

This repository is intended to aid reproduction of figures and statistics used in the associated publication. Users are recommended to download the extracted modified base information from the NCBI GEO repository where they are archived as GSE279860. Users will then need to replace filepaths used in these scripts to match their filesystem. Scripts are named according to the statistics, figure, or supplementary figure that they produce or are related to.

With the exception of large genomic references, for which links are provided, references and data files are provided in feature_references and data folders. Where this is not possible due to size, instructions are provided within scripts.

Source data

Source data can be downloaded for most plots and figures from Figshare using this link:

To reproduce this analysis from scratch, follow the README.md file in the data/ directory for links and information to all raw datasets.

Comparison with public datasets

This repository refers to public datasets, including oxBS-seq and TAB-seq data produced by Ma et al. (2017). This was downloaded from the Beijing Genome Sequence Archive (GSA), where it is stored under the experimental accession numbers: CRX008031 (https://ngdc.cncb.ac.cn/gsa/browse/CRA000145/CRR008807) and CRX008030 (https://ngdc.cncb.ac.cn/gsa/browse/CRA000145/CRR008808). These datasets were processed using the following pipeline:

Raw data was downloaded in fastq.gz format.
Raw data was trimmed using Trim Galore! (see docs. https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
Trimmed fastq were aligned to a bisulphite converted reference genome (mm39) using bismark (see docs. https://www.bioinformatics.babraham.ac.uk/projects/bismark/).
- Bisulphite converted reference genome prepared using bismark_genome_preparation
Duplicate reads were removed using deduplicate_bismark
Modified bases were extracted in CpG positions using bismark_methylation_extractor --merge_non_CpG --bedGraph --zero_based
Repeat sequences (repeatMasker) were removed using bedtools intersect -v mm39_repeatMasker.bed.

Additionally, we make use of hMeDIP-seq data procured from NCBI's Gene Expression Omnibus under accession: GSE25398. These datasets were processed using a pipeline comprised of Trim Galore!, Picard, and MACS2, which is available here: https://github.com/DominicOH/ChIP2MACS2.

Name		Name	Last commit message	Last commit date
Latest commit History 544 Commits
AnalysisTools		AnalysisTools
__pycache__		__pycache__
data		data
feature_references		feature_references
.gitignore		.gitignore
README.md		README.md
ctcf_notebook.ipynb		ctcf_notebook.ipynb
fig1_methylation_standards.py		fig1_methylation_standards.py
fig2_benchmark_modification_comparison.py		fig2_benchmark_modification_comparison.py
fig3_nanopore_hMeDIP-seq.py		fig3_nanopore_hMeDIP-seq.py
fig4_duplex_base-calling_notebook.ipynb		fig4_duplex_base-calling_notebook.ipynb
fig5_ctcf_analysis.py		fig5_ctcf_analysis.py
figs1_coverage_comparison.py		figs1_coverage_comparison.py
figs2_rmsd_depth_comparison.py		figs2_rmsd_depth_comparison.py
figs3_context_comparison.py		figs3_context_comparison.py
figs4_public_hMeDIP-seq_comparison.py		figs4_public_hMeDIP-seq_comparison.py
figs5_duplex_read_statistics.py		figs5_duplex_read_statistics.py
figs6_icr_heatmaps.py		figs6_icr_heatmaps.py
figs7_icr_readDistribution.py		figs7_icr_readDistribution.py
figs8_duplex_upsetPlots.py		figs8_duplex_upsetPlots.py
required_packages.yaml		required_packages.yaml
table_s2_methlyation_statistics.py		table_s2_methlyation_statistics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Double and single stranded detection of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore sequencing

How to use this repository

Source data

Comparison with public datasets

About

Releases 1

Packages

Languages

DominicOH/Analysis-of-nanopore-epigenetic-sequencing

Folders and files

Latest commit

History

Repository files navigation

Double and single stranded detection of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore sequencing

How to use this repository

Source data

Comparison with public datasets

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages