Exploring the Differential Effects of Sequencing Resolution on Semi-Automated Genome Annotations

A project from Summer 2018 that compared the utility of newer ChIP methods ChIP-exo and ChIP-nexus on its effects on Segway's ability to generate annotations from this data of higher signal to noise ratio and near single base pair resolution.

Pipeline

Data Cleaning (data_preprocessing/)
- Download raw data from ENCODE project and NCBI SRA (getSRR.sh)
- Convert to bedgraph and sort data based on threshold cut off (fq_to_bam.sh)
- QC with PhantomPeakQualTools + peak calling with MACS2 to generate bedgraphs (MACS2/ & PhantomPeakQualTools/)
- Store data in genomedata archive (bedgraph_to_genomedata.sh)
Run Segway (segway/)
- Training then identification rounds (trainsegway.sh & annotate.sh)
- Set minibatch training of 10 round on 1% of the genome
- Try with 5 different resolutions: 100bp, 1bp, 2bp, 30bp, 50bp
Analyze Results
- Recolour segway annotations with 10 different colours for visualization in genome browser (segway/)
- Run stable marriage Hungarian algorithm on annotations (stablemarriage/)
- Graph heatmaps and bipartite graphs in R. Account for negative/NaNs by adding pseudocount (LOD/2) to all data points prior to normalizing (pseudocount/)
- All graph generating functions in datavisualization.R file, generated some in R notebook (can find in lab notebook)
Miscellaneous
- Script to clean up on the cluster (segway/seg_cleanup.sh)
- Alternate attempt at finding the LOD via the genomedata archives that were already generated (segway/runthroughcoords.py)
- Get average counts from bedgraph file (data_preprocessing/getavg.sh)
- Optional conversion from bigwig to wiggle format (data_preprocessing/bigwig_to_wiggle.sh)

Links:

Lab Notebook Final Presentation

###Acknowledgements The student researcher would like to thank Dr. Hoffman for the opportunity + resources and Francis Nguyen for mentorship. Special thanks to Coby, Sam, and Davide of the Hoffman Lab as well.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data_preprocessing		data_preprocessing
notebook		notebook
pseudocount		pseudocount
segway		segway
sequencing-resolution-analysis		sequencing-resolution-analysis
stablemarriage		stablemarriage
.hgignore		.hgignore
.hgignore.orig		.hgignore.orig
README.md		README.md
datavisualization.R		datavisualization.R
ignore		ignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Differential Effects of Sequencing Resolution on Semi-Automated Genome Annotations

Pipeline

Links:

About

Releases

Packages

Languages

hoffmangroup/sequencing-resolution-analysis

Folders and files

Latest commit

History

Repository files navigation

Exploring the Differential Effects of Sequencing Resolution on Semi-Automated Genome Annotations

Pipeline

Links:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages