Skip to content

Pipeline for calling 45S methylation on the per unit level given an input set of ONT reads

Notifications You must be signed in to change notification settings

steven-solar/45S_methyl_caller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

45S_methyl_caller

Call as:

bash run_pipeline.sh $SCRIPT_DIR $alns.bam $chrs.txt $KY_ref.fa $out_dir $genome.bed $18S.fa $45S_on_KY.bed $TR_unit.fa

$SCRIPT_DIR directory of this code $alns.bam ONT aligned to chm13 passed $chrs.txt list of newline separated chromosomes of interest (ie. the acros) $KY_ref.fa path to the KY_ROT reference $out_dir path to directory where you want results $genome.bed bed file for calculating genome coverage $18S.fa reference of the 18S rRNA gene for estimating copy number $45S_on_KY.bed bed file of the 45S gene on the KY_rot reference for getting methylation of just the gene $TR_unit.fa the reference sequence of a canonical TR unit for estimating copy number in rdna units

General pipeline:

  1. makes file structure for outputs
  2. maps ONT reads to 45S ref (or any ref you give it), filters for 90% alignment block and 90% identity, filters out suspected chimeric reads (identified as containing inverted 45S units)
  3. splits up the alignments to get one alignment per file
  4. calls modkit on these split alignments, getting per read methylation info (also calls on all reads per category)
  5. summarizes data at the per group level, and the per unit level
  6. ordering analysis: are neighboring units methylated more similarly than random units?
  7. generates a per category violin plot

The outputs: Everything is contained within the folder specified as $out_dir.

$out_dir/alignment contains outputs from the alignment and splitting stages.

  • In the top level contains bams,beds,pafs for each group
  • In read_breakdown contains subdirectories for each group, then for each readname
  • Each readname folder contains a sam and bam per 45S unit

$out_dir/get_methylation contains outputs from running modkit to get methylation information from alignments.

  • modkit_beds contains the group level beds generated by modkit, as well as a logs folder
  • read_breakdown is organized as in the alignment folder, with each readname folder containing one bed file per unit, as well as a logs folder

$out_dir/methylation_analysis contains final summary data generated by the pipeline.

  • group_summary.txt contains the group-level summary of %methylation
  • read_summary.txt contains the per-unit level data of %methylation (each row represents one unit on one read)
  • output.png contains a violin plot of group-level %methylation
  • ordering_analysis.txt contains the results of ordering analysis

About

Pipeline for calling 45S methylation on the per unit level given an input set of ONT reads

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published