Skip to content

Plasma WGS Fragmentomics

sprokopec edited this page Oct 6, 2023 · 4 revisions

Overview

How to run on H4H:

module load perl

perl /path/to/pughlab_fragmentomics_pipeline.pl \
-t /path/to/fragmentomics_pipeline_config.yaml \
-d /path/to/fragmentomics_data_config.yaml \
-o /path/to/output/directory \
-c slurm \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }

Directory Structure

PROJECT
├── logs
├── PATIENT1
│   ├── SAMPLE1
│   │   ├── breakpoints
│   │   ├── dinucleotide
│   │   ├── downsample
│   │   ├── end_motifs
│   │   ├── fragment_ratio
│   │   ├── fragment_score
│   │   ├── griffin
│   │   ├── insert_size
│   │   └── nucleosome_peaks
│   └── SAMPLE2
│       ├── breakpoints
│       ├── dinucleotide
│       ├── downsample
│       ├── end_motifs
│       ├── fragment_ratio
│       ├── fragment_score
│       ├── griffin
│       ├── insert_size
│       └── nucleosome_peaks
└── PATIENT2

Final outputs

For each SAMPLE:

  • downsample
    • downsampled BAM + BAI
    • bedpe files for all or q30 reads
  • breakpoints
    • per-nucleotide counts/frequency/ratio for each position (breakpoint +/- 15 bases)
  • dinucleotide
    • per-dinucleotide counts(raw) and frequency(contexts) for each position along the read
  • end_motifs
    • per-motif counts and frequencies
  • fragment_ratio
    • coverage-adjusted, GC-corrected per-5Mb bin fragment ratios of short (90-150bp) to normal (151-220bp) length reads
  • fragment_score
    • per-sample fragment score
  • insert_size
    • Picard insert-size metrics
  • nucleosome_peaks
    • per-chromosome peak distances (peak +/- 1000 bases)

Combined:

  • date_PROJECT_breakpoint_frequencies.tsv
  • date_PROJECT_dinucleotide_frequencies.tsv
  • date_PROJECT_endmotif_frequencies.tsv
  • date_PROJECT_fragment_scores.tsv
  • date_PROJECT_insert_size_summary.tsv
  • date_PROJECT_nucleosome_peak_distances.tsv
  • date_PROJECT_per5Mb_fragment_ratios.tsv