ont_analysis_pipeline

Overview

A lightweight SLURM-based pipeline of shell scripts for Oxford Nanopore Technologies (ONT) data processing. The repository contains a set of sbatch scripts and helper shell scripts to perform basecalling, demultiplexing, alignment, merging, bigWig generation, structural variant calling, and cleanup across one or many samples.

Key features

Modular steps implemented as numbered sbatch scripts (1..9) so individual stages can be run or re-run independently.
Designed to run on HPC clusters using SLURM and common ONT tools (Dorado, minimap2, samtools, etc.).
Simple submit wrappers to run the full pipeline or only per-sample stages.

Prerequisites

A SLURM scheduler and an HPC environment.
ONT tools: Dorado (or preferred basecaller), guppy/dorado demuxing toolchain where appropriate.
Standard bioinformatics tools: samtools, minimap2, bedtools, deeptools (for bigWig), and any structural variant callers used in your environment.
Bash (>=4), basic GNU coreutils, and any site-specific modules loaded on the cluster.

Repository layout

1_copy_from_grid.sbatch - step for copying raw data from grid storage
2_dorado_basecall.sbatch - basecalling (Dorado)
3_dorado_demux.sbatch - demultiplexing
4_samtools_merge_barcode.sbatch - merge per-barcode BAMs
5_dorado_align.sbatch - alignment (e.g. minimap2)
6_create_bigwig.sbatch - create bigWig coverage tracks
6_1_create_bigwig_no_targets.sbatch - variant of bigWig creation
7_cleanup.sbatch - cleanup temporary files
8_build_registry.sbatch - build summary/registry of runs
9_structural_variant_calling.sbatch - structural variant calling step
submit.sh, submit_first_steps.sh, submit_per_sample_only.sh - convenience wrappers
print_links.sh, dm_test.sbatch, error.sbatch - helpers and test scripts

Configuration and inputs

The scripts assume a directory layout and data locations set by environment variables or by editing the top of each sbatch script.
Inputs are typically raw ONT output (pod5, a reference FASTA and an Adaptive Sampling bed file), a sample sheet or barcode map for demultiplexing, and a reference genome for alignment.
Edit resource requests (time, memory, CPUs, partitions) in each sbatch script to match the cluster policy.

Basic usage

Prepare data on the cluster and update any path variables in the step scripts.

Create a file called variables.conf which contains the following paths and parameters:

# Directory that will contain raw POD5 data and references
# Data is stored in <year>/<run_id>/ format
RAW_DATA_DIR=/path/to/raw_data
# Directory that contains processed data outputs
# Data is stored in <year>/<run_id>/ format
PROCESSED_DATA=/path/to/processed_data_directory
# Directory containing links to the directory containing reference genomes and adaptive sampling bed files
# The reference genome directory is stored alongside the raw data in <year>/<run_id>/ format
REFERENCE_LINKS=/path/to/reference_links
# Directory to store dorado basecallign models
MODELS_DIR=/path/to/models_dir
# Per species annotations such as repeat masker and ncbi gene tracks
ANNOTATIONS_DIR=/path/to/annotations_dir

Run the pipeline from the repository root:
- To submit the whole pipeline in order, use: ./submit.sh
```
 sh submit.sh \
       <remote_data_dir> \
       <run_id> \
       <species> \
       -r <reference_directory> <reference_link_name> <adaptive_sampling_bed> \
       -o <barcode_id> <sample_id> <sample_name> <reference_fasta> \
       -o <barcode_id> <sample_id> <sample_name> <reference_fasta>
```
- remote_data_dir: Path to the raw data directory on the sequencer (e.g. /data/raw_data)
- run_id: date of the run or a unique identifier (yyyymmdda)
- species: used to fetch the correct annotation from the ANNOTATIONS_DIR
- -r reference information
  - reference_directory: directory name listed in the remote_data_dir
  - reference_link_name: symlink name in REFERENCE_LINKS pointing to the reference genome directory
  - adaptive_sampling_bed: bed file for adaptive sampling regions
- -o sample specific details (repeatable for multiple samples)
  - barcode_id: barcode identifier (e.g. 01)
  - sample_id: unique sample identifier (ONT00001)
  - sample_name: human-readable sample name
  - reference_fasta: path to the reference FASTA for alignment
- To run initial steps only: ./submit_first_steps.sh
- To submit per-sample processing: ./submit_per_sample_only.sh
Monitor SLURM jobs with squeue and check produced logs in the working directories.

Outputs

Per-sample and merged BAMs, coverage bigWig files, structural variant VCFs/beds, and run-level registry/summary files.
Logs for each sbatch job are written to the working directories configured in the scripts.

Extending or customizing

To add new processing steps, follow the numeric naming convention (e.g., 10_new_step.sbatch) and update the submit wrapper if needed.
Replace tools or change arguments inside the sbatch scripts; keep resource requests and IO patterns consistent with cluster norms.

Troubleshooting

Check SLURM job logs and stdout/stderr files for errors.
Ensure required tools are available in your PATH or loaded as modules in the sbatch script header.
For permission or IO errors, verify cluster storage mounts and user access.

Contact / Further help

For questions about this pipeline, create an issue ticket. Adapt the scripts to local conventions before production use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ont_analysis_pipeline

Overview

Key features

Prerequisites

Repository layout

Configuration and inputs

Basic usage

Outputs

Extending or customizing

Troubleshooting

Contact / Further help

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
1_copy_from_grid.sbatch		1_copy_from_grid.sbatch
2_dorado_basecall.sbatch		2_dorado_basecall.sbatch
3_dorado_demux.sbatch		3_dorado_demux.sbatch
4_samtools_merge_barcode.sbatch		4_samtools_merge_barcode.sbatch
5_dorado_align.sbatch		5_dorado_align.sbatch
6_1_create_bigwig_no_targets.sbatch		6_1_create_bigwig_no_targets.sbatch
6_create_bigwig.sbatch		6_create_bigwig.sbatch
7_cleanup.sbatch		7_cleanup.sbatch
8_build_registry.sbatch		8_build_registry.sbatch
9_structural_variant_calling.sbatch		9_structural_variant_calling.sbatch
README.md		README.md
dm_test.sbatch		dm_test.sbatch
error.sbatch		error.sbatch
print_links.sh		print_links.sh
submit.sh		submit.sh
submit_first_steps.sh		submit_first_steps.sh
submit_per_sample_only.sh		submit_per_sample_only.sh

FenyoLab/ont_analysis_pipeline

Folders and files

Latest commit

History

Repository files navigation

ont_analysis_pipeline

Overview

Key features

Prerequisites

Repository layout

Configuration and inputs

Basic usage

Outputs

Extending or customizing

Troubleshooting

Contact / Further help

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages