rnabam2cov - Library-type aware generation of strand-specific per-base coverage tracks from RNA-seq BAM files

A convenient wrapper around bedtools genomecov to compute library-type aware, strand-specific per-base coverage from RNA-seq BAM files.

Installation

Prerequisites

Conda or Mamba

Installation Instructions

# Clone the repository
git clone https://github.com/SamBryce-Smith/rnabam2cov.git
cd rnabam2cov

For regular use, use the rnabam2cov.yaml environment file:

# Create and activate the environment using mamba
mamba env create -f rnabam2cov.yaml
mamba activate rnabam2cov

# Verify installation
rnabam2cov --help

For development, please use the rnabam2cov_dev.yaml file (for editable install and development dependencies):

# Create and activate the development environment
mamba env create -f rnabam2cov_dev.yaml
mamba activate rnabam2cov_dev

It is also possible to install via pip install / uv pip install directly from github, although this is not recommended because it will not install bedtools:

# for installing at specific commit/release tag, suffix the URL with @<commit hash/tag>
pip install git+https://github.com/SamBryce-Smith/rnabam2cov
# verify bedtools is installed
python -c "import rnabam2cov, pybedtools; a = pybedtools.example_bedtool('a.bed'); b = pybedtools.example_bedtool('b.bed'); print(a.intersect(b))"
chr1    155     200     feature2        0       +
chr1    155     200     feature3        0       -
chr1    900     901     feature4        0       +

Usage

Command Line Usage

To generate strand-specific RNA-seq coverage files (in BedGraph format) from a BAM file:

rnabam2cov -i tests/data/example.forward.bam --libtype forward -o output/coverage

This will produce two BedGraph files (one per strand) - output/coverage.plus.bedgraph and output/coverage.minus.bedgraph

To generate coverage for only for a single specific strand, set the --strand argument with + or -. For example, to just generate coverage on the plus strand:

rnabam2cov -i tests/data/example.forward.bam --libtype forward -o output/coverage --strand +

This will produce only one file: output/coverage.plus.bedgraph

the --libtype argument allows you to specify between the two strand-specific library types used for strand-specific RNA-seq:

forward (FR): The aligned strand of the first mate (leftmost) in a read pair corresponds to the transcribed strand of origin. Common with ligation-based methods
reverse (RF): The aligned strand of the second mate (rightmost) in a read pair corresponds to the transcribed strand of origin. Common with dUTP-based methods (most Illumina RNA-seq kits)

This tool does not check that the provided libtype is corroborated in the input BAM file. If you don't know the library type, there are tools to infer it from FASTQs (e.g. how_are_we_stranded_here) or BAM files (e.g. RSeQC's infer_experiment.py). For more details, I recommend the Griffith lab's guide.

By default, the following bedtools genomecov options are enabled to match preferences for RNA-seq data:

--split: Read the CIGAR string to compute splice/gap-aware coverage
--du: Change strand of mate read so both reads contribute to coverage on the same strand
--bg: Report coverage in BedGraph format (non-zero positions only)

These and all other bedtools genomecov arguments (as of v2.31.0) can be toggled at the command line (see the help message):

# any of the following options work
rnabam2cov --help
rnabam2cov -h
rnabam2cov

Python API Usage

You can also use rnabam2cov directly in your Python code:

from rnabam2cov.cli import rnabam2cov

# Generate coverage files for both strands (default behavior)
output_files = rnabam2cov(
    bam_path="tests/data/example.forward.bam",
    libtype="forward",
    output_prefix="output/coverage"
)

# Print the paths to the generated files
print(output_files)
# ['output/coverage.plus.bedgraph', 'output/coverage.minus.bedgraph']

# Generate coverage for only the plus strand
plus_only = rnabam2cov(
    bam_path="tests/data/example.forward.bam",
    libtype="forward",
    output_prefix="output/coverage_plus_only",
    strands=["+"]
)

print(plus_only)
# ['output/coverage_plus_only.plus.bedgraph']

Notes

It appears that bedtools genomecov includes secondary alignments in the computed coverage (GitHub issue 1061), although it's unclear exactly how the counting is done. If you want to compute coverage of primary alignments only, you will need to prefilter the BAM file.
The -pc and -split flag are currently incompatible - cigar strings (i.e. splicing) is ignored when the -pc flag is passed (I reproduced this, but initially reported in GitHub issue 516). Although I want to double-check, I expect this means that positions in fragments that are covered by both mates will be double-counted (i.e. coverage is per-read, not per-fragment). I do not see an obvious way to get around this without a non-bedtools implementation.
This package calls bedtools genomecov via pybedtools - this will mean use of temporary files (by default under /tmp). pybedtools provides a way to set the temporary directory for the session, but I haven't exposed this to the CLI yet.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
src/rnabam2cov		src/rnabam2cov
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rnabam2cov.yaml		rnabam2cov.yaml
rnabam2cov_dev.yaml		rnabam2cov_dev.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rnabam2cov - Library-type aware generation of strand-specific per-base coverage tracks from RNA-seq BAM files

Contents

Installation

Prerequisites

Installation Instructions

Usage

Command Line Usage

Python API Usage

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

SamBryce-Smith/rnabam2cov

Folders and files

Latest commit

History

Repository files navigation

rnabam2cov - Library-type aware generation of strand-specific per-base coverage tracks from RNA-seq BAM files

Contents

Installation

Prerequisites

Installation Instructions

Usage

Command Line Usage

Python API Usage

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages