Skip to content

Application to find local anomalies in read coverage and to predict putative SV events

License

Notifications You must be signed in to change notification settings

rki-mf1/CoverageAnomalyScanner

Repository files navigation

CAS - The Coverage Anomaly Scanner

C/C++ CI Version install with bioconda

Tool to find, interpret and convert unexpected changes in read coverage in genomic sequence data

Table of Contents

  1. Installation
    1.1 Dependencies
    1.2 Install with bioconda
    1.3 Build from source
  2. Program execution
    2.1 Running the CoverageAnomalyScanner
    2.2 Output
    2.3 Visualizing CAS' metadata
    2.4 Help

Installation

Dependencies

cas was implemented with a 64-bit Linux environment in mind. Using other operating systems might break its functionality and are not maintained. The easiest way to install cas is with bioconda. If you like to build cas from source then follow the description in the Build from source section. The cas repository comes with git submodules by default, i.e. there is no need to install direct dependencies manually. However, in order to build htslib from source make sure you have libbz2 and liblzma available on your system, e.g. with root privileges on ubuntu linux use sudo apt install libbz2-dev lzma. To build the cas binary from source you need a C++ compiler that supports the C++17 standard (tested with gcc v9.4.0, but should require only gcc >=v8.3.0).

Install with bioconda

conda install -c bioconda coverageanomalyscanner

Build from source

git clone https://github.com/rki-mf1/CoverageAnomalyScanner.git --recursive
cd CoverageAnomalyScanner 
make -C htslib
mkdir build
make

If the steps above compile successfully there should be a binary cas in your working directory.

[Advanced build] If you already have a clone of one or multiple submodules available and don't want to download them again then you can omit the --recursive flag when cloning the github repository. In this case make sure to adjust the paths in the CXXFLAGS and LDLIBS variables within the Makefile. E.g. for htslib you have to manually set the path to its archive (libhts.a).

Program execution

Running the CoverageAnomalyScanner

The cas program has two intended ways of starting a scan (see examples below). The first example uses a BAM file and three separate input parameters to specify a window of the genome to be scanned. The --chr parameter is the zero-based chromosome index from the mandatory BAM file. If the index of the desired chromosome is unclear use samtools view -H on the BAM file to determine the index from the @SQ fields.

cas --bam FILE --chr 0 --start 400 --end 900

The second way of starting the program uses a BAM file and the input format of a SAMTOOLS-like genomic range. Here, the chromosome name itself can be used together with a range.

cas --bam FILE --range chr1:400-900

Output

A run of cas prints a report of the predicted positions with a read coverage anomaly inside the given genomic window. Moreover, a VCF file predictedEvents.vcf is generated for all pairs of predicted positions where the increase and decrease in read coverage follows the (nested) parenthesis rules. I.e. an increase-decrease pair is interpreted as duplication event and a decrese-increase pair is interpreted as deletion event (deletions are WIP).

Visualizing CAS' metadata

The GitHub repository provides a subfolder util with a R script to visualize the metadata that cas is generating to determine coverage anomalies. The metadata itself is only written to disc if the program is (re)compiled with

make print

instead of only make as described in the Build from source section. When using the printing program binary of cas a program execution generates a coverage.csv (CSV) in the current working directory. The R script inside the util subfolder reads the CSV file and generates plots which visualize the metadata. The most convenient method to generate the plots is to use R from the command line

chmod +x <path/to/util>/covplot.R
Rscript <path/to/util>/covplot.R <path/to>/coverage.csv

The commands above generate a Rplots.pdf file with plots. If there is no Rscript command line application available in your working environment the plots can be manually generated in R via

coverage <- read.delim("coverage.csv", header=FALSE, row.names=1)

and an execution of the VARIABLES and PLOTS sections of the covplot.R.

Help

Usage: cas [-h] --bam FILE [--chr INT] [--start INT] [--end INT] [--range VAR] [--threshold FLOAT]

CAS - The Coverage Anomaly Scanner

Optional arguments:
  -h, --help            shows help message and exits 
  -v, --version         prints version information and exits 
  -b, --bam FILE        Read alignment in BAM file format. [required]
  -c, --chr INT         A 0-based chromosome index from the BAM file. 
  -s, --start INT       Start position in the chromosome. 
  -e, --end INT         End position in the chromosome. 
  -r, --range           Genomic range in SAMTOOLS style, e.g. chr:beg-end 
  --threshold FLOAT     Constant threshold for the anomaly detection. Overwrites the internal default formula. 

by T. Krannich (2022)

BACK TO TOP

About

Application to find local anomalies in read coverage and to predict putative SV events

Resources

License

Stars

Watchers

Forks

Packages

No packages published