Skip to content

Processing of raw Strand-seq data as part of the SV calling pipeline

License

Notifications You must be signed in to change notification settings

Hufsah-Ashraf/mosaicatcher

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Processing Strand-seq data

This software is part of a larger pipeline to call structural variants in single-cell Strand-seq data.

For optimal integration with pipeline version 1.0, please use version 0.3.1-dev.

Installation

Mosaicatcher can be built using Cmake (v3.0) on Linux and MacOS.

It relies on two external dependecies

  • boost libraries >= 1.50. This needs to be installed on your system
  • HTSlib >= 1.3.1. Cmake should be able to install this for you
git clone https://github.com/friendsofstrandseq/mosaicatcher.git
cd mosaicatcher
mkdir build
cd build
cmake ../src
make
./mosaic --version

Strand-seq read counting and plotting

Mosaicatcher counts Strand-seq reads and classifies strand states of each chromosome in each cell using a Hidden Markov Model.

Choose between bins of fixed width (-w) or predefined bins (-b). Here is an example for bins with a fixed width of 200kb:

./build/mosaic count \
    -o counts.txt.gz \
    -i counts.info \
    -x data/exclude/GRCh38_full_analysis_set_plus_decoy_hla.exclude \
    -w 200000 \
    cell1.bam cell2.bam [...]

To generate QC plots from these tables run

Rscript R/qc.R \
    counts.txt.gz \
    counts.info \
    counts.pdf

Data input

  • Sequencing reads should be supplied in exactly one BAM file per single cell
  • Each BAM file must contain a single read group (@RG). Cells are grouped into samples by using the same SM tag.
  • BAM files must be sorted and indexed.

Strand-seq simulations

Simulate strand-seq data and SVs on the level of binned counts. You are asked to specify an SV config file such as in the example data/simulation/example.txt.

Then run

./build/mosaic simulate \
    -o counts.txt.gz \
	 svconfig.txt
Rscript R/qc.R counts.txt.gz counts.pdf

References

For information on Strand-seq see

Falconer E et al., 2012 (doi: 10.1038/nmeth.2206)

About

Processing of raw Strand-seq data as part of the SV calling pipeline

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 81.1%
  • R 17.7%
  • CMake 1.2%