Strand-seq pipeline

Ongoing work

Preliminary SV calling using Strand-seq data - summarized in a Snakemake pipeline.

Update: We just switched to a re-implementation of the SV classification. Still in the test phase

Bioconda environment

To install the correct environment, you can use Bioconda.

Install MiniConda: In case you do not have Conda yet, it is easiest to just install MiniConda.
Create environment:

```
conda env create -n strandseqnation -f conda-environment.yml
source activate strandseqnation
```

That's it, you are ready to go.

How to use it

Install required software:

* Install [mosaicatcher](https://github.com/friendsofstrandseq/mosaicatcher) (*currently you will need the `develop` branch*)
* Get the R-scripts from [strandsequtils](https://github.com/friendsofstrandseq/strandsequtils)
* Install BSgenome.Hsapiens.UCSC.hg38 (can be skipped of you use the Bioconda environment, see above):
  ```
  source("https://bioconductor.org/biocLite.R")
  biocLite('BSgenome.Hsapiens.UCSC.hg38')
  ```
* [Strand-Phaser](https://github.com/daewoooo/StrandPhaseR) is installed automatically

Set up the configuration of the snakemake pipeline

* Open `Snake.config.json` and specify the path to the executatables
  (such as Mosaicatcher) and to the R scripts.
* Create a subdirectory `bam/` and another subdirectory per sample (e.g.
  `bam/NA12878/`). **Multiple samples can be run together not**.
  Then copy (or soft-link) the Strand-seq single-cell libraries (one BAM
  file per cell) in there. Note that bam files need to be sorted and indexed,
  contain a read group and should have duplicates marked.

Run Snakemake

* run `snakemake` to compute all tasks locally
* Alternatively, you can ask Snakemake to submit your jobs to a HPC cluster. To this end edit the `Snake.cluster.json` file according to your available HPC environment and call

  ```
  snakemake -j 100 \
    --cluster-config Snake.cluster.json \
    --cluster "???"
  ```

SNV calls

The pipeline will run simple SNV calling using samtools and bcftools. If you already have SNV calls, you can avoid that by entering your VCF files into the pipeline. To so, make sure the files are tabix-indexed and specifigy them inside the Snake.config.json file:

"snv_calls"     : {
      "NA12878" : "path/to/snp/calls.vcf.gz"
  },

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
Snake.config.json		Snake.config.json
Snakefile		Snakefile
conda-environment.yml		conda-environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strand-seq pipeline

Bioconda environment

How to use it

SNV calls

About

Releases

Packages

Contributors 3

Languages

titansmc/pipeline

Folders and files

Latest commit

History

Repository files navigation

Strand-seq pipeline

Bioconda environment

How to use it

SNV calls

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages