An Integrative Pipeline for Splicing Analyses (IPSA) written in the Nextflow DSL.
The pipeline allows to perform the following splicing analyses:
- Quantification of splice junctions and splice sites
- Calculation of exon-centric and intron-centric splicing metrics
- Identification of micro-exons
Install nextflow with the following command:
curl -fsSL get.nextflow.io | bash
Pull the docker image:
docker pull guigolab/ipsa-nf@sha256:29750072f2b42ee8ea094a331f53bc5906183591a71ed26619ea76b12b6be3ed
Launch the test pipeline with the following command:
./nextflow run guigolab/ipsa-nf
Launching the pipeline with the --help
parameter shows the help message:
nextflow run ipsa-nf --help
N E X T F L O W ~ version 0.24.4
Launching `guigolab/ipsa-nf` [dreamy_aryabhata] - revision: v4.0
I P S A ~ Integrative Pipeline for Splicing Analyses
----------------------------------------------------
Run IPSA on a set of data.
Usage:
ipsa-nf [options]
Options:
--index INDEX_FILE the index file in TSV format
--genome GENOME_FILE the genome file in FASTA format
--annot ANNOTATION_FILE the annotation file in gtf format
--merge MERGE prefix for merged output files (default: all)
--dir DIRECTORY the output directory
--sjcount-params PARAMS additional `sjcount` paramters
--margin MARGIN margin for aggregate (default: 5)
--mincount MIN_COUNT minimum number of counts for denominators when calculationg fractions (default: 10)
--deltaSS DELTA distance threshold for splice sites (default: 10)
--entropy ENTROPY entropy lower threshold (default: 1.5)
--status STATUS annotation status lower threshold (default: 0)
--microexons include microexons, default=false
IPSA-nf takes as input a tab separated file containing information about the input data. The file must be specified using the --index
parameter. The format of the index file is as follows:
- sample identifier
- path to the bam file to be processed
- library type:
Single-End
Paired-End
- strandnesss of the data:
NONE
for unstrandedSENSE
/ANTISENSE
forSingle-End
strandedMATE1_SENSE
/MATE2_SENSE
forPaired-End
stranded
Here is an index file example:
E14_rep1 E14AlnRep1.sub.bam Paired-End MATE2_SENSE
E14_rep2 E14AlnRep2.sub.bam Paired-End MATE2_SENSE
E18_rep1 E18AlnRep1.sub.bam Paired-End MATE2_SENSE
E18_rep2 E18AlnRep2.sub.bam Paired-End MATE2_SENSE
Analyses results are saved into the folder specified with the --dir
parameter. By default it is the data
directory within the current working folder.
Output files are organinzed into folders corresponding to the different pipeline endpoints:
A01
- splice junctions and splice sites countsA02
- aggregated splice junctions and splice sites countsA03
- aggregated junctions with annotation status and splice site nucleotidesA04
- aggregated junctions with selected strand and constrained splice sitesA06
- aggregated counts filtered by annotation status and entropyA07
- splicing indicesE06
- BED files with splicing indices from A06
And if --microexons
is used:
D01
- aggregated 2-splitsD02
- aggregated constrained 2-splitsD06
- extracted microexons from constrained 2-splits
IPSA-nf is configured to run using the Docker container engine by default. See the included Dockerfile for the configuration details.
In order to run the pipeline with Doecker the following dependencies have to be met:
The pipeline can also be used without Docker by installing the following software components on your system (natively or by using Environemnt Modules):