Single Cell RNA Intron-Exon Counting
scrinvex
counts intronic, exonic, and junction-spanning reads for each unique barcode encountered in the input bam.
Each mapped read is checked against the input gtf to determine if the read lies entirely on introns, exons, or crosses at least one intron/exon junction.
Reads with the same UMI are only checked against any given gene once. Subsequent reads with the same UMI will not be checked against any gene that the first read intersected.
git clone --recursive git@github.com:getzlab/scrinvex.git
- If you do not use
--recursive
you will be missing dependencies - Fix with
git submodule update --init --recursive
- If you do not use
make
scR-Invex is available via gcr.io/broad-cga-aarong-gtex/scrinvex
scrinvex {gtf} {bam} [-h] [-b {barcode file}] [-q {mapping quality}] [-o {output filename}] [-s [{summary filename}]]
scR-Invex requires that the input GTF be collapsed in such a way that each gene possesses only one transcript, and that there are no overlapping features on the same strand. You can collapse existing GTFs using the GTEx collapse annotation script
scR-Invex can read SAM, BAM, and, if the genome exists in your htscache, CRAM formats. The input sequence file does not need to be indexed, but scR-Invex requires that the file be sorted.
scR-Invex can read a barcodes.tsv
file, which is produced by cellranger by default.
The file should be a list of barcodes with one barcode on each line.
Only reads with a barcode listed in the file will be considered.
scrinvex
produces 1 file, which defaults to {sample name}.scrinvex.tsv
This file contains 7 columns:
gene_id
barcode
- Counts of intron, junction, or exon reads for that gene id - barcode combination, respectively
- Counts of sense and antisense reads for that gene id - barcode combination, respectively