This is a set of bash scripts I wrote as a wrapper for STARsolo to simplify it's setup and for running it with a set of parameters which try to replicate the output of 10x Genomics Cell Ranger.
The scripts are located under the directory ./bin/starsolo
(see documentation):
starsolo-setup-linux-x86_64.sh
: downloads the STAR executable and the whitelists from 10xGenomics.starsolo-gen-idx-danio-rerio.sh
: downloads the reference genome GRCz11 for the sp. Danio_rerio from ENSEMBL and generates the STAR genome index.run-starsolo.sh
: runs the STARsolo algorithm with some preset parameters.
Because I needed some experiment data to test the STARsolo scripts, I also wrote a couple of scripts to obtain FASTQ files from the SRA repository.
These download the raw data files from the SRA repository
and extract the original FASTQ files
using the fastq-dump
tool from
SRA-toolkit.
The obtained FASTQ files should be equivalent to the output of
cellranger count.
The scripts are located under the directory ./bin/sra
(see documentation):
get-fastq-dump.sh
: downloads the tool fastq-dump from the SRA-toolkit.sra-to-cellranger-count.sh
: downloads the raw data from the SRA repositories for the given ID(s) and extracts the original FASTQ files.
Finally,
I provide a workflow example
(workflow_example.sh
)
that uses all these scripts to:
- Download and setup all the tools.
- Download some example data from the SRA repositories.
- Run STARsolo to obtain a cell-feature count matrix.