Viper is a Snakemake workflow, aimed at performing the RNA-seq workflow of the paper 'Causes and Consequences of A Glutamine Induced Normoxic HIF1 Activity for the Tumor Metabolism', Kappler et al. (2019) in a reproducible, automated, and partially contained manner. It is implemented such that alternative or similar analysis can be added or removed.
Viper consists of a Snakefile
(workflow/HIF_version_1.0/snakefile
), conda
environment files (envs/*.yaml
), a configuration file (workflow/HIF_version_1.0/config.yaml
), a set of R
functions (R/*R
), and a set of R
scripts (scripts/*.R
), to perform quality control, preprocessing, differential expression analysis, and functional annotation of RNA-seq data.
By default, the pipeline performs all the steps shown in the diagram below. However, advanced user, you can easily modify the Snakefile
and the config.yaml
and/or add "custom rules" to enable additional functions. Currently, transcript quantification with Salmon
at the read-level or gene quantification by featureCounts
can be activated.
This workflow performs differential expression analysis on paired-end RNA-seq data.
After adapter removal with Cutadapt
and quality filtering with sickle
, reads were mapped with STAR
to the humane genome (GRCh38.82), and transcript counts were quantified with salmon
.
These transcript counts were summarized to gene counts with tximport
.
Integrated normalization and differential expression analysis were conducted with edegR
.
Further, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID v6_8
) for functional annotation of the differential expressed genes.
Assuming that snakemake and conda are installed (and your system has the necessary libraries to compile R packages), you can use the following commands on a test dataset:
git clone https://github.com/GrosseLab/ViperWF.git
Here is the basic suggested skeleton for your project folder:
.
├── data
│ ├── qPCR # qPRCR raw data
│ └ *.fastq.gz # all 'fastq.gz'-files from !...!
│
├── references
│ └── hg38 # all data from Homo_sapiens.GRCh38.82
│ ├ Homo_sapiens.GRCh38.82.gtf # annotation
│ ├ Homo_sapiens.GRCh38.dna.primary_assembly.fa # genome sequence
│ └ Homo_sapiens.GRCh38.82.EXON.fa # exon sequence of all transcript of GTF
│
├── logs
├── report
│
├── viper # Github repository
│ ├── report # Snakemake report definition
│ ├── wrapper # Snakemake wrapper
│ ├── rules # Snakemake rules
│ ├── scripts # Snakemake scripts
│ ├── workflow # Snakemake final workflows
│ │ └ HIF_version_1.0 #
│ ├── R # R functions needed to run the analysis
│ └── man # R functions manual
│
├── Snakefile # file from ./viper/workflow/HIF_version_1.0
├── config.yaml # file from ./viper/workflow/HIF_version_1.0
├── units.tsv # file from ./viper/workflow/HIF_version_1.0
└── samples.tsv # file from ./viper/workflow/HIF_version_1.0
Make folder and copy files from viper/workflow/HIF_version_1.0
mkdir data
mkdir data/qpcr
mkdir references
mkdir logs
mkdir report
cp ./viper/workflow/HIF_version_1.0/Snakefile
cp ./viper/workflow/HIF_version_1.0/config.yaml
cp ./viper/workflow/HIF_version_1.0/units.tsv
cp ./viper/workflow/HIF_version_1.0/samples.tsv
cp ./viper/workflow/HIF_version_1.0/copy.csv ./data/qPCR/
cp ./viper/workflow/HIF_version_1.0/qPCR_data.csv ./data/qPCR/
Download data from Gene Expression Omnibus (GEO) project GSExxx using the NCBI SRA Toolkit
download sra-files using the 'SRA Run Selector' or SRA Toolkit from https://www.ncbi.nlm.nih.gov/geo/query/XXXX
convert *.sra fiels to *.fastq.gz files usnig fastq-dump form SRA Toolkit
snakemake -kn
snakemake --create-envs-only --use-conda
snakemake -k -p --use-conda -j 20
new Folder results
.
├── data
├── references
├── report
├── viper # Github repository
│
├── logs # include loggings of the snakemake rules
├── results # new folder for the results of the snakemake rules
│
├── Snakefile
├── config.yaml
├── units.tsv
└── samples.tsv