Sequencing Facility RNA-Seq Downstream Analysis pipeline

Pipeline will intake raw counts file and run differential gene expression analysis using three different tools (Deseq2, EdgeR, and Limma_Voom) and perform Gene Set Enrichment Analysis using Deseq2 output.

Analysis Tools:

Deseq2: From Deseq2, users can expect a comprehensive analysis of differential gene expression using DESeq2. The script takes as input raw gene counts, sample metadata, and contrasts between experimental conditions. It normalizes the gene counts, performs differential expression analysis, and outputs normalized counts, lists of differentially expressed genes (DEGs), and corresponding MA plots. The DEGs are filtered based on a log fold change threshold of 2 for upregulation and 0.5 for downregulation, with a significance threshold of p-value < 0.05. Additionally, heatmaps for the top 30 differentially expressed genes are generated to visually represent expression changes across the relevant samples. The code ensures that all outputs, including tables and plots, are saved to a specified directory for easy retrieval and downstream analysis.

EdgeR: The EdgeR code performs differential gene expression analysis by normalizing raw RNA-seq counts using TMM, accounting for library size differences, and estimating gene-specific dispersions. Users can expect the code to detect significantly differentially expressed genes between experimental groups using an exact test. It filters results based on an FDR threshold of 0.05 and log fold change thresholds of ≥ 2 or ≤ 0.5 (corresponding to a log2 fold change of ±1). The code generates visualizations such as MDS plots, MA plots, and heatmaps to help users explore gene expression patterns across samples.

Limma Voom: The Limma-voom code applies the voom transformation to RNA-seq data, stabilizing variance across gene expression levels and making the data suitable for linear modeling. Users can expect the code to fit linear models to the transformed data and perform empirical Bayes moderation to improve statistical accuracy. Contrasts between experimental groups are analyzed for differential expression, with results filtered based on an adjusted p-value threshold of 0.05 and a log fold change threshold of log2 fold change of ±1. The code generates MA plots and heatmaps to visualize the top differentially expressed genes, providing a detailed summary of expression changes.

Venn Diagram: This Venn diagram script takes filtered gene lists as input from multiple differential expression analyses, specifically from DESeq2, edgeR, and limma-voom. These gene lists are used to generate a Venn diagram, which visually represents the overlap of differentially expressed genes identified by each method. The script processes the input files containing filtered genes, ensures that the required GeneID column is present, and creates the diagram. The resulting Venn diagram is saved as a PNG file, providing a clear visualization of shared and unique genes between the analyses.

Gene Set Enrichment Anaysis: Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether a pre-defined set of genes (ex: those beloging to a specific GO term or KEGG pathway) shows statistically significant, concordant differences between two biological states.

Usage

Step 1: Obtain a copy of this workflow

Clone the Repository: Clone the new repository to your local machine, choosing the directory where you want to perform data analysis. Instructions for cloning can be found here.

Step 2: Configure workflow

Have input RawCountFile_rsemgenes.txt available in test folder.

-Example of input RawCountFile_rsemgenes.txt

gene_id	Sample1	Sample2	Sample3	Sample4	Sample5	Sample6
ENSG00000000003.14_TSPAN6	1500	1600	1450	1300	1250	1400
ENSG00000000005.6_TNMD	10	12	9	15	14	18
ENSG00000000419.12_DPM1	3000	3100	2900	3200	3300	3400
ENSG00000000457.14_SCYL3	500	520	480	600	590	610
...	...	...	...	...	...	...
...	...	...	...	...	...	...
ENSG00000005955.13_ACTN2	500	530	460	550	540	570
ENSG00000006062.15_RPS6	600	650	580	700	680	720

Modify the contrasts.txt file in config folder.

Add sample condition contrast pairs. Example:

parent	control

Modify the sampleinfo.txt file in config folder.

Add sample details keeping the same format such as:

samplename	condition	batch
Sample1	parent	b1
Sample2	parent	b2
Sample3	parent	b2
Sample4	control	b1
Sample5	control	b2
Sample6	control	b2

In config.yaml:

Give full path to raw_counts data

Give full path to results directory

Give full path to contrasts file

Select the reference genome of choice. For example: “mouse”

Select the organism_db accordingly. For example: "org.Mm.eg.db" for mouse

Select the kegg_organism. For example: "mmu" for mouse

In Snakefile:

Give full path to configfile

Give full path to Snakemake rule files

In the common.smk:

Give full path to configfile

Ensure full paths to the Rscripts in snakemake (.smk) files under 'shell' or "params' for the following:

deseq2.smk

pathway.smk

edger.smk

venn.smk

rmarkdown.smk

limma_voom.smk

Ensure full path to R libraries as .libPaths in pathway.R

Step 3: Load the snakemake version 8 or above

module load snakemake/8.4.8

Step 4: Create a conda environment

$NAME=my_environment_name

conda create -n $NAME

Step 5: Execute the Workflow

Activate the Conda Environment:

conda activate $NAME

Install mamba

conda install -c conda-forge mamba

Perform a dry-run to validate your setup:

snakemake --use-conda -np

Local Execution: Execute the workflow on your local machine using $N cores:

snakemake --use-conda --cores $N --latency-wait 55 --force

Here, $N represents the number of cores you wish to allocate for the workflow.

Step 6: Investigate results

After successful execution, output analysis files along with rmarkdown html report will be generated in results folder.

Move the results folder to a different path and the workflow will be ready to run another dataset.

conda deactivate

For any questions, contact

CCRSF_IFX@nih.gov

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
resources		resources
results		results
test		test
workflow		workflow
.editorconfig		.editorconfig
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequencing Facility RNA-Seq Downstream Analysis pipeline

Analysis Tools:

Usage

Step 1: Obtain a copy of this workflow

Step 2: Configure workflow

Have input RawCountFile_rsemgenes.txt available in test folder.

Modify the contrasts.txt file in config folder.

Modify the sampleinfo.txt file in config folder.

In config.yaml:

In Snakefile:

In the common.smk:

Ensure full paths to the Rscripts in snakemake (.smk) files under 'shell' or "params' for the following:

Ensure full path to R libraries as .libPaths in pathway.R

Step 3: Load the snakemake version 8 or above

Step 4: Create a conda environment

Step 5: Execute the Workflow

Step 6: Investigate results

For any questions, contact

About

Releases

Packages

Languages

License

CCRSF-IFX/SF_Downstream_RNA-seq

Folders and files

Latest commit

History

Repository files navigation

Sequencing Facility RNA-Seq Downstream Analysis pipeline

Analysis Tools:

Usage

Step 1: Obtain a copy of this workflow

Step 2: Configure workflow

Have input RawCountFile_rsemgenes.txt available in test folder.

Modify the contrasts.txt file in config folder.

Modify the sampleinfo.txt file in config folder.

In config.yaml:

In Snakefile:

In the common.smk:

Ensure full paths to the Rscripts in snakemake (.smk) files under 'shell' or "params' for the following:

Ensure full path to R libraries as .libPaths in pathway.R

Step 3: Load the snakemake version 8 or above

Step 4: Create a conda environment

Step 5: Execute the Workflow

Step 6: Investigate results

For any questions, contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages