Tools to conduct functional enrichment analysis, parse and plot output from gene ontology (GO) analysis (web) tools
- Any version of R
- CRAN packags gprofiler2 (for local g:Profiler analysis), plotrix and optionally parallel
A suitable background often is not the whole genome but should represent the set of genes that could be found in the analysis, e.g. because they contain alternative splicing events or are expressed in the control sample. It is better to use a unique gene identifier such as the ENCODE gene ID rather than the gene name. Gene names change.
Find g:Profiler, FuncAssociate and DAVID.
If using the web interface of g:Profiler, add a column log2Enr with the log2 enrichment.
If using FuncAssociate, save the results table as well as the 'Attribute/Entity List'.
Examples can be found in the input folder.
using runGprofiler(), which uses the gProfiler2 package (Kolberg et al., F1000Research 2020). See arguments to change species, data sources etc.
For examples of 'lollipop' plots based on g:Profiler and FuncAssociate see output folder.
g:Profiler: Huge categories will be removed and only 'highlighted' driver categories will be shown by default. See options. Remaining categories will be ploted such that log2-enrichment is on the x-axis, dot size represents the number of genes from the category that were in the foreground, and color reflects p-value. Sources are indicated by text color.
See arguments to tweak filtering behaviour etc.
source("GOplotTools.R")
foreground <- read.delim("input/Input_genesWithChangingExons.txt", header=FALSE)[,1]
background <- read.delim("input/Input_background.txt", header=FALSE)[,1]
enriched <- runGprofiler(fore = foreground, back = background, species = "mmusculus", outBase = NA)
plotGprofilerDots(over = enriched$result,
main = "Genes enriched in input over background",
outName = "output/gProfiler_plotGprofilerDots.pdf",
wid = 7, hei = 3.5)
FuncAssociate: Huge categories will be removed by default. If categories overlap more than a threshold, only the more significant one will be kept. See options. Remaining categories will be plotted such that the LOD is on the x-axis, dot size represents the number of genes from the category that were in the foreground, and color reflects p-value:
source("GOplotTools.R")
plotFuncAssDots(file = "input/FuncAssociate_results.tsv",
outName = "output/FuncAssociate_plotFuncAssDots.pdf",
inputGenes = "input/Input_genesWithChangingExons.txt",
attrEntList = "input/FuncAssociate_attrEntList.xls",
main = "Genes enriched in input over background",
wid = 9, hei = 4.5)
Input checking is imperfect. If it doesn't work, check that you are submitting the correct files.