Geneset

Overview

The Omic-age has brought forth an enormous amount of gene data, which poses a challenge in uncovering their potential biological effects. One effective approach to this challenge is gene enrichment analysis.

The core and fundamental aspect of gene enrichment analysis is the access to gene sets, regardless of the method used, be it the traditional Over-representation analysis (ORA) or the advanced Functional class scoring (FCS) method, such as Gene Set Enrichment Analysis (GSEA).

Currently, many available enrichment analysis tools provide built-in data sets for only a few model species or require users to download them online. This presents an issue where users must download different gene sets from various public databases for non-model species. For instance, the enrichGO() and gseGO() functions of the clusterProfiler package use organism-level annotation packages for approximately 20 species. If the research target is not among these organisms, users must create one through AnnotationHub or download it from biomaRt or Blast2GO, which can be a time-consuming and challenging task for biologists lacking programming skills.

To address this issue, I have developed an R package called "geneset," which aims to provide access to updated gene sets in less time. The package includes GO (BP, CC and MF), KEGG (pathway, module, enzyme, network, drug and disease), WikiPathway, MsigDb, EnrichrDb, Reactome, MeSH, DisGeNET, Disease Ontology (DO), Network of Cancer Gene (NCG) (version 6 and v7) and COVID-19. . Additionally, it supports both model and non-model species.

Supported organisms

For more details, please refer to this site. The backend data follows a monthly-update frequency to make better user experience

GO supports 143 species
KEGG supports 8213 species
MeSH supports 71 species
MsigDb supports 20 species
WikiPahtwaysupports 16 species
Reactome supports 11 species
EnrichrDB supports 5 species
Disease-related only support human (DO, NCG, DisGeNET and COVID-19)

🛠 Installation

Install stable version from CRAN:

install.packages("geneset")

Install development version from GitHub:

remotes::install_github("GangLiLab/geneset")

Install development version from Gitee (for CHN mainland users):

remotes::install_git("https://gitee.com/genekitr/pacakge_geneset")

📚 Usage

For more details, please refer to genekitr book.

The package mainly includes 8 functions: getGO(), getKEGG() , getMesh(), getMsigdb(), getWiki(), getReactome(), getEnrichrdb(), getHgDisease()

All functions take org (organism) as input. Several functions have unique argument such as ont (ontology) of genGO().

Take Human GO MF gene sets for example:

library(geneset)
x = getGO(org = "human",ont = "mf")

str(x)
# List of 4
# $ geneset     :'data.frame':	280115 obs. of  2 variables:
#   ..$ mf  : chr [1:280115] "GO:0000009" "GO:0000009" "GO:0000010" "GO:0000010" ...
# ..$ gene: chr [1:280115] "PIGV" "ALG12" "PDSS1" "PDSS2" ...
# $ geneset_name:'data.frame':	4878 obs. of  2 variables:
#   ..$ go_id: chr [1:4878] "GO:0000009" "GO:0000010" "GO:0000014" "GO:0000016" ...
# ..$ Term : chr [1:4878] "alpha-1,6-mannosyltransferase activity" "trans-hexaprenyltranstransferase activity" "single-stranded DNA endodeoxyribonuclease activity" "lactase activity" ...
# $ organism    : chr "hsapiens"
# $ type        : chr "mf"

head(x$geneset)
# mf  gene
# GO:0000009  PIGV
# GO:0000009 ALG12
# GO:0000010 PDSS1
# GO:0000010 PDSS2
# GO:0000014 ENDOG
# GO:0000014 ERCC1

head(x$geneset_name)
# go_id                                               Term
# GO:0000009             alpha-1,6-mannosyltransferase activity
# GO:0000010          trans-hexaprenyltranstransferase activity
# GO:0000014 single-stranded DNA endodeoxyribonuclease activity
# GO:0000016                                   lactase activity
# GO:0000026             alpha-1,2-mannosyltransferase activity
# GO:0000030                       mannosyltransferase activity

How many terms/pathways in specific gene set?

Take human KEGG Pathway as an example:

gs <- geneset::getKEGG('hsa','pathway')
gs_df <- gs$geneset
table(gs_df$id) %>% length()
# 347

Pass gene set to GSVA/ssGSEA

library(GSVA)
# firstly: turn gs to list
gs_list <- split(gs_df$gene, gs_df$id)  

# secondly: pass your expression dataset: "express_data" to gsva() function
ssgsea_mat <- gsva(expr=express_data, 
                 method="ssgsea", # "gsva"(default), "zscore", "plage"
                 gset.idx.list=gs_list,  
                 verbose=F, 
                 parallel.sz = 4 )

Pass gene set to ORA/GSEA

hg_gs <- geneset::getGO(org = "human",ont = "mf")
# ORA
go_ent <- genekitr::genORA(input_id, geneset = hg_gs)
# GSEA (input is a pre-ranked gene list with logFC value)
gse <- genGSEA(genelist = geneList, geneset = hg_gs)

✍️ Author

Yunze Liu

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
R		R
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
geneset.Rproj		geneset.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geneset

Overview

Supported organisms

🛠 Installation

Install stable version from CRAN:

Install development version from GitHub:

Install development version from Gitee (for CHN mainland users):

📚 Usage

How many terms/pathways in specific gene set?

Pass gene set to GSVA/ssGSEA

Pass gene set to ORA/GSEA

✍️ Author

About

Releases

Packages

Languages

License

GangLiLab/geneset

Folders and files

Latest commit

History

Repository files navigation

Geneset

Overview

Supported organisms

🛠 Installation

Install stable version from CRAN:

Install development version from GitHub:

Install development version from Gitee (for CHN mainland users):

📚 Usage

How many terms/pathways in specific gene set?

Pass gene set to GSVA/ssGSEA

Pass gene set to ORA/GSEA

✍️ Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages