The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:
- in an R-friendly "tidy" format with one gene pair per row
- for multiple frequently studied model organisms, such as mouse, rat, pig, zebrafish, fly, and yeast, in addition to the original human genes
- as gene symbols as well as NCBI Entrez and Ensembl IDs
- without accessing external resources and requiring an active internet connection
The package can be installed from CRAN.
install.packages("msigdbr")
Releases that are not available on CRAN can be installed from GitHub (specific release or version can be specified):
remotes::install_github("igordot/msigdbr", ref = "v2022.1.1")
The package data can be accessed using the msigdbr()
function, which returns a data frame of gene sets and their member genes. For example, you can retrieve mouse genes from the C2 (curated) CGP (chemical and genetic perturbations) gene sets.
library(msigdbr)
genesets = msigdbr(species = "mouse", category = "C2", subcategory = "CGP")
Check the documentation website for more information.