Which human genes are implicated in tumor development?
geneOncoX is an R package that address this question through the integration of a number of resources with respect to the functional roles of cancer genes, and also their representation in commercially available targeted sequencing assays (gene panels). Resources included among the integrated annotations include the following:
- IntOGen - compendium of mutational cancer driver genes
- Network of Cancer Genes - collection of curated cancer genes
- CancerMine - text-mined predictions of tumor suppressor genes, proto-oncogenes and cancer drivers
- Cancer Gene Census - manually curated resource on cancer genes (soma and germline)
- DNA repair genes - collection of genes involved in DNA repair
- Genomics England PanelApp - collections of cancer gene panels used in clinical diagnostics
- TSO500 targets - cancer genes targeted by Illumina's TSO500 gene panel
- F1CDx targets - cancer genes targeted by Foundation One's F1CDx gene panel
The package offers a few pre-processed datasets, along with metadata, that the user can retrieve and use for their own projects or set-ups. The package utilizes the googledrive R package to download the pre-processed and documented datasets to a local cache directory provided by the user.
remotes::install_github('sigven/geneOncoX')
The package offers (currently) five different functions, that each retrieves a specific dataset that can be of use for gene annotation purposes.
-
get_basic()
- retrieves basic, non-transcript-specific gene annotations. Includes tumor suppressor gene/oncogene/driver annotations from multiple resources, NCBI gene summary descriptions, as well as multiple predictions/scores when it comes to gene indispensability and loss-of-function tolerance -
get_gencode()
- retrieves two datasets ( grch37 and grch38 ) with human gene transcripts from GENCODE, including cross-references to RefSeq, UniProt, APPRIS, and MANE -
get_alias()
- retrieves a list of gene synonyms, indicating which synonyms are ambiguous or nonambiguous (with respect to primary gene symbols) -
get_predisposition()
- retrieves a list of genes of relevance for cancer predisposition, utilizing multiple resources, including Cancer Gene Census, Genomics England PanelApp, TCGA's PanCancer study, and others. -
get_panels()
- retrieves a collection of > 40 different panels for various cancer conditions, as found in the Genomics England PanelApp.
Technically, each dataset comes as a list
object in R with
- a
metadata
data frame that lists URLs, citations, and versions of underlying resources - a
records
data frame that contains the actual gene/transcript annotations
If you use the datasets provided with geneOncoX, make sure you properly cite the original publications of the resources integrated, and that you comply with the licensing terms:
- IntOGen - Martínez-Jiménez et al., Nat Rev Cancer, 2020 - CC0 1.0
- CancerMine - Lever et al., Nat Methods, 2019 - CC0 1.0
- Network of Cancer Genes - Repana et al., Genome Biol, 2019 - Open Access
- Cancer Gene Census - Sondka et al., Nat Rev Cancer, 2018 - Free for non-commercial, academic use - for commercial usage see https://cancer.sanger.ac.uk/cosmic/license
- DNA repair genes database - Woods et al., Science, 2001 - Open Access
- dbNSFP - Liu et al., Genome Med, 2020 - Open Access
- Genomics England PanelApp - Martin et al., Nat Genet, 2019 - Commercial use requires separate agreement with GEL, see licensing terms
- GENCODE - Frankish et al., Nucleic Acids Res, 2021 - Open Access
sigven AT ifi.uio.no