Skip to content

Latest commit

 

History

History
81 lines (58 loc) · 3.83 KB

README.md

File metadata and controls

81 lines (58 loc) · 3.83 KB

DrugMineR

DrugMineR is a package to retrieve information for a given drug, based on information found on KEGG, PubChem, and DrugBank, making use of REST APIs. Some of these data are already pre-processed and provided as part of the package.

Installation

You need the devtools package to install drugminer. In R, do:

install.packages("devtools") # <-- not needed if you already have it
devtools::install_github("diogocamacho/drugminer")
library(drugminer)

Data pre-processing

Included with the package are data objects extracted from KEGG and DrugBank. These can be easily updated with the provided functions kegg_processing and parse_drugbank, though this is not recommended (a few tweaks internally need to happen.) For completeness, the process for extracting these data is described below.

Processing KEGG data

The KEGG data processing is done with the kegg_processing function, which queries the PUG REST service of KEGG using the httr package. Briefly, this can be done as:

kegg_data <- kegg_processing()

The kegg_data object contains two lists, each containing diverse information from KEGG, such as drug/compound name, pathways, and so on. Please refer to the data documentation in the package by doing:

?KEGG

or

?DRUG

Parsing DrugBank

DrugMineR also comes with data from DrugBank, which has been processed using the dbparser package. From DrugMineR, you can parse the DrugBank data by doing:

drugbank_data <- parse_drugbank(xml_file)

where xml_file is the full data base download from DrugBank (you need this file before running the parser). The output is a set of tibbles that extract relevant information from the XML file. See what information is captured by doing:

?DRUGBANK

Running DrugMineR

The best way to use the DrugMineR package is to make use of its compound_query wrapper. As an example, if we want to extract all of the known information on aspirin, we can just call the wrapper as:

compound_query("aspirin")

which will return six tibbles:

  • general, for general information about a drug. Tibble includes:
    • drug name
    • KEGG ID
    • PubChem CID
    • DrugBank ID
    • CAS registration number
    • Chemical formula
    • Molecular weight
    • ChEBI ID
    • Canonical SMILES
    • InChIKey
  • targets, for known targets of a given drug;
  • pathways, for known pathways that are affected by the drug;
  • diseases, for diseases/syndromes that the drug can be used against (not a comprehensive list)
  • drug_uses, for indications for a given drug (e.g., analgesic, anti-neoplastic)
  • drug_groups, for KEGG mappings for drugs into larger classes (e.g., calcium blocker or SSRI)

The compound_query function will call the kegg_query function first, to map a given compound string to information present in KEGG (looking in the compound, drug, and synonyms tables.) When a compound in not found in KEGG, it will then be searched directly on PubChem, using the name2cid and the property_extractor functions, in sequence. What these functions do is to 1) translate the compound string into a PubChem CID (in name2cid) and 2) get all of the information on the compound by querying PubChem through its PUG REST API. Of note, we also query the Chemical Translation Service at UC Davis to extract the CAS number for a given PubChem CID through their REST services. In the event that no PubChem ID is found for a given query, then NAs are returned.

Future developments

I am always open to adding features and functionalities to DrugMineR and would be excited about possible collaborations to develop the package further. Please contact me if you want to chat.