An R package designed to link metabolites' IDs from different Metabolome databases with eachother in a step to resolve ambiguity and standardize metabolites representation and annotation. Currently the package supports resolving IDs for the following databases:
- Human Metabolome Database (HMDB)
- Chemical Entities of Biological Interest (ChEBI)
- PubChem
- Kyoto Encyclopedia of Genes and Genomes (KEGG)
- Lipidomics Gateway (LipidMaps)
For any questions or issues please use the Issues in github or contact Rajmund Casombordi or Sara Younes.
Install the dump files script first!
- Install postgressql on your system
- Install devtools in R
- In R write
library(devtools)
devtools::load_all()
install_github("komorowskilab/metafetcher")
We recommend downloading manually the database dump files. You can find them here:
https://hmdb.ca/system/downloads/current/hmdb_metabolites.zip
ftp://ftp.ebi.ac.uk/pub/databases/chebi/SDF/ChEBI_complete.sdf.gz
https://www.lipidmaps.org/data/structure/download.php
Please uncompress them all, and put them in a directory tmp.
1- This step is optional there is already a default config file however if you want to set your own configuration call function write_config() to set the configuration of the local database cache
write_config(host,port,db_name,user,password,path_of_tmp_folder)
2- call function install_database() for creating the tables and inserting the data from the tmp folder preferably put it in your R project directory
install_database()
discovery.csv:
hmdb_id,chebi_id,kegg_id,pubchem_id,lipidmaps_id,names,inchi,inchikey,smiles,formula,mass,monoisotopic_mass
HMDB0006112,,,,,,,,,,,
,8337,,,,,,,,,,
HMDB0001005,,,,,,,,,,,
HMDB0001008,,,,,,,,,,,
example1.R:
df.res <- read.csv("discovery.csv", stringsAsFactors=FALSE)
resp <- resolve_metabolites(df.res)
df.out <- resp$df
output: The resulting df.out is a dataframe where the missing cell values have been extended by the program. For example, in each row the hmdb_id value(s) refer to the same metabolite as the value(s) in pubchem_id, and so on...
resp <- resolve_single_id('HMDB', 'HMDB0001005')
df.out <- resp$df
note 1)running install_database() will save the progress of installation when the package is done with creating and inserting data in each table Rerunning install_database() afterwards will continue at the last step.
note 2) If you want to restart the whole process, simply delete the install.RDS file in the project directory.
note 3) After the install script has succeeded, re-running it will wipe your existing database!
note 3) For Pubchem and KEGG the script will not download their dump files. Instead it accesses their API endpoints.
For how to use the package please visit (https://komorowskilab.github.io/MetaFetcheR/)
- Rajmund Casombordi @oboforty
- Sara Yones sara.younes@icm.uu.se @SaraYones
- Klev Diamanti @klevdiamanti