Skip to content

Code to build the Target Central Resource Database (TCRD).

License

Notifications You must be signed in to change notification settings

KeithKelleher/TCRD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TCRD is the central resource behind the Illuminating the Druggable Genome Knowledge Management Center (IDG-KMC). TCRD contains information about human targets, with special emphasis on three families of targets that are central to the NIH IDG initiative: GPCRs, kinases, ion channels. Olfactory GPCRs (oGPCRs) are treated as a separate family. The official public portal for IDG-KMC is: Pharos.

The code in this repository is for people wanting to rebuild a version of TCRD from scratch. If you just want to install TCRD locally, MySQL dumps of recent versions are available for download here.

Overview of the Build Process

There are 70+ datasets in TCRD and, generally, one loader script in the loaders directory to load each. Targets in TCRD correspond to reviewed human entries in UniProt and so UniProt data is loaded first.

Depending on the dataset, data is loaded via web APIs and/or files in various formats. Regarding the latter, some loaders take care of downloading the file(s) they need; others require the user to download or obtain the file(s) manually before running. In all cases the directories and/or files needed are set as all caps variables at the top of the loader scripts. Have a look at each loader script before running and make sure the necessary directories and/or files are in place.

Some datasets require pre-processing steps (by code in R/ or python/ directories) to be run before loading. A few also require manual steps be perforemd after the load is completed.

doc/TCRD_Build_Notes.txt

This file has information for each dataset on the steps required and also an estimate of the time required. Some loaders run in a few minutes, others require days.

doc/README_v*.txt and doc/TCRDv*_Fixes.txt

These files contains all command lines, and most of their output, run to build and fix the corresponding version TCRD. There are notes in this file that should help with the pre- and post- processing required for some of the datasets.

Loading Order

Some of the loaders need to be run before others. Importantly, the steps through TDLs should be run in the order they are listed in doc/TCRD_Build_Notes.txt. After that, loaders can be run in whatever order you like.

System Requirements

You will need a Linux or OSX system (you might be able to get things to work on Windows, but it would require a lot of fiddling - Not recommended) and I would recommend at least 4 cores and 64GB of RAM.

Software Requirements

MySQL server

I am using MySQL Community Server 5.6.24. But anything version 5.5 or later would be fine.

Python

Python 2.7 and many Python modules not included in the standard library: BioPython, BeautifulSoup, docopt, goatools, httplib2, progressbar, urllib, urllib2, cPickle, cStringIO, csv, KEGG_Graph, MySQLdb, networkx, numpy, requests, and shelve.

Lars Jensen's Tagger

This is available here.

R

R and the R packages dplyr, stringr, tidyr, data.table, and Hmisc.

About

Code to build the Target Central Resource Database (TCRD).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.2%
  • Jupyter Notebook 27.2%
  • R 1.3%
  • Other 0.3%