GitHub - KeithKelleher/TCRD: Code to build the Target Central Resource Database (TCRD).

TCRD is the central resource behind the Illuminating the Druggable Genome Knowledge Management Center (IDG-KMC). TCRD contains information about human targets, with special emphasis on three families of targets that are central to the NIH IDG initiative: GPCRs, kinases, ion channels. Olfactory GPCRs (oGPCRs) are treated as a separate family. The official public portal for IDG-KMC is: Pharos.

The code in this repository is for people wanting to rebuild a version of TCRD from scratch. If you just want to install TCRD locally, MySQL dumps of recent versions are available for download here.

Overview of the Build Process

There are 70+ datasets in TCRD and, generally, one loader script in the loaders directory to load each. Targets in TCRD correspond to reviewed human entries in UniProt and so UniProt data is loaded first.

Depending on the dataset, data is loaded via web APIs and/or files in various formats. Regarding the latter, some loaders take care of downloading the file(s) they need; others require the user to download or obtain the file(s) manually before running. In all cases the directories and/or files needed are set as all caps variables at the top of the loader scripts. Have a look at each loader script before running and make sure the necessary directories and/or files are in place.

Some datasets require pre-processing steps (by code in R/ or python/ directories) to be run before loading. A few also require manual steps be perforemd after the load is completed.

doc/TCRD_Build_Notes.txt

This file has information for each dataset on the steps required and also an estimate of the time required. Some loaders run in a few minutes, others require days.

doc/README_v.txt and doc/TCRDv_Fixes.txt

These files contains all command lines, and most of their output, run to build and fix the corresponding version TCRD. There are notes in this file that should help with the pre- and post- processing required for some of the datasets.

Loading Order

Some of the loaders need to be run before others. Importantly, the steps through TDLs should be run in the order they are listed in doc/TCRD_Build_Notes.txt. After that, loaders can be run in whatever order you like.

System Requirements

You will need a Linux or OSX system (you might be able to get things to work on Windows, but it would require a lot of fiddling - Not recommended) and I would recommend at least 4 cores and 64GB of RAM.

Software Requirements

MySQL server

I am using MySQL Community Server 5.6.24. But anything version 5.5 or later would be fine.

Python

Python 2.7 and many Python modules not included in the standard library: BioPython, BeautifulSoup, docopt, goatools, httplib2, progressbar, urllib, urllib2, cPickle, cStringIO, csv, KEGG_Graph, MySQLdb, networkx, numpy, requests, and shelve.

Lars Jensen's Tagger

This is available here.

R

R and the R packages dplyr, stringr, tidyr, data.table, and Hmisc.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
ETL		ETL
R		R
SQL		SQL
data		data
doc		doc
examples/SQL		examples/SQL
loaders		loaders
notebooks		notebooks
perl		perl
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview of the Build Process

doc/TCRD_Build_Notes.txt

doc/README_v.txt and doc/TCRDv_Fixes.txt

Loading Order

System Requirements

Software Requirements

MySQL server

Python

Lars Jensen's Tagger

R

About

Releases

Packages

Languages

License

KeithKelleher/TCRD

Folders and files

Latest commit

History

Repository files navigation

Overview of the Build Process

doc/TCRD_Build_Notes.txt

doc/README_v*.txt and doc/TCRDv*_Fixes.txt

Loading Order

System Requirements

Software Requirements

MySQL server

Python

Lars Jensen's Tagger

R

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

doc/README_v.txt and doc/TCRDv_Fixes.txt

Packages