Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal query for literature references linking targets and ligands #94

Open
eric-czech opened this issue Jun 13, 2023 · 4 comments
Open

Comments

@eric-czech
Copy link

I would like to know what publications associate targets and ligands such that the publications explicitly note some interaction/relationship between the pair (not just the target or just the ligand). The query in #93 seemed like a reasonable place to start. Is there a better way to do this?

I would also like to run this query infrequently (monthly or quarterly at most) and with no filter, i.e. I'd like to capture all ligand <-> target relationships with citations.

Any suggestions on the best way to accomplish this would be appreciated. Thanks!

@KeithKelleher
Copy link
Collaborator

KeithKelleher commented Jun 13, 2023

That query looks good for fetching the publications that we have for reporting each known target ligand interaction. There are a couple of things to add.

  1. add a field alias for drugs - there's an issue to fix this, but without telling the API that you want drugs AND ligands (i.e. approved and unapproved compounds), it will just give you back the ligands

  2. add a field ligandCounts - for sanity checks that the numbers of drugs and ligands you're getting back is consistent

    ligandCounts {
    name
    value
    }
    ligands(isdrug: false) {
    ligid
    name
    description
    isdrug
    activities {
    pubs {
    pmid
    title
    year
    }
    }
    }
    drugs: ligands (isdrug: true) {
    ligid
    name
    description
    isdrug
    activities {
    pubs {
    pmid
    title
    year
    }
    }
    }

If you want to run this query for all targets, you'll probably have to paginate the results, or else it will be slow, and have a very large response. It seems your doing that already, so that's good.
One optimization to make would be to filter your target list to Tchem and Tclin targets, since knowing if a target has a chemical interaction is the main criteria to no longer be considered Tdark or Tbio.

"filter": {
"facets": [
{
"facet": "Target Development Level",
"values": ["Tclin", "Tchem"]
}
]
}

The other thing to consider is that the data in TCRD (and subsequently Pharos) is a subset of ligand activities that come primarily from DrugCentral and Chembl, where activities below a threshold are not included.
Here is the blurb on Pharos about the criteria to be included:

Activity Thresholds Activity values from DrugCentral and ChEMBL must be standardizable to -Log Molar units AND meet the the following target-family-specific cutoffs:
GPCRs: <= 100nM
Kinases: <= 30nM
Ion Channels: <= 10μM
Non-IDG Family Targets: <= 1μM

If you want data outside those criteria, you'd probably want to get data straight from Chembl and DrugCentral.

@eric-czech
Copy link
Author

Thanks again @KeithKelleher, that's extremely helpful! We'll try those improvements and you can close this if you'd like, otherwise I'll leave it open and report back for the sake of posterity (or if any other questions come up).

@KeithKelleher
Copy link
Collaborator

Glad to help. Yes, let us know how it goes, and if there's anything else.

@Rahkovsky
Copy link

Rahkovsky commented Jun 26, 2023

@KeithKelleher, thank you very much for your advice. We have run the following query looping over the offset and limit values.

query

query ($offset: Int!, $limit: Int!) {
targets {
targets(skip: $offset, top: $limit) {
name
sym
uniprot
facetValues(facetName: "Target Development Level")
ligandCounts {
name
value
}
nonDrugLigands: ligands(isdrug: false, top:10000) {
ligid
name
description
isdrug
activities {
pubs {
pmid
title
year
}
}
}
DrugLigands: ligands(isdrug: true, top:10000) {
ligid
name
description
isdrug
activities {
pubs {
pmid
title
year
}
}
}
}
}
}

We found out that the default is to extract maximum 10 ligands per protein, so to override it, we need to add a top parameter with sufficiently large value:

DrugLigands: ligands(isdrug: true, top:10000)

The counts of unique proteins and unique protein-ligids combinations are almost identical. Curiously, we extract little bit more records from DrugLigands + nonDrugLigands query than from validation counts:
Screenshot 2023-06-26 at 6 58 51 PM. Do you know what maybe a reason for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants