Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change definition of negative l2g evidence #255

Merged
merged 29 commits into from
Nov 27, 2023
Merged

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Nov 16, 2023

This PR includes fixes for the issues 2 and 3 identified in #3157 :

  • Thorough refactor of the logic that processes the raw gold standard curations for L2G.
    • Logic has been modularised to different functions
    • OpenTargetsL2GGoldStandard contains the logic to parse the curation and build the negative set, as this is sth inherent to our curation
    • L2GGoldStandard contains the logic to remove the false negatives and the redundant associations, as this is sth that we would like to do on any gold standard
  • Testing of each function, that made me rewrite the core pieces of logic and fix other bugs
  • Redefinition of how negative evidence is built: the negative set consists of all the genes that are withing a 5Mb distance of the lead variant and that are not part of the positive set @Daniel-Considine . After defining this initial set of negative evidence, false negatives are refined in remove_false_negatives

QC

  • Nr of initial studyLocus in the raw gold standard (high or medium quality): 1201
  • Nr of parsed studyLocus in L2GGoldStandard: 1201
  • Distribution of GS labels (before removing redundant associations and false negatives):
+---------------+-----+                                                         
|goldStandardSet|count|
+---------------+-----+
|       positive| 1225|
|       negative|15282|
+---------------+-----+

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

Merging #255 (5de5b77) into main (024d084) will increase coverage by 0.09%.
The diff coverage is 88.23%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #255      +/-   ##
==========================================
+ Coverage   86.12%   86.21%   +0.09%     
==========================================
  Files          85       85              
  Lines        1931     1952      +21     
==========================================
+ Hits         1663     1683      +20     
- Misses        268      269       +1     
Files Coverage Δ
src/otg/dataset/l2g_prediction.py 90.47% <100.00%> (ø)
src/otg/dataset/study_locus_overlap.py 100.00% <100.00%> (ø)
...c/otg/datasource/open_targets/l2g_gold_standard.py 100.00% <100.00%> (ø)
src/otg/dataset/l2g_gold_standard.py 90.62% <90.00%> (+4.91%) ⬆️
src/otg/l2g.py 56.66% <33.33%> (ø)

@ireneisdoomed ireneisdoomed marked this pull request as ready for review November 21, 2023 08:44
Copy link
Contributor

@Daniel-Considine Daniel-Considine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gold standard negative expansion logic looks good, replicating what is currently in production.

@@ -8,7 +8,7 @@ wandb_run_name: null
perform_cross_validation: false
model_path: ${datasets.l2g_model}
predictions_path: ${datasets.l2g_predictions}
study_locus_path: ${datasets.study_locus}
study_locus_path: ${datasets.credible_set}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
study_locus_path: ${datasets.credible_set}
credible_set_path: ${datasets.credible_set}

interactions_df = cls.process_gene_interactions(interactions)

return (
OpenTargetsL2GGoldStandard.as_l2g_gold_standard(gold_standard_curation, v2g)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially moving this to datasource

@ireneisdoomed ireneisdoomed merged commit cd3c325 into main Nov 27, 2023
1 check passed
@ireneisdoomed ireneisdoomed deleted the il-l2g-negative-gs branch November 27, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants