fix: change definition of negative l2g evidence #255

ireneisdoomed · 2023-11-16T17:34:42Z

This PR includes fixes for the issues 2 and 3 identified in #3157 :

Thorough refactor of the logic that processes the raw gold standard curations for L2G.
- Logic has been modularised to different functions
- OpenTargetsL2GGoldStandard contains the logic to parse the curation and build the negative set, as this is sth inherent to our curation
- L2GGoldStandard contains the logic to remove the false negatives and the redundant associations, as this is sth that we would like to do on any gold standard
Testing of each function, that made me rewrite the core pieces of logic and fix other bugs
Redefinition of how negative evidence is built: the negative set consists of all the genes that are withing a 5Mb distance of the lead variant and that are not part of the positive set @Daniel-Considine . After defining this initial set of negative evidence, false negatives are refined in remove_false_negatives

QC

Nr of initial studyLocus in the raw gold standard (high or medium quality): 1201
Nr of parsed studyLocus in L2GGoldStandard: 1201
Distribution of GS labels (before removing redundant associations and false negatives):

+---------------+-----+                                                         
|goldStandardSet|count|
+---------------+-----+
|       positive| 1225|
|       negative|15282|
+---------------+-----+

…thon into il-run-l2g

…targets/genetics_etl_python into il-run-l2g

…oldStandard`

…enetics_etl_python into il-l2g-negative-gs

codecov-commenter · 2023-11-17T13:07:58Z

Codecov Report

Merging #255 (5de5b77) into main (024d084) will increase coverage by 0.09%.
The diff coverage is 88.23%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #255      +/-   ##
==========================================
+ Coverage   86.12%   86.21%   +0.09%     
==========================================
  Files          85       85              
  Lines        1931     1952      +21     
==========================================
+ Hits         1663     1683      +20     
- Misses        268      269       +1

Files	Coverage Δ
src/otg/dataset/l2g_prediction.py	`90.47% <100.00%> (ø)`
src/otg/dataset/study_locus_overlap.py	`100.00% <100.00%> (ø)`
...c/otg/datasource/open_targets/l2g_gold_standard.py	`100.00% <100.00%> (ø)`
src/otg/dataset/l2g_gold_standard.py	`90.62% <90.00%> (+4.91%)`	⬆️
src/otg/l2g.py	`56.66% <33.33%> (ø)`

…uare matrix

…n the schema

…thon into il-l2g-negative-gs

Daniel-Considine

Gold standard negative expansion logic looks good, replicating what is currently in production.

d0choa · 2023-11-23T15:58:05Z

config/step/locus_to_gene.yaml

@@ -8,7 +8,7 @@ wandb_run_name: null
 perform_cross_validation: false
 model_path: ${datasets.l2g_model}
 predictions_path: ${datasets.l2g_predictions}
-study_locus_path: ${datasets.study_locus}
+study_locus_path: ${datasets.credible_set}


Suggested change

study_locus_path: ${datasets.credible_set}

credible_set_path: ${datasets.credible_set}

d0choa · 2023-11-23T16:05:16Z

src/otg/dataset/l2g_gold_standard.py

+        interactions_df = cls.process_gene_interactions(interactions)
+
+        return (
+            OpenTargetsL2GGoldStandard.as_l2g_gold_standard(gold_standard_curation, v2g)


potentially moving this to datasource

…thon into il-l2g-negative-gs

ireneisdoomed and others added 14 commits November 10, 2023 14:39

chore: changes in config

cf2be5b

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

4690e15

…thon into il-run-l2g

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

e2f4d9a

…thon into il-run-l2g

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

ee461d4

…thon into il-run-l2g

Merge branch 'dependabot/pip/wandb-0.16.0' of https://github.com/open…

8759f43

…targets/genetics_etl_python into il-run-l2g

fix: change definition of negative l2g evidence

44766d2

Merge branch 'main' into il-l2g-negative-gs

e6b20f1

refactor: modularise logic for gold standards

e17df5b

refactor: move hardcoded values to constants

f7eba79

refactor: turn OpenTargetsL2GGoldStandard into class methods

65be470

refactor(gold_standard): move logic to refine gold standards to `L2GG…

1518156

…oldStandard`

Merge branch 'il-l2g-negative-gs' of https://github.com/opentargets/g…

a33e797

…enetics_etl_python into il-l2g-negative-gs

test: add test_parse_positive_curation

ab29c9a

test: fix and test logic in expand_gold_standard_with_negatives

dd95d9c

ireneisdoomed added 12 commits November 17, 2023 15:09

test: add test_expand_gold_standard_with_negatives_same_positives

8347b2f

test: testing for process_gene_interactions

ca94412

chore: add variantId to gold standards schema

6a33976

chore: change sources in gold standards schema to a nullable

c75a663

test: add test_filter_unique_associations

8007726

feat(overlaps): add and test method to transform the overlaps as a sq…

9c0a042

…uare matrix

chore(overlaps): chromosome and statistics are not mandatory fields i…

dc7c423

…n the schema

feat(l2g_gold_standard): change filter_unique_associations logic

28031b8

test(l2g_gold_standard): add test_remove_false_negatives

aa4246c

fix(l2g_gold_standard): fix logic in remove_false_negatives

0a1ffa0

chore(gold_standards): define gs labels as L2GGoldStandard attributes

2f13b3b

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

186d2b3

…thon into il-l2g-negative-gs

ireneisdoomed marked this pull request as ready for review November 21, 2023 08:44

ireneisdoomed requested review from d0choa and DSuveges November 21, 2023 08:47

ireneisdoomed requested a review from Daniel-Considine November 21, 2023 08:47

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

1c6040b

…thon into il-l2g-negative-gs

Daniel-Considine approved these changes Nov 22, 2023

View reviewed changes

d0choa reviewed Nov 23, 2023

View reviewed changes

d0choa approved these changes Nov 23, 2023

View reviewed changes

ireneisdoomed added 2 commits November 27, 2023 10:46

chore: rename study_locus to credible_set for l2g

4e4e4f5

Merge branch 'main' of https://github.com/opentargets/genetics_etl_py…

5de5b77

…thon into il-l2g-negative-gs

ireneisdoomed force-pushed the il-l2g-negative-gs branch from 8298ad5 to 5de5b77 Compare November 27, 2023 09:47

ireneisdoomed merged commit cd3c325 into main Nov 27, 2023
1 check passed

ireneisdoomed deleted the il-l2g-negative-gs branch November 27, 2023 09:52

ireneisdoomed mentioned this pull request Nov 29, 2023

QC locus to gene gold standards processing opentargets/issues#3157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: change definition of negative l2g evidence #255

fix: change definition of negative l2g evidence #255

ireneisdoomed commented Nov 16, 2023 •

edited

Loading

codecov-commenter commented Nov 17, 2023 •

edited

Loading

Daniel-Considine left a comment

d0choa Nov 23, 2023

d0choa Nov 23, 2023

	study_locus_path: ${datasets.credible_set}
	credible_set_path: ${datasets.credible_set}

fix: change definition of negative l2g evidence #255

fix: change definition of negative l2g evidence #255

Conversation

ireneisdoomed commented Nov 16, 2023 • edited Loading

QC

codecov-commenter commented Nov 17, 2023 • edited Loading

Codecov Report

Daniel-Considine left a comment

Choose a reason for hiding this comment

d0choa Nov 23, 2023

Choose a reason for hiding this comment

d0choa Nov 23, 2023

Choose a reason for hiding this comment

ireneisdoomed commented Nov 16, 2023 •

edited

Loading

codecov-commenter commented Nov 17, 2023 •

edited

Loading