refactor: remove gene_index step #946

vivienho · 2024-12-12T16:50:03Z

✨ Context

Gentropy uses a gene_index that is generated from the target index in the platform etl via the gene_index step. The gene_index step is redundant, and we now want to use the target index from the platform etl directly.

This PR is also connected to these PRs in the orchestration and platform-etl-backend repos.

Note: After this PR is merged, gentropy pipelines will depend on a target_index dataset generated by the platform etl that has a tss column and thus the gentropy pipeline cannot be run on its own. I have generated a patched target_index dataset using gs://open-targets-pre-data-releases/24.12-uo_test-3/output/etl/parquet/targets patched with the tss column, just in case the gentropy pipelines need to be run on their own. The patched target_index can be found here: gs://ot-team/vivien/gentropy_patched_datasets/target_index_with_tss_column

🛠 What does this PR implement

The PR removes all files/code related to the gene_index step.
All files/code with gene_index are renamed to target_index.
Although the gene_index essentially comprises of a subset of columns from the target_index, some field names differ, so the PR renames field names to be compatible with the target_index where necessary.
The gene_index json schema has been replaced with the target_index schema derived from a target_index dataset generated by the platform etl.
The gene_index step was used an example in the docs here and here. The gwas_catalog_sumstat_preprocess step is now used as the example as it has inputs of similar complexity.

🙈 Missing

🚦 Before submitting

Do these changes cover one single feature (one change at a time)?
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes?
Did you make sure there is no commented out code in this PR?
Did you follow conventional commits standards in PR title and commit messages?
Did you make sure the branch is up-to-date with the dev branch?
Did you write any new necessary tests?
Did you make sure the changes pass local tests (make test)?
Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

…entargets/gentropy into vh-gene-index-to-target-index

DSuveges

Very thorough refactoring of this huge codebase! The target index is phased out in favour of the target index, which is now shared between the platform and gentropy. Luckily there was not much logic refactoring.

DSuveges · 2025-01-22T14:05:34Z

docs/howto/command_line/run_step_in_cli.md

@@ -28,16 +27,16 @@ Available options:
 Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
 ```

-As indicated, you can run a step by specifying the step's name with the `step` argument. For example, to run the `gene_index` step, you can run:
+As indicated, you can run a step by specifying the step's name with the `step` argument. For example, to run the `gwas_catalog_sumstat_preprocess` step, you can run:


Thanks for updating the documentation! I keep forgetting about it.

DSuveges · 2025-01-22T14:18:37Z

tests/gentropy/conftest.py

+def mock_target_index(spark: SparkSession) -> TargetIndex:
+    """Mock target index dataset."""
+    ti_schema = TargetIndex.get_schema()


Theoretical question: wouldn't make more sense to use an actual data sample from the target index?

Probably? There is a sample_target_index right above it in the code, but it doesn't seem to be used.

vivienho added 3 commits December 5, 2024 14:20

refactor: rename all gene_index related files to target_index

5e1a475

refactor: rename gene_index to target_index in various files

26abb1b

refactor: rename gene_index to target_index in tests

19bb7c9

github-actions bot added documentation Improvements or additions to documentation size-M Refactor Dataset Step Datasource labels Dec 12, 2024

vivienho added 4 commits January 13, 2025 17:30

Merge branch 'dev' into vh-gene-index-to-target-index

98bc6bc

refactor: rename fields to be compatible with new target_index

d26ddfd

refactor: replace examples that use gene_index

307b7cc

refactor: delete gene_index step files

306de99

DSuveges mentioned this pull request Jan 15, 2025

feat(qtls): flagging trans QTL credible sets #973

Merged

vivienho added 8 commits January 15, 2025 12:16

refactor: remove gene_index step from config

a8c6d07

feat: replace the gene_index schema with the target_index schema

1d3f975

refactor: modify mock_target_index

fc8df50

fix: remove mock_target_index from test_validate_schema_missing_field

8331efc

refactor: delete target.md

92e6da6

fix: fix study index validation tests

b5f4492

fix: fix l2g feature tests

59dd931

Merge branch 'dev' into vh-gene-index-to-target-index

5444ea5

github-actions bot added size-XL and removed size-M labels Jan 20, 2025

pre-commit-ci bot and others added 3 commits January 20, 2025 14:48

chore: pre-commit auto fixes [...]

91796f2

revert: revert column name after merging from dev

20fc538

Merge branch 'vh-gene-index-to-target-index' of https://github.com/op…

3582b34

…entargets/gentropy into vh-gene-index-to-target-index

vivienho marked this pull request as ready for review January 20, 2025 16:12

vivienho requested a review from DSuveges January 20, 2025 16:12

Merge branch 'dev' into vh-gene-index-to-target-index

024975c

DSuveges approved these changes Jan 22, 2025

View reviewed changes

DSuveges merged commit c847921 into dev Jan 22, 2025
5 checks passed

DSuveges deleted the vh-gene-index-to-target-index branch January 22, 2025 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove gene_index step #946

refactor: remove gene_index step #946

vivienho commented Dec 12, 2024 •

edited

Loading

DSuveges left a comment

DSuveges Jan 22, 2025

DSuveges Jan 22, 2025

vivienho Jan 22, 2025

refactor: remove gene_index step #946

refactor: remove gene_index step #946

Conversation

vivienho commented Dec 12, 2024 • edited Loading

✨ Context

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

DSuveges left a comment

Choose a reason for hiding this comment

DSuveges Jan 22, 2025

Choose a reason for hiding this comment

DSuveges Jan 22, 2025

Choose a reason for hiding this comment

vivienho Jan 22, 2025

Choose a reason for hiding this comment

vivienho commented Dec 12, 2024 •

edited

Loading