Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(l2g): implement variant consequence features from VEP #805

Merged
merged 18 commits into from
Oct 3, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Oct 1, 2024

✨ Context

Closes opentargets/issues#3554

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added documentation Improvements or additions to documentation size-L Method Dataset Step Feature labels Oct 1, 2024
)
.join(consequences_dataset, "variantId")
)
elif isinstance(study_loci_to_annotate, L2GGoldStandard):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have L2GGoldStandard? Isn't it going to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed offline, we'll do that once we tackle opentargets/issues#3525

f.when(
f.col("biotype") == "protein_coding", f.col(local_feature_name)
)
).over(Window.partitionBy("studyLocusId")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we assign the genes for this studyLocusId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genes come from VEP. It's the genes of the transcript that the variant overlaps with in a 500kb window

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For variantId - yes, but we have many varintIds in one studyLocusId. So just to be sure - we use the union across al varaintIds? And if varintId is not assigned to geneId we assume its VEP score is 0?

@ireneisdoomed ireneisdoomed changed the base branch from dev to il-csq_to_score_property October 3, 2024 10:50
@ireneisdoomed
Copy link
Contributor Author

@addramir As discussed, the maximum score is fetched from the variant index directly. This branch now has #811 as a base. I'd suggest merging that one first.

@addramir
Copy link
Contributor

addramir commented Oct 3, 2024

Approved

Base automatically changed from il-csq_to_score_property to dev October 3, 2024 13:44
@ireneisdoomed ireneisdoomed merged commit 8876fc1 into dev Oct 3, 2024
5 checks passed
@ireneisdoomed ireneisdoomed deleted the il-3554 branch October 3, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataset documentation Improvements or additions to documentation Feature Method size-L Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement variant consequence features from VEP into L2G
2 participants