Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require both overlap and breakend proximity for depth-only SV clustering #8962

Merged
merged 2 commits into from
Sep 11, 2024

Conversation

mwalker174
Copy link
Contributor

SV clustering uses both reciprocal overlap and a maximum endpoint distance criteria. Currently, calls with only depth evidence are clustered differently from other classes in that only one of these two is required. However, this is confusing, unnecessary, and problematic in certain cases (it was convenient originally for the first modules ported in GATK-SV).

This PR changes depth-only calls to be treated as other classes, requiring both criteria to be met.

Note that increasing the endpoint window to a very high number (10Mbp) effectively retains the original behavior and is now the default value.

@cwhelan
Copy link
Member

cwhelan commented Sep 10, 2024

Two questions:

getMaxClusterableStartingPositionWithParams in CanonicalSVLinkage uses the window to determine the max clusterable position. Will setting the value to 10MB make everything look clusterable to this method, potentially bogging down the algorithm for large callsets?

Is there a reason to keep the keep the old code around if this is the intended way to disable the proximity check (setting the window very large)? Seems like an opportunity to simplify if you don't want to support that special case anymore.

@mwalker174
Copy link
Contributor Author

Two questions:

getMaxClusterableStartingPositionWithParams in CanonicalSVLinkage uses the window to determine the max clusterable position. Will setting the value to 10MB make everything look clusterable to this method, potentially bogging down the algorithm for large callsets?

getMaxClusterableStartingPositionWithParams should be smart enough to use enough both the window and reciprocal overlap to calculate the max position, taking the min of the two if both are required. Effectively we're just disabling the window requirement so the RO will always be used I think.

Is there a reason to keep the keep the old code around if this is the intended way to disable the proximity check (setting the window very large)? Seems like an opportunity to simplify if you don't want to support that special case anymore.

I was thinking of doing that, but we're probably also going to start doing some experimenting with new clustering strategies in the near future so I wanted to keep it in just in case. I'll add a comment noting that the AND vs OR functionality is not used anymore.

Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, looks good to me.

@mwalker174 mwalker174 merged commit 6bb5217 into master Sep 11, 2024
20 checks passed
@mwalker174 mwalker174 deleted the mw_svcluster_depth_window branch September 11, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants