Skip to content

pull in new libbdsg with ref overlay fix#4821

Open
glennhickey wants to merge 2 commits intomasterfrom
decon-mem-regression
Open

pull in new libbdsg with ref overlay fix#4821
glennhickey wants to merge 2 commits intomasterfrom
decon-mem-regression

Conversation

@glennhickey
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • Fix (unreleased) memory usage regression in vg call and vg deconstruct.

Description

I've been breaking my head for days now trying to get #4804 tested to the point where I can merge it. But every time I'd run sanity checks on the hprc v2 gbz it would run out of memory, and I was sure it was due to some new indexes not scaling to whole-genome.

I was comparing to the latest release binary where everything was working fine, but it turns out the issue is in master by way of, I think #4795, and so it took forever for me to clue in... Since vg deconstruct and vg call, the two tools I'm testing, happen to use the ReferencePathOverlay, and the overlay was apparently changed to index every single path, including haplotypes, in the GBZ, these two tools became unusable at scale regardless of if you have nested paths or not...

This PR updates libdsg to explicitly select for reference paths when creating the overlay to revert back to the original logic (vgteam/libbdsg#236)

@glennhickey
Copy link
Contributor Author

@adamnovak Do you mind taking a quick look at this when you have a chance? I'm worried we may be working at cross purposes here... Thanks!

@adamnovak
Copy link
Member

I think I must have made a mistake implementing the non-hiding of paths, if it resulted in the ReferencePathOverlay indexing everything for fast position query. I don't think I'm trying to do something opposed to what you're trying to have happening.

We want the methods that iterate over paths to show you all paths now, not just reference paths. But we don't always want to guarantee that all paths have O(1) position lookups; we only want that for reference paths. The ReferencePathOverlay is meant to do that indexing just for reference paths.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should work. I did indeed make the reference path overlay start indexing everything, because I forgot to add exactly the path sense query that this is adding.

This probably explains why my Giraffe startup time has been weirdly slow lately.

@glennhickey
Copy link
Contributor Author

This looks like it should work. I did indeed make the reference path overlay start indexing everything, because I forgot to add exactly the path sense query that this is adding.

This probably explains why my Giraffe startup time has been weirdly slow lately.

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants