ANN_BENCH: CAGRA-HNSW build in managed memory#1058
Draft
achirkin wants to merge 10 commits intobranch-25.08from
Draft
ANN_BENCH: CAGRA-HNSW build in managed memory#1058achirkin wants to merge 10 commits intobranch-25.08from
achirkin wants to merge 10 commits intobranch-25.08from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test |
Contributor
Author
|
/ok to test |
Contributor
Author
|
/ok to test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces the standard
configured_raft_resourceshandle to a customized handle for CAGRA-HNSW benchmark.This resource handle uses a single managed memory resource for all: RMM default memory resource, RAFT workspace resource, RAFT large workspace resource. For raft workspace resource, a pool is used as usual to speedup frequent allocations.
The rationale behind this change is to allow using all available GPU memory through all stages of CAGRA build.
Before this change, by default, we have a regular device memory pool for everything except the large allocations; the large_memory_resource uses the managed memory. The problem with this behavior is that this pool grows during internal IVF-PQ build/search (the whole IVF-PQ index is stored in there), but doesn't shrink back during the graph optimization stage. As a result, the large allocations during the optimization stage severely oversubscribe UVM and degrade performance to a complete halt.
With the new change, the RMM default memory resource is not a member of the pool. Hence the pool stays relatively small (limited be the workspace resource adapter). And even the small pool that is left can be paged out by UVM when it's not actively in use.