Replace hardcoded buffer sizes with cuMemGetAllocationGranularity #82

pavanbalaji · 2025-12-08T23:40:04Z

Summary:
Previously, the code used SFINAE templates to detect and use
c10::CachingAllocator::kLargeBuffer and kSmallBuffer, with hardcoded
fallbacks of 20MB and 2MB respectively. This approach had issues with
different PyTorch versions that had the buffer sizes defined in
different locations, making it unreliable across PyTorch versions.

This diff simplifies the implementation by directly querying the
allocation granularity from the CUDA driver using
cuMemGetAllocationGranularity. This ensures we use the correct,
device-specific granularity for memory registration chunks, making the
code more robust and portable across different GPU architectures and
PyTorch versions.

Differential Revision: D88688463

Summary: Previously, the code used SFINAE templates to detect and use c10::CachingAllocator::kLargeBuffer and kSmallBuffer, with hardcoded fallbacks of 20MB and 2MB respectively. This approach had issues with different PyTorch versions that had the buffer sizes defined in different locations, making it unreliable across PyTorch versions. This diff simplifies the implementation by directly querying the allocation granularity from the CUDA driver using cuMemGetAllocationGranularity. This ensures we use the correct, device-specific granularity for memory registration chunks, making the code more robust and portable across different GPU architectures and PyTorch versions. Differential Revision: D88688463

meta-codesync · 2025-12-08T23:40:23Z

@pavanbalaji has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88688463.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 8, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace hardcoded buffer sizes with cuMemGetAllocationGranularity #82

Replace hardcoded buffer sizes with cuMemGetAllocationGranularity #82

Uh oh!

pavanbalaji commented Dec 8, 2025

Uh oh!

meta-codesync bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Replace hardcoded buffer sizes with cuMemGetAllocationGranularity #82

Are you sure you want to change the base?

Replace hardcoded buffer sizes with cuMemGetAllocationGranularity #82

Uh oh!

Conversation

pavanbalaji commented Dec 8, 2025

Uh oh!

meta-codesync bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant