Skip to content

Conversation

@pavanbalaji
Copy link
Contributor

Summary:
Previously, the code used SFINAE templates to detect and use
c10::CachingAllocator::kLargeBuffer and kSmallBuffer, with hardcoded
fallbacks of 20MB and 2MB respectively. This approach had issues with
different PyTorch versions that had the buffer sizes defined in
different locations, making it unreliable across PyTorch versions.

This diff simplifies the implementation by directly querying the
allocation granularity from the CUDA driver using
cuMemGetAllocationGranularity. This ensures we use the correct,
device-specific granularity for memory registration chunks, making the
code more robust and portable across different GPU architectures and
PyTorch versions.

Differential Revision: D88688463

Summary:
Previously, the code used SFINAE templates to detect and use
c10::CachingAllocator::kLargeBuffer and kSmallBuffer, with hardcoded
fallbacks of 20MB and 2MB respectively. This approach had issues with
different PyTorch versions that had the buffer sizes defined in
different locations, making it unreliable across PyTorch versions.

This diff simplifies the implementation by directly querying the
allocation granularity from the CUDA driver using
cuMemGetAllocationGranularity. This ensures we use the correct,
device-specific granularity for memory registration chunks, making the
code more robust and portable across different GPU architectures and
PyTorch versions.

Differential Revision: D88688463
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 8, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 8, 2025

@pavanbalaji has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88688463.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant