[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor #2144

xiaoxi-wangfj · 2025-09-02T11:01:45Z

Description

In the FP8 dataflow (dispatch → expert fc1) path, under 1F1B overlap, we need to switch to the work stream and safely release memory from the previous stream when passing a QuantizedTensor (FP8 payload plus scale_inv metadata) from dispatch to expert fc1. This reduces peak HBM usage and avoids unnecessary memory retention.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

torch_dispatch handler for aten.record_stream on QuantizedTensor:
We record all relevant CUDA buffers inside the quantized tensor—_rowwise_data/_columnwise_data and their _rowwise_scale_inv/_columnwise_scale_inv—onto the provided stream via record_stream(stream). This does not change tensor values; it only updates storage lifetime metadata so the allocator won’t reuse/free the memory before the stream finishes its asynchronous work.
Expose QuantizedTensor.untyped_storage():
Returns the payload’s underlying UntypedStorage. Callers can then run resize_(0) to immediately shrink the storage capacity to zero and return it to the caching allocator (on CUDA).

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…ensor Signed-off-by: xiaoxi-wangfj <690912414@qq.com>

[PyTorch] Add record_stream and untyped_storage func op in QuantizedT…

bc92638

…ensor Signed-off-by: xiaoxi-wangfj <690912414@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor #2144

[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor #2144

Uh oh!

xiaoxi-wangfj commented Sep 2, 2025

Uh oh!

Uh oh!

[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor #2144

Are you sure you want to change the base?

[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor #2144

Uh oh!

Conversation

xiaoxi-wangfj commented Sep 2, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!