[Core] Fix inconsistent logic in C++ tensor class #2330

timmoon10 · 2025-11-01T00:36:36Z

Description

This PR fixes some hacky logic in the C++ Tensor class:

Construct uninitialized tensors with shape=[0]. Previously we constructed them as 0-D tensors, which should have one entry (see [PyTorch][MOE] Support NVFP4 Grouped Linear #2215 (comment)).
Make Tensor::has_data and Tensor::has_columnwise_data have a consistent meaning: "is it safe to touch the data pointer?".
Fix code that use Tensor::has_data/Tensor::has_columnwise_data inappropriately (Tensor::shape, swizzling). These are cases where the data is initialized but the pointer is not safe to touch, e.g. when the tensor has a zero dim. Note that I've only fixed places that were causing problems, and I haven't fully gone through the entire code base.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Construct uninitialized tensors with zero entries.
Fix logic in Tensor::has_data and Tensor::has_columnwise_data.
Remove inappropriate usage of Tensor::has_data/Tensor::has_columnwise_data from Tensor::shape and swizzling.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Still seeing correctness issues. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-11-01T00:36:52Z

/te-ci

for more information, see https://pre-commit.ci

Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10 · 2025-11-03T22:38:49Z

/te-ci L1

greptile-apps

Greptile Overview

Greptile Summary

Refactored C++ Tensor class to fix inconsistent initialization and data checking logic by using shape=[0] for uninitialized tensors instead of empty shapes.

Key Changes:

Updated SimpleTensor and Tensor default constructors to initialize with shape=[0] instead of shape=[], fixing semantic confusion between 0-D tensors (which have 1 entry) and uninitialized tensors
Refactored has_data() and has_columnwise_data() to consistently check both pointer validity and non-zero element count, making them reliable indicators of "is it safe to touch the data pointer?"
Updated shape() and dim() methods to use explicit is_shape_trivial() checks instead of relying on has_data()/has_columnwise_data(), fixing cases where data is initialized but pointer shouldn't be accessed (e.g., zero-dim tensors)
Propagated changes throughout normalization, swizzle, transpose, and PyTorch integration code to use numel() != 0 checks instead of shape.empty()
Added defensive null checks in GEMM code to prevent out-of-bounds access with empty tensors
Fixed workspace size queries to return {1} instead of {0} to distinguish from uninitialized state

Issues Found:

Two error messages in swizzle.cu incorrectly reference input->scale_inv.shape instead of input->columnwise_scale_inv.shape (lines 378 and 392)

Confidence Score: 4/5

This PR is safe to merge after fixing the two error message bugs in swizzle.cu
The refactoring improves code correctness by fixing semantic confusion between 0-D and uninitialized tensors. The changes are well-tested and systematic. Score is 4 instead of 5 due to: (1) two copy-paste errors in error messages that would display incorrect variable names during debugging, and (2) the extensive nature of the refactoring across 11 files increases risk of edge cases
Pay close attention to transformer_engine/common/swizzle/swizzle.cu which has two error message bugs on lines 378 and 392

Important Files Changed

File Analysis

Filename	Score	Overview
transformer_engine/common/common.h	4/5	Updated Tensor class to use shape=[0] for uninitialized tensors instead of shape=[]; refactored has_data() and has_columnwise_data() to check both pointer and numel(); improved shape() method logic with consistent is_shape_trivial checks
transformer_engine/common/include/transformer_engine/transformer_engine.h	5/5	Updated TensorWrapper to return shape=[0] for null tensors instead of shape=[]; added emptyShape constant; improved constructor logic to set appropriate shapes for null pointers
transformer_engine/common/transformer_engine.cpp	5/5	Updated CheckScaleTensorShape to distinguish between FP8 and high-precision tensors; changed nvte_get_tensor_param to return shape=[0] for null tensors
transformer_engine/common/swizzle/swizzle.cu	3/5	Refactored swizzle_scaling_factors to check pointer and numel() instead of has_data(); improved logic with explicit has_rowwise_scale_inv/has_columnwise_scale_inv checks. Found two error messages that incorrectly reference scale_inv.shape instead of columnwise_scale_inv.shape

Sequence Diagram

sequenceDiagram
    participant App as Application Code
    participant Tensor as Tensor Class
    participant SimpleTensor as SimpleTensor
    participant Wrapper as TensorWrapper
    
    Note over App,Wrapper: Uninitialized Tensor Creation
    App->>Tensor: Tensor()
    Tensor->>SimpleTensor: SimpleTensor()
    SimpleTensor->>SimpleTensor: Set shape=[0], dptr=nullptr
    SimpleTensor-->>Tensor: Uninitialized SimpleTensor
    Note over Tensor: All scale tensors initialized with shape=[0]
    Tensor-->>App: Uninitialized Tensor
    
    Note over App,Wrapper: Checking Tensor State
    App->>Tensor: has_data()
    Tensor->>Tensor: Check: dptr != nullptr && numel() != 0
    Tensor-->>App: false (uninitialized)
    
    Note over App,Wrapper: After Data Assignment
    App->>Tensor: Set data.dptr, data.shape
    App->>Tensor: has_data()
    Tensor->>SimpleTensor: numel()
    SimpleTensor->>SimpleTensor: Calculate product of shape dims
    SimpleTensor-->>Tensor: numel > 0
    Tensor->>Tensor: Check: dptr != nullptr && numel() != 0
    Tensor-->>App: true (has valid data)
    
    Note over App,Wrapper: TensorWrapper Integration
    App->>Wrapper: TensorWrapper(dptr, shape, dtype)
    Wrapper->>Wrapper: Check if amax_dptr == nullptr
    alt amax_dptr is null
        Wrapper->>Wrapper: Use emptyShape {0}
    else amax_dptr is valid
        Wrapper->>Wrapper: Use defaultShape {1}
    end
    Wrapper-->>App: Configured TensorWrapper

_{11 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/common/swizzle/swizzle.cu

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

for more information, see https://pre-commit.ci

greptile-apps

Greptile Overview

Greptile Summary

This PR fixes inconsistent logic in the C++ Tensor class by standardizing how uninitialized and empty tensors are handled.

Key Changes:

Uninitialized tensors now constructed with shape=[0] instead of shape=[] (0-D tensors should have 1 entry)
Tensor::has_data() and Tensor::has_columnwise_data() now consistently check both pointer validity AND non-zero element count
Swizzle functions updated to explicitly check pointer and size instead of relying on has_data()/has_columnwise_data(), which allows proper handling of edge cases where tensors have valid pointers but zero dimensions

Impact:

Fixes potential segfaults when accessing data pointers of empty tensors
Ensures consistent behavior across the codebase when checking if tensor data can be safely accessed
Previous comment noted error messages referenced wrong variable (scale_inv.shape instead of columnwise_scale_inv.shape), which was already corrected in lines 378 and 393

Confidence Score: 5/5

This PR is safe to merge - it's a well-structured refactoring that fixes edge case handling
The changes are defensive and fix potential bugs with empty tensor handling. The refactoring replaces implicit checks with explicit pointer and size validation, making the code more robust and easier to reason about. The logic is consistent throughout and follows a clear pattern.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
transformer_engine/common/swizzle/swizzle.cu	5/5	Replaces inappropriate uses of `has_data()`/`has_columnwise_data()` with explicit pointer and size checks to handle empty tensors correctly

Sequence Diagram

sequenceDiagram
    participant Caller
    participant swizzle_scaling_factors
    participant Tensor
    participant CUDA_Kernel
    
    Caller->>swizzle_scaling_factors: input, output, stream
    swizzle_scaling_factors->>swizzle_scaling_factors: Check scaling mode
    swizzle_scaling_factors->>Tensor: CheckInputTensor(input)
    swizzle_scaling_factors->>Tensor: CheckInputTensor(output)
    
    Note over swizzle_scaling_factors: Check if scaling factors exist
    swizzle_scaling_factors->>Tensor: scale_inv.dptr != nullptr?
    swizzle_scaling_factors->>Tensor: scale_inv.numel() != 0?
    swizzle_scaling_factors->>Tensor: columnwise_scale_inv.dptr != nullptr?
    swizzle_scaling_factors->>Tensor: columnwise_scale_inv.numel() != 0?
    
    alt No scaling factors
        swizzle_scaling_factors-->>Caller: return early
    end
    
    Note over swizzle_scaling_factors: Deduce tensor dimensions from shape
    swizzle_scaling_factors->>Tensor: Get scale_inv.shape or columnwise_scale_inv.shape
    
    Note over swizzle_scaling_factors: Validate dimensions
    swizzle_scaling_factors->>swizzle_scaling_factors: Check m % SF_TILE_DIM_M == 0
    swizzle_scaling_factors->>swizzle_scaling_factors: Check k % SF_TILE_DIM_K == 0
    
    Note over swizzle_scaling_factors: Choose swizzle implementation
    alt Rowwise swizzle needed
        swizzle_scaling_factors->>CUDA_Kernel: swizzle_row_scaling_kernel
        CUDA_Kernel-->>swizzle_scaling_factors: Complete
    end
    
    alt Columnwise swizzle needed
        swizzle_scaling_factors->>CUDA_Kernel: swizzle_col_scaling_kernel
        CUDA_Kernel-->>swizzle_scaling_factors: Complete
    end
    
    swizzle_scaling_factors-->>Caller: return

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

timmoon10 added 5 commits October 31, 2025 00:36

Initialize empty tensors with shape=[0] instead of shape=[].

5ce67ab

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix runtime crash in LayerNorm

8360331

Still seeing correctness issues. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Make sure norm workspace sizes are not zero

b9c9deb

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Remove assumption in swizzle kernel that data is available.

f6293f9

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Remove assumption in multi-swizzle kernel that data is available.

83707ca

Signed-off-by: Tim Moon <tmoon@nvidia.com>

pre-commit-ci bot and others added 5 commits November 1, 2025 00:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

30b837d

for more information, see https://pre-commit.ci

Remove unnecessary explicit call to default constructor

b9ec6e9

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Avoid accessing tensor data pointer if tensor has no entries

c5f22a3

Signed-off-by: Tim Moon <tmoon@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d05bb93

for more information, see https://pre-commit.ci

Merge branch 'main' into debug-empty-tensor

5dc3d15

timmoon10 marked this pull request as ready for review November 3, 2025 22:35

timmoon10 requested review from Oleg-Goncharov, phu0ngng and ptrendx November 3, 2025 22:36

greptile-apps bot reviewed Nov 3, 2025

View reviewed changes

transformer_engine/common/swizzle/swizzle.cu Outdated Show resolved Hide resolved

transformer_engine/common/swizzle/swizzle.cu Outdated Show resolved Hide resolved

timmoon10 and others added 2 commits November 3, 2025 14:53

Apply suggestions from code review

bd0ddc4

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

957172c

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Fix inconsistent logic in C++ tensor class #2330

[Core] Fix inconsistent logic in C++ tensor class #2330

Uh oh!

timmoon10 commented Nov 1, 2025 •

edited

Loading

Uh oh!

timmoon10 commented Nov 1, 2025

Uh oh!

timmoon10 commented Nov 3, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Core] Fix inconsistent logic in C++ tensor class #2330

Are you sure you want to change the base?

[Core] Fix inconsistent logic in C++ tensor class #2330

Uh oh!

Conversation

timmoon10 commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Nov 1, 2025

Uh oh!

timmoon10 commented Nov 3, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timmoon10 commented Nov 1, 2025 •

edited

Loading