Skip to content

Conversation

@timmoon10
Copy link
Collaborator

@timmoon10 timmoon10 commented Nov 1, 2025

Description

This PR fixes some hacky logic in the C++ Tensor class:

  • Construct uninitialized tensors with shape=[0]. Previously we constructed them as 0-D tensors, which should have one entry (see [PyTorch][MOE] Support NVFP4 Grouped Linear #2215 (comment)).
  • Make Tensor::has_data and Tensor::has_columnwise_data have a consistent meaning: "is it safe to touch the data pointer?".
  • Fix code that use Tensor::has_data/Tensor::has_columnwise_data inappropriately (Tensor::shape, swizzling). These are cases where the data is initialized but the pointer is not safe to touch, e.g. when the tensor has a zero dim. Note that I've only fixed places that were causing problems, and I haven't fully gone through the entire code base.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Construct uninitialized tensors with zero entries.
  • Fix logic in Tensor::has_data and Tensor::has_columnwise_data.
  • Remove inappropriate usage of Tensor::has_data/Tensor::has_columnwise_data from Tensor::shape and swizzling.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Still seeing correctness issues.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 marked this pull request as ready for review November 3, 2025 22:35
@timmoon10
Copy link
Collaborator Author

/te-ci L1

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Refactored C++ Tensor class to fix inconsistent initialization and data checking logic by using shape=[0] for uninitialized tensors instead of empty shapes.

Key Changes:

  • Updated SimpleTensor and Tensor default constructors to initialize with shape=[0] instead of shape=[], fixing semantic confusion between 0-D tensors (which have 1 entry) and uninitialized tensors
  • Refactored has_data() and has_columnwise_data() to consistently check both pointer validity and non-zero element count, making them reliable indicators of "is it safe to touch the data pointer?"
  • Updated shape() and dim() methods to use explicit is_shape_trivial() checks instead of relying on has_data()/has_columnwise_data(), fixing cases where data is initialized but pointer shouldn't be accessed (e.g., zero-dim tensors)
  • Propagated changes throughout normalization, swizzle, transpose, and PyTorch integration code to use numel() != 0 checks instead of shape.empty()
  • Added defensive null checks in GEMM code to prevent out-of-bounds access with empty tensors
  • Fixed workspace size queries to return {1} instead of {0} to distinguish from uninitialized state

Issues Found:

  • Two error messages in swizzle.cu incorrectly reference input->scale_inv.shape instead of input->columnwise_scale_inv.shape (lines 378 and 392)

Confidence Score: 4/5

  • This PR is safe to merge after fixing the two error message bugs in swizzle.cu
  • The refactoring improves code correctness by fixing semantic confusion between 0-D and uninitialized tensors. The changes are well-tested and systematic. Score is 4 instead of 5 due to: (1) two copy-paste errors in error messages that would display incorrect variable names during debugging, and (2) the extensive nature of the refactoring across 11 files increases risk of edge cases
  • Pay close attention to transformer_engine/common/swizzle/swizzle.cu which has two error message bugs on lines 378 and 392

Important Files Changed

File Analysis

Filename Score Overview
transformer_engine/common/common.h 4/5 Updated Tensor class to use shape=[0] for uninitialized tensors instead of shape=[]; refactored has_data() and has_columnwise_data() to check both pointer and numel(); improved shape() method logic with consistent is_shape_trivial checks
transformer_engine/common/include/transformer_engine/transformer_engine.h 5/5 Updated TensorWrapper to return shape=[0] for null tensors instead of shape=[]; added emptyShape constant; improved constructor logic to set appropriate shapes for null pointers
transformer_engine/common/transformer_engine.cpp 5/5 Updated CheckScaleTensorShape to distinguish between FP8 and high-precision tensors; changed nvte_get_tensor_param to return shape=[0] for null tensors
transformer_engine/common/swizzle/swizzle.cu 3/5 Refactored swizzle_scaling_factors to check pointer and numel() instead of has_data(); improved logic with explicit has_rowwise_scale_inv/has_columnwise_scale_inv checks. Found two error messages that incorrectly reference scale_inv.shape instead of columnwise_scale_inv.shape

Sequence Diagram

sequenceDiagram
    participant App as Application Code
    participant Tensor as Tensor Class
    participant SimpleTensor as SimpleTensor
    participant Wrapper as TensorWrapper
    
    Note over App,Wrapper: Uninitialized Tensor Creation
    App->>Tensor: Tensor()
    Tensor->>SimpleTensor: SimpleTensor()
    SimpleTensor->>SimpleTensor: Set shape=[0], dptr=nullptr
    SimpleTensor-->>Tensor: Uninitialized SimpleTensor
    Note over Tensor: All scale tensors initialized with shape=[0]
    Tensor-->>App: Uninitialized Tensor
    
    Note over App,Wrapper: Checking Tensor State
    App->>Tensor: has_data()
    Tensor->>Tensor: Check: dptr != nullptr && numel() != 0
    Tensor-->>App: false (uninitialized)
    
    Note over App,Wrapper: After Data Assignment
    App->>Tensor: Set data.dptr, data.shape
    App->>Tensor: has_data()
    Tensor->>SimpleTensor: numel()
    SimpleTensor->>SimpleTensor: Calculate product of shape dims
    SimpleTensor-->>Tensor: numel > 0
    Tensor->>Tensor: Check: dptr != nullptr && numel() != 0
    Tensor-->>App: true (has valid data)
    
    Note over App,Wrapper: TensorWrapper Integration
    App->>Wrapper: TensorWrapper(dptr, shape, dtype)
    Wrapper->>Wrapper: Check if amax_dptr == nullptr
    alt amax_dptr is null
        Wrapper->>Wrapper: Use emptyShape {0}
    else amax_dptr is valid
        Wrapper->>Wrapper: Use defaultShape {1}
    end
    Wrapper-->>App: Configured TensorWrapper
Loading

11 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

timmoon10 and others added 2 commits November 3, 2025 14:53
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes inconsistent logic in the C++ Tensor class by standardizing how uninitialized and empty tensors are handled.

Key Changes:

  • Uninitialized tensors now constructed with shape=[0] instead of shape=[] (0-D tensors should have 1 entry)
  • Tensor::has_data() and Tensor::has_columnwise_data() now consistently check both pointer validity AND non-zero element count
  • Swizzle functions updated to explicitly check pointer and size instead of relying on has_data()/has_columnwise_data(), which allows proper handling of edge cases where tensors have valid pointers but zero dimensions

Impact:

  • Fixes potential segfaults when accessing data pointers of empty tensors
  • Ensures consistent behavior across the codebase when checking if tensor data can be safely accessed
  • Previous comment noted error messages referenced wrong variable (scale_inv.shape instead of columnwise_scale_inv.shape), which was already corrected in lines 378 and 393

Confidence Score: 5/5

  • This PR is safe to merge - it's a well-structured refactoring that fixes edge case handling
  • The changes are defensive and fix potential bugs with empty tensor handling. The refactoring replaces implicit checks with explicit pointer and size validation, making the code more robust and easier to reason about. The logic is consistent throughout and follows a clear pattern.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
transformer_engine/common/swizzle/swizzle.cu 5/5 Replaces inappropriate uses of has_data()/has_columnwise_data() with explicit pointer and size checks to handle empty tensors correctly

Sequence Diagram

sequenceDiagram
    participant Caller
    participant swizzle_scaling_factors
    participant Tensor
    participant CUDA_Kernel
    
    Caller->>swizzle_scaling_factors: input, output, stream
    swizzle_scaling_factors->>swizzle_scaling_factors: Check scaling mode
    swizzle_scaling_factors->>Tensor: CheckInputTensor(input)
    swizzle_scaling_factors->>Tensor: CheckInputTensor(output)
    
    Note over swizzle_scaling_factors: Check if scaling factors exist
    swizzle_scaling_factors->>Tensor: scale_inv.dptr != nullptr?
    swizzle_scaling_factors->>Tensor: scale_inv.numel() != 0?
    swizzle_scaling_factors->>Tensor: columnwise_scale_inv.dptr != nullptr?
    swizzle_scaling_factors->>Tensor: columnwise_scale_inv.numel() != 0?
    
    alt No scaling factors
        swizzle_scaling_factors-->>Caller: return early
    end
    
    Note over swizzle_scaling_factors: Deduce tensor dimensions from shape
    swizzle_scaling_factors->>Tensor: Get scale_inv.shape or columnwise_scale_inv.shape
    
    Note over swizzle_scaling_factors: Validate dimensions
    swizzle_scaling_factors->>swizzle_scaling_factors: Check m % SF_TILE_DIM_M == 0
    swizzle_scaling_factors->>swizzle_scaling_factors: Check k % SF_TILE_DIM_K == 0
    
    Note over swizzle_scaling_factors: Choose swizzle implementation
    alt Rowwise swizzle needed
        swizzle_scaling_factors->>CUDA_Kernel: swizzle_row_scaling_kernel
        CUDA_Kernel-->>swizzle_scaling_factors: Complete
    end
    
    alt Columnwise swizzle needed
        swizzle_scaling_factors->>CUDA_Kernel: swizzle_col_scaling_kernel
        CUDA_Kernel-->>swizzle_scaling_factors: Complete
    end
    
    swizzle_scaling_factors-->>Caller: return
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant