Skip to content

Conversation

@phu0ngng
Copy link
Collaborator

@phu0ngng phu0ngng commented Nov 24, 2025

Description

MCore wants to use moe_router_dtype=fp64, i.e., the resulting permuted_token_probs is a tensor of FP64 dtype.
This PR adds partial support for FP64, mainly for the usage in the fused router and padding APIs.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
timmoon10
timmoon10 previously approved these changes Nov 25, 2025
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending CI

@phu0ngng phu0ngng marked this pull request as ready for review November 25, 2025 02:13
@phu0ngng
Copy link
Collaborator Author

/te-ci L0

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 25, 2025

Greptile Overview

Greptile Summary

Adds partial FP64 (double precision) support to TransformerEngine, enabling MCore to use moe_router_dtype=fp64 for router operations. The implementation systematically adds kFloat64 enum value (11) across all type systems and ensures proper type mappings throughout the stack.

Key changes:

  • Added DType::kFloat64 and NVTEDType::kNVTEFloat64 enum values with consistent numbering (11)
  • Updated all critical type switch macros (TRANSFORMER_ENGINE_TYPE_SWITCH_ALL, TE_ROUTER_PROBS_TYPE_SWITCH_ALL, TE_ROUTER_INDEX_TYPE_SWITCH_ALL) to handle FP64 → double mapping
  • Integrated FP64 with CUDA runtime (CUDA_R_64F), PyTorch (at::kDouble), and Python bindings
  • Marked FP64 as a high precision dtype in is_high_precision_dtype()
  • Updated test infrastructure to include FP64 in type lists and mappings

Scope: This is intentionally partial support focused on router and padding APIs. Other specialized type switch macros (TRANSFORMER_ENGINE_TYPE_SWITCH_FLOAT, _OUTPUT, _INPUT, _NON_FP8ONLY) were not modified, which is appropriate for the stated use case.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation is thorough, consistent, and correctly addresses the previously reported issue (line 503 now maps kFloat64 to double instead of float). All type mappings are correct across enums, CUDA types, PyTorch types, and Python bindings. The enum value (11) is consistently used, and all critical type switch macros include proper FP64 handling. The scope is appropriately limited to the stated use case.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
transformer_engine/common/include/transformer_engine/transformer_engine.h 5/5 Added kFloat64 enum value (11) to both NVTEDType and DType enums, and updated is_high_precision_dtype() to include Float64
transformer_engine/common/common.h 5/5 Added fp64 type alias for double, registered it with type system, added to TypeInfo tuple, and added kFloat64 case to TRANSFORMER_ENGINE_TYPE_SWITCH_ALL macro
transformer_engine/common/fused_router/utils.h 5/5 Added kFloat64 -> double mappings to both TE_ROUTER_PROBS_TYPE_SWITCH_ALL and TE_ROUTER_INDEX_TYPE_SWITCH_ALL macros
transformer_engine/pytorch/csrc/common.h 5/5 Added FP64 support to typeToNumBits(), GetATenDType() (maps to at::kDouble), and GetTransformerEngineDType() functions

Sequence Diagram

sequenceDiagram
    participant User as MCore/User Code
    participant PyBind as Python Binding
    participant PyTorch as PyTorch Interface
    participant Core as TE Core (common.h)
    participant Router as Fused Router
    participant CUDA as CUDA Runtime

    User->>PyBind: Request fp64 dtype (moe_router_dtype=fp64)
    PyBind->>PyTorch: Convert to DType::kFloat64
    PyTorch->>PyTorch: Map kFloat64 to at::kDouble
    PyTorch->>Core: Create tensor with DType::kFloat64
    Core->>Core: TRANSFORMER_ENGINE_TYPE_SWITCH_ALL<br/>maps kFloat64 -> double
    Core->>Router: Pass fp64 tensor to fused router
    Router->>Router: TE_ROUTER_PROBS_TYPE_SWITCH_ALL<br/>handles kFloat64 -> double
    Router->>CUDA: get_cuda_dtype(kFloat64)
    CUDA->>CUDA: Returns CUDA_R_64F
    CUDA-->>Router: 64-bit float operations
    Router-->>User: permuted_token_probs (FP64)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
yaox12
yaox12 previously approved these changes Nov 25, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@phu0ngng
Copy link
Collaborator Author

/te-ci L0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
@phu0ngng
Copy link
Collaborator Author

/te-ci L0

@phu0ngng phu0ngng requested a review from ptrendx November 25, 2025 16:10
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
@phu0ngng
Copy link
Collaborator Author

/te-ci L0

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants