[Common] Add kFloat64 partial support #2417

phu0ngng · 2025-11-24T23:23:44Z

Description

MCore wants to use moe_router_dtype=fp64, i.e., the resulting permuted_token_probs is a tensor of FP64 dtype.
This PR adds partial support for FP64, mainly for the usage in the fused router and padding APIs.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10

LGTM, pending CI

phu0ngng · 2025-11-25T02:13:26Z

/te-ci L0

greptile-apps · 2025-11-25T02:16:35Z

Greptile Overview

Greptile Summary

Adds partial FP64 (double precision) support to TransformerEngine, enabling MCore to use moe_router_dtype=fp64 for router operations. The implementation systematically adds kFloat64 enum value (11) across all type systems and ensures proper type mappings throughout the stack.

Key changes:

Added DType::kFloat64 and NVTEDType::kNVTEFloat64 enum values with consistent numbering (11)
Updated all critical type switch macros (TRANSFORMER_ENGINE_TYPE_SWITCH_ALL, TE_ROUTER_PROBS_TYPE_SWITCH_ALL, TE_ROUTER_INDEX_TYPE_SWITCH_ALL) to handle FP64 → double mapping
Integrated FP64 with CUDA runtime (CUDA_R_64F), PyTorch (at::kDouble), and Python bindings
Marked FP64 as a high precision dtype in is_high_precision_dtype()
Updated test infrastructure to include FP64 in type lists and mappings

Scope: This is intentionally partial support focused on router and padding APIs. Other specialized type switch macros (TRANSFORMER_ENGINE_TYPE_SWITCH_FLOAT, _OUTPUT, _INPUT, _NON_FP8ONLY) were not modified, which is appropriate for the stated use case.

Confidence Score: 5/5

This PR is safe to merge with no identified issues
The implementation is thorough, consistent, and correctly addresses the previously reported issue (line 503 now maps kFloat64 to double instead of float). All type mappings are correct across enums, CUDA types, PyTorch types, and Python bindings. The enum value (11) is consistently used, and all critical type switch macros include proper FP64 handling. The scope is appropriately limited to the stated use case.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
transformer_engine/common/include/transformer_engine/transformer_engine.h	5/5	Added `kFloat64` enum value (11) to both `NVTEDType` and `DType` enums, and updated `is_high_precision_dtype()` to include Float64
transformer_engine/common/common.h	5/5	Added `fp64` type alias for `double`, registered it with type system, added to TypeInfo tuple, and added `kFloat64` case to `TRANSFORMER_ENGINE_TYPE_SWITCH_ALL` macro
transformer_engine/common/fused_router/utils.h	5/5	Added `kFloat64` -> `double` mappings to both `TE_ROUTER_PROBS_TYPE_SWITCH_ALL` and `TE_ROUTER_INDEX_TYPE_SWITCH_ALL` macros
transformer_engine/pytorch/csrc/common.h	5/5	Added FP64 support to `typeToNumBits()`, `GetATenDType()` (maps to `at::kDouble`), and `GetTransformerEngineDType()` functions

Sequence Diagram

sequenceDiagram
    participant User as MCore/User Code
    participant PyBind as Python Binding
    participant PyTorch as PyTorch Interface
    participant Core as TE Core (common.h)
    participant Router as Fused Router
    participant CUDA as CUDA Runtime

    User->>PyBind: Request fp64 dtype (moe_router_dtype=fp64)
    PyBind->>PyTorch: Convert to DType::kFloat64
    PyTorch->>PyTorch: Map kFloat64 to at::kDouble
    PyTorch->>Core: Create tensor with DType::kFloat64
    Core->>Core: TRANSFORMER_ENGINE_TYPE_SWITCH_ALL<br/>maps kFloat64 -> double
    Core->>Router: Pass fp64 tensor to fused router
    Router->>Router: TE_ROUTER_PROBS_TYPE_SWITCH_ALL<br/>handles kFloat64 -> double
    Router->>CUDA: get_cuda_dtype(kFloat64)
    CUDA->>CUDA: Returns CUDA_R_64F
    CUDA-->>Router: 64-bit float operations
    Router-->>User: permuted_token_probs (FP64)

greptile-apps

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/common/common.h

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

phu0ngng · 2025-11-25T14:58:08Z

/te-ci L0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2025-11-25T16:10:11Z

/te-ci L0

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2025-11-26T16:13:08Z

/te-ci L0

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

added Float64

1e52343

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng force-pushed the dtype_fp64 branch from 7ac7987 to 1e52343 Compare November 24, 2025 23:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f09be2

for more information, see https://pre-commit.ci

timmoon10 previously approved these changes Nov 25, 2025

View reviewed changes

phu0ngng marked this pull request as ready for review November 25, 2025 02:13

greptile-apps bot reviewed Nov 25, 2025

View reviewed changes

transformer_engine/common/common.h Show resolved Hide resolved

fix mapping in common.h

a8ce721

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng dismissed timmoon10’s stale review via a8ce721 November 25, 2025 02:25

yaox12 previously approved these changes Nov 25, 2025

View reviewed changes

greptile-apps bot reviewed Nov 25, 2025

View reviewed changes

reset tests/cpp

0fbb553

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng dismissed yaox12’s stale review via 0fbb553 November 25, 2025 16:09

phu0ngng requested a review from ptrendx November 25, 2025 16:10

greptile-apps bot reviewed Nov 25, 2025

View reviewed changes

fix test_common.h compilation

70f22cd

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

greptile-apps bot reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Common] Add kFloat64 partial support #2417

[Common] Add kFloat64 partial support #2417

phu0ngng commented Nov 24, 2025 •

edited

Loading

Uh oh!

timmoon10 left a comment

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

greptile-apps bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

phu0ngng commented Nov 26, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Common] Add kFloat64 partial support #2417

Are you sure you want to change the base?

[Common] Add kFloat64 partial support #2417

Conversation

phu0ngng commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

greptile-apps bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

phu0ngng commented Nov 25, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Nov 26, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phu0ngng commented Nov 24, 2025 •

edited

Loading

greptile-apps bot commented Nov 25, 2025 •

edited

Loading