-
Notifications
You must be signed in to change notification settings - Fork 565
[Common] Add kFloat64 partial support #2417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
for more information, see https://pre-commit.ci
timmoon10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending CI
|
/te-ci L0 |
Greptile OverviewGreptile SummaryAdds partial FP64 (double precision) support to TransformerEngine, enabling MCore to use Key changes:
Scope: This is intentionally partial support focused on router and padding APIs. Other specialized type switch macros ( Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User as MCore/User Code
participant PyBind as Python Binding
participant PyTorch as PyTorch Interface
participant Core as TE Core (common.h)
participant Router as Fused Router
participant CUDA as CUDA Runtime
User->>PyBind: Request fp64 dtype (moe_router_dtype=fp64)
PyBind->>PyTorch: Convert to DType::kFloat64
PyTorch->>PyTorch: Map kFloat64 to at::kDouble
PyTorch->>Core: Create tensor with DType::kFloat64
Core->>Core: TRANSFORMER_ENGINE_TYPE_SWITCH_ALL<br/>maps kFloat64 -> double
Core->>Router: Pass fp64 tensor to fused router
Router->>Router: TE_ROUTER_PROBS_TYPE_SWITCH_ALL<br/>handles kFloat64 -> double
Router->>CUDA: get_cuda_dtype(kFloat64)
CUDA->>CUDA: Returns CUDA_R_64F
CUDA-->>Router: 64-bit float operations
Router-->>User: permuted_token_probs (FP64)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 1 comment
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
|
/te-ci L0 |
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci L0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, no comments
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci L0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
Description
MCore wants to use
moe_router_dtype=fp64, i.e., the resultingpermuted_token_probsis a tensor of FP64 dtype.This PR adds partial support for FP64, mainly for the usage in the fused router and padding APIs.
Type of change
Checklist: