-
Notifications
You must be signed in to change notification settings - Fork 513
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Bump minimum cuDNN version for fused attention with FP8 current scaling
bug
Something isn't working
#2236
opened Oct 4, 2025 by
timmoon10
Loading…
7 of 13 tasks
[JAX] xla_home logging during JAX build
#2232
opened Oct 3, 2025 by
jberchtold-nvidia
Loading…
13 tasks
[JAX] Fix for GEMM + fuse bias + AllReduce
#2230
opened Oct 2, 2025 by
phu0ngng
Loading…
7 of 13 tasks
Enable SWA with CP for THD input format
#2220
opened Sep 30, 2025 by
sudhakarsingh27
Loading…
1 of 6 tasks
[Draft][PyTorch][MOE] Support NVFP4 Grouped Linear
#2215
opened Sep 30, 2025 by
zhongbozhu
Loading…
3 of 17 tasks
[JAX][Draft] Async issuing D2H memcpy for grouped_gemm group_sizes array
#2213
opened Sep 29, 2025 by
huanghua1994
Loading…
6 of 13 tasks
Test to see if SWA and Causal compute can be removed from seqlens and…
#2201
opened Sep 25, 2025 by
KshitijLakhani
•
Draft
13 tasks
Honor COMPACT data_format for FP8 blockwise scales in MoE up-projection path to remove 5× redundant rowwise_scale_inv.T.contiguous() passes
#2199
opened Sep 24, 2025 by
xiaoxi-wangfj
Loading…
2 of 13 tasks
[PyTorch] Add max_score support for MuonClip
2.9.0
#2195
opened Sep 22, 2025 by
cyanguwa
Loading…
8 of 13 tasks
[Feature] Enable rope application with offsets for training
2.9.0
#2188
opened Sep 19, 2025 by
sudhakarsingh27
Loading…
1 of 13 tasks
Context Parallel integration tests with a transformer layer: BSHD and THD + CP
2.9.0
#2176
opened Sep 16, 2025 by
jomitchellnv
Loading…
7 of 13 tasks
blockwise fp8 weight memory optimization: on-demand columnwise fp8 weight creation
#2168
opened Sep 10, 2025 by
skydoorkai
Loading…
7 of 13 tasks
[Common][PyTorch][Rework] PDL for Quantization
#2150
opened Sep 4, 2025 by
yaox12
Loading…
1 of 13 tasks
[main][feature][under updating]adapt for offload activation
#2145
opened Sep 2, 2025 by
GeYuhong
Loading…
1 of 13 tasks
[PyTorch] Add record_stream and untyped_storage func op in QuantizedTensor
#2144
opened Sep 2, 2025 by
xiaoxi-wangfj
Loading…
1 of 13 tasks
[PyTorch Debug] Support precision debug tools for fp8 model parameters.
#2141
opened Sep 1, 2025 by
pggPL
Loading…
8 of 13 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.