v1.8
#Release Notes – Release 1.8
Key Features and Enhancements
- [pyTorch] Added a new argument,
softmax_scale
, to theDotProductAttention
API. - [pyTorch] Extended Transformer Engine’s pyTorch build to always compile with tensor parallelism (TP) communication overlap support, and to remove MPI dependency. Also exposed the APIs
initialize_ub
anddestroy_ub
for communication-gemm overlap configuration. - [pyTorch] Improved documentation for the
DotProductAttention
API, including benchmarks and end-to-end test scripts. - [pyTorch] Incorporated the Fused Adam and Fused SGD optimizers into Transformer Engine. They previously had to be installed from the GitHub repository https://github.com/NVIDIA/apex.
Fixed Issues
- [pyTorch] Made internal changes to reduce the amount of CPU overhead.
- [pyTorch] Fixed a crash that occured when using TorchDynamo with the
checkpoint
API. - [pyTorch] Fixed an issue with loading an FP8 checkpoint when using FP8 attention.
Known Issues in This Release
There are no known issues in this release.
Breaking Changes in This Release
There are no breaking changes in this release.
Deprecated Features
There are no deprecated features in this release.