PR #15331: Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward) #18100

…8. Part- 2(backward) Imported from GitHub PR #15331 As the 2nd part of #15092. NOTE: this feature relies on cudnn-frontend v1.6.1 which is not in XLA yet. Copybara import of the project: -- 06db3c8 by shuw <shuw@nvidia.com>: Scaled dot product attention implementation by cudnn. -- 937b0e2 by shuw <shuw@nvidia.com>: Improve after review 1 -- 398b2ba by shuw <shuw@nvidia.com>: clang-format -- 0825789 by Shu Wang <shuw@nvidia.com>: fix typo. -- d0ae3cf by shuw <shuw@nvidia.com>: Refactor test Merging this change closes #15331 COPYBARA_INTEGRATE_REVIEW=#15331 from wenscarl:sdpa_fp8_bwd d0ae3cf PiperOrigin-RevId: 684062495

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #15331: Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward) #18100

PR #15331: Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward) #18100

Commits on Oct 9, 2024