Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #15331: Support cuDNN frontend scaled dot product attention for FP8. Part- 2(backward) #18100

Merged
merged 1 commit into from
Oct 9, 2024

Commits on Oct 9, 2024

  1. PR #15331: Support cuDNN frontend scaled dot product attention for FP…

    …8. Part- 2(backward)
    
    Imported from GitHub PR #15331
    
    As the 2nd part of #15092.
    NOTE: this feature relies on cudnn-frontend v1.6.1 which is not in XLA yet.
    Copybara import of the project:
    
    --
    06db3c8 by shuw <shuw@nvidia.com>:
    
    Scaled dot product attention implementation by cudnn.
    
    --
    937b0e2 by shuw <shuw@nvidia.com>:
    
    Improve after review 1
    
    --
    398b2ba by shuw <shuw@nvidia.com>:
    
    clang-format
    
    --
    0825789 by Shu Wang <shuw@nvidia.com>:
    
    fix typo.
    --
    d0ae3cf by shuw <shuw@nvidia.com>:
    
    Refactor test
    
    Merging this change closes #15331
    
    COPYBARA_INTEGRATE_REVIEW=#15331 from wenscarl:sdpa_fp8_bwd d0ae3cf
    PiperOrigin-RevId: 684062495
    wenscarl authored and Google-ML-Automation committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    467563e View commit details
    Browse the repository at this point in the history