sdpa: support attn_mask.requires_grad, support expanded number of heads in attn_mask #1563
Azure Pipelines / lightning-thunder (GPUs) (testing ubuntu22.04 | cuda 12.1 | python 3.10 | torch 2.5.1 | distributed)
succeeded
Dec 17, 2024 in 23m 11s
testing ubuntu22.04 | cuda 12.1 | python 3.10 | torch 2.5.1 | distributed succeeded
Loading