Add Unified Sequence Parallel attention #12693

Bissmella · 2025-11-21T08:49:33Z

What does this PR do?

This is a draft implementation of the Unified SP attention approach.

Implements _all_to_all_dim_exchange with custom scatter and gather indices
Implements TemplatedUnifiedAttention

Core implementation complete, needs:

Testing
Validation

sayakpaul · 2025-11-21T09:00:07Z

It would be nice to get a testing script so that we can quickly check things.

KarthikSundar2002 · 2025-11-21T11:54:13Z

I added a basic test script with a simple forward and backward op. Is it better to have a test script with flash_attention_backward and forward??

… attention

…Ring Attention

bug fixes, lse calculation - switched to _all_to_all_single helper in _all_to_all_dim_exchange due contiguity issues bug fix bug fix bug fix

sayakpaul · 2025-11-29T11:59:17Z

Let us know if this is ready for a review!

Bissmella · 2025-11-29T12:42:43Z

Yep, ready for review! I tested it with a 4-process setup (2×2 mesh, on cpu) and everything checks out, shapes look good and gradients flow correctly. Looking forward for feedback and happy to address any issues.

Bissmella mentioned this pull request Nov 21, 2025

[feature] help us implement unified attention #12570

Open

Bissmella force-pushed the unified-SP-attention branch from a244006 to 9dee8f8 Compare November 24, 2025 10:54

Bissmella marked this pull request as ready for review November 24, 2025 10:56

Bissmella and others added 8 commits November 25, 2025 00:00

initial scheme of unified-sp

4b0c647

initial all_to_all_double

81494b8

bug fixes, added cmnts

83fc606

unified attention prototype done

fcb06e5

remove raising value error in contextParallelConfig to enable unified…

4b71777

… attention

bug fix

e0ed41e

feat: Adds Test for Unified SP Attention and Fixes a bug in Template …

3a407d8

…Ring Attention

bug fix, lse calculation, testing

9ebcff5

bug fixes, lse calculation - switched to _all_to_all_single helper in _all_to_all_dim_exchange due contiguity issues bug fix bug fix bug fix

Bissmella force-pushed the unified-SP-attention branch from 9dee8f8 to 9ebcff5 Compare November 24, 2025 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Unified Sequence Parallel attention #12693

Add Unified Sequence Parallel attention #12693

Bissmella commented Nov 21, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Nov 21, 2025

Uh oh!

KarthikSundar2002 commented Nov 21, 2025

Uh oh!

sayakpaul commented Nov 29, 2025

Uh oh!

Bissmella commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Unified Sequence Parallel attention #12693

Are you sure you want to change the base?

Add Unified Sequence Parallel attention #12693

Conversation

Bissmella commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sayakpaul commented Nov 21, 2025

Uh oh!

KarthikSundar2002 commented Nov 21, 2025

Uh oh!

sayakpaul commented Nov 29, 2025

Uh oh!

Bissmella commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bissmella commented Nov 21, 2025 •

edited

Loading