gpt-oss model enablement #1754

wwwjn · 2025-09-24T20:37:33Z

Keep developing on top of #1559. Thanks @KhoomeiK for initial contribution!

Initialized by the same seed checkpoint, set seed=0 and deterministic = True.

Run 1: dp_shard = 2

Run 2: dp_shard = 2, TP degree = 2 (NGPU=4)

Run 3: dp_shard = 2, TP degree =2, EP degree = 2 (NGPU=4)

Run 4: dp_shard = 2, TP degree = 2, EP degree = 2, ETP degree = 2 (NGPU=4)

Run 5: dp_shard=2, EP degree = 2 (NGPU=2)

…ks but reduces mfu for 20b

wwwjn · 2025-09-24T21:48:48Z

torchtitan/models/attention.py

        block_mask = FlexAttention.block_masks[self.mask_key]
        return FlexAttention.flex_attn(q, k, v, block_mask=block_mask, scale=scale)

+    def _forward_with_sink(


Wants some early comments / suggestions @fegin @tianyu-l

LGTM.

I'm curious how expensive it is to always return lse. If it is actually no cost, we can merge the FlexAttention call to the original forward.

cc., @drisspg

wwwjn · 2025-09-30T22:55:49Z

Need to rebase onto #1776

wwwjn · 2025-09-30T23:08:47Z

torchtitan/models/attention.py

+]  # (mask_type, fixed_block_size, sliding_window)


 class FlexAttention(torch.nn.Module):


@wwwjn will rebase onto #1776

Sorry for the disruption. I should have done this earlier.

As for the FlexAttention, @drisspg confirmed that, while it is probably just a minor overhead, the AuxOutput does incur some extra memory and memory write. So let's keep it optional.

Rohan Pandey and others added 9 commits September 23, 2025 13:35

gptoss experimental support

9461315

clean up tentative licensing

371f204

training fixes: expert load balancing, TP for sinks + experts, EP wor…

4957bb0

…ks but reduces mfu for 20b

only assert sdpa backends if using sdpa; improve conversion script

c3fc9e7

fixed conversion script with param by param

b696028

new lse-based flexattn implementation for sinks

4010fa2

test

2e71aaf

rebase

122e93a

fix flexattn

589ce62

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 24, 2025

wwwjn commented Sep 24, 2025

View reviewed changes

wwwjn added 2 commits September 24, 2025 15:54

check and replace rope

4fc78a3

FSDP work, TP doesn't work

b28fe7c

wwwjn force-pushed the gpt-oss branch from ca57b78 to b28fe7c Compare September 29, 2025 21:26

wwwjn added 2 commits September 29, 2025 15:41

test

bb8ee6f

fix sink

07c0ff4

wwwjn force-pushed the gpt-oss branch from 48b2a11 to 07c0ff4 Compare September 30, 2025 04:34

wwwjn added 3 commits September 29, 2025 22:55

test EP

a2727a6

working on ETP

e7f9a56

clean up

ef146e1

Merge branch 'main' into gpt-oss

6f41f6c

wwwjn marked this pull request as ready for review September 30, 2025 23:01

wwwjn requested review from tianyu-l and wconstab as code owners September 30, 2025 23:01

wwwjn added 2 commits September 30, 2025 16:01

clean up

2b47774

fix lint

cd89d26

wwwjn changed the title ~~[WIP] gpt-oss model enablement~~ gpt-oss model enablement Sep 30, 2025

wwwjn commented Sep 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt-oss model enablement #1754

gpt-oss model enablement #1754

wwwjn commented Sep 24, 2025 •

edited

Loading

Uh oh!

wwwjn Sep 24, 2025

Uh oh!

fegin Sep 25, 2025

Uh oh!

wwwjn commented Sep 30, 2025

Uh oh!

wwwjn Sep 30, 2025

Uh oh!

fegin Oct 1, 2025

Uh oh!

Uh oh!

		] # (mask_type, fixed_block_size, sliding_window)


		class FlexAttention(torch.nn.Module):

gpt-oss model enablement #1754

Are you sure you want to change the base?

gpt-oss model enablement #1754

Conversation

wwwjn commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwjn Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

fegin Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn commented Sep 30, 2025

Uh oh!

wwwjn Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

fegin Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wwwjn commented Sep 24, 2025 •

edited

Loading