Flash is not flash #144

liujuncn · 2023-05-17T05:40:16Z

I test Flash attention vs HF GPT2 with pytorch lightning warp. But it is slow than transformers.GPT2LMHeadModel with same config parameters. Not sure where I am going wrong?

Purple is x-transformers flash attn.

`
class FlashAttentionLM(pl.LightningModule):

def __init__(self, config):
    super().__init__()
    model = TransformerWrapper(
        num_tokens = config.vocab_size,
        max_seq_len = config.seq_length,
        attn_layers = Decoder(
            dim = config.embd_size,
            depth = config.n_layer + 1,
            heads = 8,
            attn_flash = True
        )
    )
    self.model = AutoregressiveWrapper(model)

`

The text was updated successfully, but these errors were encountered:

lucidrains · 2023-05-20T16:01:55Z

you are comparing the entire transformer implementation, not the attention mechanism itself

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash is not flash #144

Flash is not flash #144

liujuncn commented May 17, 2023 •

edited

Loading

lucidrains commented May 20, 2023

Flash is not flash #144

Flash is not flash #144

Comments

liujuncn commented May 17, 2023 • edited Loading

lucidrains commented May 20, 2023

liujuncn commented May 17, 2023 •

edited

Loading