Skip to content

wait_for_kv() waits for K & V but not Q — what guarantees Q is ready? #4

@Beibei-Zhou

Description

@Beibei-Zhou

Issue Description

In attention_partial::launcher::wait_for_kv the code checks the Bar counters for the previous RMS_QKV_MatVecRopeAppend K and V blocks:

// K
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + kv_head_idx}] < 4) { … }

// V
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + num_kv_heads + kv_head_idx}] < 4) { … }

There is no analogous polling for the corresponding Q block(s).

Because Q, K and V are generated by the same upstream opcode, I’d expect all three to be needed. What ensures that Q is already available (or otherwise not required) when PartialAttention begins?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions