-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
I've noticed that there's no implementation of the new architecture feature (DSA, sparse attn for Deepseek v3.2) in the main simulation procedure.
Though there is "index_topk" in model_config (but without index_head_dim, index_n_heads, etc...). Also a kernel_sim program for DSA (sparse_mla_fp8.py) and some csv result files. But it seems unrelated to the main and model code.
Is the DSA feature in roadmap?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels