Support for Qwen3NextForCausalLM (Qwen3-Next-80B-A3B)

Heretic v1.2.0 crashes when loading Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 with an AttributeError on self_attn.o_proj.

Architecture: Qwen3NextForCausalLM (model_type: qwen3_next)

This is a massive MoE model (80B total / 3B active) with a non-standard layer structure. The main transformer layers have no self_attn attribute — only MoE MLP components. Attention exists only in the mtp.layers (multi-token prediction) head.

Layer 0 structure:

• mlp.experts.{0-511}.{down_proj,gate_proj,up_proj} (512 experts)
• mlp.shared_expert.{down_proj,gate_proj,up_proj}
• mlp.shared_expert_gate
• mlp.gate
• post_attention_layernorm (exists but no self_attn)

MTP layers have standard attention:

• mtp.layers.0.self_attn.{q_proj,k_proj,v_proj,o_proj,q_norm,k_norm}
• mtp.layers.0.mlp.experts.{0-511}.*

Traceback:
`heretic/model.py:345 in get_layer_modules
    try_add("attn.o_proj", layer.self_attn.o_proj)
AttributeError: 'Qwen3NextDecoderLayer' object has no attribute 'self_attn'`

GPU: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
Command: heretic --model /path/to/Qwen3-Next-80B-FP8 --study-checkpoint-dir /path/to/output --device-map auto

Similar to the NemotronH hybrid architecture in #166, this likely needs custom layer detection that scans all layers for the union of component types.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Qwen3NextForCausalLM (Qwen3-Next-80B-A3B) #181

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for Qwen3NextForCausalLM (Qwen3-Next-80B-A3B) #181

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions