Skip to content

Support for Qwen3NextForCausalLM (Qwen3-Next-80B-A3B) #181

@amq8

Description

@amq8

Heretic v1.2.0 crashes when loading Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 with an AttributeError on self_attn.o_proj.

Architecture: Qwen3NextForCausalLM (model_type: qwen3_next)

This is a massive MoE model (80B total / 3B active) with a non-standard layer structure. The main transformer layers have no self_attn attribute — only MoE MLP components. Attention exists only in the mtp.layers (multi-token prediction) head.

Layer 0 structure:

• mlp.experts.{0-511}.{down_proj,gate_proj,up_proj} (512 experts)
• mlp.shared_expert.{down_proj,gate_proj,up_proj}
• mlp.shared_expert_gate
• mlp.gate
• post_attention_layernorm (exists but no self_attn)

MTP layers have standard attention:

• mtp.layers.0.self_attn.{q_proj,k_proj,v_proj,o_proj,q_norm,k_norm}
• mtp.layers.0.mlp.experts.{0-511}.*

Traceback:
heretic/model.py:345 in get_layer_modules try_add("attn.o_proj", layer.self_attn.o_proj) AttributeError: 'Qwen3NextDecoderLayer' object has no attribute 'self_attn'

GPU: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
Command: heretic --model /path/to/Qwen3-Next-80B-FP8 --study-checkpoint-dir /path/to/output --device-map auto

Similar to the NemotronH hybrid architecture in #166, this likely needs custom layer detection that scans all layers for the union of component types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions