Heretic v1.2.0 crashes when loading Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 with an AttributeError on self_attn.o_proj.
Architecture: Qwen3NextForCausalLM (model_type: qwen3_next)
This is a massive MoE model (80B total / 3B active) with a non-standard layer structure. The main transformer layers have no self_attn attribute — only MoE MLP components. Attention exists only in the mtp.layers (multi-token prediction) head.
Layer 0 structure:
• mlp.experts.{0-511}.{down_proj,gate_proj,up_proj} (512 experts)
• mlp.shared_expert.{down_proj,gate_proj,up_proj}
• mlp.shared_expert_gate
• mlp.gate
• post_attention_layernorm (exists but no self_attn)
MTP layers have standard attention:
• mtp.layers.0.self_attn.{q_proj,k_proj,v_proj,o_proj,q_norm,k_norm}
• mtp.layers.0.mlp.experts.{0-511}.*
Traceback:
heretic/model.py:345 in get_layer_modules try_add("attn.o_proj", layer.self_attn.o_proj) AttributeError: 'Qwen3NextDecoderLayer' object has no attribute 'self_attn'
GPU: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
Command: heretic --model /path/to/Qwen3-Next-80B-FP8 --study-checkpoint-dir /path/to/output --device-map auto
Similar to the NemotronH hybrid architecture in #166, this likely needs custom layer detection that scans all layers for the union of component types.
Heretic v1.2.0 crashes when loading Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 with an AttributeError on self_attn.o_proj.
Architecture: Qwen3NextForCausalLM (model_type: qwen3_next)
This is a massive MoE model (80B total / 3B active) with a non-standard layer structure. The main transformer layers have no self_attn attribute — only MoE MLP components. Attention exists only in the mtp.layers (multi-token prediction) head.
Layer 0 structure:
• mlp.experts.{0-511}.{down_proj,gate_proj,up_proj} (512 experts)
• mlp.shared_expert.{down_proj,gate_proj,up_proj}
• mlp.shared_expert_gate
• mlp.gate
• post_attention_layernorm (exists but no self_attn)
MTP layers have standard attention:
• mtp.layers.0.self_attn.{q_proj,k_proj,v_proj,o_proj,q_norm,k_norm}
• mtp.layers.0.mlp.experts.{0-511}.*
Traceback:
heretic/model.py:345 in get_layer_modules try_add("attn.o_proj", layer.self_attn.o_proj) AttributeError: 'Qwen3NextDecoderLayer' object has no attribute 'self_attn'GPU: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
Command: heretic --model /path/to/Qwen3-Next-80B-FP8 --study-checkpoint-dir /path/to/output --device-map auto
Similar to the NemotronH hybrid architecture in #166, this likely needs custom layer detection that scans all layers for the union of component types.