Skip to content

Conversation

@Aznix07
Copy link

@Aznix07 Aznix07 commented Nov 3, 2025

What does this PR do?

Fixes the Qwen3 model condfiguration to use NormalizedTextConfigWithGQA instead of NormalizedTextConfig.

Problem

Qwen3 models use Group Query Attention (GQA) architecture similar to Qwen2, with:

  • num_attention_heads: 16
  • num_key_value_heads: 8
  • head_dim: 128

The previous configuration used NormalizedTextConfig which does not have the num_key_value_heads attribute needed for GQA models, which causing:

  1. Incorrect head dimension calculations (64 instead of 128)
  2. ONNX export failures with error

Solution

Changed the normalized config class for qwen3 from NormalizedTextConfig to NormalizedTextConfigWithGQA in otimum/utils/normalized_config.py (line 314).

Testing

✅ Tested with Qwen/Qwen3-0.6B - all assertions passed:

  • Correct normalized config class
  • head_dim = 128
  • GQA structure validated

Fixes #2379

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Note: This is a configuration fix. No user-facing documentation changes needed. Manually verified the fix works correctly with Qwen3 models.

Who can review?

@echarlaix @JingyaHuang @michaelbenayounIlyasMoutawwakil

@IlyasMoutawwakil
Copy link
Member

Hi ! Thanks for the catch ! what are the tests you ran exactly ? should this not be done on in optimum-onnx on the config level in https://github.com/huggingface/optimum-onnx/blob/c3db0acb978a916cf418350272242bb817276758/optimum/exporters/onnx/model_configs.py#L491

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qwen3-0.6 to onnx : index: 3 Got: 64 Expected: 128

3 participants