Fix: Use NormalizedTextConfigWithGQA for Qwen3 model #2382

Aznix07 · 2025-11-03T11:15:42Z

What does this PR do?

Fixes the Qwen3 model condfiguration to use NormalizedTextConfigWithGQA instead of NormalizedTextConfig.

Problem

Qwen3 models use Group Query Attention (GQA) architecture similar to Qwen2, with:

num_attention_heads: 16
num_key_value_heads: 8
head_dim: 128

The previous configuration used NormalizedTextConfig which does not have the num_key_value_heads attribute needed for GQA models, which causing:

Incorrect head dimension calculations (64 instead of 128)
ONNX export failures with error

Solution

Changed the normalized config class for qwen3 from NormalizedTextConfig to NormalizedTextConfigWithGQA in otimum/utils/normalized_config.py (line 314).

Testing

✅ Tested with Qwen/Qwen3-0.6B - all assertions passed:

Correct normalized config class
head_dim = 128
GQA structure validated

Fixes #2379

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Note: This is a configuration fix. No user-facing documentation changes needed. Manually verified the fix works correctly with Qwen3 models.

Who can review?

@echarlaix @JingyaHuang @michaelbenayounIlyasMoutawwakil

IlyasMoutawwakil · 2025-11-10T22:14:35Z

Hi ! Thanks for the catch ! what are the tests you ran exactly ? should this not be done on in optimum-onnx on the config level in https://github.com/huggingface/optimum-onnx/blob/c3db0acb978a916cf418350272242bb817276758/optimum/exporters/onnx/model_configs.py#L491

HuggingFaceDocBuilderDev · 2025-11-10T22:17:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Fix: Use NormalizedTextConfigWithGQA for Qwen models

de99128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Use NormalizedTextConfigWithGQA for Qwen3 model #2382

Fix: Use NormalizedTextConfigWithGQA for Qwen3 model #2382

Aznix07 commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 10, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Use NormalizedTextConfigWithGQA for Qwen3 model #2382

Are you sure you want to change the base?

Fix: Use NormalizedTextConfigWithGQA for Qwen3 model #2382

Conversation

Aznix07 commented Nov 3, 2025

What does this PR do?

Problem

Solution

Testing

Before submitting

Who can review?

Uh oh!

IlyasMoutawwakil commented Nov 10, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants