Skip to content

Add Qwen3 family recipes#259

Open
hanbitmyths wants to merge 4 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family
Open

Add Qwen3 family recipes#259
hanbitmyths wants to merge 4 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family

Conversation

@hanbitmyths
Copy link

This PR is to add recipes for Qwen3 family. 0.6B, 1.7B, 4B, 8B and 14B for CPU, CUDA, and WebGPU.

  • 0.6B-8B: KLD Gradient quantization.
  • 14B: k_quant_mixed quantization due to GPU memory limit.

… WebGPU

- 0.6B-8B: kld_gradient SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4)
- 14B: k_quant_mixed SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4)
- All models include cpu, cuda, and webgpu execution provider configs
- Standardized naming: {model}_{ep}_int4.json
Copilot AI review requested due to automatic review settings March 14, 2026 00:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Olive recipe bundles for the Qwen3 model family across CPU, CUDA, and WebGPU execution providers, using INT4 quantization (KLD-gradient-based mixed precision for 0.6B–8B and k_quant_mixed for 14B due to memory constraints).

Changes:

  • Add per-model CPU/CUDA/WebGPU recipe configs (*.json) plus info.yaml, requirements.txt, and backend READMEs.
  • Introduce Qwen3 14B recipes using k_quant_mixed instead of kld_gradient.
  • Rename/standardize some CPU recipe references (e.g., removing _kld_gradient suffix for 0.6B/4B).

Reviewed changes

Copilot reviewed 60 out of 62 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
Qwen-Qwen3-0.6B/LICENSE Add model license file.
Qwen-Qwen3-0.6B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-0.6B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-0.6B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-0.6B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-0.6B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-0.6B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-0.6B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-0.6B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-0.6B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-1.7B/LICENSE Add model license file.
Qwen-Qwen3-1.7B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-1.7B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-1.7B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-1.7B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-1.7B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-1.7B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-1.7B/cuda/Qwen-Qwen3-1.7B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-1.7B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-1.7B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-1.7B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-1.7B/webgpu/Qwen-Qwen3-1.7B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/LICENSE Add model license file.
Qwen-Qwen3-4B/cpu/info.yaml Register CPU recipe metadata (rename/standardize).
Qwen-Qwen3-4B/cpu/README.md Update CPU README to match recipe name/file.
Qwen-Qwen3-4B/cpu/Qwen-Qwen3-4B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-4B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-4B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-4B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-4B/webgpu/Qwen-Qwen3-4B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-4B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-4B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-4B/cuda/Qwen-Qwen3-4B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/LICENSE Add model license file.
Qwen-Qwen3-8B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-8B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-8B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-8B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-8B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-8B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-8B/cuda/Qwen-Qwen3-8B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-8B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-8B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-8B/webgpu/Qwen-Qwen3-8B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-14B/LICENSE Add model license file.
Qwen-Qwen3-14B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-14B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-14B/cpu/README.md Document CPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json Add CPU INT4 recipe config (k_quant_mixed).
Qwen-Qwen3-14B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-14B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-14B/cuda/README.md Document CUDA recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cuda/Qwen-Qwen3-14B_cuda_int4.json Add CUDA INT4 recipe config (k_quant_mixed).
Qwen-Qwen3-14B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-14B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-14B/webgpu/README.md Document WebGPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/webgpu/Qwen-Qwen3-14B_webgpu_int4.json Add WebGPU INT4 recipe config (k_quant_mixed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hanbitmyths and others added 2 commits March 15, 2026 00:30
…TieWordEmbeddings

- Add RTN pass with 8-bit quantization for lm_head and embeddings (with overrides)
- Add systems section with CPUExecutionProvider to all CPU configs
- Add TieWordEmbeddings graph surgery for 0.6B, 1.7B, 4B (tie_word_embeddings=true)
- Update group_size to 128 for CPU/CUDA, 32 for WebGPU
- Update all READMEs with accurate pipeline descriptions
- Address PR microsoft#259 review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants