Add Qwen3 family recipes by hanbitmyths · Pull Request #259 · microsoft/olive-recipes

hanbitmyths · 2026-03-14T00:57:55Z

This PR is to add recipes for Qwen3 family. 0.6B, 1.7B, 4B, 8B and 14B for CPU, CUDA, and WebGPU.

0.6B-8B: KLD Gradient quantization.
14B: k_quant_mixed quantization due to GPU memory limit.

… WebGPU - 0.6B-8B: kld_gradient SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4) - 14B: k_quant_mixed SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4) - All models include cpu, cuda, and webgpu execution provider configs - Standardized naming: {model}_{ep}_int4.json

Copilot

Pull request overview

Adds Olive recipe bundles for the Qwen3 model family across CPU, CUDA, and WebGPU execution providers, using INT4 quantization (KLD-gradient-based mixed precision for 0.6B–8B and k_quant_mixed for 14B due to memory constraints).

Changes:

Add per-model CPU/CUDA/WebGPU recipe configs (*.json) plus info.yaml, requirements.txt, and backend READMEs.
Introduce Qwen3 14B recipes using k_quant_mixed instead of kld_gradient.
Rename/standardize some CPU recipe references (e.g., removing _kld_gradient suffix for 0.6B/4B).

Reviewed changes

Copilot reviewed 60 out of 62 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
Qwen-Qwen3-0.6B/LICENSE	Add model license file.
Qwen-Qwen3-0.6B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-0.6B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-0.6B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-0.6B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-0.6B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-0.6B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-0.6B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-0.6B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-0.6B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-1.7B/LICENSE	Add model license file.
Qwen-Qwen3-1.7B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-1.7B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-1.7B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-1.7B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-1.7B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-1.7B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-1.7B/cuda/Qwen-Qwen3-1.7B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-1.7B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-1.7B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-1.7B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-1.7B/webgpu/Qwen-Qwen3-1.7B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/LICENSE	Add model license file.
Qwen-Qwen3-4B/cpu/info.yaml	Register CPU recipe metadata (rename/standardize).
Qwen-Qwen3-4B/cpu/README.md	Update CPU README to match recipe name/file.
Qwen-Qwen3-4B/cpu/Qwen-Qwen3-4B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-4B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-4B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-4B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-4B/webgpu/Qwen-Qwen3-4B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-4B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-4B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-4B/cuda/Qwen-Qwen3-4B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/LICENSE	Add model license file.
Qwen-Qwen3-8B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-8B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-8B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-8B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-8B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-8B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-8B/cuda/Qwen-Qwen3-8B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-8B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-8B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-8B/webgpu/Qwen-Qwen3-8B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-14B/LICENSE	Add model license file.
Qwen-Qwen3-14B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-14B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-14B/cpu/README.md	Document CPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json	Add CPU INT4 recipe config (`k_quant_mixed`).
Qwen-Qwen3-14B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-14B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-14B/cuda/README.md	Document CUDA recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cuda/Qwen-Qwen3-14B_cuda_int4.json	Add CUDA INT4 recipe config (`k_quant_mixed`).
Qwen-Qwen3-14B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-14B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-14B/webgpu/README.md	Document WebGPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/webgpu/Qwen-Qwen3-14B_webgpu_int4.json	Add WebGPU INT4 recipe config (`k_quant_mixed`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json

Qwen-Qwen3-14B/cuda/README.md

Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json

Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json

Qwen-Qwen3-14B/cpu/README.md

Qwen-Qwen3-14B/webgpu/README.md

…TieWordEmbeddings - Add RTN pass with 8-bit quantization for lm_head and embeddings (with overrides) - Add systems section with CPUExecutionProvider to all CPU configs - Add TieWordEmbeddings graph surgery for 0.6B, 1.7B, 4B (tie_word_embeddings=true) - Update group_size to 128 for CPU/CUDA, 32 for WebGPU - Update all READMEs with accurate pipeline descriptions - Address PR microsoft#259 review comments

Copilot AI review requested due to automatic review settings March 14, 2026 00:57

Merge branch 'main' into sunghcho/qwen3-family

7d7a204

Copilot started reviewing on behalf of hanbitmyths March 14, 2026 00:58 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

hanbitmyths and others added 2 commits March 15, 2026 00:30

Merge branch 'main' into sunghcho/qwen3-family

82df52a

kunal-vaishnavi mentioned this pull request Mar 17, 2026

Add Qwen-3 LLM recipes #221

Closed

shaahji approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 family recipes#259

Add Qwen3 family recipes#259
hanbitmyths wants to merge 4 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family

hanbitmyths commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hanbitmyths commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants