add an example for quantizing LLaMa 4 Scout #3408

vkuzo · 2025-12-01T21:39:18Z

Summary:

Adds an e2e example of how to use torchao to quantize LLaMa 4 Scout.

Note that this needs:

a recent transformers version (higher than 4.57, not officially
released yet so user needs to build from source)
a recent fbgemm_gpu version nightly from 2025.11.22 or after
to run this in vLLM, map torchao quantized checkpoints to vLLM's MoE kernels vllm-project/vllm#28421
is needed (not yet landed).

Test Plan:

with-proxy time python examples/quantize_llama_4.py ~/local/tmp/20251201_test/

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-12-01T21:39:19Z

Stack from ghstack (oldest at bottom):

-> add an example for quantizing LLaMa 4 Scout #3408

pytorch-bot · 2025-12-01T21:39:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3408

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds an e2e example of how to use torchao to quantize LLaMa 4 Scout. Note that this needs: * a recent `transformers` version (higher than 4.57, not officially released yet so user needs to build from source) * a recent `fbgemm_gpu` version nightly from `2025.11.22` or after * to run this in vLLM, vllm-project/vllm#28421 is needed (not yet landed). Test Plan: ```bash with-proxy time python examples/quantize_llama_4.py ~/local/tmp/20251201_test/ ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3c47130 ghstack-comment-id: 3599037297 Pull-Request: #3408

examples/quantize_llama_4.py

[ghstack-poisoned]

Summary: Adds an e2e example of how to use torchao to quantize LLaMa 4 Scout. Note that this needs: * a recent `transformers` version (higher than 4.57, not officially released yet so user needs to build from source) * a recent `fbgemm_gpu` version nightly from `2025.11.22` or after * to run this in vLLM, vllm-project/vllm#28421 is needed (not yet landed). Test Plan: ```bash with-proxy time python examples/quantize_llama_4.py ~/local/tmp/20251201_test/ ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 76125e9 ghstack-comment-id: 3599037297 Pull-Request: #3408

Update

b003614

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 1, 2025

vkuzo added the topic: documentation Use this tag if this PR adds or improves documentation label Dec 1, 2025

jcaip reviewed Dec 1, 2025

View reviewed changes

examples/quantize_llama_4.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Dec 1, 2025

View reviewed changes

Update

88ffc66

[ghstack-poisoned]

vkuzo merged commit 5977905 into main Dec 2, 2025
35 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add an example for quantizing LLaMa 4 Scout #3408

add an example for quantizing LLaMa 4 Scout #3408

vkuzo commented Dec 1, 2025

Uh oh!

vkuzo commented Dec 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add an example for quantizing LLaMa 4 Scout #3408

add an example for quantizing LLaMa 4 Scout #3408

Conversation

vkuzo commented Dec 1, 2025

Uh oh!

vkuzo commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3408

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vkuzo commented Dec 1, 2025 •

edited

Loading

pytorch-bot bot commented Dec 1, 2025 •

edited

Loading