NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 167
Star 1.4k

Code
Issues 119
Pull requests 24
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Pull requests: NVIDIA/TensorRT-Model-Optimizer

Labels 8 Milestones 0

New pull request New

24 Open 136 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix: supporting gpt-oss HF eagle

#398 opened Oct 1, 2025 by h-guo18

Loading…

Update ReadMe for torch_quant_to_onnx.py example

#395 opened Sep 30, 2025 by ajrasane

Loading…

EAGLE parallel draft with auto regression; kv cache in EAGLE training

#391 opened Sep 29, 2025 by yeyu-nvidia

Loading…

Remove Qwen tokenizer modification

#390 opened Sep 29, 2025 by cjluo-nv

Loading…

[5545101]: AutoCast: Add options to force include node/op in F16

#386 opened Sep 28, 2025 by galagam

Loading…

Updated Prune-KD NeMo flow

#382 opened Sep 26, 2025 by AAnoosheh

Loading…

Support kv cache quantization for mcore using bmm_quantizers

#375 opened Sep 25, 2025 by kaix-nv

Loading…

[New feature] Support fakequant serve with latest vllm and generalize calib logic

#369 opened Sep 25, 2025 by RalphMao

Loading…

megatron realquant FP8 WIP

#367 opened Sep 24, 2025 by cjluo-nv • Draft

Sync amax & AWQ-Lite act_scale in context parallel/data parallel [OMNIML-2813]

#359 opened Sep 24, 2025 by jenchen13

Loading…

1 task

QLoRA DDP export

#353 opened Sep 22, 2025 by sugunav14

Loading…

4 of 6 tasks

add new trainer

#352 opened Sep 22, 2025 by h-guo18 • Draft

Refactor set_multi_step_attn_mask for arbitrary step

#348 opened Sep 19, 2025 by yeyu-nvidia

Loading…

Support nemotron nano vlm v1 nvfp4 quantize + export

#347 opened Sep 19, 2025 by Edwardf0t1 • Draft

Feat: Hardware-aware autoquant

#343 opened Sep 19, 2025 by h-guo18 • Draft

[1/N] ModelOPT PEFT mode support for the megatron-lm

#342 opened Sep 18, 2025 by jingyu-ml

Loading…

113

support W4afp8 quant in v3.1

#337 opened Sep 18, 2025 by Bruce-x-1997

Loading…

FP8 Block quantize onnx export support

#324 opened Sep 15, 2025 by jingyu-ml

Loading…

add end_process in deepseek ptq

#317 opened Sep 15, 2025 by Bruce-x-1997

Loading…

Update eagle notebook example with sglang

#316 opened Sep 14, 2025 by jamieliNVIDIA

Loading…

[Autocast] Fix edge case casting input directly to output

#305 opened Sep 9, 2025 by aboubezari

Loading…

Fix the bug in realquant

#301 opened Sep 8, 2025 by yeyu-nvidia

Loading…

Fix speculative decoding example stale

Not updated in a long time

#214 opened Jun 13, 2025 by Framartin

Loading…

Bump the pip group across 3 directories with 1 update

#205 opened Jun 5, 2025 by dependabot bot

Loading…

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!