Release Torch-TensorRT v2.4.0 · pytorch/TensorRT

C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12

Torch-TensorRT 2.4.0 targets PyTorch 2.4, CUDA 12.4 (builds for CUDA 11.8/12.1 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118 https://download.pytorch.org/whl/cu121) and TensorRT 10.1.
This version introduces official support for the C++ runtime on the Windows platform, though it is limited to the dynamo frontend, supporting both AOT and JIT workflows. Users can now utilize both Python and C++ runtimes on Windows. Additionally, this release expands support to include all Aten Core Operators, except torch.nonzero, and significantly increases dynamic shape support across more converters. Python 3.12 is supported for the first time in this release.

Full Windows Support

In this release we introduce both C++ and Python runtime support in Windows. Users can now directly optimize PyTorch models with TensorRT on Windows, with no code changes. C++ runtime is the default option and users can enable Python runtime by specifying use_python_runtime=True

import torch
import torch_tensorrt
import torchvision.models as models

model = models.resnet18(pretrained=True).eval().to("cuda")
input = torch.randn((1, 3, 224, 224)).to("cuda")
trt_mod = torch_tensorrt.compile(model, ir="dynamo", inputs=[input])
trt_mod(input)

Enhanced Op support in Converters

Support for Converters is near 100% of core ATen. At this point fall back to PyTorch execution is either due to specific limitations of converters or some combination of user compiler settings (e.g. torch_executed_ops, dynamic shape). This release also expands the number of operators that support dynamic shape. dryrun will provide specific information on your model + settings support.

What's Changed

fix: FakeTensors appearing in get_attr calls by @gs-olive in #2669
feat: support adaptive_avg_pool1d dynamo converter by @zewenli98 in #2614
fix: Add cmake missing source file ref for core_lowering.passes by @Arktische in #2672
ci: Torch nightly version upgrade to 2.4.0 by @gs-olive in #2704
Add support for aten.pixel_unshuffle dynamo converter by @HolyWu in #2696
feat: support aten.atan2 converter by @chohk88 in #2689
feat: support aten.index_select converter by @chohk88 in #2710
feat: support aten.isnan converter by @chohk88 in #2711
feat: support adaptive avg pool 2d and 3d dynamo converters by @zewenli98 in #2632
feat: support aten.expm1 converter by @chohk88 in #2714
fix: Add dependencies to Docker container for apt versioning TRT by @gs-olive in #2746
fix: Missing parameters in compiler settings by @gs-olive in #2749
fix: param bug in test_binary_ops_aten by @zewenli98 in #2733
aten::empty_like by @apbose in #2654
empty_permute decomposition by @apbose in #2698
Removing grid lowering by @apbose in #2686
Selectively enable different frontends by @narendasan in #2693
chore(deps): bump transformers from 4.33.2 to 4.36.0 in /tools/perf by @dependabot in #2555
Fix upsample converter not properly registered by @HolyWu in #2683
feat: TS Add converter support for aten::grid_sampler by @mfeliz-cruise in #2717
fix: Bump torchvision version by @gs-olive in #2770
fix: convert_module_to_trt_engine by @zewenli98 in #2728
chore: cherry pick of save API by @peri044 in #2719
chore: Upgrade TensorRT version to TRT 10 EA (#2699) by @peri044 in #2774
Fix minor grammatical corrections by @aakashapoorv in #2779
feat: cherry-pick of Implement symbolic shape propagation, sym_size converter by @peri044 in #2751
feat: cherry-pick of torch.compile dynamic shapes by @peri044 in #2750
chore: bump deps for default workspace file by @narendasan in #2786
fix: Point infra branch to main by @gs-olive in #2785
"empty_like" decomposition test correction by @apbose in #2784
chore: Bump versions by @narendasan in #2787
fix: refactor layer norm converter with INormalization Layer by @zewenli98 in #2755
TRT-10 GA Support for main branch by @zewenli98 in #2781
chore(//tests): Update tests to use assertEqual by @narendasan in #2800
feat: Add support for is_causal argument in attention by @gs-olive in #2780
feat: Adding support for native int64 by @narendasan in #2789
chore: small mypy issue by @narendasan in #2803
Rand converter - evaluator by @apbose in #2580
cherry-pick: Python Runtime Windows Builds on TRT 10 (#2764) by @gs-olive in #2776
feat: support 1d ITensor offsets for embedding_bag converter by @zewenli98 in #2677
chore(deps): bump transformers from 4.36.0 to 4.38.0 in /tools/perf by @dependabot in #2766
fix: a bug in func run_test_compare_tensor_attributes_only by @zewenli98 in #2809
Fix ModuleNotFoundError in ptq by @HolyWu in #2814
docs: Example on how to use custom kernels in Torch-TensorRT by @narendasan in #2812
typo fix in doc on saving models by @laikhtewari in #2818
chore: Remove CUDNN dependencies by @zewenli98 in #2804
fix: bug in elementwise base for static inputs by @zewenli98 in #2819
Use environment for docgen by @atalman in #2826
tool: Opset coverage notebook by @narendasan in #2831
ci: Add release flag for nightly build tag by @gs-olive in #2821
[doc] Update options documentation for torch.compile by @lanluo-nvidia in #2834
feat(//py/torch_tensorrt/dynamo): Support for BF16 by @narendasan in #2833
feat: data parallel inference examples by @bowang007 in #2805
fix: bugs in TRT 10 upgrade by @zewenli98 in #2832
feat: support aten._cdist_forward converter by @chohk88 in #2726
chore: cherry pick of #2805 by @bowang007 in #2851
feat: Add support for multi-device safe mode in C++ by @gs-olive in #2824
feat: support aten.log1p converter by @chohk88 in #2823
feat: support aten.as_strided converter by @chohk88 in #2735
fix: Fix deconv kernel channel num_output_maps where wts are ITensor by @andi4191 in #2678
Aten scatter converter by @apbose in #2664
fix user_guide and tutorial docs by @yoosful in #2854
chore: Make from and to methods use the same TRT API by @narendasan in #2858
add aten.topk implementation by @lanluo-nvidia in #2841
feat: support aten.atan2.out converter by @chohk88 in #2829
chore: update docker, refactor CI TRT dep to main by @peri044 in #2793
feat: Cherry pick of Add validators for dynamic shapes in converter registration by @peri044 in #2849
feat: support aten.diagonal converter by @chohk88 in #2856
Remove ops from decompositions where converters exist by @HolyWu in #2681
slice_scatter decomposition by @apbose in #2519
select_scatter decomp by @apbose in #2515
manylinux wheel file build update for TensorRT-10.0.1 by @lanluo-nvidia in #2868
replace itemset due to numpy version 2.0 removed itemset api by @lanluo-nvidia in #2879
chore: cherry-pick of DS feature by @peri044 in #2857
feat: TS Add converter support for aten::flip by @mfeliz-cruise in #2722
ptq test error correction by @apbose in #2860
feat: Add dynamic shape support for sub by @keehyuna in #2888
feat: dynamic shapes support for sqrt and copy by @chohk88 in #2889
add dynamic shape support for aten.ops.gt and aten.ops.ge by @lanluo-nvidia in #2883
chore: cherry-pick FP8 by @peri044 in #2892
add dynamic shape support for sin/cos/cat by @lanluo-nvidia in #2887
Cancel in-progress ci build when a new commit is pushed by @lanluo-nvidia in #2903
readme by @laikhtewari in #2864
Only trigger doc gen if it is not a pytorchbot commit by @lanluo-nvidia in #2909
fix: Handle dynamic shapes in where ops by @keehyuna in #2853
chore: Dynamic support for split (#2871) into main by @peri044 in #2914
feat: C++ runtime on Windows by @HolyWu in #2806
chore: cherry pick of #2709 by @peri044 in #2850
Add dynamic shape support for layer_norm/native_group_norm/group_norm by @lanluo-nvidia in #2908
feat: dynamic shapes support for neg ops by @keehyuna in #2878
empty_stride decomposition by @apbose in #2859
empty_memory_format evaluator by @apbose in #2745
gather converter by @apbose in #2905
feat: Win/Linux Dual Compatible WORKSPACE + Upgrade CUDA + Upgrade PyT by @gs-olive in #2907
chore: add dynamic shapes section in the resnet tutorial by @peri044 in #2904
fix: Remove build artifact by @gs-olive in #2924
feat: Use a global timing cache and add a save option by @peri044 in #2898
chore: fix ValueRanges computation in symbolic nodes by @peri044 in #2918
scatter CI failures by @apbose in #2925
chore: Update layer_norm converter to use INormalizationLayer by @mfeliz-cruise in #2509
Add dynamic shape support for leaky_relu/elu/hard_sigmoid/softplus by @lanluo-nvidia in #2927
feat: Improve logging throughout the Dynamo path by @gs-olive in #2405
fix unsqueeze cannot work on more than 1 dynamic_shape dimensions by @lanluo-nvidia in #2933
feat: support native_dropout dynamo converter by @zewenli98 in #2931
feat: support aten index_put converter for accumulate=False by @chohk88 in #2880
feat: support aten.resize_ converter by @chohk88 in #2874
fix the docker build failure on main by @lanluo-nvidia in #2942
feat: Add Branches to Docker Build File by @gs-olive in #2935
add dynamic shape support for amax/amin/max/min/prod/sum by @lanluo-nvidia in #2943
fix: bug in vgg16_fp8_ptq example by @zewenli98 in #2950
Fixed layernorm when weight and bias is None in Stable Diffusion 3 by @cehongwang in #2936
chore: dynamic shape support for rsqrt/erf ops by @keehyuna in #2929
feat: dynamic shape support for tan, sinh, cosh, asin and acos by @chohk88 in #2941
fix: Repair integer inputs in dynamic shape cases by @gs-olive in #2876
Update PYTORCH to 2.4 by @lanluo-nvidia in #2953
Automate release artifacts build: usage pytorch cxx11 builder base image by @lanluo-nvidia in #2988
chore: cherrypick of #2855 by @zewenli98 in #3027
cherry pick 2740 to release2.4 branch. by @lanluo-nvidia in #3033
cherry pick from 3008 to release/2.4 by @lanluo-nvidia in #3035
assertEquals is deprecated in TestCase in Python 3.12 by @lanluo-nvidia in #3038
fix the artifacts name issue by @lanluo-nvidia in #3041

New Contributors

@Arktische made their first contribution in #2672
@aakashapoorv made their first contribution in #2779
@atalman made their first contribution in #2826
@yoosful made their first contribution in #2854

Full Changelog: v2.3.0...v2.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch-TensorRT v2.4.0

C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12

Full Windows Support

Enhanced Op support in Converters

What's Changed

New Contributors

Contributors