Name		Name	Last commit message	Last commit date
parent directory ..
build		build
common		common
lib		lib
plugin		plugin
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
cudaComputeVersion.cu		cudaComputeVersion.cu
install.sh		install.sh

README.md

TensorRT Plugins

Plugins

Grid Sampler

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8 Speed	Half Type	Tensor Format	Test Device
GridSampler2DTRT	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x2.0	x3.8	nv_half	kLinear, kCHW4	RTX 2080Ti
GridSampler2DTRT2	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x3.1	x3.8	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti
GridSampler3DTRT	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x1.3	-	nv_half	kLinear	RTX 2080Ti
GridSampler3DTRT2	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x2.2	-	nv_half2	kLinear	RTX 2080Ti

Inputs

input: T[float/half/half2/int8]

Tensor shape: [N, C, H_in, W_in] (4D case) or [N, C, D_in, H_in, W_in] (5D case)
grid: T[float/half/half2/int8]

Tensor shape: [N, 2, H_out, W_out] (4D case) or [N, 3, D_out, H_out, W_out] (5D case)

grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-10, 10]. For example, values x = -10, y = -10 is the left-top pixel of input, and values x = 10, y = 10 is the right-bottom pixel of input.

Attributes

interpolation_mode: int

Interpolation mode to calculate output values. (0: bilinear , 1: nearest, 2: bicubic)

Note: bicubic supports only 4-D input.
padding_mode: int

Padding mode for outside grid values. (0: zeros, 1: border, 2: reflection)
align_corners: int

If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

Outputs

output: T[float/half/half2/int8]

Tensor shape: [N, C, H_out, W_out] (4D case) or [N, C, D_out, H_out, W_out] (5D case)

Multi-scale Deformable Attention

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
MultiScaleDeformableAttnTRT	-	value: T value_spatial_shapes: T sampling_locations: T attention_weights: T	output: T	x1	x1.3	x3.2	nv_half	kLinear	RTX 2080Ti
MultiScaleDeformableAttnTRT2	-	value: T value_spatial_shapes: T value_level_start_index: T sampling_locations: T attention_weights: T	output: T	x1	x2.0	x2.7	nv_half2	kLinear	RTX 2080Ti

Inputs

value: T[float/half/half2/int8]

Tensor shape: [N, num_keys, mum_heads, channel]
value_spatial_shapes: T[int32]

Spatial shape of each feature map, has shape [num_levels, 2], last dimension 2 represent (h, w)
reference_points: T[float/half2]

The reference points.

Tensor shape: [N, num_queries, 1, points_per_group * 2]
sampling_offsets: T[float/half/half2/int8]

The offset of sampling points.

Tensor shape: [N, num_queries, num_heads, num_levels * num_points * 2]
attention_weights: T[float/half/int8]

The weight of sampling points used when calculate the attention (before softmax), has shape [N ,num_queries, num_heads, num_levels * num_points].

Attributes

-

Outputs

output: T[float/half/int8]

Tensor shape: [N, num_queries, mum_heads, channel]

Modulated Deformable Conv2d

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
ModulatedDeformableConv2dTRT	stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int	input: T offset: T mask: T weight: T bias: T (optional)	output: T	x1	x2.9	x3.7	nv_half	kLinear, kCHW4	RTX 2080Ti
ModulatedDeformableConv2dTRT2	stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int	input: T offset: T mask: T weight: T bias: T (optional)	output: T	x1	x3.5	x3.7	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti

Inputs

input: T[float/half/half2/int8]

Tensor shape: [N, C_in, H_in, W_in]
offset: T[float/half/half2/int8]

Tensor shape: [N, deform_groups*K_h*K_w*2, H_out, W_out]
mask: T[float/half/half2/int8]

Tensor shape: [N, deform_groups*K_h*K_w, H_out, W_out]
weight: T[float/half/half2/int8]

Tensor shape: [C_out, C_in/groups, K_h, K_w]
bias: T[float/half/half2] (optional)

Tensor shape: [C_out]

Attributes

stride: int[2]

Same as torch.nn.Conv2d.
padding: int[2]

Same as torch.nn.Conv2d.
dilation: int[2]

Same as torch.nn.Conv2d.
groups: int

Same as torch.nn.Conv2d.
deform_groups: int

Deformable conv2d groups.

Outputs

output: T[float/half/half2/int8]

Tensor shape: [N, C_out, H_out, W_out]

NOTE: Values (C_in / groups) and (C_in / deform_groups) should be even numbers.

Rotate

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
RotateTRT	interpolation: int	img: T angle: T center: T	output: T	x1	X1.8	X4.4	nv_half	kLinear, kCHW4	RTX 2080Ti
RotateTRT2	interpolation: int	img: T angle: T center: T	output: T	x1	x2.2	x4.4	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti

Inputs

img: T[float/half/half2/int8]

Tensor shape: [C, H, W]
angle: T[float/half/half2]

Tensor shape: [1]
center: T[float/half/half2]

Tensor shape: [2]

Attributes

interpolation: int

Interpolation mode to calculate output values. (0: bilinear , 1: nearest)

Outputs

output: T[float/half/half2/int8]

Tensor shape: [C, H, W]

Inverse

OP Name	Attributes	Inputs	Outputs	Tensor Format	Test Device
InverseTRT	-	input: T[float]	output: T[float]	kLinear	RTX 2080Ti

Inputs

input: T[float]

Tensor shape: [B, C, H, W]

Outputs

output: T[float]

Tensor shape: [B, C, H, W]

BEV Pool

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8 Speed	Half Type	Tensor Format	Test Device
BEVPoolV2TRT	out_height: int out_width: int	depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T	output: T	x1	X1.1	X2.1	nv_half	kLinear	RTX 2080Ti
BEVPoolV2TRT2	out_height: int out_width: int	depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T	output: T	x1	x1.4	X2.1	nv_half2	kLinear	RTX 2080Ti

Inputs

depth: T[float/half/half2/int8]

Tensor shape: [Cam, D, H, W]
feat: T[float/half/half2/int8]

Tensor shape: [Cam, H, W, C]
ranks_depth: T[int32]
ranks_feat: T[int32]
ranks_bev: T[int32]
interval_starts: T[int32]
interval_lengths: T[int32]

Attributes

out_height: int

BEV feature height
out_width: int

BEV feature width

Outputs

output: T[float/half/half2/int8]

Tensor shape: [1, out_height, out_width, C]

Multi-Head Attention

OP Name	Inputs	Outputs	FP32 Speed NHMA	FP16 Speed NHMA	FP32 Speed FHMA	FP16 Speed FHMA	INT8 Speed FHMA	Half Type	Test Device
QKVTRT	query: T key: T value: T	output: T	x1	X2.0	x4.6	x6.1	x8.2	nv_half	RTX 2080Ti
QKVTRT2	query: T key: T value: T	output: T	x1	X2.1	x4.6	x6.3	x8.2	nv_half2	RTX 2080Ti

Inputs

query: T[float/half/half2/int8]

Tensor shape: [batch, q_len, channel]
key: T[float/half/half2/int8]

Tensor shape: [batch, kv_len, channel]
value: T[float/half/half2/int8]

Tensor shape: [batch, kv_len, channel]

Attributes

-

Outputs

output: T[float/half/half2/int8]

Tensor shape: [batch, q_len, channel]

NOTE: If q_len and kv_len are both multiples of 64, the plugin will run with Flash Multi-Head Attention (FMHA), else Naive Multi-Head Attention (NMHA).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT

TensorRT

README.md

TensorRT Plugins

Plugins

Grid Sampler

Inputs

Attributes

Outputs

Multi-scale Deformable Attention

Inputs

Attributes

Outputs

Modulated Deformable Conv2d

Inputs

Attributes

Outputs

Rotate

Inputs

Attributes

Outputs

Inverse

Inputs

Outputs

BEV Pool

Inputs

Attributes

Outputs

Multi-Head Attention

Inputs

Attributes

Outputs

Files

TensorRT

Directory actions

More options

Directory actions

More options

Latest commit

History

TensorRT

Folders and files

parent directory

README.md

TensorRT Plugins

Plugins

Grid Sampler

Inputs

Attributes

Outputs

Multi-scale Deformable Attention

Inputs

Attributes

Outputs

Modulated Deformable Conv2d

Inputs

Attributes

Outputs

Rotate

Inputs

Attributes

Outputs

Inverse

Inputs

Outputs

BEV Pool

Inputs

Attributes

Outputs

Multi-Head Attention

Inputs

Attributes

Outputs