OP Name | Attributes | Inputs | Outputs | FP32 Speed | FP16 Speed | INT8 Speed | Half Type | Tensor Format | Test Device |
---|---|---|---|---|---|---|---|---|---|
GridSampler2DTRT | interpolation_mode: int padding_mode: int align_corners: int |
input: T grid: T |
output: T | x1 | x2.0 | x3.8 | nv_half | kLinear, kCHW4 | RTX 2080Ti |
GridSampler2DTRT2 | interpolation_mode: int padding_mode: int align_corners: int |
input: T grid: T |
output: T | x1 | x3.1 | x3.8 | nv_half2 | kLinear, kCHW2, kCHW4 | RTX 2080Ti |
GridSampler3DTRT | interpolation_mode: int padding_mode: int align_corners: int |
input: T grid: T |
output: T | x1 | x1.3 | - | nv_half | kLinear | RTX 2080Ti |
GridSampler3DTRT2 | interpolation_mode: int padding_mode: int align_corners: int |
input: T grid: T |
output: T | x1 | x2.2 | - | nv_half2 | kLinear | RTX 2080Ti |
-
input: T[float/half/half2/int8]
Tensor shape:
[N, C, H_in, W_in]
(4D case) or[N, C, D_in, H_in, W_in]
(5D case) -
grid: T[float/half/half2/int8]
Tensor shape:
[N, 2, H_out, W_out]
(4D case) or[N, 3, D_out, H_out, W_out]
(5D case)grid
specifies the sampling pixel locations normalized by theinput
spatial dimensions. Therefore, it should have most values in the range of[-10, 10]
. For example, valuesx = -10, y = -10
is the left-top pixel ofinput
, and valuesx = 10, y = 10
is the right-bottom pixel ofinput
.
-
interpolation_mode: int
Interpolation mode to calculate output values. (0:
bilinear
, 1:nearest
, 2:bicubic
)Note:
bicubic
supports only 4-D input. -
padding_mode: int
Padding mode for outside grid values. (0:
zeros
, 1:border
, 2:reflection
) -
align_corners: int
If
align_corners=1
, the extrema (-1
and1
) are considered as referring to the center points of the input's corner pixels. Ifalign_corners=0
, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
-
output: T[float/half/half2/int8]
Tensor shape:
[N, C, H_out, W_out]
(4D case) or[N, C, D_out, H_out, W_out]
(5D case)
OP Name | Attributes | Inputs | Outputs | FP32 Speed | FP16 Speed | INT8/FP16 Speed | Half Type | Tensor Format | Test Device |
---|---|---|---|---|---|---|---|---|---|
MultiScaleDeformableAttnTRT | - | value: T value_spatial_shapes: T sampling_locations: T attention_weights: T |
output: T | x1 | x1.3 | x3.2 | nv_half | kLinear | RTX 2080Ti |
MultiScaleDeformableAttnTRT2 | - | value: T value_spatial_shapes: T value_level_start_index: T sampling_locations: T attention_weights: T |
output: T | x1 | x2.0 | x2.7 | nv_half2 | kLinear | RTX 2080Ti |
-
value: T[float/half/half2/int8]
Tensor shape:
[N, num_keys, mum_heads, channel]
-
value_spatial_shapes: T[int32]
Spatial shape of each feature map, has shape
[num_levels, 2]
, last dimension 2 represent (h, w) -
reference_points: T[float/half2]
The reference points.
Tensor shape:
[N, num_queries, 1, points_per_group * 2]
-
sampling_offsets: T[float/half/half2/int8]
The offset of sampling points.
Tensor shape:
[N, num_queries, num_heads, num_levels * num_points * 2]
-
attention_weights: T[float/half/int8]
The weight of sampling points used when calculate the attention (before softmax), has shape
[N ,num_queries, num_heads, num_levels * num_points]
.
-
-
output: T[float/half/int8]
Tensor shape:
[N, num_queries, mum_heads, channel]
OP Name | Attributes | Inputs | Outputs | FP32 Speed | FP16 Speed | INT8/FP16 Speed | Half Type | Tensor Format | Test Device |
---|---|---|---|---|---|---|---|---|---|
ModulatedDeformableConv2dTRT | stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int |
input: T offset: T mask: T weight: T bias: T (optional) |
output: T | x1 | x2.9 | x3.7 | nv_half | kLinear, kCHW4 | RTX 2080Ti |
ModulatedDeformableConv2dTRT2 | stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int |
input: T offset: T mask: T weight: T bias: T (optional) |
output: T | x1 | x3.5 | x3.7 | nv_half2 | kLinear, kCHW2, kCHW4 | RTX 2080Ti |
-
input: T[float/half/half2/int8]
Tensor shape:
[N, C_in, H_in, W_in]
-
offset: T[float/half/half2/int8]
Tensor shape:
[N, deform_groups*K_h*K_w*2, H_out, W_out]
-
mask: T[float/half/half2/int8]
Tensor shape:
[N, deform_groups*K_h*K_w, H_out, W_out]
-
weight: T[float/half/half2/int8]
Tensor shape:
[C_out, C_in/groups, K_h, K_w]
-
bias: T[float/half/half2] (optional)
Tensor shape:
[C_out]
-
stride: int[2]
Same as torch.nn.Conv2d.
-
padding: int[2]
Same as torch.nn.Conv2d.
-
dilation: int[2]
Same as torch.nn.Conv2d.
-
groups: int
Same as torch.nn.Conv2d.
-
deform_groups: int
Deformable conv2d groups.
-
output: T[float/half/half2/int8]
Tensor shape:
[N, C_out, H_out, W_out]
NOTE: Values (C_in / groups) and (C_in / deform_groups) should be even numbers.
OP Name | Attributes | Inputs | Outputs | FP32 Speed | FP16 Speed | INT8/FP16 Speed | Half Type | Tensor Format | Test Device |
---|---|---|---|---|---|---|---|---|---|
RotateTRT | interpolation: int | img: T angle: T center: T |
output: T | x1 | X1.8 | X4.4 | nv_half | kLinear, kCHW4 | RTX 2080Ti |
RotateTRT2 | interpolation: int | img: T angle: T center: T |
output: T | x1 | x2.2 | x4.4 | nv_half2 | kLinear, kCHW2, kCHW4 | RTX 2080Ti |
-
img: T[float/half/half2/int8]
Tensor shape:
[C, H, W]
-
angle: T[float/half/half2]
Tensor shape:
[1]
-
center: T[float/half/half2]
Tensor shape:
[2]
-
interpolation: int
Interpolation mode to calculate output values. (0:
bilinear
, 1:nearest
)
-
output: T[float/half/half2/int8]
Tensor shape:
[C, H, W]
OP Name | Attributes | Inputs | Outputs | Tensor Format | Test Device |
---|---|---|---|---|---|
InverseTRT | - | input: T[float] | output: T[float] | kLinear | RTX 2080Ti |
-
input: T[float]
Tensor shape:
[B, C, H, W]
-
output: T[float]
Tensor shape:
[B, C, H, W]
OP Name | Attributes | Inputs | Outputs | FP32 Speed | FP16 Speed | INT8 Speed | Half Type | Tensor Format | Test Device |
---|---|---|---|---|---|---|---|---|---|
BEVPoolV2TRT | out_height: int out_width: int |
depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T |
output: T | x1 | X1.1 | X2.1 | nv_half | kLinear | RTX 2080Ti |
BEVPoolV2TRT2 | out_height: int out_width: int |
depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T |
output: T | x1 | x1.4 | X2.1 | nv_half2 | kLinear | RTX 2080Ti |
-
depth: T[float/half/half2/int8]
Tensor shape:
[Cam, D, H, W]
-
feat: T[float/half/half2/int8]
Tensor shape:
[Cam, H, W, C]
-
ranks_depth: T[int32]
-
ranks_feat: T[int32]
-
ranks_bev: T[int32]
-
interval_starts: T[int32]
-
interval_lengths: T[int32]
-
out_height: int
BEV feature height
-
out_width: int
BEV feature width
-
output: T[float/half/half2/int8]
Tensor shape:
[1, out_height, out_width, C]
OP Name | Inputs | Outputs | FP32 Speed NHMA | FP16 Speed NHMA | FP32 Speed FHMA | FP16 Speed FHMA | INT8 Speed FHMA | Half Type | Test Device |
---|---|---|---|---|---|---|---|---|---|
QKVTRT | query: T key: T value: T |
output: T | x1 | X2.0 | x4.6 | x6.1 | x8.2 | nv_half | RTX 2080Ti |
QKVTRT2 | query: T key: T value: T |
output: T | x1 | X2.1 | x4.6 | x6.3 | x8.2 | nv_half2 | RTX 2080Ti |
-
query: T[float/half/half2/int8]
Tensor shape:
[batch, q_len, channel]
-
key: T[float/half/half2/int8]
Tensor shape:
[batch, kv_len, channel]
-
value: T[float/half/half2/int8]
Tensor shape:
[batch, kv_len, channel]
-
-
output: T[float/half/half2/int8]
Tensor shape:
[batch, q_len, channel]
NOTE: If q_len
and kv_len
are both multiples of 64, the plugin will run with Flash Multi-Head Attention (FMHA), else Naive Multi-Head Attention (NMHA).