Skip to content

Latest commit

 

History

History

TensorRT

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

TensorRT Plugins

Plugins

Grid Sampler

OP Name Attributes Inputs Outputs FP32 Speed FP16 Speed INT8 Speed Half Type Tensor Format Test Device
GridSampler2DTRT interpolation_mode: int
padding_mode: int
align_corners: int
input: T
grid: T
output: T x1 x2.0 x3.8 nv_half kLinear, kCHW4 RTX 2080Ti
GridSampler2DTRT2 interpolation_mode: int
padding_mode: int
align_corners: int
input: T
grid: T
output: T x1 x3.1 x3.8 nv_half2 kLinear, kCHW2, kCHW4 RTX 2080Ti
GridSampler3DTRT interpolation_mode: int
padding_mode: int
align_corners: int
input: T
grid: T
output: T x1 x1.3 - nv_half kLinear RTX 2080Ti
GridSampler3DTRT2 interpolation_mode: int
padding_mode: int
align_corners: int
input: T
grid: T
output: T x1 x2.2 - nv_half2 kLinear RTX 2080Ti

Inputs

  • input: T[float/half/half2/int8]

    Tensor shape: [N, C, H_in, W_in] (4D case) or [N, C, D_in, H_in, W_in] (5D case)

  • grid: T[float/half/half2/int8]

    Tensor shape: [N, 2, H_out, W_out] (4D case) or [N, 3, D_out, H_out, W_out] (5D case)

    grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-10, 10]. For example, values x = -10, y = -10 is the left-top pixel of input, and values x = 10, y = 10 is the right-bottom pixel of input.

Attributes

  • interpolation_mode: int

    Interpolation mode to calculate output values. (0: bilinear , 1: nearest, 2: bicubic)

    Note: bicubic supports only 4-D input.

  • padding_mode: int

    Padding mode for outside grid values. (0: zeros, 1: border, 2: reflection)

  • align_corners: int

    If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

Outputs

  • output: T[float/half/half2/int8]

    Tensor shape: [N, C, H_out, W_out] (4D case) or [N, C, D_out, H_out, W_out] (5D case)

Multi-scale Deformable Attention

OP Name Attributes Inputs Outputs FP32 Speed FP16 Speed INT8/FP16 Speed Half Type Tensor Format Test Device
MultiScaleDeformableAttnTRT - value: T
value_spatial_shapes: T
sampling_locations: T
attention_weights: T
output: T x1 x1.3 x3.2 nv_half kLinear RTX 2080Ti
MultiScaleDeformableAttnTRT2 - value: T
value_spatial_shapes: T
value_level_start_index: T
sampling_locations: T
attention_weights: T
output: T x1 x2.0 x2.7 nv_half2 kLinear RTX 2080Ti

Inputs

  • value: T[float/half/half2/int8]

    Tensor shape: [N, num_keys, mum_heads, channel]

  • value_spatial_shapes: T[int32]

    Spatial shape of each feature map, has shape [num_levels, 2], last dimension 2 represent (h, w)

  • reference_points: T[float/half2]

    The reference points.

    Tensor shape: [N, num_queries, 1, points_per_group * 2]

  • sampling_offsets: T[float/half/half2/int8]

    The offset of sampling points.

    Tensor shape: [N, num_queries, num_heads, num_levels * num_points * 2]

  • attention_weights: T[float/half/int8]

    The weight of sampling points used when calculate the attention (before softmax), has shape [N ,num_queries, num_heads, num_levels * num_points].

Attributes

​ -

Outputs

  • output: T[float/half/int8]

    Tensor shape: [N, num_queries, mum_heads, channel]

Modulated Deformable Conv2d

OP Name Attributes Inputs Outputs FP32 Speed FP16 Speed INT8/FP16 Speed Half Type Tensor Format Test Device
ModulatedDeformableConv2dTRT stride: int[2]
padding: int[2]
dilation: int[2]
groups: int
deform_groups: int
input: T
offset: T
mask: T
weight: T
bias: T (optional)
output: T x1 x2.9 x3.7 nv_half kLinear, kCHW4 RTX 2080Ti
ModulatedDeformableConv2dTRT2 stride: int[2]
padding: int[2]
dilation: int[2]
groups: int
deform_groups: int
input: T
offset: T
mask: T
weight: T
bias: T (optional)
output: T x1 x3.5 x3.7 nv_half2 kLinear, kCHW2, kCHW4 RTX 2080Ti

Inputs

  • input: T[float/half/half2/int8]

    Tensor shape: [N, C_in, H_in, W_in]

  • offset: T[float/half/half2/int8]

    Tensor shape: [N, deform_groups*K_h*K_w*2, H_out, W_out]

  • mask: T[float/half/half2/int8]

    Tensor shape: [N, deform_groups*K_h*K_w, H_out, W_out]

  • weight: T[float/half/half2/int8]

    Tensor shape: [C_out, C_in/groups, K_h, K_w]

  • bias: T[float/half/half2] (optional)

    Tensor shape: [C_out]

Attributes

  • stride: int[2]

    Same as torch.nn.Conv2d.

  • padding: int[2]

    Same as torch.nn.Conv2d.

  • dilation: int[2]

    Same as torch.nn.Conv2d.

  • groups: int

    Same as torch.nn.Conv2d.

  • deform_groups: int

    Deformable conv2d groups.

Outputs

  • output: T[float/half/half2/int8]

    Tensor shape: [N, C_out, H_out, W_out]

NOTE: Values (C_in / groups) and (C_in / deform_groups) should be even numbers.

Rotate

OP Name Attributes Inputs Outputs FP32 Speed FP16 Speed INT8/FP16 Speed Half Type Tensor Format Test Device
RotateTRT interpolation: int img: T
angle: T
center: T
output: T x1 X1.8 X4.4 nv_half kLinear, kCHW4 RTX 2080Ti
RotateTRT2 interpolation: int img: T
angle: T
center: T
output: T x1 x2.2 x4.4 nv_half2 kLinear, kCHW2, kCHW4 RTX 2080Ti

Inputs

  • img: T[float/half/half2/int8]

    Tensor shape: [C, H, W]

  • angle: T[float/half/half2]

    Tensor shape: [1]

  • center: T[float/half/half2]

    Tensor shape: [2]

Attributes

  • interpolation: int

    Interpolation mode to calculate output values. (0: bilinear , 1: nearest)

Outputs

  • output: T[float/half/half2/int8]

    Tensor shape: [C, H, W]

Inverse

OP Name Attributes Inputs Outputs Tensor Format Test Device
InverseTRT - input: T[float] output: T[float] kLinear RTX 2080Ti

Inputs

  • input: T[float]

    Tensor shape: [B, C, H, W]

Outputs

  • output: T[float]

    Tensor shape: [B, C, H, W]

BEV Pool

OP Name Attributes Inputs Outputs FP32 Speed FP16 Speed INT8 Speed Half Type Tensor Format Test Device
BEVPoolV2TRT out_height: int
out_width: int
depth: T
feat: T
ranks_depth: T
ranks_feat: T
ranks_bev: T
interval_starts: T
interval_lengths: T
output: T x1 X1.1 X2.1 nv_half kLinear RTX 2080Ti
BEVPoolV2TRT2 out_height: int
out_width: int
depth: T
feat: T
ranks_depth: T
ranks_feat: T
ranks_bev: T
interval_starts: T
interval_lengths: T
output: T x1 x1.4 X2.1 nv_half2 kLinear RTX 2080Ti

Inputs

  • depth: T[float/half/half2/int8]

    Tensor shape: [Cam, D, H, W]

  • feat: T[float/half/half2/int8]

    Tensor shape: [Cam, H, W, C]

  • ranks_depth: T[int32]

  • ranks_feat: T[int32]

  • ranks_bev: T[int32]

  • interval_starts: T[int32]

  • interval_lengths: T[int32]

Attributes

  • out_height: int

    BEV feature height

  • out_width: int

    BEV feature width

Outputs

  • output: T[float/half/half2/int8]

    Tensor shape: [1, out_height, out_width, C]

Multi-Head Attention

OP Name Inputs Outputs FP32 Speed NHMA FP16 Speed NHMA FP32 Speed FHMA FP16 Speed FHMA INT8 Speed FHMA Half Type Test Device
QKVTRT query: T
key: T
value: T
output: T x1 X2.0 x4.6 x6.1 x8.2 nv_half RTX 2080Ti
QKVTRT2 query: T
key: T
value: T
output: T x1 X2.1 x4.6 x6.3 x8.2 nv_half2 RTX 2080Ti

Inputs

  • query: T[float/half/half2/int8]

    Tensor shape: [batch, q_len, channel]

  • key: T[float/half/half2/int8]

    Tensor shape: [batch, kv_len, channel]

  • value: T[float/half/half2/int8]

    Tensor shape: [batch, kv_len, channel]

Attributes

​ -

Outputs

  • output: T[float/half/half2/int8]

    Tensor shape: [batch, q_len, channel]

NOTE: If q_len and kv_len are both multiples of 64, the plugin will run with Flash Multi-Head Attention (FMHA), else Naive Multi-Head Attention (NMHA).