Exploration path for Torch-TensorRT integration with PT2.0 tracer #1586

frank-wei · 2023-01-11T17:18:07Z

frank-wei
Jan 11, 2023
Collaborator

Currently, we have two different paths developed by NV and Meta separately. TS path is C++ based converter and can be used in C++ applications and deployed in C++ too. It is also accessible by Python through pybind.

FX path is purely pythonic which does not require any C++ involvement. This brings convenience to the developers for easy debugging and deployment. In general, they have pros and cons. I listed the comparison here:

	TS	FX
Tracer	Torchscript trace	Fx trace
Imple	C++	Python
Deployment	Python or C++	Python or C++(with EngineHolder)
Dev efficiency	Med	Easy

PT2.0 was announced in December and we are looking forward to their release version in the next month. We are seriously considering the next generation of Torch-TensorRT (version 2.0) to align with the PT2.0 development. Considering the existing tech stack we had done and how we are going to develop the converter with the new tracer, we have to consider a few problems which I will discuss later in the next few sections.

Our current code structure is quite straightforward and stable right now. It can be separated into a few parts:

Front end

Tracer
Splitter

Backend

Converter
Engine Module

One exciting feature of PT2.0 is the new tracer (torchdynamo + AOT autograd) which could capture a more stable graph from a model. It also supports dynamic shape with symbolic shape representation. It is not mature yet but it should meet our basic requirements now. It also comes with functionalization which helps to remove any mutation operation in a model with immutable operations. These merits do not exist in the FX trace.

Migrate to Aten ops

This is straightforward as we only need to change the tracer. PT2.0 tracer is the replacement for our current tracer. It can replace FX tracer or TS tracer. For example, the FX frontend lowering interface has is_aten flag set (the implementation is here).

def compile(   
  module: nn.Module,   
  input,
  min_acc_module_size: int = 10,   
  max_batch_size: int = 2048,   
  max_workspace_size=1 << 25,   
  explicit_batch_dimension=False,   
  lower_precision=LowerPrecision.FP16,   
  verbose_log=False,   
  timing_cache_prefix="",   
  save_timing_cache=False,   
  cuda_graph_batch_size=-1,   
  dynamic_batch=True,   
  is_aten=False,   
  use_experimental_fx_rt=False,) -> nn.Module:

It also can be propagated to the main frontend interface by giving if=”aten”

def compile(
   module: Any,
   ir="aten",
   inputs=[],
   enabled_precisions=set([_enums.dtype.float]),
   **kwargs,
):

Existing passes need to change

For the FX frontend, we have some passes designed for FX traces like lower_basic_pass. They are based on acc_op which needs modification to adapt to aten ops.
TS based trace should not worry too much since it is already based on aten ops.

Aten Ops Based TensorRT Converters in Python

We still want to have/keep the python converter for a few reasons:
In Meta, the fx2trt flow is used in prod and it is based on pure python environment
The team found that pythonic way is good for dev efficiency and can improve their efficiency on debugging

It is beneficial if we can still keep the converter in python. In FX folder, I initiated a new converter file aten_ops_converters.py. As we can see, the effort is not too much since most of the acc_ops have a 1-1 mapping to aten ops. We can implement it in a way that follows this order:

If 1-1 mapping exists. Just remap the implementation to acc_ops_converters
If it is not existed, we can create corresponding implementation in aten_ops_converters.py

A few things to be expected:

We will need to introduce shape layer since symbolic tracer will generated the shape representation using torch.ops.aten.sym_size
For example, chunk is decomposed into

%sym_size : [#users=1] = call_function[target=torch.ops.aten.sym_size](args = (%slice_174, 1), kwargs = {})
%add : [#users=1] = call_function[target=operator.add](args = (%sym_size, 4), kwargs = {})
%sub : [#users=1] = call_function[target=operator.sub](args = (%add, 1), kwargs = {})
%floordiv : [#users=1] = call_function[target=operator.floordiv](args = (%sub, 4), kwargs = {})
%split : [#users=4] = call_function[target=torch.ops.aten.split.Tensor](args = (%slice_174, %floordiv, 1), kwargs = {})

This may introduce overhead to TRT execution which needs evaluation after some pioneer models are supported

Backend options

Some TS customers do not have a python environment and have to implement their applications and deploy their models in C++. Currently, we do not support the integration with PT2.0 since the new tracer does not have C++ interface yet.

Another scenario is that the customer has a python environment but expects deployment in C++ environment. We have two ways to do it.

Method 1: They can choose to use “Converters in Python” as we mentioned above. We can use the FX backend to do the rest of the work like splitting, convert and model wrapping. We have TRTModuleNext to wrap the TRT engine in C++. Or we can use EngineHolder to wrap the model.

Method 2: They can choose to use “Converters in Python” as we mentioned above but use the TS path backend. The benefit is that existing customers do not expect any surprise since the backend still uses the TS one. They can use the same pass optimizations, splitting and deployment process.

gdoc link

frank-wei · 2023-01-11T17:20:49Z

frank-wei
Jan 11, 2023
Collaborator Author

cc @ncomly-nvidia @narendasan @peri044 @yinghai @wushirong

0 replies

narendasan · 2023-01-11T22:53:35Z

narendasan
Jan 11, 2023
Collaborator

cc: @apbose who is looking at porting torch2trt style converters to the FX frontend #1557

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploration path for Torch-TensorRT integration with PT2.0 tracer #1586

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Migrate to Aten ops

Existing passes need to change

Aten Ops Based TensorRT Converters in Python

Backend options

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Exploration path for Torch-TensorRT integration with PT2.0 tracer #1586

frank-wei Jan 11, 2023 Collaborator

Migrate to Aten ops

Existing passes need to change

Aten Ops Based TensorRT Converters in Python

Backend options

Replies: 2 comments

frank-wei Jan 11, 2023 Collaborator Author

narendasan Jan 11, 2023 Collaborator

frank-wei
Jan 11, 2023
Collaborator

frank-wei
Jan 11, 2023
Collaborator Author

narendasan
Jan 11, 2023
Collaborator