Replies: 2 comments
-
cc: @apbose who is looking at porting torch2trt style converters to the FX frontend #1557 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, we have two different paths developed by NV and Meta separately. TS path is C++ based converter and can be used in C++ applications and deployed in C++ too. It is also accessible by Python through pybind.
FX path is purely pythonic which does not require any C++ involvement. This brings convenience to the developers for easy debugging and deployment. In general, they have pros and cons. I listed the comparison here:
PT2.0 was announced in December and we are looking forward to their release version in the next month. We are seriously considering the next generation of Torch-TensorRT (version 2.0) to align with the PT2.0 development. Considering the existing tech stack we had done and how we are going to develop the converter with the new tracer, we have to consider a few problems which I will discuss later in the next few sections.
Our current code structure is quite straightforward and stable right now. It can be separated into a few parts:
One exciting feature of PT2.0 is the new tracer (torchdynamo + AOT autograd) which could capture a more stable graph from a model. It also supports dynamic shape with symbolic shape representation. It is not mature yet but it should meet our basic requirements now. It also comes with functionalization which helps to remove any mutation operation in a model with immutable operations. These merits do not exist in the FX trace.
Migrate to Aten ops
This is straightforward as we only need to change the tracer. PT2.0 tracer is the replacement for our current tracer. It can replace FX tracer or TS tracer. For example, the FX frontend lowering interface has is_aten flag set (the implementation is here).
It also can be propagated to the main frontend interface by giving if=”aten”
Existing passes need to change
For the FX frontend, we have some passes designed for FX traces like lower_basic_pass. They are based on acc_op which needs modification to adapt to aten ops.
TS based trace should not worry too much since it is already based on aten ops.
Aten Ops Based TensorRT Converters in Python
We still want to have/keep the python converter for a few reasons:
In Meta, the fx2trt flow is used in prod and it is based on pure python environment
The team found that pythonic way is good for dev efficiency and can improve their efficiency on debugging
It is beneficial if we can still keep the converter in python. In FX folder, I initiated a new converter file aten_ops_converters.py. As we can see, the effort is not too much since most of the acc_ops have a 1-1 mapping to aten ops. We can implement it in a way that follows this order:
A few things to be expected:
For example, chunk is decomposed into
Backend options
Some TS customers do not have a python environment and have to implement their applications and deploy their models in C++. Currently, we do not support the integration with PT2.0 since the new tracer does not have C++ interface yet.
Another scenario is that the customer has a python environment but expects deployment in C++ environment. We have two ways to do it.
Method 1: They can choose to use “Converters in Python” as we mentioned above. We can use the FX backend to do the rest of the work like splitting, convert and model wrapping. We have TRTModuleNext to wrap the TRT engine in C++. Or we can use EngineHolder to wrap the model.
Method 2: They can choose to use “Converters in Python” as we mentioned above but use the TS path backend. The benefit is that existing customers do not expect any surprise since the backend still uses the TS one. They can use the same pass optimizations, splitting and deployment process.
gdoc link
Beta Was this translation helpful? Give feedback.
All reactions