Support for `torch_executed_ops` in FX #1681

gs-olive · 2023-02-17T23:17:42Z

gs-olive
Feb 17, 2023

`torch_executed_ops` in FX

TL;DR

In line with unifying the TorchScript/FX Frontends, the torch_executed_ops field from TorchScript should be available for FX use as well. The existing FX leaf_module_list attribute in the tracer accomplishes a similar result (for modules), however the naming and functionality of the two features should be unified.

Goals + Usecases

Specifying that certain operations/modules should be run in Torch, as opposed to the accelerated TensorRT framework, is a common feature among both the FX and TS paths of Torch-TensorRT. However, the method of invoking this feature is different among TS and FX, and could be unified, in line with the ongoing effort to consolidate the frontend interface (RFC #1372, PR #1404). Specifically, while compiling in the TorchScript/FX paths is as easy as toggling ir="fx" or ir="ts" in torch_tensorrt.compile(...), one cannot do the same for torch_executed_ops.

Enabling dual TS/FX use of torch_executed_ops, alongside other fields currently used exclusively for TorchScript would improve and streamline the existing compilation process.

Proposed APIs / UX

Example Workflow

A user would interact with this feature by using the torch_tensorrt.compile(...) function with the arguments ir="fx" and specifying a list of excluded operations, to be executed in Torch (non-accelerated). For instance:

torch_tensorrt.compile(model, inputs, ir="fx", torch_executed_ops=[torch.nn.ReLU, torch.add, ...])

Currently, users can exclude modules (like torch.nn.ReLU), by setting the leaf_module_list field of the acc_tracer, but operations like torch.add cannot be excluded this way. The snippet below presents a method to compile a model via the FX path using the acc_tracer manually.

TensorRT/examples/fx/fx2trt_example.py

Lines 23 to 55 in deda87b

    
           class Model(nn.Module): 
        
               def __init__(self): 
        
                   super().__init__() 
        
                   self.linear = nn.Linear(10, 10) 
        
                   self.relu = nn.ReLU() 
        
               def forward(self, x): 
        
                   x = self.linear(x) 
        
                   x = self.relu(x) 
        
                   x = torch.linalg.norm(x, ord=2, dim=1) 
        
                   x = self.relu(x) 
        
                   return x 
        
           inputs = [torch.randn((1, 10), device=torch.device("cuda"))] 
        
           model = Model().cuda().eval() 
        
           # acc_tracer is a custom fx tracer that maps nodes whose targets are PyTorch operators 
        
           # to acc ops. 
        
           traced = acc_tracer.trace(model, inputs) 
        
           # Splitter will split the model into several submodules. The name of submodules will 
        
           # be either `run_on_acc_{}` or `run_on_gpu_{}`. Submodules named `run_on_acc_{}` can 
        
           # be fully lowered to TensorRT via fx2trt while submodules named `run_on_gpu_{}` has 
        
           # unsupported ops and can't be lowered by fx2trt. We can still run `run_on_gpu_{}` 
        
           # submodules on Gpu if ops there have cuda implementation, the naming is a bit 
        
           # confusing and we'll improve it. 
        
           splitter = TRTSplitter(traced, inputs) 
        
           # Preview functionality allows us to see what are the supported ops and unsupported 
        
           # ops. We can optionally the dot graph which will color supported ops and unsupported 
        
           # ops differently. 
        
           splitter.node_support_preview(dump_graph=False)

Below is a preview of the leaf_module_list argument in the tracer.

TensorRT/py/torch_tensorrt/fx/tracer/acc_tracer/acc_tracer.py

Lines 274 to 309 in deda87b

    
           # Custom tracer that traces to the functional level and rewrites asserts and 
        
           # exceptions. 
        
           class AccRewritingTracer(Tracer): 
        
               # Add an explicit check for mutable operations, which break symbolic tracing. 
        
               check_mutable_operations = True 
        
               # Disble proxying buffers, which currently breaks some quantization code 
        
               proxy_buffer_attributes = False 
        
               # Note: Treat ConditionalExceptionWrapper as a leaf so that we don't 
        
               # trace into it, because it contains control flow and raises an exception. 
        
               DEFAULT_LEAF_MODULE_LIST = { 
        
                   ConditionalExceptionBoolCondWrapper, 
        
                   ConditionalExceptionWrapper, 
        
                   torch.nn.quantized.Linear, 
        
                   torch.nn.quantized.Conv2d, 
        
                   torch.nn.intrinsic.quantized.ConvReLU2d, 
        
                   jit.ScriptModule, 
        
                   jit.RecursiveScriptModule, 
        
                   torch.nn.modules.activation.MultiheadAttention, 
        
               } 
        
               def is_leaf_module(self, m: nn.Module, mod_qual_name: str) -> bool: 
        
                   return getattr(m, "_base_class_origin", type(m)) in self.leaf_module_list 
        
               def trace( 
        
                   self, 
        
                   root: nn.Module, 
        
                   concrete_args: Optional[Dict[str, Any]] = None, 
        
                   ast_rewriter_allow_list: Optional[Set] = None, 
        
                   leaf_module_list: Optional[Set] = None, 
        
               ) -> Tuple[Graph, nn.Module]: 
        
                   self.leaf_module_list = self.DEFAULT_LEAF_MODULE_LIST 
        
                   if leaf_module_list: 
        
                       self.leaf_module_list.update(leaf_module_list) 
        
                   rewritten = _rewrite(root, ast_rewriter_allow_list, self.leaf_module_list) 
        
                   return super().trace(rewritten, concrete_args), rewritten

Finally, we have the exclude_support_node_name argument of the TRTSplitterSetting:

TensorRT/py/torch_tensorrt/fx/tools/trt_splitter.py

Lines 44 to 58 in a343650

    
           class TRTSplitterSetting(splitter_base._SplitterSettingBase): 
        
               def __init__(self): 
        
                   super().__init__() 
        
                   # Determines what batch mode we'll use for lowering. 
        
                   # During split, we'll split out the operators that 
        
                   # don't support the batch dim. 
        
                   self.use_implicit_batch_dim: bool = True 
        
                   self.exclude_support_node_name: set = set() 
        
                   self.use_experimental_rt: bool = False 
        
                   if self.use_experimental_rt and self.use_implicit_batch_dim: 
        
                       raise ValueError( 
        
                           "The experimental unifed runtime only supports explicit batch. Please make sure to set use_implicit_batch_dim=False when use_experimental_rt=True" 
        
                       )

Internal Implementation

Design

The design of this feature would begin with TS/FX unification of the torch_executed_ops argument. Specifically, this argument should be capable of taking two different types of inputs:

With TorchScript (ir="ts") [Already Supported]
- A list of strings encoding aten operators (i.e. torch_executed_ops=["aten::where"])
With FX (ir="fx") [To Add]
- A list of torch operators or modules (i.e. torch_executed_ops=[torch.nn.ReLU, torch.add])
- Operators which could reasonably be considered modules include some of torch.nn, such as torch.nn.Softmax, which has corresponding aten operator aten::softmax.

Then, for the FX path, the next step would be to add functionality for these operators to be excluded during the tracing/splitting. Specifically, this would include marking certain operations, like torch.add to register as unsupported, as though their accelerated counterpart was unimplemented. This would likely involve adding modules to leaf_module_list and operators to exclude_support_node_name, and writing functionality to distinguish which modules/operators should go where.

Extensions Required to Core API implementations

The existing core Python library would need changes to the compile function to support passing arguments from torch_executed_ops to the FX compiler and handling the parsing and proper assignment of those operators to FX.

Details specific for TorchScript Support

Implementation exists.

Details specific for FX support

A key challenge to note here is the overload of the terms "module" and "operation". Certain modules, such as torch.nn.Conv2d can map to a single aten::convolution operator, while the operator torch.add maps to aten::add. In this sense, aten and the TorchScript path have a clearer notion of a single operation (for example, excluding convolutions and adds), whereas in the FX path, convolution might be considered a "module" while add would be an "operation". Thus, to disable torch.add, one would need to employ a different method than adding torch.add to the leaf_module_list, since torch.add is not considered a module. exclude_support_node_name, as discussed above, is a feasible option for excluding individual operators.

Note: Ensure consideration/differentiation of ops in TorchScript (strings like "aten::add"), versus Torch objects as needed in FX. The torch_executed_ops field could take a mix of these types.

Implementation Phases

Prototype - Extra Small / Small

Link the torch_executed_modules argument to the leaf_module_list argument, and allow pass-through of operators between the torch_tensorrt.compile(ir="fx") function invocation, and the acc_tracer invocation.
- Ensure the torch_executed_modules list provided by the user is valid for FX, if ir="fx" is specified.
- Enable the feature for torch modules (only) - no torch simple operations

MVP `1.4.0` - Small / Medium

Enable torch_executed_ops for simple operations marked to be executed in Torch, in addition to modules

narendasan · 2023-03-01T19:36:10Z

narendasan
Mar 1, 2023
Collaborator

Note to add to docs that in FX you need classes but in PyTorch you need strings that point to node kinds.

0 replies

narendasan · 2023-03-01T19:40:11Z

narendasan
Mar 1, 2023
Collaborator

There are two APIs: torch_executed_ops and torch_executed_modules. They seem to map to the internal leaf_module_list and exclude_support_node_name. But there might be crossover. We should expand this RFC scope to include both APIs so we can think about the distinction between the two. (We could also deprecate one for FX).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for `torch_executed_ops` in FX #1681

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support for torch_executed_ops in FX #1681

Uh oh!

Uh oh!

gs-olive Feb 17, 2023

torch_executed_ops in FX

TL;DR

Goals + Usecases

Proposed APIs / UX

Example Workflow

Internal Implementation

Design

Extensions Required to Core API implementations

Details specific for TorchScript Support

Details specific for FX support

Implementation Phases

Prototype - Extra Small / Small

MVP 1.4.0 - Small / Medium

Replies: 2 comments

Uh oh!

narendasan Mar 1, 2023 Collaborator

Uh oh!

narendasan Mar 1, 2023 Collaborator

Support for `torch_executed_ops` in FX #1681

gs-olive
Feb 17, 2023

`torch_executed_ops` in FX

MVP `1.4.0` - Small / Medium

narendasan
Mar 1, 2023
Collaborator

narendasan
Mar 1, 2023
Collaborator