Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create MLIR functions for ONNX operators that are functions #3409

Merged
merged 1 commit into from
Jun 14, 2024

Conversation

andfau-amd
Copy link
Contributor

@andfau-amd andfau-amd commented May 31, 2024

Resolves #3384.

Please read the commit message for more information!

I'd really appreciate any feedback! I am completely new to this. :)

@andfau-amd andfau-amd marked this pull request as draft May 31, 2024 18:28
@andfau-amd andfau-amd force-pushed the onnx-to-torch-function-expansion branch 2 times, most recently from 50060d7 to eefc1d4 Compare June 4, 2024 15:39
@andfau-amd andfau-amd changed the title WIP: Create MLIR functions for ONNX operators that are functions Create MLIR functions for ONNX operators that are functions Jun 4, 2024
@andfau-amd andfau-amd marked this pull request as ready for review June 4, 2024 16:35
@andfau-amd
Copy link
Contributor Author

Marking this as ready for review now. check-torch-mlir, check-torch-mlir-python, and the CI (including e2e tests) work now. I'm reasonably confident in the quality of the code, and I would appreciate some feedback.

However, I probably need to add some kind of regression testing for this before it can be merged. I'm also going to run some external testing.

This might be a disruptive change, as discussed in the commit message. A different denylisting or allowlisting strategy might be worth discussing?

Copy link
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review. I feel like something complicated like this should also have a lit test that verifies at least a simple structural case case more explicitly.

See for example: https://github.com/llvm/torch-mlir/blob/main/test/python/onnx_importer/import_onnx_tool.runlit#L3

This can be annoyingly hard in ONNX, but if you can identify at least one simple example of function outlining like this, then it would be good to back it up with a lit test like that as it makes it obvious what the corresponding IR structure imports as. Will also make it easier to port to the C version.

@andfau-amd
Copy link
Contributor Author

I feel like something complicated like this should also have a lit test that verifies at least a simple structural case case more explicitly.

See for example: https://github.com/llvm/torch-mlir/blob/main/test/python/onnx_importer/import_onnx_tool.runlit#L3

This can be annoyingly hard in ONNX, but if you can identify at least one simple example of function outlining like this, then it would be good to back it up with a lit test like that as it makes it obvious what the corresponding IR structure imports as.

Oh, somehow I hadn't realised I could use LIT tests here. I can definitely add a few of those!

Will also make it easier to port to the C version.

Ah, right, do the two importers need to stay in sync? Ideally I would have wanted to implement a lot of this in C++, and then the two importers could perhaps share it, but I don't know if that's possible without putting that C++ into ONNX itself or something.

@stellaraccident
Copy link
Collaborator

I feel like something complicated like this should also have a lit test that verifies at least a simple structural case case more explicitly.
See for example: https://github.com/llvm/torch-mlir/blob/main/test/python/onnx_importer/import_onnx_tool.runlit#L3
This can be annoyingly hard in ONNX, but if you can identify at least one simple example of function outlining like this, then it would be good to back it up with a lit test like that as it makes it obvious what the corresponding IR structure imports as.

Oh, somehow I hadn't realised I could use LIT tests here. I can definitely add a few of those!

Will also make it easier to port to the C version.

Ah, right, do the two importers need to stay in sync? Ideally I would have wanted to implement a lot of this in C++, and then the two importers could perhaps share it, but I don't know if that's possible without putting that C++ into ONNX itself or something.

The C one gets used for ORT integration and is just starting point code for that use case. I've just aimed at keeping the importers relatively structural and eating the cost of keeping them in sync.

The python version can go a lot of places the C one can't and vica-versa. As an example, the python one doesn't infect every consuming project with a native protobuf/absl dependency. That itself pays for the code duplication.

@andfau-amd
Copy link
Contributor Author

andfau-amd commented Jun 4, 2024

Hmm. As an aside, there's quite a few things in this code that are effectively working around deficiencies in the ONNX Python APIs. We could potentially simplify this Python importer a bit and make it easier to keep it in sync with the C++ importer if some improvements and additions to ONNX could be upstreamed.

@stellaraccident
Copy link
Collaborator

Hmm. As an aside, there's quite a few things in this code that are effectively working around deficiencies in the ONNX Python APIs. We could potentially simplify this Python importer a bit and make it easier to keep it in sync with the C++ importer if some improvements and additions to ONNX could be upstreamed.

Example? Most of the messiness I've seen is just silly twenty year old protobuf-is-really-not-a-great-ir issues. Most of that stuff is just never going to be very good.

@andfau-amd andfau-amd force-pushed the onnx-to-torch-function-expansion branch from eefc1d4 to 01570af Compare June 5, 2024 14:18
@stellaraccident
Copy link
Collaborator

stellaraccident commented Jun 5, 2024

I hear you. But in various downstreams, we are basically never going to take a dep on protobuf or onnx in the c++ code... My experience with Onnx on this stuff is limited, but it is essentially a clone of TF GraphDef, which I have a lot of sad history with. I hate to say it, but at some point you lose the "this could be better upstream" aspect and just try to ensure local sanity. For that, I just try to keep the mechanics contained to mostly write once importers that have reasonable tests.

But if you see a chance to make the actual onnx python API better, feel free to send them a patch. The protobuf API hasn't changed much in a decade and is what it is, though.

Onnx only releases quarterly, so it could be a while before you can actually use any contributions.

@andfau-amd
Copy link
Contributor Author

andfau-amd commented Jun 5, 2024

Wait, how does the C++ importer avoid depending on ONNX's C++ libraries or Protobuf? (Maybe we should discuss this elsewhere.)

@stellaraccident
Copy link
Collaborator

Wait, how does the C++ importer avoid depending on ONNX's C++ libraries or Protobuf? (Maybe we should discuss this elsewhere.)

The c++ importer doesn't avoid it. The python importer does.

@andfau-amd
Copy link
Contributor Author

Oh, well, the Python importer doesn't depend on C++ directly, but it does depend on the ONNX Python library, and that depends on ONNX's C++ code. But I think I understand: we can't add C++ code of our own to the importer, and adding stuff to ONNX would take a long time.

@andfau-amd
Copy link
Contributor Author

Uh, but more importantly, we might be able to make the C++ version of this code a bit simpler than the Python one since there's more stuff exposed by ONNX for C++, I think.

@andfau-amd
Copy link
Contributor Author

andfau-amd commented Jun 13, 2024

@stellaraccident I added some LIT tests in 79ea75b, do those look good? Could you take another look at the PR?

python/torch_mlir/extras/onnx_importer.py Outdated Show resolved Hide resolved
python/torch_mlir/extras/onnx_importer.py Outdated Show resolved Hide resolved
python/torch_mlir/extras/onnx_importer.py Show resolved Hide resolved
python/torch_mlir/extras/onnx_importer.py Outdated Show resolved Hide resolved
python/torch_mlir/extras/onnx_importer.py Outdated Show resolved Hide resolved
python/torch_mlir/extras/onnx_importer.py Outdated Show resolved Hide resolved
@rsuderman
Copy link
Contributor

I am a little worried about seeing multiple compounding changes on top of the general supporting multiple function imports. I may not have full context but it would surprise me if we need to tweak the decompositions due to wanting additional function inclusion.

@andfau-amd
Copy link
Contributor Author

andfau-amd commented Jun 13, 2024

I am a little worried about seeing multiple compounding changes on top of the general supporting multiple function imports. I may not have full context but it would surprise me if we need to tweak the decompositions due to wanting additional function inclusion.

We discussed this further privately. As I understand it, Rob's concern was about disruption being caused by the function visibility change, and by the function expansion applying to all operations by default (so, it would affect operations we already support and significantly change their lowering).

We already fixed the function visibility thing.

To solve the other problem we've decided to switch to an allowlisting approach: only a tiny set of operations will be expanded this way by default. In this patch I make that be just MeanVarianceNormalization (resolves nod-ai/SHARK-ModelDev#697) and NegativeLogLikelihoodLoss (might remove this, see #3380 discussion). This way there's no immediate disruption.

I guess there can be some tracking issue for gradually expanding the allowlist.

@andfau-amd andfau-amd requested a review from rsuderman June 14, 2024 09:25
@andfau-amd andfau-amd force-pushed the onnx-to-torch-function-expansion branch 2 times, most recently from cdfc7e1 to 66920b8 Compare June 14, 2024 10:53
Resolves llvm#3384.

Many ONNX operators are defined by functions and therefore could be
expanded into simpler ONNX operations during importing, avoiding the
need for tools downstream to support these operators directly.

This commit adds this capability to onnx_importer.py. When importing a
node, the schema for the node's operator is retrieved. If the schema
provides a function for the operator, a specialized version for the
node's types and attributes will be created and imported as an MLIR
function with private visibility. An MLIR function call will then be
emitted, instead of a normal operator node. Caching is used to avoid
generating redundant functions within the same module.

In order to avoid a disruptive change to the importer output for a
large number of operators that already have TorchOnnxToTorch support,
an allowlist strategy is used by default. With this commit, only one
operator is allowlisted for expansion, MeanVarianceNormalization.
However, many other operators can be correctly expanded by the current
code, so hopefully the allowlist can be gradually extended. It is
possible to disable the allowlist in the configuration, in which case
all functions are expanded (useful for testing).

Tools downstream of the importer may now need to do inlining when
consuming the output of the importer, e.g.:

  cat imported.mlir | torch-mlir-opt --inline --convert-onnx-to-torch

Explanations for subtle code changes:

- Looking up the correct schema and function for an operator requires
  knowing the opset version. NodeImporter retrieves this from the
  opset imports on the ModelProto retained by the GraphInfo. Previously,
  the model_proto field on GraphInfo was None when importing a subgraph
  in import_regions, but this conflicts with the new need for opset
  version info. Since the apparent purpose of setting it to None was to
  control how GraphInfo generates its input map, a new flag is added to
  GraphInfo (is_subgraph) to control this behavior, so that the actual
  ModelProto can now be provided without breaking this. This also turned
  out to be useful for getting the Config via ModelInfo via GraphInfo.
- Some operators' functions are context-dependent, which means the
  function definition depends on the types of the inputs. Therefore node
  importing now needs to look up the types of a node's inputs, not just
  its outputs as was the case previously. Consequently the operand to
  find_type_proto_for_name() may now be a graph input or initializer in
  some cases, so it has to be updated.
@andfau-amd andfau-amd force-pushed the onnx-to-torch-function-expansion branch from 66920b8 to 46e39ee Compare June 14, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ONNX] Systematically expand ONNX functions before conversion to Torch to avoid needing bespoke conversions
4 participants