Rework iree-compile pipeline for ONNX #18803

zjgarvey · 2024-10-17T00:48:24Z

Request description

The current onnx-specific intake logic in PluginRegistration.cpp is insufficient for iree-compile. Currently, this single pass is not enough to get the torch IR into an acceptable state to hand off to the torch-to-iree pipeline that follows.

I'd like to request that we do the following.

Use the following passes when lowering torch-onnx-to-torch:

"--convert-torch-onnx-to-torch"
take the relevant bits out of "--torch-lower-to-backend-contract". In particular, "--torch-shape-refinement-pipeline" and "--torch-decompose-complex-ops".
"--torch-scalarize-shapes": This pass de-tensorizes shape computations, which are horrifically profuse in models coming from ONNX.
Re-apply "--torch-shape-refinement-pipeline", since the shape scalarization might have unblocked shape inference from doing what it is supposed to. Yes, it is unfortunately necessary to do shape inference before and after shape scalarization. For some models that fail to get any meaningful shape information from onnx's shape inference tools, cycling between shape inference and scalarization seems to be the only reliable way to untangle a mess of frontend IR.
On the off-chance that the return type of a func op is still a static info cast op after shape inference, "--torch-refine-public-return" and "--canonicalize" can clean up the boundary.

Remove the following pass from torch-to-iree pipeline:

"--torch-decompose-complex-ops". Why remove this here? Well, since this pass pipeline is already missing shape and dtype refinement passes, I'm going to run under the assumption that any "torch" input IR for the torch-to-IREE pipeline satisfies the loosely defined torch-backend contract. Any pipeline like "-to-(torch-)backend-" already includes decompose complex ops. If we actually do need this for the ONNX side because some cycling between shape scalarization and inference unblocks decomposition patterns we need, then it should be included in the onnx-to-torch pipeline.

If reasonable, it would be a good idea to re-apply cycles of shape refinement, shape-scalarization, and possibly decompose complex ops, until the IR reaches a fixed state (or a max number of iterations). If doing this, please disable the unflatten.int and flatten.using_ints decompositions, since these might create a cycle with view op.

What component(s) does this issue relate to?

Frontends, Compiler, MLIR

Additional context

We are currently pre-applying torch-mlir passes in our test suite before handing off linalg IR to iree-compile. Making the suggested changes would mean we could just run iree-compile on the onnx ir and get the same results (or better, since torch-to-iree might handle the lowering to linalg more carefully).

The removal of additional pre-compile processing could resolve some compilation issues. For example, the generation of affine.apply ops that seem to be relevant to the issue #18386 (comment), evidently don't get created when using torch-to-iree, but do get created when using torch-backend-to-linalg-on-tensors-backend-pipeline.

The text was updated successfully, but these errors were encountered:

ScottTodd · 2024-10-17T15:36:22Z

We are currently pre-applying torch-mlir passes in our test suite before handing off linalg IR to iree-compile.

Oof. That's not good. Test suites should be using the same tools that we recommend to users. If tests fail when using those tools, the tools should be fixed instead of changing the tests.

The current onnx-specific intake logic in PluginRegistration.cpp is insufficient for iree-compile. Currently, this single pass is not enough to get the torch IR into an acceptable state to hand off to the torch-to-iree pipeline that follows.

Generally:

iree-import-onnx should handle .onnx --> .mlir that iree-compile can use.
iree-compile with --iree-input-type=onnx (or the default auto) should be able to compile any .mlir produced by iree-import-onnx for all backends

We can put options in the torch (onnx) plugin if there are decisions for users to make, like the existing --iree-torch-use-strict-symbolic-shapes option.

If the passes and pipelines in torch-mlir change, IREE should adapt as well. See also #17021.

I don't have specific feedback about the proposed concrete changes. They all look pretty standard for converting from one representation to another.

vivekkhandelwal1 · 2024-10-17T15:51:07Z

I have added the WIP patch here: llvm/torch-mlir#3801 for adding the torch-onnx-to-torch-backend pipeline in Torch-MLIR.

zjgarvey · 2024-10-17T18:45:49Z

Oof. That's not good. Test suites should be using the same tools that we recommend to users. If tests fail when using those tools, the tools should be fixed instead of changing the tests.

I only disagree based on my incompetence. It's been some trial and error to figure out what passes we actually need, and where to put them in the pipeline. It would have been several iterations of sloppy iree commits to converge on something meaningful. Instead, we tried some different combinations of pre-proccessing passes in the test suite, and finally have a good understanding of what passes are actually necessary to apply to onnx-importer generated IR. It's time to upstream the correct pipeline.

ScottTodd · 2024-10-17T18:51:51Z

Fair points. It's definitely the end state that matters, not necessarily the journey. I think there are some process / development workflow changes we could make that would make that convergence less sloppy. Manually specifying pass pipelines and adding extra flags is a crutch that we've been leaning on in many places for far too long. The tools need to work (correctly and with high performance) out of the box.

vivekkhandelwal1 · 2024-10-18T13:10:08Z

For the first proposed change, I have added a PR here: llvm/torch-mlir#3801

CC: @zjgarvey @ScottTodd

ScottTodd · 2024-12-06T20:21:25Z

What's the status of this issue? I'm comparing test coverage between https://github.com/nod-ai/SHARK-TestSuite/tree/main/alt_e2eshark/onnx_tests and https://github.com/iree-org/iree-test-suites/tree/main/onnx_models and I'm wondering if the higher test pass rate in SHARK-TestSuite is due to some of the extra torch-mlir-opt passes being run here: https://github.com/nod-ai/SHARK-TestSuite/blob/22a000a685ede0764b51d234ccad6b973e2724d9/alt_e2eshark/e2e_testing/test_configs/onnxconfig.py#L148-L223. Can that test suite be switched to using the user-facing tools (iree-import-onnx) now?

zjgarvey · 2024-12-07T00:00:17Z

What's the status of this issue? I'm comparing test coverage between https://github.com/nod-ai/SHARK-TestSuite/tree/main/alt_e2eshark/onnx_tests and https://github.com/iree-org/iree-test-suites/tree/main/onnx_models and I'm wondering if the higher test pass rate in SHARK-TestSuite is due to some of the extra torch-mlir-opt passes being run here: https://github.com/nod-ai/SHARK-TestSuite/blob/22a000a685ede0764b51d234ccad6b973e2724d9/alt_e2eshark/e2e_testing/test_configs/onnxconfig.py#L148-L223. Can that test suite be switched to using the user-facing tools (iree-import-onnx) now?

We aren't running torch-mlir preprocessing anymore in the ci, so we can close this issue.

ScottTodd · 2024-12-07T00:04:07Z

Are you sure about that? the code I linked is using torch-mlir directly in alt_e2eshark/, while I only see references to iree-import-onnx in e2eshark/: https://github.com/nod-ai/SHARK-TestSuite/blob/22a000a685ede0764b51d234ccad6b973e2724d9/e2eshark/run.py#L197

zjgarvey · 2024-12-07T00:08:18Z

Yeah, the preprocessing stage is only used if the flag --torchtolinalg is passed. We turned that off in the CI after updating the torch-to-iree pipeline. We will soon use iree-import-onnx when adding some larger models to the CI. So right now the only place torch-mlir is being used in the CI is the import phase.

ScottTodd · 2024-12-07T00:15:48Z

Ah. I think the import phase should also be moved to iree-import-onnx too. Users (and test suites for IREE) shouldn't be using https://github.com/llvm/torch-mlir directly at all.

I specifically was looking at this test failure earlier and was wondering if the torch-mlir import was affected by the same bug:

https://github.com/iree-org/iree-test-suites/actions/runs/12202034018/job/34041791439#step:6:180

ERROR    onnx_models.conftest:conftest.py:140 artifacts/model_zoo/validated/vision/classification/rcnn-ilsvrc13-9.mlir:13:82: error: expected '>'
    %10 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_fc-rcnn_b_0> : tensor<200xf32>} : () -> !torch.vtensor<[200],f32>                                                                               ^

(maybe some invalid symbol in a resource name?)

zjgarvey · 2024-12-09T17:14:40Z

Yeah, this "-" issue should get fixed by llvm/torch-mlir#3901.

When this gets added and bumped into iree, the importer there should get fixed too.

zjgarvey added the enhancement ➕ New feature or request label Oct 17, 2024

zjgarvey assigned zjgarvey and vivekkhandelwal1 Oct 17, 2024

zjgarvey closed this as completed Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework iree-compile pipeline for ONNX #18803

Rework iree-compile pipeline for ONNX #18803

zjgarvey commented Oct 17, 2024

ScottTodd commented Oct 17, 2024

vivekkhandelwal1 commented Oct 17, 2024

zjgarvey commented Oct 17, 2024

ScottTodd commented Oct 17, 2024

vivekkhandelwal1 commented Oct 18, 2024

ScottTodd commented Dec 6, 2024 •

edited

Loading

zjgarvey commented Dec 7, 2024

ScottTodd commented Dec 7, 2024

zjgarvey commented Dec 7, 2024

ScottTodd commented Dec 7, 2024

zjgarvey commented Dec 9, 2024

Rework iree-compile pipeline for ONNX #18803

Rework iree-compile pipeline for ONNX #18803

Comments

zjgarvey commented Oct 17, 2024

Request description

Use the following passes when lowering torch-onnx-to-torch:

Remove the following pass from torch-to-iree pipeline:

Recommended:

What component(s) does this issue relate to?

Additional context

ScottTodd commented Oct 17, 2024

vivekkhandelwal1 commented Oct 17, 2024

zjgarvey commented Oct 17, 2024

ScottTodd commented Oct 17, 2024

vivekkhandelwal1 commented Oct 18, 2024

ScottTodd commented Dec 6, 2024 • edited Loading

zjgarvey commented Dec 7, 2024

ScottTodd commented Dec 7, 2024

zjgarvey commented Dec 7, 2024

ScottTodd commented Dec 7, 2024

zjgarvey commented Dec 9, 2024

ScottTodd commented Dec 6, 2024 •

edited

Loading