Fully upstream lowering pipeline #452

Hardcode84 · 2025-11-16T21:21:48Z

Add a new lowering pipeline which doesn't depend on IREE and uses only upstream dialects.

Design overview:

emitter.py emit_host_func generates a host wrapper on top of gpu kernel func
- converts PyObject* inputs to the actual types
- Computes dynamic kernel launch params (grid dims, dynamic values)
- invokes kernel func using gpu.launch_func
water_lowering_pipeline constructs lowering pipeline for the host/kernel funcs
- Lower high-level dialects (i.e. affine)
- Run optimizations
- compile kernel code into binary and embed it into the IR
- Custom lowering pass for wave runtime
GPUToGPURuntime.cpp lowering pass to run gpu binary using our custom runtime. Upstream MLIR GPU lowering uses MLIR runtime_wrappers which are still in toy stage and not really suitable for production use, so we need a custom runtime.
execution_engine includes 3 support libraries
- execution_engine itself which includes conversion from mlir to llvm for host code (and the host code only as gpu kernel was converted to binary earlier in the pipeline), running llvm opt pipeline, x86 codegen and execution it in memory using llvm execution engine.
- buffer_utils which provide conversion from python to native types
- wave_hip_runtime which contains HIP binary load and kernel launch, similar to our existing wave_runtime
execution_engine.py which is python wrapper on top of the above runtimes
compile.py creates execution engine, loads the final module into it and invokes resulting native function using ctypes.
Update setup.py to optionally build execution engine and water-opt

TODOs to follow up PRs:

direct asm support in host wrapper
compiled binary caching
dump_intermediates support

Copilot

Pull request overview

This PR adds a fully upstream lowering pipeline for Wave that doesn't depend on IREE, using only upstream MLIR dialects. Key components include:

Purpose: Enable Wave compilation using standard MLIR tooling without IREE dependencies

Key Changes:

Host wrapper generation that converts PyObject inputs to native types and launches GPU kernels
New execution engine with LLVM JIT compilation and HIP runtime integration
Water lowering pipeline for GPU dialect to binary compilation

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
wave.py	Adds water pipeline integration and host function emission
water.py	Implements water binary discovery and lowering pipeline construction
compile_options.py	Adds `use_water_pipeline` flag
compile.py	Implements `WaveKernelExecutionEngine` for JIT execution
emitter.py	Adds `emit_host_func` to generate host wrapper functions
execution_engine/*	New C++ execution engine with Python bindings
buffer_utils.*	PyObject to native type conversion helpers
wave_hip_runtime.*	HIP kernel loading and launch runtime
GPUToGPURuntime.cpp	MLIR pass lowering GPU ops to runtime calls
setup.py	Build configuration for execution engine and water
tests/*	New tests and test utilities for water pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/kernel/wave/wave_e2e_test.py

wave_lang/kernel/wave/codegen/emitter.py

wave_lang/kernel/wave/execution_engine/execution_engine.h

water/lib/Transforms/GPUToGPURuntime.cpp

wave_lang/kernel/wave/execution_engine/__init__.py

lit_tests/kernel/wave/water_lowering.py

wave_lang/kernel/wave/compile.py

lit_tests/kernel/wave/water_lowering.py

Copilot

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wave_lang/kernel/wave/water.py

wave_lang/kernel/wave/execution_engine/CMakeLists.txt

pyproject.toml

wave_lang/kernel/wave/execution_engine/execution_engine.cpp

wave_lang/kernel/wave/compile.py

lit_tests/kernel/wave/water_lowering.py

wave_lang/kernel/wave/compile.py

Hardcode84 · 2025-11-24T00:37:22Z

Ready for review, I think. There are few TODOs I will leave to follow-up PRs.

tgymnich · 2025-11-24T09:31:10Z

water/lib/Transforms/GPUToGPURuntime.cpp

+        std::string strVal = str.str();
+        strVal.append(std::string_view("\0", 1));
+        return LLVM::createGlobalString(loc, builder,
+                                        getUniqueLLVMGlobalName(mod, varName),
+                                        strVal, LLVM::Linkage::Internal);


Suggested change

std::string strVal = str.str();

strVal.append(std::string_view("\0", 1));

return LLVM::createGlobalString(loc, builder,

getUniqueLLVMGlobalName(mod, varName),

strVal, LLVM::Linkage::Internal);

Twine strVal = str + "\0";

return LLVM::createGlobalString(loc, builder,

getUniqueLLVMGlobalName(mod, varName),

strVal.str(), LLVM::Linkage::Internal);

nit: use Twine instead of std::string_view

didn't worked as it dropped final \0

tgymnich · 2025-11-24T09:37:24Z

water/lib/Transforms/GPUToGPURuntime.cpp

+      Value dataPtr = LLVM::createGlobalString(
+          loc, builder, getUniqueLLVMGlobalName(mod, "kernel_data"), objData,
+          LLVM::Linkage::Internal);
+      Value dataSize = createConst(objData.size(), i64Type);


Suggested change

Value dataSize = createConst(objData.size(), i64Type);

Value dataSize = LLVM::ConstantOp::create(builder, loc, i64Type, objData.size());

nit: maybe don't use a lambda for a one-liner.

it being used in 4 places so I'd prefer to keep it.

water/lib/Transforms/GPUToGPURuntime.cpp

wave_lang/kernel/wave/execution_engine/execution_engine.cpp

ftynse

Looks good in general, though I may have missed things due to the sheer size of the PR. It may have been better to send, e.g, the execution engine and the runtime separately, then the lowering pass and logic, then the overall connection (and even then each patch is 1kLoC).

Overall comments:

We need to update the documentation if we are going to refer to the MLIR base layer for Wave as Water, so far it was a stealth-ish effort to build the dialect.
We need to somehow ensure we have enough coverage on the Python/C++ boundary as well as around runtime linking. This is one of the places that are difficult to debug so we'd rather be pedantic now than spend weeks printf-debugging later.
Let's not depend on the entirety of upstream dialects and passes, this increases distribution sizes significantly for no reason.

ftynse · 2025-11-24T08:44:30Z

.github/workflows/ci-gpu.yaml

+        # TODO: we are linking water shared libs with static LLVM libraries,
+        # which doesn't really work if multiple of them linked with LLVM proper.
+        # shared_libs: ["ON", "OFF"]


The point of shared-library build was to catch missing transitively covered dependencies that are invisible in a one monolithic build. Otherwise, the only sane way of having multiple MLIR downstreams co-exist at Python level is to statically link everything into one DSO per Python extension. If we remove the IREE dependency entirely, we could then build and ship our own distribution with a single DSO instead of depending on IREE's.

setup.py

ftynse · 2025-11-24T08:56:09Z

setup.py

+            invoke_git(
+                "clone", "https://github.com/llvm/llvm-project.git", ".", cwd=llvm_dir
+            )


Cloning LLVM takes a lot of time and space, we can limit that by making a shallow clone at the required commit, or even downloading a tarball at that commit.

Separately, consider factoring out the URL to a constant so it can be changed should we need to depend on a fork.

ftynse · 2025-11-24T08:58:20Z

setup.py

+        CMakeExtension(
+            "wave_execution_engine",
+            "wave_lang/kernel/wave/execution_engine",
+            install_dir="wave_lang/kernel/wave/execution_engine",
+            need_llvm=True,
+        ),


So Wave EE is under BUILD_WATER flag. Are we moving towards rebranding the MLIR layer of Wave Water? If so, let's update the documentation.

tests/kernel/wave/common/utils.py

ftynse · 2025-11-24T10:00:12Z

wave_lang/kernel/wave/execution_engine/execution_engine.cpp

+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+


Is this adapted from somewhere (so I don't have to re-review the code)?

It was adapted from some of my personal codes :)

wave_lang/kernel/wave/execution_engine/execution_engine.py

ftynse · 2025-11-24T10:18:04Z

wave_lang/kernel/wave/water.py

+    path = find_local_water_binary_path("water-opt")
+    if path:
+        return path
+
    try:
        from water_mlir import binaries as water_bin
    except ImportError as err:
        raise RuntimeError(
            "optional water_mlir module not installed but its use is requested"
        ) from err
-    binary = water_bin.find_binary("water-opt")
+    return water_bin.find_binary("water-opt")


Why do we need two mechanisms?

wave_lang/kernel/wave/water.py

wave_lang/kernel/wave/wave.py

Copilot

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 15 comments.

Comments suppressed due to low confidence (1)

wave_lang/kernel/wave/compile.py:337

This comment appears to contain commented-out code.

    # def __del__(self):
    #     """Clean up the loaded module when the kernel is destroyed."""
    #     if self._module_handle is not None and self._engine is not None:
    #         self._engine.release_module(self._module_handle)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wave_lang/kernel/wave/execution_engine/CMakeLists.txt

wave_lang/kernel/wave/runtime/CMakeLists.txt

wave_lang/kernel/wave/codegen/emitter.py

wave_lang/kernel/wave/execution_engine/wave_hip_runtime.cpp

wave_lang/kernel/wave/water.py

tests/kernel/wave/common/utils.py

wave_lang/kernel/wave/water.py

wave_lang/kernel/wave/execution_engine/execution_engine.cpp

wave_lang/kernel/wave/execution_engine/execution_engine.py

Copilot

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

setup.py

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wave_lang/kernel/wave/execution_engine/buffer_utils.cpp

wave_lang/kernel/wave/execution_engine/buffer_utils.h

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

wave_lang/kernel/wave/compile.py:337

This comment appears to contain commented-out code.

    # def __del__(self):
    #     """Clean up the loaded module when the kernel is destroyed."""
    #     if self._module_handle is not None and self._engine is not None:
    #         self._engine.release_module(self._module_handle)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wave_lang/kernel/wave/execution_engine/wave_hip_runtime.cpp

wave_lang/kernel/wave/water.py

water/lib/Transforms/GPUToGPURuntime.cpp

wave_lang/kernel/wave/execution_engine/bindings.cpp

tests/kernel/wave/common/utils.py

wave_lang/kernel/wave/compile.py

wave_lang/kernel/wave/water.py

wave_lang/kernel/wave/execution_engine/execution_engine.cpp

lit_tests/kernel/wave/water_lowering.py

Extracted from #452 This pass expects IR with `gpu.launch_func` ops and converts it to wave runtime lib calls. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Hardcode84 force-pushed the host-wrapper branch 2 times, most recently from f6708e0 to 35daa6b Compare November 22, 2025 20:06

Hardcode84 marked this pull request as ready for review November 23, 2025 20:38

Copilot AI review requested due to automatic review settings November 23, 2025 20:38

Copilot started reviewing on behalf of Hardcode84 November 23, 2025 20:39 View session

Copilot finished reviewing on behalf of Hardcode84 November 23, 2025 20:42

Copilot AI reviewed Nov 23, 2025

View reviewed changes

Hardcode84 changed the title ~~[WIP] Fully upstream lowering pipeline~~ Fully upstream lowering pipeline Nov 23, 2025

Copilot AI review requested due to automatic review settings November 24, 2025 00:29

Copilot started reviewing on behalf of Hardcode84 November 24, 2025 00:29 View session

Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 00:31

Copilot AI reviewed Nov 24, 2025

View reviewed changes

Hardcode84 requested review from ftynse, harsh-nod, raikonenfnu and tgymnich November 24, 2025 00:36

tgymnich reviewed Nov 24, 2025

View reviewed changes

ftynse reviewed Nov 24, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 24, 2025 15:22

Hardcode84 force-pushed the host-wrapper branch from edc3e0b to 73dfd68 Compare November 24, 2025 15:22

Copilot started reviewing on behalf of Hardcode84 November 24, 2025 15:22 View session

Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 15:25

Copilot AI reviewed Nov 24, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 24, 2025 15:47

Copilot started reviewing on behalf of Hardcode84 November 24, 2025 15:48 View session

Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 15:50

Copilot AI reviewed Nov 24, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings November 24, 2025 15:58

Copilot started reviewing on behalf of Hardcode84 November 24, 2025 15:59 View session

Hardcode84 added 15 commits November 26, 2025 14:49

SymbolTable::generateSymbolName

0953033

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup

7070798

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

builder.erase(op)

3b05496

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cache symbol table

c698277

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup

6b2bb93

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

make_linear_pass_pipeline doc

80b77d6

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

unlocal imports

199dd4d

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup, license headers

2882df2

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

miltiline literals and comment

4c146c4

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

code comments

68b74c9

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

fix comments

9fd62c0

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

better wave-opt error handling

03bcd4b

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

typos

d177d14

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

simplify wave_get_buffer

f3cf1ed

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

fix search path

77eb88a

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot AI review requested due to automatic review settings November 26, 2025 14:10

Hardcode84 force-pushed the host-wrapper branch from f559935 to f3cf1ed Compare November 26, 2025 14:10

Copilot started reviewing on behalf of Hardcode84 November 26, 2025 14:10 View session

fix func call

5e90473

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot finished reviewing on behalf of Hardcode84 November 26, 2025 14:11

comment

4a66184

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot AI reviewed Nov 26, 2025

View reviewed changes

wave_lang/kernel/wave/execution_engine/buffer_utils.cpp Show resolved Hide resolved

wave_lang/kernel/wave/execution_engine/buffer_utils.h Outdated Show resolved Hide resolved

fix func

39cd325

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Copilot AI review requested due to automatic review settings November 26, 2025 14:25

Copilot started reviewing on behalf of Hardcode84 November 26, 2025 14:25 View session

Copilot finished reviewing on behalf of Hardcode84 November 26, 2025 14:28

Copilot AI reviewed Nov 26, 2025

View reviewed changes

This was referenced Nov 26, 2025

gpu to gpu runtime pass #488

Merged

Add upstream host wrapper codegen #491

Open

Hardcode84 added a commit that referenced this pull request Nov 27, 2025

gpu to gpu runtime pass (#488)

398f6ac

Extracted from #452 This pass expects IR with `gpu.launch_func` ops and converts it to wave runtime lib calls. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

	Value dataSize = createConst(objData.size(), i64Type);
	Value dataSize = LLVM::ConstantOp::create(builder, loc, i64Type, objData.size());

Fully upstream lowering pipeline #452

Are you sure you want to change the base?

Fully upstream lowering pipeline #452

Uh oh!

Conversation

Hardcode84 commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hardcode84 commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hardcode84 Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ftynse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Hardcode84 commented Nov 16, 2025 •

edited

Loading

Hardcode84 Nov 24, 2025 •

edited

Loading