Skip to content

Conversation

@Hardcode84
Copy link
Contributor

@Hardcode84 Hardcode84 commented Nov 16, 2025

Add a new lowering pipeline which doesn't depend on IREE and uses only upstream dialects.

Design overview:

  • emitter.py emit_host_func generates a host wrapper on top of gpu kernel func
    • converts PyObject* inputs to the actual types
    • Computes dynamic kernel launch params (grid dims, dynamic values)
    • invokes kernel func using gpu.launch_func
  • water_lowering_pipeline constructs lowering pipeline for the host/kernel funcs
    • Lower high-level dialects (i.e. affine)
    • Run optimizations
    • compile kernel code into binary and embed it into the IR
    • Custom lowering pass for wave runtime
  • GPUToGPURuntime.cpp lowering pass to run gpu binary using our custom runtime. Upstream MLIR GPU lowering uses MLIR runtime_wrappers which are still in toy stage and not really suitable for production use, so we need a custom runtime.
  • execution_engine includes 3 support libraries
    • execution_engine itself which includes conversion from mlir to llvm for host code (and the host code only as gpu kernel was converted to binary earlier in the pipeline), running llvm opt pipeline, x86 codegen and execution it in memory using llvm execution engine.
    • buffer_utils which provide conversion from python to native types
    • wave_hip_runtime which contains HIP binary load and kernel launch, similar to our existing wave_runtime
  • execution_engine.py which is python wrapper on top of the above runtimes
  • compile.py creates execution engine, loads the final module into it and invokes resulting native function using ctypes.
  • Update setup.py to optionally build execution engine and water-opt

TODOs to follow up PRs:

  • direct asm support in host wrapper
  • compiled binary caching
  • dump_intermediates support

@Hardcode84 Hardcode84 force-pushed the host-wrapper branch 2 times, most recently from f6708e0 to 35daa6b Compare November 22, 2025 20:06
@Hardcode84 Hardcode84 marked this pull request as ready for review November 23, 2025 20:38
Copilot AI review requested due to automatic review settings November 23, 2025 20:38
Copilot finished reviewing on behalf of Hardcode84 November 23, 2025 20:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a fully upstream lowering pipeline for Wave that doesn't depend on IREE, using only upstream MLIR dialects. Key components include:

Purpose: Enable Wave compilation using standard MLIR tooling without IREE dependencies

Key Changes:

  • Host wrapper generation that converts PyObject inputs to native types and launches GPU kernels
  • New execution engine with LLVM JIT compilation and HIP runtime integration
  • Water lowering pipeline for GPU dialect to binary compilation

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
wave.py Adds water pipeline integration and host function emission
water.py Implements water binary discovery and lowering pipeline construction
compile_options.py Adds use_water_pipeline flag
compile.py Implements WaveKernelExecutionEngine for JIT execution
emitter.py Adds emit_host_func to generate host wrapper functions
execution_engine/* New C++ execution engine with Python bindings
buffer_utils.* PyObject to native type conversion helpers
wave_hip_runtime.* HIP kernel loading and launch runtime
GPUToGPURuntime.cpp MLIR pass lowering GPU ops to runtime calls
setup.py Build configuration for execution engine and water
tests/* New tests and test utilities for water pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Hardcode84 Hardcode84 changed the title [WIP] Fully upstream lowering pipeline Fully upstream lowering pipeline Nov 23, 2025
Copilot AI review requested due to automatic review settings November 24, 2025 00:29
Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 00:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Hardcode84
Copy link
Contributor Author

Ready for review, I think. There are few TODOs I will leave to follow-up PRs.

Comment on lines 199 to 203
std::string strVal = str.str();
strVal.append(std::string_view("\0", 1));
return LLVM::createGlobalString(loc, builder,
getUniqueLLVMGlobalName(mod, varName),
strVal, LLVM::Linkage::Internal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string strVal = str.str();
strVal.append(std::string_view("\0", 1));
return LLVM::createGlobalString(loc, builder,
getUniqueLLVMGlobalName(mod, varName),
strVal, LLVM::Linkage::Internal);
Twine strVal = str + "\0";
return LLVM::createGlobalString(loc, builder,
getUniqueLLVMGlobalName(mod, varName),
strVal.str(), LLVM::Linkage::Internal);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Twine instead of std::string_view

Copy link
Contributor Author

@Hardcode84 Hardcode84 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't worked as it dropped final \0

Value dataPtr = LLVM::createGlobalString(
loc, builder, getUniqueLLVMGlobalName(mod, "kernel_data"), objData,
LLVM::Linkage::Internal);
Value dataSize = createConst(objData.size(), i64Type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value dataSize = createConst(objData.size(), i64Type);
Value dataSize = LLVM::ConstantOp::create(builder, loc, i64Type, objData.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe don't use a lambda for a one-liner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it being used in 4 places so I'd prefer to keep it.

Copy link
Contributor

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, though I may have missed things due to the sheer size of the PR. It may have been better to send, e.g, the execution engine and the runtime separately, then the lowering pass and logic, then the overall connection (and even then each patch is 1kLoC).

Overall comments:

  • We need to update the documentation if we are going to refer to the MLIR base layer for Wave as Water, so far it was a stealth-ish effort to build the dialect.
  • We need to somehow ensure we have enough coverage on the Python/C++ boundary as well as around runtime linking. This is one of the places that are difficult to debug so we'd rather be pedantic now than spend weeks printf-debugging later.
  • Let's not depend on the entirety of upstream dialects and passes, this increases distribution sizes significantly for no reason.

Comment on lines +253 to +243
# TODO: we are linking water shared libs with static LLVM libraries,
# which doesn't really work if multiple of them linked with LLVM proper.
# shared_libs: ["ON", "OFF"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of shared-library build was to catch missing transitively covered dependencies that are invisible in a one monolithic build. Otherwise, the only sane way of having multiple MLIR downstreams co-exist at Python level is to statically link everything into one DSO per Python extension. If we remove the IREE dependency entirely, we could then build and ship our own distribution with a single DSO instead of depending on IREE's.

Comment on lines 181 to 183
invoke_git(
"clone", "https://github.com/llvm/llvm-project.git", ".", cwd=llvm_dir
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning LLVM takes a lot of time and space, we can limit that by making a shallow clone at the required commit, or even downloading a tarball at that commit.

Separately, consider factoring out the URL to a constant so it can be changed should we need to depend on a fork.

Comment on lines +285 to +290
CMakeExtension(
"wave_execution_engine",
"wave_lang/kernel/wave/execution_engine",
install_dir="wave_lang/kernel/wave/execution_engine",
need_llvm=True,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Wave EE is under BUILD_WATER flag. Are we moving towards rebranding the MLIR layer of Wave Water? If so, let's update the documentation.

// Licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adapted from somewhere (so I don't have to re-review the code)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was adapted from some of my personal codes :)

Comment on lines 197 to 207
path = find_local_water_binary_path("water-opt")
if path:
return path

try:
from water_mlir import binaries as water_bin
except ImportError as err:
raise RuntimeError(
"optional water_mlir module not installed but its use is requested"
) from err
binary = water_bin.find_binary("water-opt")
return water_bin.find_binary("water-opt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two mechanisms?

Copilot AI review requested due to automatic review settings November 24, 2025 15:22
Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 15:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 15 comments.

Comments suppressed due to low confidence (1)

wave_lang/kernel/wave/compile.py:337

  • This comment appears to contain commented-out code.
    # def __del__(self):
    #     """Clean up the loaded module when the kernel is destroyed."""
    #     if self._module_handle is not None and self._engine is not None:
    #         self._engine.release_module(self._module_handle)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 24, 2025 15:47
Copilot finished reviewing on behalf of Hardcode84 November 24, 2025 15:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 24, 2025 15:58
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Copilot AI review requested due to automatic review settings November 26, 2025 14:10
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Copilot finished reviewing on behalf of Hardcode84 November 26, 2025 14:11
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Copilot AI review requested due to automatic review settings November 26, 2025 14:25
Copilot finished reviewing on behalf of Hardcode84 November 26, 2025 14:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

wave_lang/kernel/wave/compile.py:337

  • This comment appears to contain commented-out code.
    # def __del__(self):
    #     """Clean up the loaded module when the kernel is destroyed."""
    #     if self._module_handle is not None and self._engine is not None:
    #         self._engine.release_module(self._module_handle)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Hardcode84 added a commit that referenced this pull request Nov 27, 2025
Extracted from #452

This pass expects IR with `gpu.launch_func` ops and converts it to wave
runtime lib calls.

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants