-
Notifications
You must be signed in to change notification settings - Fork 25
Fully upstream lowering pipeline #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f6708e0 to
35daa6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a fully upstream lowering pipeline for Wave that doesn't depend on IREE, using only upstream MLIR dialects. Key components include:
Purpose: Enable Wave compilation using standard MLIR tooling without IREE dependencies
Key Changes:
- Host wrapper generation that converts PyObject inputs to native types and launches GPU kernels
- New execution engine with LLVM JIT compilation and HIP runtime integration
- Water lowering pipeline for GPU dialect to binary compilation
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| wave.py | Adds water pipeline integration and host function emission |
| water.py | Implements water binary discovery and lowering pipeline construction |
| compile_options.py | Adds use_water_pipeline flag |
| compile.py | Implements WaveKernelExecutionEngine for JIT execution |
| emitter.py | Adds emit_host_func to generate host wrapper functions |
| execution_engine/* | New C++ execution engine with Python bindings |
| buffer_utils.* | PyObject to native type conversion helpers |
| wave_hip_runtime.* | HIP kernel loading and launch runtime |
| GPUToGPURuntime.cpp | MLIR pass lowering GPU ops to runtime calls |
| setup.py | Build configuration for execution engine and water |
| tests/* | New tests and test utilities for water pipeline |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 30 out of 31 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Ready for review, I think. There are few TODOs I will leave to follow-up PRs. |
| std::string strVal = str.str(); | ||
| strVal.append(std::string_view("\0", 1)); | ||
| return LLVM::createGlobalString(loc, builder, | ||
| getUniqueLLVMGlobalName(mod, varName), | ||
| strVal, LLVM::Linkage::Internal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::string strVal = str.str(); | |
| strVal.append(std::string_view("\0", 1)); | |
| return LLVM::createGlobalString(loc, builder, | |
| getUniqueLLVMGlobalName(mod, varName), | |
| strVal, LLVM::Linkage::Internal); | |
| Twine strVal = str + "\0"; | |
| return LLVM::createGlobalString(loc, builder, | |
| getUniqueLLVMGlobalName(mod, varName), | |
| strVal.str(), LLVM::Linkage::Internal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use Twine instead of std::string_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't worked as it dropped final \0
| Value dataPtr = LLVM::createGlobalString( | ||
| loc, builder, getUniqueLLVMGlobalName(mod, "kernel_data"), objData, | ||
| LLVM::Linkage::Internal); | ||
| Value dataSize = createConst(objData.size(), i64Type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value dataSize = createConst(objData.size(), i64Type); | |
| Value dataSize = LLVM::ConstantOp::create(builder, loc, i64Type, objData.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe don't use a lambda for a one-liner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it being used in 4 places so I'd prefer to keep it.
ftynse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, though I may have missed things due to the sheer size of the PR. It may have been better to send, e.g, the execution engine and the runtime separately, then the lowering pass and logic, then the overall connection (and even then each patch is 1kLoC).
Overall comments:
- We need to update the documentation if we are going to refer to the MLIR base layer for Wave as Water, so far it was a stealth-ish effort to build the dialect.
- We need to somehow ensure we have enough coverage on the Python/C++ boundary as well as around runtime linking. This is one of the places that are difficult to debug so we'd rather be pedantic now than spend weeks printf-debugging later.
- Let's not depend on the entirety of upstream dialects and passes, this increases distribution sizes significantly for no reason.
| # TODO: we are linking water shared libs with static LLVM libraries, | ||
| # which doesn't really work if multiple of them linked with LLVM proper. | ||
| # shared_libs: ["ON", "OFF"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of shared-library build was to catch missing transitively covered dependencies that are invisible in a one monolithic build. Otherwise, the only sane way of having multiple MLIR downstreams co-exist at Python level is to statically link everything into one DSO per Python extension. If we remove the IREE dependency entirely, we could then build and ship our own distribution with a single DSO instead of depending on IREE's.
| invoke_git( | ||
| "clone", "https://github.com/llvm/llvm-project.git", ".", cwd=llvm_dir | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning LLVM takes a lot of time and space, we can limit that by making a shallow clone at the required commit, or even downloading a tarball at that commit.
Separately, consider factoring out the URL to a constant so it can be changed should we need to depend on a fork.
| CMakeExtension( | ||
| "wave_execution_engine", | ||
| "wave_lang/kernel/wave/execution_engine", | ||
| install_dir="wave_lang/kernel/wave/execution_engine", | ||
| need_llvm=True, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Wave EE is under BUILD_WATER flag. Are we moving towards rebranding the MLIR layer of Wave Water? If so, let's update the documentation.
| // Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this adapted from somewhere (so I don't have to re-review the code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was adapted from some of my personal codes :)
wave_lang/kernel/wave/water.py
Outdated
| path = find_local_water_binary_path("water-opt") | ||
| if path: | ||
| return path | ||
|
|
||
| try: | ||
| from water_mlir import binaries as water_bin | ||
| except ImportError as err: | ||
| raise RuntimeError( | ||
| "optional water_mlir module not installed but its use is requested" | ||
| ) from err | ||
| binary = water_bin.find_binary("water-opt") | ||
| return water_bin.find_binary("water-opt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two mechanisms?
edc3e0b to
73dfd68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 30 out of 31 changed files in this pull request and generated 15 comments.
Comments suppressed due to low confidence (1)
wave_lang/kernel/wave/compile.py:337
- This comment appears to contain commented-out code.
# def __del__(self):
# """Clean up the loaded module when the kernel is destroyed."""
# if self._module_handle is not None and self._engine is not None:
# self._engine.release_module(self._module_handle)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
f559935 to
f3cf1ed
Compare
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (1)
wave_lang/kernel/wave/compile.py:337
- This comment appears to contain commented-out code.
# def __del__(self):
# """Clean up the loaded module when the kernel is destroyed."""
# if self._module_handle is not None and self._engine is not None:
# self._engine.release_module(self._module_handle)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Extracted from #452 This pass expects IR with `gpu.launch_func` ops and converts it to wave runtime lib calls. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Add a new lowering pipeline which doesn't depend on IREE and uses only upstream dialects.
Design overview:
emitter.pyemit_host_funcgenerates a host wrapper on top of gpu kernel funcPyObject*inputs to the actual typesgpu.launch_funcwater_lowering_pipelineconstructs lowering pipeline for the host/kernel funcsGPUToGPURuntime.cpplowering pass to run gpu binary using our custom runtime. Upstream MLIR GPU lowering uses MLIRruntime_wrapperswhich are still in toy stage and not really suitable for production use, so we need a custom runtime.execution_engineincludes 3 support librariesexecution_engineitself which includes conversion from mlir to llvm for host code (and the host code only as gpu kernel was converted to binary earlier in the pipeline), running llvm opt pipeline, x86 codegen and execution it in memory using llvm execution engine.buffer_utilswhich provide conversion from python to native typeswave_hip_runtimewhich contains HIP binary load and kernel launch, similar to our existingwave_runtimeexecution_engine.pywhich is python wrapper on top of the above runtimescompile.pycreates execution engine, loads the final module into it and invokes resulting native function usingctypes.setup.pyto optionally build execution engine andwater-optTODOs to follow up PRs:
dump_intermediatessupport