Add simple user epilogues to persistent gemm kernel #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Add epilogue function support to tritonBLAS persistent GEMM kernels, enabling users to apply element-wise operations (e.g., activation functions) directly to the output accumulator. This provides better performance through kernel fusion by eliminating separate kernel launches for common post-GEMM operations.
Technical Details
Core Changes:
epilogue.pymodule with built-in activation functions (ReLU, GELU, SiLU, Sigmoid, Tanh, Leaky ReLU, Identity)persistent_gemm.pyto accept optionalepilogue_fnparameter (default:None)epilogue_fn=None, Triton JIT compiler optimizes it out (zero overhead)Files Modified:
include/tritonblas/kernels/stages/algorithms/epilogue.py(new)include/tritonblas/kernels/stages/algorithms/__init__.pyinclude/tritonblas/kernels/persistent_gemm.pytests/test_epilogues.py(new)examples/example_matmul_epilogue.py(new)docs/EPILOGUES.md(new)Key Features:
@triton.jitTest Plan
tests/test_epilogues.py) with parametrized testsepilogue_fn=None(no epilogue)Test Result
All tests pass with expected fp16 precision tolerance (rtol=1e-2, atol=1e-2):
Submission Checklist