Add simple user epilogues to persistent gemm kernel #53

ryanswann-amd · 2026-01-21T20:39:20Z

Motivation

Add epilogue function support to tritonBLAS persistent GEMM kernels, enabling users to apply element-wise operations (e.g., activation functions) directly to the output accumulator. This provides better performance through kernel fusion by eliminating separate kernel launches for common post-GEMM operations.

Technical Details

Core Changes:

Added epilogue.py module with built-in activation functions (ReLU, GELU, SiLU, Sigmoid, Tanh, Leaky ReLU, Identity)
Modified persistent_gemm.py to accept optional epilogue_fn parameter (default: None)
Epilogue applied after GEMM/scales/bias but before type conversion
When epilogue_fn=None, Triton JIT compiler optimizes it out (zero overhead)

Files Modified:

include/tritonblas/kernels/stages/algorithms/epilogue.py (new)
include/tritonblas/kernels/stages/algorithms/__init__.py
include/tritonblas/kernels/persistent_gemm.py
tests/test_epilogues.py (new)
examples/example_matmul_epilogue.py (new)
docs/EPILOGUES.md (new)

Key Features:

Numerically stable implementations (tanh/GELU avoid overflow)
Easy to create custom epilogue functions with @triton.jit

Test Plan

Created comprehensive pytest suite (tests/test_epilogues.py) with parametrized tests
Tests all built-in epilogue functions (ReLU, GELU, SiLU, Sigmoid, Tanh, Leaky ReLU, Identity)
Tests epilogue with bias addition
Tests epilogue_fn=None (no epilogue)
Validates against PyTorch reference implementations
Multiple matrix sizes tested (256x256, 512x512, 128x256x512)

Test Result

All tests pass with expected fp16 precision tolerance (rtol=1e-2, atol=1e-2):

✓ Identity, ReLU, GELU, SiLU, Tanh, Sigmoid, Leaky ReLU epilogues
✓ Epilogue with bias
✓ No epilogue (None)
Example demonstrates custom clamp epilogue with perfect match to PyTorch

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds support for epilogue functions to the tritonBLAS persistent GEMM kernel, enabling fused element-wise operations on the output accumulator for improved performance through kernel fusion.

Changes:

Introduced a new epilogue.py module with built-in activation functions (ReLU, GELU, SiLU, Sigmoid, Tanh, Leaky ReLU, Identity)
Modified persistent_gemm.py to accept an optional epilogue_fn parameter
Added comprehensive test coverage and example demonstrating custom epilogue usage

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`include/tritonblas/kernels/stages/algorithms/epilogue.py`	New module implementing built-in activation functions as JIT-compiled epilogue operations
`include/tritonblas/kernels/stages/algorithms/__init__.py`	Exports epilogue functions from the new module
`include/tritonblas/kernels/persistent_gemm.py`	Adds optional `epilogue_fn` parameter and applies it to accumulator before type conversion
`tests/test_epilogues.py`	Comprehensive test suite validating all epilogue functions against PyTorch references
`examples/example_matmul_epilogue.py`	Example demonstrating custom epilogue function creation and usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/tritonblas/kernels/stages/algorithms/epilogue.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add simple user epilogues to persistent gemm kernel

f9bf9a9

Copilot AI review requested due to automatic review settings January 21, 2026 20:39

Copilot AI reviewed Jan 21, 2026

View reviewed changes

include/tritonblas/kernels/stages/algorithms/epilogue.py Outdated Show resolved Hide resolved

ryanswann-amd and others added 2 commits January 21, 2026 14:44

Update include/tritonblas/kernels/stages/algorithms/epilogue.py

15a0219

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add example using epilogues to fuse hadamard to gemm

44787cd

ryanswann-amd requested a review from neoblizz January 22, 2026 17:21

Update hadamard example

b0c16fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple user epilogues to persistent gemm kernel #53

Add simple user epilogues to persistent gemm kernel #53

Uh oh!

ryanswann-amd commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add simple user epilogues to persistent gemm kernel #53

Are you sure you want to change the base?

Add simple user epilogues to persistent gemm kernel #53

Uh oh!

Conversation

ryanswann-amd commented Jan 21, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant