Skip to content

Commit 612732a

Browse files
committed
Add comments in response to feedback
1 parent 8b6207b commit 612732a

File tree

2 files changed

+12
-3
lines changed

2 files changed

+12
-3
lines changed

flashinfer/gemm/gemm_base.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1993,12 +1993,15 @@ def _heuristic_func_mm_fp4(
19931993
use_nvfp4: bool = True,
19941994
):
19951995
r"""
1996-
Heuristic function for mm_fp4 backend selection. Routes to either cudnn or cutlass, but not trtllm.
1996+
Heuristic function for mm_fp4 backend selection. Routes to either cudnn or cutlass.
1997+
Note: trtllm is not considered in the backend selection because it requires a specific
1998+
input quantization (swizzling/shuffling) that differs from the preparation used
1999+
for cudnn and cutlass backends.
19972000
19982001
Logic for which comes first:
19992002
- If cuda version is 12 - use cutlass.
2000-
- If cuda version is 13 and cudnn version is less than 9.14 - use cutlass.
2001-
- If cuda version is 13 and cudnn version is 9.14 or greater - use cudnn.
2003+
- If cuda version is 13 and cudnn version is less than 9.15 - use cutlass.
2004+
- If cuda version is 13 and cudnn version is 9.15 or greater - use cudnn.
20022005
20032006
"""
20042007
cuda_major, _ = get_cuda_version()

flashinfer/utils.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -921,6 +921,12 @@ def backend_requirement(
921921
backends. Should accept the same arguments as the decorated function and return
922922
True if requirements are met, False otherwise.
923923
In the case where the kernel function does not have any specific backends, this can be decorated with @supported_compute_capability to specify the function's supported compute capabilities.
924+
heuristic_func : callable, optional
925+
An optional function that performs heuristic backend selection when backend is "auto". Does not do anything if backend is not "auto".
926+
Should accept the same arguments as the decorated function.
927+
Should return an ordered list of runnable backends with the most preferred backend first.
928+
When decorated function is not autotuned, the first backend in the heuristic list will be run.
929+
When decorated function is autotuned, the backends in the heuristic list will be autotuned over to find the best backend.
924930
925931
Returns
926932
-------

0 commit comments

Comments
 (0)