Reverse sort matmul benchmarks by memory use to avoid fragmentation by jacobhinkle · Pull Request #4449 · NVIDIA/Fuser

jacobhinkle · 2025-05-14T13:01:03Z

Before this PR memory use on H200 grows slowly to reach 126 GiB out of 128 GiB capacity.

After this PR memory use on H200 never goes above about 66 GiB.

Also, we previously had a 20 GiB cutoff for included benchmarks but this changes that to 90% of GPU capacity.

github-actions · 2025-05-14T13:01:44Z

Review updated until commit be28037

Description

Reverse sort matmul benchmarks by memory use
Update OOM cutoff to 90% of GPU capacity
Introduce maybe_skip_oom_case utility
Add docstrings and utility functions

Changes walkthrough 📝

Relevant files

Enhancement

test_matmul.py `Enhance matmul benchmark memory management` benchmarks/python/test_matmul.py Import `functools` for comparison utilities Add `row_mem` function to compute memory usage Add `three_way_cmp` and `mem_cmp` functions for sorting Reverse sort matmul problems by memory use Introduce `maybe_skip_oom_case` to skip OOM cases Update OOM cutoff to 90% of GPU capacity Use `maybe_skip_oom_case` in test functions	+42/-5

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Memory Calculation

The memory calculation in row_mem and maybe_skip_oom_case assumes half-precision (FP16) data types. Ensure that this assumption is correct for all use cases and that the memory calculation is accurate.

        def row_mem(row):
            """Compute gmem bytes used in a half-precision GEMM

            This computes only the space required for operands and output,
            ignoring intermediates like split-K buffers and assuming no bias
            terms.
            """
            m, n, k, _ = row
            return ((m + n) * k + m * n) * 2

        def three_way_cmp(a, b) -> int:
            """Perform a three-way comparison like the Python2 cmp() function

            This returns 0 if a == b, -1 if a < b, and 1 if a > b
            """
            return int(a > b) - int(a < b)

        def mem_cmp(row1, row2):
            """Compare two rows based on memory use"""
            return three_way_cmp(row_mem(row1), row_mem(row2))

        # Reverse sort by expected memory use to avoid fragmentation
        rows.sort(key=functools.cmp_to_key(mem_cmp), reverse=True)

        return rows


def maybe_skip_oom_case(m: int, n: int, k: int):
    expected_mem = (m * k + n * k + m * n) * 2  # operands plus output
    expected_mem *= 2  # account for multiple runs/deferred frees

    _, total = torch.cuda.mem_get_info()
    max_mem = total * 0.9
    if expected_mem > max_mem:
        pytest.skip(
            f"Case takes more than {max_mem / (2 ** 30): .2f} GiB. Skipping to avoid OOM"
        )

Three-Way Comparison

The three_way_cmp function is a reimplementation of the Python 2 cmp function. Consider using Python's built-in comparison operators or functools.cmp_to_key directly for simplicity and readability.

def three_way_cmp(a, b) -> int:
    """Perform a three-way comparison like the Python2 cmp() function

    This returns 0 if a == b, -1 if a < b, and 1 if a > b
    """
    return int(a > b) - int(a < b)

Memory Threshold

The memory threshold is set to 90% of GPU capacity. Verify that this threshold is appropriate for all target hardware and that it balances performance and resource utilization effectively.

max_mem = total * 0.9
if expected_mem > max_mem:

jacobhinkle · 2025-05-14T13:01:45Z

!build

benchmarks/python/test_matmul.py

Priya2698 · 2025-05-14T19:21:13Z

I believe you switched to custom memory checks for matmul since retry_on_oom resulted in errors pertaining to benchmark fixture being used twice?

…blems

Priya2698 · 2025-05-19T20:12:27Z

benchmarks/python/test_matmul.py

+            if a < b:
+                return -1
+            elif a > b:
+                return 1
+            return 0


I think this condition can be re-written as return (a > b) - (a < b).

Priya2698 · 2025-05-19T20:13:10Z

benchmarks/python/test_matmul.py

+        return rows
+
+
+def maybe_skip_oom_case(m: int, n: int, k: int):


Should we also clear allocated memory here?

Since we reverse sort the benchmarks by required memory, we shouldn't need to clear the allocated memory, since subsequent tests will fit into the already-allocated memory pool.

…blems

Priya2698

LGTM

jacobhinkle · 2025-05-20T17:40:34Z

!build

Reverse sort matmul benchmarks by memory use to avoid fragmentation

ffc15de

jacobhinkle requested a review from Priya2698 May 14, 2025 13:01

Update OOM cutoff to allow cases up to 90% of capacity

d8952f5

Priya2698 reviewed May 14, 2025

View reviewed changes

benchmarks/python/test_matmul.py Outdated Show resolved Hide resolved

Priya2698 reviewed May 14, 2025

View reviewed changes

benchmarks/python/test_matmul.py Outdated Show resolved Hide resolved

jacobhinkle added 3 commits May 19, 2025 12:06

Merge remote-tracking branch 'origin/main' into jh/sort_benchmark_pro…

9383144

…blems

Introduce maybe_skip_oom_case utility

ab3ac08

Add doc strings, three_way_cmp function. Ignore layout

fedc4e4

jacobhinkle requested a review from Priya2698 May 19, 2025 16:18

Priya2698 reviewed May 19, 2025

View reviewed changes

jacobhinkle added 2 commits May 20, 2025 09:32

Use more compact formula

f106613

Merge remote-tracking branch 'origin/main' into jh/sort_benchmark_pro…

be28037

…blems

Priya2698 approved these changes May 20, 2025

View reviewed changes

jacobhinkle merged commit c1d8a3c into main May 20, 2025
15 of 16 checks passed

jacobhinkle deleted the jh/sort_benchmark_problems branch May 20, 2025 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse sort matmul benchmarks by memory use to avoid fragmentation#4449

Reverse sort matmul benchmarks by memory use to avoid fragmentation#4449
jacobhinkle merged 7 commits intomainfrom
jh/sort_benchmark_problems

jacobhinkle commented May 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 14, 2025 •

edited

Loading

Uh oh!

jacobhinkle commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Priya2698 commented May 14, 2025

Uh oh!

Priya2698 May 19, 2025

Uh oh!

jacobhinkle May 20, 2025

Uh oh!

Priya2698 May 19, 2025

Uh oh!

jacobhinkle May 20, 2025

Uh oh!

Priya2698 left a comment

Uh oh!

jacobhinkle commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacobhinkle commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

jacobhinkle commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Priya2698 commented May 14, 2025

Uh oh!

Priya2698 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

jacobhinkle May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Priya2698 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

jacobhinkle May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Priya2698 left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented May 14, 2025 •

edited

Loading

github-actions bot commented May 14, 2025 •

edited

Loading