Reverse sort matmul benchmarks by memory use to avoid fragmentation#4449
Reverse sort matmul benchmarks by memory use to avoid fragmentation#4449jacobhinkle merged 7 commits intomainfrom
Conversation
|
Review updated until commit be28037 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
!build |
|
I believe you switched to custom memory checks for matmul since |
benchmarks/python/test_matmul.py
Outdated
| if a < b: | ||
| return -1 | ||
| elif a > b: | ||
| return 1 | ||
| return 0 |
There was a problem hiding this comment.
I think this condition can be re-written as return (a > b) - (a < b).
| return rows | ||
|
|
||
|
|
||
| def maybe_skip_oom_case(m: int, n: int, k: int): |
There was a problem hiding this comment.
Should we also clear allocated memory here?
There was a problem hiding this comment.
Since we reverse sort the benchmarks by required memory, we shouldn't need to clear the allocated memory, since subsequent tests will fit into the already-allocated memory pool.
|
!build |
Before this PR memory use on H200 grows slowly to reach 126 GiB out of 128 GiB capacity.
After this PR memory use on H200 never goes above about 66 GiB.
Also, we previously had a 20 GiB cutoff for included benchmarks but this changes that to 90% of GPU capacity.