Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the clear_l2_cache function to improve L2 cache clearing for GPU benchmarking. The implementation is updated based on the Triton library's approach and optimized for newer GPU architectures like GB200.
Key changes:
- Added configurable
deviceparameter with default value'cuda' - Updated cache clearing strategy from allocating 32 MB and filling with value 42 to allocating 512 MB and zeroing
- Added comprehensive documentation explaining the rationale and hardware-specific considerations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Adapted from triton.testing.do_bench. | ||
| """ | ||
| cache_size = 512 * 1024 * 1024 | ||
| cache = torch.empty(int(cache_size // 4), dtype=torch.int32, device=device) |
There was a problem hiding this comment.
The magic number 4 in the division cache_size // 4 should be explained with a comment or replaced with a named constant. The division by 4 likely converts bytes to int32 elements (4 bytes per int32), but this is not immediately clear to readers.
| cache = torch.empty(int(cache_size // 4), dtype=torch.int32, device=device) | |
| BYTES_PER_INT32 = 4 # Number of bytes in a 32-bit integer | |
| cache = torch.empty(int(cache_size // BYTES_PER_INT32), dtype=torch.int32, device=device) |
| Clears GPU L2 cache by allocating and zeroing a buffer. | ||
|
|
||
| GB200 has 126 MB L2 cache. Using 512 MB (4x buffer). | ||
| See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/ |
There was a problem hiding this comment.
The URL in the documentation appears to be incomplete or generic. The link should point to a specific section of the Blackwell tuning guide that discusses L2 cache specifications, if available, to help readers verify the 126 MB L2 cache claim.
| See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/ | |
| See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html#l2-cache | |
| # Section: "L2 Cache" in the Blackwell tuning guide. |
No description provided.