Add triton inspired l2 cache clearing by msaroufim · Pull Request #370 · gpu-mode/kernelbot

msaroufim · 2025-11-08T23:56:32Z

No description provided.

github-actions · 2025-11-08T23:57:28Z

Coverage report

This PR does not seem to contain any modification to coverable code.

Copilot

Pull Request Overview

This PR refactors the clear_l2_cache function to improve L2 cache clearing for GPU benchmarking. The implementation is updated based on the Triton library's approach and optimized for newer GPU architectures like GB200.

Key changes:

Added configurable device parameter with default value 'cuda'
Updated cache clearing strategy from allocating 32 MB and filling with value 42 to allocating 512 MB and zeroing
Added comprehensive documentation explaining the rationale and hardware-specific considerations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-08T23:58:08Z

examples/utils.py

+    Adapted from triton.testing.do_bench.
+    """
+    cache_size = 512 * 1024 * 1024
+    cache = torch.empty(int(cache_size // 4), dtype=torch.int32, device=device)


The magic number 4 in the division cache_size // 4 should be explained with a comment or replaced with a named constant. The division by 4 likely converts bytes to int32 elements (4 bytes per int32), but this is not immediately clear to readers.

Suggested change

cache = torch.empty(int(cache_size // 4), dtype=torch.int32, device=device)

BYTES_PER_INT32 = 4 # Number of bytes in a 32-bit integer

cache = torch.empty(int(cache_size // BYTES_PER_INT32), dtype=torch.int32, device=device)

Copilot · 2025-11-08T23:58:09Z

examples/utils.py

+    Clears GPU L2 cache by allocating and zeroing a buffer.
+
+    GB200 has 126 MB L2 cache. Using 512 MB (4x buffer).
+    See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/


The URL in the documentation appears to be incomplete or generic. The link should point to a specific section of the Blackwell tuning guide that discusses L2 cache specifications, if available, to help readers verify the 126 MB L2 cache claim.

Suggested change

See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/

See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html#l2-cache

# Section: "L2 Cache" in the Blackwell tuning guide.

Add triton inspired l2 cache clearing

2e38331

Copilot AI review requested due to automatic review settings November 8, 2025 23:56

Copilot AI reviewed Nov 8, 2025

View reviewed changes

msaroufim closed this Nov 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add triton inspired l2 cache clearing#370

Add triton inspired l2 cache clearing#370
msaroufim wants to merge 1 commit intomainfrom
l2_cache_clear

msaroufim commented Nov 8, 2025

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 8, 2025

Uh oh!

Copilot AI Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	cache = torch.empty(int(cache_size // 4), dtype=torch.int32, device=device)
	BYTES_PER_INT32 = 4 # Number of bytes in a 32-bit integer
	cache = torch.empty(int(cache_size // BYTES_PER_INT32), dtype=torch.int32, device=device)

	See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/
	See: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html#l2-cache
	# Section: "L2 Cache" in the Blackwell tuning guide.

Comments

Conversation

msaroufim commented Nov 8, 2025

Uh oh!

github-actions bot commented Nov 8, 2025

Coverage report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant