Skip to content

Conversation

haykh
Copy link
Collaborator

@haykh haykh commented Sep 11, 2025

There is a persistent bug, where random number generation fails due to memory allocation on starting at 700+ GPUs on Aurora (also seen similar behavior on Frontier). #132

@haykh haykh added bug Something isn't working patch Patched version of stable release labels Sep 11, 2025
@haykh haykh linked an issue Sep 11, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch Patched version of stable release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Memory allocation error from Kokkos random number generator
2 participants