Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: test_random fails on AMD GPU #1682

Open
ClaudiaComito opened this issue Oct 16, 2024 · 0 comments · Fixed by #1677
Open

[Bug]: test_random fails on AMD GPU #1682

ClaudiaComito opened this issue Oct 16, 2024 · 0 comments · Fixed by #1677
Labels
bug Something isn't working HW:ROCm MPI Anything related to MPI communication

Comments

@ClaudiaComito
Copy link
Contributor

What happened?

Our tests on the AMD-ROCm runner have been failing at test_random, on the 2-process GPU tests.

Failure corresponds to one of the many dndarray.numpy() calls, in turn calling Allgather or Allgatherv.

Code snippet triggering the error

No response

Error message or erroneous outcome

No response

Version

main (development branch)

Python version

None

PyTorch version

None

MPI version

No response

@ClaudiaComito ClaudiaComito added bug Something isn't working MPI Anything related to MPI communication HW:ROCm labels Oct 16, 2024
@mtar mtar reopened this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working HW:ROCm MPI Anything related to MPI communication
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants