Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] hash_aggregate_test.py::test_hash_multiple_grpby_pivot and row_conversion_test.py::test_row_conversions_fixed_width_wide fails on GB100 and cuda12.8 #12231

Open
yinqingh opened this issue Feb 26, 2025 · 1 comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@yinqingh
Copy link
Collaborator

yinqingh commented Feb 26, 2025

Describe the bug
Observed following similar failures in the test on GB100 with cuda12.8

  • hash_aggregate_test.py::test_hash_multiple_grpby_pivot
Exception in task 0.0 in stage 35.0 (TID 90) ai.rapids.cudf.CudfException: CUDA error at: build-spark-rapids-jni-cuda12/target/libcudf/cmake-build/_deps/cuco-src/include/cuco/detail/open_addressing/open_addressing_impl.cuh798: cudaErrorInvalidDevice invalid device ordinal
...
Exception in task 0.0 in stage 39.0 (TID 95) ai.rapids.cudf.CudfException: parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal
  • row_conversion_test.py::test_row_conversions_fixed_width_wide
ai.rapids.cudf.CudfException: parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)
GB100 + cuda12.8

Revision

spark-rapids: 1b0bce5 (branch-25.02)
spark-rapids-jni: f6fd3f96f286b88d3e6e7bdf2a13dede753b2d09 (branch-25.02)
cudf: 1fe744fb43044e9beb728c244f3f4a0beac3588f (branch-25.02)

Additional context
Add any other context about the problem here.

@yinqingh yinqingh added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 26, 2025
@pxLi
Copy link
Member

pxLi commented Feb 26, 2025

cc @ttnghia @sameerz for visibility

@yinqingh yinqingh changed the title [BUG] test_hash_multiple_grpby_pivot and test_many_column_project fails on GB100 and cuda12.8 [BUG] test_hash_multiple_grpby_pivot and test_row_conversions_fixed_width_wide fails on GB100 and cuda12.8 Feb 26, 2025
@yinqingh yinqingh changed the title [BUG] test_hash_multiple_grpby_pivot and test_row_conversions_fixed_width_wide fails on GB100 and cuda12.8 [BUG] hash_aggregate_test.py::test_hash_multiple_grpby_pivot and row_conversion_test.py::test_row_conversions_fixed_width_wide fails on GB100 and cuda12.8 Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants