We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug first seen in rapids_nightly-pre_release-github, run: 754. Please keep monitoring subsequent runs...
failed spark 321,323,331,334, and passed all other shims.
[2025-01-28T06:05:35.112Z] HostAllocSuite: [2025-01-28T06:05:35.367Z] [2025-01-28 06:05:35.223] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:35.368Z] - simple pinned tryAlloc [2025-01-28T06:05:36.289Z] [2025-01-28 06:05:36.241] [RMM] [error] [A][Stream 0][Upstream 4096B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:36.290Z] [2025-01-28 06:05:36.242] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:36.290Z] - simple non-pinned tryAlloc [2025-01-28T06:05:37.647Z] [2025-01-28 06:05:37.252] [RMM] [error] [A][Stream 0][Upstream 4096B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:37.648Z] [2025-01-28 06:05:37.253] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:37.648Z] - simple mixed tryAlloc [2025-01-28T06:05:39.007Z] [2025-01-28 06:05:38.613] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:39.008Z] - simple pinned blocking alloc [2025-01-28T06:05:39.931Z] [2025-01-28 06:05:39.636] [RMM] [error] [A][Stream 0][Upstream 4096B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:39.932Z] [2025-01-28 06:05:39.637] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:39.932Z] [2025-01-28 06:05:39.648] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:39.932Z] - simple non-pinned blocking alloc [2025-01-28T06:05:40.853Z] [2025-01-28 06:05:40.662] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:40.854Z] - simple mixed blocking alloc [2025-01-28T06:05:41.781Z] [2025-01-28 06:05:41.692] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:41.782Z] - pinned blocking alloc with spill [2025-01-28T06:05:43.139Z] [2025-01-28 06:05:42.744] [RMM] [error] [A][Stream 0][Upstream 4096B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:43.140Z] [2025-01-28 06:05:42.746] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:43.140Z] [2025-01-28 06:05:42.753] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:43.140Z] - non-pinned blocking alloc with spill [2025-01-28T06:05:44.061Z] [2025-01-28 06:05:43.769] [RMM] [error] [A][Stream 0][Upstream 256B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:44.062Z] - mixed blocking alloc with spill [2025-01-28T06:05:44.985Z] [2025-01-28 06:05:44.792] [RMM] [error] [A][Stream 0x1][Upstream 1073741824B][FAILURE maximum pool size exceeded] [2025-01-28T06:05:44.985Z] SUITE ABORTED - HostAllocSuite: Error initializing pinned memory pool [2025-01-28T06:05:44.985Z] java.lang.RuntimeException: Error initializing pinned memory pool [2025-01-28T06:05:44.985Z] at ai.rapids.cudf.PinnedMemoryPool.getSingleton(PinnedMemoryPool.java:92) [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.PinnedMemoryPool.getTotalPoolSizeBytes(PinnedMemoryPool.java:212) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAlloc.<init>(HostAlloc.scala:29) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAlloc$.initialize(HostAlloc.scala:267) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAllocSuite.afterAll(HostAllocSuite.scala:354) [2025-01-28T06:05:44.986Z] at org.scalatest.BeforeAndAfterAll.$anonfun$run$1(BeforeAndAfterAll.scala:225) [2025-01-28T06:05:44.986Z] at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377) [2025-01-28T06:05:44.986Z] at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373) [2025-01-28T06:05:44.986Z] at org.scalatest.CompositeStatus.whenCompleted(Status.scala:962) [2025-01-28T06:05:44.986Z] at org.scalatest.Status.withAfterEffect(Status.scala:373) [2025-01-28T06:05:44.986Z] ... [2025-01-28T06:05:44.986Z] Cause: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-393-cuda11/target/libcudf/cmake-build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:276: Maximum pool size exceeded [2025-01-28T06:05:44.986Z] at java.util.concurrent.FutureTask.report(FutureTask.java:122) [2025-01-28T06:05:44.986Z] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.PinnedMemoryPool.getSingleton(PinnedMemoryPool.java:90) [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.PinnedMemoryPool.getTotalPoolSizeBytes(PinnedMemoryPool.java:212) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAlloc.<init>(HostAlloc.scala:29) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAlloc$.initialize(HostAlloc.scala:267) [2025-01-28T06:05:44.986Z] at com.nvidia.spark.rapids.HostAllocSuite.afterAll(HostAllocSuite.scala:354) [2025-01-28T06:05:44.986Z] at org.scalatest.BeforeAndAfterAll.$anonfun$run$1(BeforeAndAfterAll.scala:225) [2025-01-28T06:05:44.986Z] at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377) [2025-01-28T06:05:44.986Z] at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373) [2025-01-28T06:05:44.986Z] ... [2025-01-28T06:05:44.986Z] Cause: java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-393-cuda11/target/libcudf/cmake-build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:276: Maximum pool size exceeded [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.Rmm.newPinnedPoolMemoryResource(Native Method) [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.PinnedMemoryPool.<init>(PinnedMemoryPool.java:225) [2025-01-28T06:05:44.986Z] at ai.rapids.cudf.PinnedMemoryPool.lambda$initialize$1(PinnedMemoryPool.java:142) [2025-01-28T06:05:44.986Z] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [2025-01-28T06:05:44.986Z] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [2025-01-28T06:05:44.986Z] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [2025-01-28T06:05:44.986Z] at java.lang.Thread.run(Thread.java:750) [2025-01-28T06:05:44.986Z] ...
Steps/Code to reproduce bug Please provide a list of steps or a code sample to reproduce the issue. Avoid posting private or sensitive data.
Expected behavior A clear and concise description of what you expected to happen.
Environment details (please complete the following information)
Additional context Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
this one was closed due to not repro for a few days, but we saw the occurence again recently so filed #12194
Sorry, something went wrong.
No branches or pull requests
Describe the bug
first seen in rapids_nightly-pre_release-github, run: 754. Please keep monitoring subsequent runs...
failed spark 321,323,331,334, and passed all other shims.
Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: