You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using the latest version of Ray, on 2TB RAM nodes and assembling a snake genome.
Ray was compiled with GCC 5.1 and with the following make...
make PREFIX=/afs/<your_preferred_install_directory> MAXKMERLENGTH=128 MPICXX=mpic++ HAVE_LIBZ=y MPI_IO=y
Everything worked fine except when running on this large dataset we get...::
Critical exception: The system is out of memory, returned NULL.
Requested -2147483648 bytes of type RAY_MALLOC_TYPE_GRID_TABLE
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
mpiexec detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[19389,1],0]
Exit code: 42
This seems to be a memory issue, and we could detect that not all of the 2TB RAM was used.
We did the following changes in Ray...
. Compilation was done using the intel compiler rather than the GNU compiler
i-compilers 15.0.2 and intelmpi 5.0.3
. I compiled the code with flag -mcmodel=medium in total...::
Hi
We are using the latest version of Ray, on 2TB RAM nodes and assembling a snake genome.
Ray was compiled with GCC 5.1 and with the following make...
make PREFIX=/afs/<your_preferred_install_directory> MAXKMERLENGTH=128 MPICXX=mpic++ HAVE_LIBZ=y MPI_IO=y
Everything worked fine except when running on this large dataset we get...::
Critical exception: The system is out of memory, returned NULL.
Requested -2147483648 bytes of type RAY_MALLOC_TYPE_GRID_TABLE
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
mpiexec detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[19389,1],0]
Exit code: 42
This seems to be a memory issue, and we could detect that not all of the 2TB RAM was used.
We did the following changes in Ray...
. Compilation was done using the intel compiler rather than the GNU compiler
. I compiled the code with flag -mcmodel=medium in total...::
. Changed line 571 in RayPlatform/RayPlatform/structures/MyHashTable.h
. In RayPlatform/RayPlatform/memory/allocator.h
. In RayPlatform/RayPlatform/memory/allocator.h at line 28
. In RayPlatform/RayPlatform/memory/allocator.cpp at line 36
. In RayPlatform/RayPlatform/memory/allocator.cpp at line 56
For consistency perhaps we should not use size_t but rather uint64_t since I see that other part of the
sourcecode are using it.
The assembly has nowadays, been running for 18 days, but does not generate any errors at least yet.
Do you have any thoughts about this matter?
With kind regards
Henric Zazzi
The text was updated successfully, but these errors were encountered: