Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash while allocating memory #239

Open
menzzana opened this issue Nov 23, 2015 · 0 comments
Open

Crash while allocating memory #239

menzzana opened this issue Nov 23, 2015 · 0 comments

Comments

@menzzana
Copy link

Hi

We are using the latest version of Ray, on 2TB RAM nodes and assembling a snake genome.
Ray was compiled with GCC 5.1 and with the following make...
make PREFIX=/afs/<your_preferred_install_directory> MAXKMERLENGTH=128 MPICXX=mpic++ HAVE_LIBZ=y MPI_IO=y

Everything worked fine except when running on this large dataset we get...::

Critical exception: The system is out of memory, returned NULL.
Requested -2147483648 bytes of type RAY_MALLOC_TYPE_GRID_TABLE


Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.



mpiexec detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[19389,1],0]
Exit code: 42


This seems to be a memory issue, and we could detect that not all of the 2TB RAM was used.
We did the following changes in Ray...

. Compilation was done using the intel compiler rather than the GNU compiler

 i-compilers 15.0.2 and intelmpi 5.0.3

. I compiled the code with flag -mcmodel=medium in total...::

 make PREFIX=/afs/<your_preferred_install_directory> MAXKMERLENGTH=128 MPICXX = mpiicpc
      HAVE_LIBZ=y MPI_IO=y CXXFLAGS =' -O3 -std=c++98 -Wall -g -mcmodel=medium'

. Changed line 571 in RayPlatform/RayPlatform/structures/MyHashTable.h

 size_t requiredBytes=sizeof(MyHashTableGroup<KEY,VALUE>)*(size_t)m_numberOfGroups;

. In RayPlatform/RayPlatform/memory/allocator.h

Added #include <stddef.h>

. In RayPlatform/RayPlatform/memory/allocator.h at line 28

void*__Malloc(size_t c,const char*description,bool show);

. In RayPlatform/RayPlatform/memory/allocator.cpp at line 36

void*__Malloc(size_t c,const char*description,bool show){

. In RayPlatform/RayPlatform/memory/allocator.cpp at line 56

printf("%s %i\t%s\t%zu bytes, ret\t%p\t%s\n",__FILE__,__LINE__,__func__,c,a,description);

For consistency perhaps we should not use size_t but rather uint64_t since I see that other part of the
sourcecode are using it.

The assembly has nowadays, been running for 18 days, but does not generate any errors at least yet.
Do you have any thoughts about this matter?

With kind regards
Henric Zazzi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant