You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing other code, I tried to generate 168 indexed_binary sample files using a single dlio_benchmark process. As each file is created, the memory of the process grows until by the time it is creating file number 49 the process's memory has reached 240 GB and the kernel kills the process.
The memory growth occurs in method generate() in indexed_binary_generator.py. Since only a single process was used (comm_size == 1) the else clause in that routine is what produces the sample files.
I can add print statements before it that print including for file #49, but a print statement after it does not print when the process is killed. Googling, I found that 'struct' caches data. I couldn't find documentation on the caching policy, when or if evictions are ever done, but there is a function
struct._clearcache()
which, if called immediately after the binary_data has been written to data_file, releases the cache memory and the size of the process then stays reasonably constant as all 168 files are created.
The text was updated successfully, but these errors were encountered:
krehm
added a commit
to krehm/dlio_benchmark
that referenced
this issue
Apr 8, 2024
…gonne-lcf#181)
The struct.pack() call in generate() in indexed_binary_generator.py
caches the data that it produces, and apparently doesn't evict the
cache, such that after 49 indexed_binary files have been created the
kernel kills the process due to OOM. This mod adds a call to
struct._clearcache() after each data files is written to release
the cached data, keeping the process memory size stable.
While testing other code, I tried to generate 168 indexed_binary sample files using a single dlio_benchmark process. As each file is created, the memory of the process grows until by the time it is creating file number 49 the process's memory has reached 240 GB and the kernel kills the process.
The memory growth occurs in method generate() in indexed_binary_generator.py. Since only a single process was used (comm_size == 1) the else clause in that routine is what produces the sample files.
This statement causes the memory problem:
I can add print statements before it that print including for file #49, but a print statement after it does not print when the process is killed. Googling, I found that 'struct' caches data. I couldn't find documentation on the caching policy, when or if evictions are ever done, but there is a function
which, if called immediately after the binary_data has been written to data_file, releases the cache memory and the size of the process then stays reasonably constant as all 168 files are created.
The text was updated successfully, but these errors were encountered: