shmem_malloc Interface to Leverage Hierarchical & Heterogenous Memory Characteristics #258

manjugv · 2018-11-26T18:31:46Z

Problem:

A typical node in the current HPC systems is composed of variety of memories and organized into multiple hierarchies and/or have different affinities to the PEs and threads. The OpenSHMEM programming model and its memory allocation routines are oblivious to these variations. As a consequence, it is a challenge for the OpenSHMEM program to leverage memory characteristics and capabilities to achieve higher performance in a portable way.

Proposal:

Introduce memory allocation interface that can pass hints to the OpenSHMEM implementations.
The memory hints are then utilized by the implementations to optimize memory for that behavior.
For example, if the user specifies that a particular allocation is used as a pSync array, then the implementation can use the memory that is available on NUMA memory bank that is near to the network. In the cases where the memory is available on the network interface, it can allocate that memory for the pSync array. This can impact the latency characteristics.

Impact on Users:

These interfaces should provide an opportunity to the user to provide usage information to
the implementation, which implementations can then utilize to optimize for that behavior. If the implementations
optimize for that behavior, the programs should achieve higher performance and/or scalability.

The OpenSHMEM programs not using these interfaces or using SHMEM_HINT_NONE is not
impacted.

Impact on Implementations:

This provides an opportunity for the implementations to optimize the behavior for particular usage.
If an implementation does not support optimizations, it is allowed to default to shmem_malloc behavior.

Useful References

https://software.intel.com/sites/default/files/managed/5f/5e/MCDRAM_Tutorial.pdf
SharP Unified Memory Allocator: https://www.osti.gov/biblio/1468045
mbind: https://www.kernel.org/doc/html/v4.18/admin-guide/mm/numa_memory_policy.html

naveen-rn · 2018-11-26T18:35:31Z

How is this different from #195? It looks like we are trying to address the same issue.

manjugv · 2018-11-26T18:48:10Z

@naveen-rn

Naveen, I knew you would ask this. :)

The main difference is that there is only one symmetric heap here (no change to symmetric address model) and complexity of memory management is handled by the library. It is a small change (adding only one interface) and we can get most of the benefits. The approach in #195 is more explicit and here it is more implicit. That said, I feel there is value in having both solutions and they can co-exist.

jamesaross · 2018-12-06T16:50:51Z

So this proposal is for something like this?

void* shmem_malloc_hint(size_t size, int hint);

Where hint is something like SHMEM_HINT_IS_PSYNC?

manjugv · 2018-12-06T16:58:37Z

@jamesaross Correct.

jamesaross · 2018-12-10T19:29:11Z

I see now that you already added a pull request #259 but I'll continue with the discussion here.

Are these types, as identified in #259, sufficient for all use cases: LOW_LAT_MEM, HIGH_BW_MEM, NEAR_NIC_MEM, DEVICE_GPU_MEM, DEVICE_NIC_MEM? There are probably dozens of device libraries that would have to be linked against or dlopened. The proposal seems to imply runtime device querying/identification for many different device vendors. Can we reasonably expect every OpenSHMEM implementation to have special memory allocators for every device that could be attached to a node? This is a lot of work. It also seems to make the implementation more fragile--it would need to be updated every time a new device/API is available.

Why put this significant burden on the OpenSHMEM implementer when it's the application developer that has a specific allocator and/or physical memory location in mind?

What do you think about alternative interfaces like these?

void* shmem_malloc_ptr(size_t size, void* (*ptr_malloc)(size_t));
void shmem_free_ptr(void* ptr, void (*ptr_free)(void*));

Application developers should just say what they want.

naveen-rn · 2018-12-10T19:35:13Z

@jamesaross If I understand correctly - you are expecting the users to create the HEAP and pass the address to the OpenSHMEM implementation. If so, this looks more like MPI windows - MPI_Win_create. To me this would create unnecessary burden on OpenSHMEM implementations to maintain these base address and register them and do lookup operations unnecessarily.

In @manjugv proposal, these are hints for the libraries. It is not mandatory for all implementations to provide support for all memory types.

jamesaross · 2018-12-10T19:50:15Z

@naveen-rn How does the current proposal get around the OpenSHMEM implementation creating a device heap and maintaining device addresses for every conceivable device? Also, a lookup for the default case is trivial if the implementation is clever about it. The address returned from shmem_malloc_ptr could be padded appropriately with meta data.

naveen-rn · 2018-12-11T12:09:12Z

@jamesaross My understanding of this proposal is - implementations will pin/register a single big chunk of memory as SHEAP sometime (may be at shmem_init()) before the actual allocation. The total size of the SHEAP is fixed. Let us assume that this memory flows over multiple numa nodes (may be INTERLEAVED mmap). With the hints in this call - implementations can select a particular memory block (if possible) during the actual allocation. There is no new registration during allocation. Only offset calculation is necessary when we perform other RMA/AMO operations on this memory (this is where it gets different from #195. In #195 we need to perform both the SHEAP identification and offset calculation, since we have multiple SHEAP).

I haven't thought about the possible usages on all the hints mentioned in this proposal. But, hints like SHMEM_HINT_PSYNC, SHMEM_HINT_PWORK, and SHMEM_HINT_ATOMICS can be effectively used at the shmem_malloc operation.

jamesaross · 2018-12-11T13:28:14Z

HPC application portability is rarely defined by the small burden of replacing an allocator or swapping it with a macro. If it's expected that most implementations won't bother with supporting most hint types and the implementations that do will support a very specific subset of devices, wouldn't it be simpler to have vendor-specific special allocator extensions?

Below is an example of a portable code with a vendor-specific special allocator.

#include "shmemx.h"
#if SHMEMX_SPECIAL_ALLOCATOR_AVAILABLE
#define shmem_malloc_special(size) shmemx_malloc_special((size), SHMEM_HINT_IS_PSYNC)
#else
#define shmem_malloc_special(size) shmem_malloc((size))
#endif
// ...
// this is now portable:
size_t sz = log(shmem_n_pes()) + 2;
int* pSync = shmem_malloc_special(sz);

manjugv · 2019-01-03T21:04:36Z

Not sure If I understand your point entirely.

With the interface in this proposal, there is a two way communication and agreement. (1) The user tells the library that a particular allocation will be used in a specific way. (2) The library uses that information and optimizes for that usage. If the user keeps up with the promise and the library can optimize, there will be performance benefits to the applications.

I disagree that it is a huge burden to implement. The network libraries can already support some of these hints and there is no way to provide these benefits to the user. Also, most of these hints are easy to implement with wrappers without any need for fancy allocators. For example, one could use Memkind to support many of these hints. I’m open to trim some of these hints, if we find something terribly difficult to implement and does not provide huge benefits. Again, remember supporting hints are optional.

jdinan · 2020-01-31T20:00:15Z

@manjugv Was this closed by #259?

manjugv · 2020-01-31T23:11:26Z

Yes @jdinan. Closing it now.

manjugv added the Feedback Requested label Nov 26, 2018

manjugv mentioned this issue Nov 26, 2018

Adding shmem_malloc_with_hints interface #259

Merged

manjugv added this to the OpenSHMEM 1.5 milestone May 7, 2019

minsii mentioned this issue Aug 22, 2019

shmemx: define gpu symm heap alloc|free pmodels/oshmpi#24

Closed

manjugv closed this as completed Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shmem_malloc Interface to Leverage Hierarchical & Heterogenous Memory Characteristics #258

shmem_malloc Interface to Leverage Hierarchical & Heterogenous Memory Characteristics #258

manjugv commented Nov 26, 2018 •

edited

Loading

naveen-rn commented Nov 26, 2018

manjugv commented Nov 26, 2018

jamesaross commented Dec 6, 2018

manjugv commented Dec 6, 2018

jamesaross commented Dec 10, 2018

naveen-rn commented Dec 10, 2018

jamesaross commented Dec 10, 2018

naveen-rn commented Dec 11, 2018

jamesaross commented Dec 11, 2018

manjugv commented Jan 3, 2019

jdinan commented Jan 31, 2020

manjugv commented Jan 31, 2020

shmem_malloc Interface to Leverage Hierarchical & Heterogenous Memory Characteristics #258

shmem_malloc Interface to Leverage Hierarchical & Heterogenous Memory Characteristics #258

Comments

manjugv commented Nov 26, 2018 • edited Loading

Problem:

Proposal:

Impact on Users:

Impact on Implementations:

Useful References

naveen-rn commented Nov 26, 2018

manjugv commented Nov 26, 2018

jamesaross commented Dec 6, 2018

manjugv commented Dec 6, 2018

jamesaross commented Dec 10, 2018

naveen-rn commented Dec 10, 2018

jamesaross commented Dec 10, 2018

naveen-rn commented Dec 11, 2018

jamesaross commented Dec 11, 2018

manjugv commented Jan 3, 2019

jdinan commented Jan 31, 2020

manjugv commented Jan 31, 2020

manjugv commented Nov 26, 2018 •

edited

Loading