Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handoff data to other distributed services #223

Open
mrocklin opened this issue Mar 10, 2017 · 4 comments
Open

Handoff data to other distributed services #223

mrocklin opened this issue Mar 10, 2017 · 4 comments

Comments

@mrocklin
Copy link

mrocklin commented Mar 10, 2017

My end goal is to share data between distributed Dask.array and Elemental. For most of this question though Dask-specifics shouldn't be important. Instead, consider the case where I have several Python processes running in an MPI world and that each Python process has a few numpy arrays which, when arranged together, form the chunks of a distributed array. To be concrete, here is a case with a world of size two:

Process 1

>>> MPI.COMM_WORLD.Get_rank()
0
>>> arrays
{(0, 0): np.array(...),
 (0, 1): np.array(...),
 (1, 1): np.array(...)}

>>> elemental_array = give_arrays_to_elemental(arrays)

Process 2

>>> MPI.COMM_WORLD.Get_rank()
1
>>> arrays
{(1, 0): np.array(...)}

>>> give_arrays_to_elemental(arrays)

I also know the shape of the array, the datatypes, etc. My chunks aren't necessarily uniformly arranged. In this case rank 0 has three chunks while rank 1 has one. I would be willing to rearrange chunks arbitrarily if necessary, but would prefer to avoid the communication if possible.

I would like to take all of these numpy arrays hand them to Elemental within each process, and then do more computationally intense operations using Elemental's algorithms. Afterwards, I would like to reverse the process and get back a bunch of NumPy arrays in all of my Python processes:

Process 1

>>> do stuff with elemental_array
>>> arrays = get_local_chunks_from_elemental(...)

Process 2

>>> participate in MPI/elemental computations
>>> arrays = get_local_chunks_from_elemental(...)

Is this feasible? If so then what is the best way to go about it?

@poulson
Copy link
Member

poulson commented Mar 10, 2017

There is a generic (but usually not the fastest) El::DistMatrix<T>::QueueUpdate( El::Int row, El::Int column, const T& value ) interface that can queue an update to an arbitrary entry of a distributed matrix from any process. The setup routine is El::DistMatrix<T>::Reserve( El::Int numUpdates ) and the collective finalization routine is El::DistMatrix<T>::ProcessQueues().

I would recommend starting with these interfaces for loading data into Elemental, and there are equivalent "pull" analogues to the above "puts": El::DistMatrix<T>::ReservePulls, El::DistMatrix<T>::QueuePull( El::Int row, El::Int column ), and El::DistMatrix<T>::ProcessPullQueue( T* pullBuf ).

Please see

// Batch updating of remote entries
for the prototypes.

Once this is working, we could look into ways of avoiding all of the unnecessary row and column metadata sent by said routines.

@poulson
Copy link
Member

poulson commented Mar 10, 2017

And, as luck would have it, I already exposed a Python interface to the above:

lib.ElDistMatrixQueueUpdate_i.argtypes = [c_void_p,iType,iType,iType]

@mrocklin
Copy link
Author

That is indeed fortunate. I suspect we might end up being bound by python for loops, but I agree that this is probably enough to demonstrate feasibility and that that should come before optimization.

@poulson
Copy link
Member

poulson commented Mar 10, 2017

I haven't yet added C/Python interfaces to El::DoubleDouble, El::QuadDouble, El::BigFloat, or their complex variants, but I am assuming single and double-precision are okay for you for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants