Added `schedule_exchange()` and `schedule_wait()` to `communication_object` by philip-paul-mueller · Pull Request #190 · ghex-org/GHEX

philip-paul-mueller · 2025-12-18T07:08:38Z

This PR adds the the schedule_exchange() and schedule_wait() function to the communication object.
They behave similar to the regular exchange() and wait() but they accepts an additional CUDA stream as argument.

schedule_exchange() will wait with packing until all work that has been scheduled on the passed stream has finished, which removes the need for an external synchronization.
It is important to note, that the function will only return after all data have been send, which is the same behaviour than exchange().

schedule_wait() is similar to wait(), it will launch unpacking but will make synchronize it with the passed stream.
This means that every work that is submitted to the stream after the function has returned, will not start until all unpacking has been completed.

The PR also extends the Python bindings.
The bindings are able to interpret the following Python objects:

None is interpreted as default stream, i..e nullptr.
If the object has a __cuda_stream__ method it is assumed to follow Nvidia's stream protocol.
If the object has a .ptr attribute it is assumed that it follows CuPy's Stream implementation.

Note:
For CPU memory schedule_exchange() and schedule_wait() behave the exact same way as exchange() and wait(), i.e. exchange() starts to pack immediately and wait() only returns after the unpacking has finished.
If CPU and GPU memory is exchanged in the same transaction then the behaviour is a mix of both, the CPU parts are immediately packed but the packing of the GPU memory synchronizes with the stream.
The same holds for schedule_wait(), which will only return if the CPU memory has been unpacked, but will return after the unpacking of the GPU memory has been initiated.
What happens exactly also depends on if the CPU memory is processed before the GPU memory.
Thus it is safe to mix the two but it is not recommended.

NOTE:

This PR replaces PR#186, with a refactored solution.
The main important changes to this function are in the following files:
- include/ghex/communication_object.hpp: Adding of the schedule_*() functions.
- include/ghex/device/cuda/stream.hpp: Adding of a (very simple) event pool.
- bindings/python/src/_pyghex/unstructured/communication_object.cpp: Updating the bindings.
This PR depends on PR#189.
This PR depends on PR#191, which adds formatting.

TODO:

This PR still has the wrong formatting applied to it.
Test on ICON4Py production setup.
Ask Fabian if we should add CuPy as a dependency for the binding tests such that we can also test the GPU tests.

…t compiles though.

…here is another error.

philip-paul-mueller · 2025-12-23T12:32:50Z

I have now run everything on Säntis and with the exception of the "inplace test" in the concepts the unit tests pass.
However, that thing also fails on master at least it was the case for me.

philip-paul-mueller · 2025-12-23T12:37:03Z

The failing in the Python bindings in CPU mode is because the library falls back to normal exchange, transparently.
This could be fixed by checking how GHEX was compiled.

… on the Python side.

msimberg · 2026-01-05T14:50:12Z

include/ghex/device/event_pool.hpp

+struct event_pool
+{
+  public: // constructors
+    event_pool(std::size_t) {};


Suggested change

event_pool(std::size_t) {};

event_pool(std::size_t) = default;

msimberg · 2026-01-05T14:50:21Z

include/ghex/device/event_pool.hpp

+    event_pool(event_pool&& other) noexcept = default;
+    event_pool& operator=(event_pool&&) noexcept = default;
+
+    void rewind() {};


Suggested change

void rewind() {};

void rewind() {}

msimberg · 2026-01-05T14:50:31Z

include/ghex/device/event.hpp

+{
+struct cuda_event
+{
+    cuda_event() {};


Suggested change

cuda_event() {};

cuda_event() = default;

msimberg · 2026-01-05T14:51:35Z

include/ghex/device/cuda/stream.hpp

+        //We do not use `reserve()` to ensure that the events are initialized now
+        // and not in the hot path when they are actually queried.


I like your current version, thanks!

msimberg · 2026-01-05T14:52:49Z

include/ghex/device/cuda/stream.hpp

+	 * and recreating them. It requires however, that a user can guarantee
+	 * that the events are no longer in use.
+	 */
+    void rewind_pool()


Yep, that sounds reasonable. So:

rewind_pool -> rewind

reset_pool -> clear
?

msimberg · 2026-01-05T14:54:49Z

test/unstructured/test_user_concepts.cpp

    initialize_data(d, field, levels, levels_first);
    data_descriptor_cpu_int_type data{d, field, levels, levels_first};

+    cudaDeviceSynchronize();


Great, thank you.

msimberg

A few more minor comments.

include/ghex/device/cuda/event.hpp

msimberg · 2026-01-05T15:01:06Z

include/ghex/device/cuda/event.hpp

+        if (!m_moved) { GHEX_CHECK_CUDA_RESULT_NO_THROW(cudaEventDestroy(m_event)) }
+    }
+
+    operator bool() const noexcept { return m_moved; }


Hmm, this seems backwards. I would expect:

Suggested change

operator bool() const noexcept { return m_moved; }

operator bool() const noexcept { return !m_moved; }

i.e. bool() is true when the event is valid, false otherwise?

I see stream has the same "backwards" semantics, so this probably comes from there.

What do you think?

Now that I think about it, I would say that it should return true when it is valid, thus it should be !m_moved.
However, I think it should follow the same logic as stream, because everything else is just confusing.
So I would keep it that way but add a comment that explains what true means.
In any case we should check with Fabian as he is the one who added that function, thus we would have to change all of GHEX (which is not an argument against it).

Yep, it'd be great to hear if @boeschf had other semantics in mind for it originally. Otherwise, it simply smells a bit like a bug that should be changed for both stream and event.

include/ghex/device/cuda/event_pool.hpp

msimberg · 2026-01-05T15:05:54Z

include/ghex/device/cuda/event_pool.hpp

+        while (!(m_next_event < m_events.size())) { m_events.emplace_back(cuda_event()); };
+
+        const std::size_t event_to_use = m_next_event;
+        assert(!bool(m_events[event_to_use]));


Related to semantics of operator bool() for event and stream:

Suggested change

assert(!bool(m_events[event_to_use]));

assert(bool(m_events[event_to_use]));

Skipped for now, see discussion above.

include/ghex/device/cuda/stream.hpp

msimberg · 2026-01-05T15:07:14Z

include/ghex/device/cuda/stream.hpp

+        if (!m_moved) { GHEX_CHECK_CUDA_RESULT_NO_THROW(cudaStreamDestroy(m_stream)) }
    }

    operator bool() const noexcept { return m_moved; }


See comment about operator bool() for event. I think this is also backwards.

Skipped for now, see discussion above.

include/ghex/communication_object.hpp

msimberg · 2026-01-07T11:22:17Z

include/ghex/communication_object.hpp

+     *
+     * TODO: Should the handle expose this function?
+     */
+    void complete_schedule_exchange()


What do you think about making this private and if a caller needs to synchronize after a schedule_wait, just make them call wait instead?

schedule_exchange(stream); schedule_wait(stream); // now I really want to synchronize wait();

IMO this would simplify the API in that wait can always be called as a wait-for-everything-to-finish regardless of how the exchange or wait was done previously. wait may of course need to call complete_schedule_exchange internally (or parts of it) for this approach to work.

We should consider it.

I have though about it again.

Regarding your example, while I can understand why it should work (in fact I also think it must work), I do not think that it is the right (what ever that means) way to achieve it.
The scheduled functions essentially introduce a kind of stream semantic into GHEX.
So, I would argue that to check if the transmission has finished one should synchronize with the stream passed to schedule_exchange().
The only thing that GHEX must do is, not start a new exchange before the old has finished or delete something that is still in use by a transfer that is still ongoing, which is what complete_schedule_exchange() does.
So I think that you are right when you say that it should become private as it should probably never be called directly by a user.
Instead all exchange*() functions call it to make sure that the exchange has happened and it is safe to start a new one.
Because wait() deallocate memory it must also call it to make sure that it does not delete something that is still in use.
As a side effect your example code will work and do what you want, i.e. the full synchronization.
But, as I outlined above, in my opinion, it is not the right way of doing it, but it must work.

Does this make sense?

I think that it should be private but currently the test needs them and currently I have no good idea to solve it.

msimberg · 2026-01-08T14:13:44Z

include/ghex/communication_object.hpp

+#ifdef GHEX_CUDACC
+        assert(has_scheduled_exchange());
+#endif


This doesn't belong here, or the logic is wrong elsewhere. clear() is called by complete_scheduled_exchange after the event has been reset, meaning this always fails. Should this be assert(!has_scheduled_exchange())?

You are right, it should be assert(!...), I am just wondering why the CI "passes".
Probably because asserts are disabled?

msimberg · 2026-01-08T14:14:01Z

include/ghex/communication_object.hpp

+     * exchange. A user will never have to call it directly. If there was no such
+     * exchange or GPU support was disabled, the function does nothing.
+     *
+     * \note  This should be a private function, but the tests need them.


Why do the tests need this function?

Some test calls it (https://github.com/ghex-org/GHEX/pull/190/changes#diff-c9ca950ae3be0eda232ecea6c5e5224c87bdee091f1e3a7c9efa081badda3fc2R364) but I think that this is more like a bad test.

…ecome private.

msimberg and others added 30 commits November 3, 2025 12:48

Don't wait for streams to finish unpacking

1add6ed

Add dependency on default stream before starting packing

6d89616

hacky async mpi changes

714f7b3

Made some notes for me and a plan forward.

06dec72

It seems a bit overkill, but let's see if it works, or even compile.

9ef0f07

Removed the stream member from teh comunication object, not sure if i…

336117a

…t compiles though.

Run the code formater.

e1f918f

Fixed smaller things.

d7ff9a3

I have to ask somebody about that.

6df7291

Update the description.

098a3f3

First version of the event pool.

3ad6b17

Forgot to update something.

e4f35d7

Updated some things.

7b05329

Applied some changes after discussing them with Mikael, but I think t…

d2066a6

…here is another error.

Fixed some bugs, but I am not sure if it compiles and is correct.

633b17c

Small update.

1a65634

The python interface now accepts streams.

dfd7065

Applied the formatter.

e95665a

Fixed some issues, this should probably be enough.

30ed9dc

Added streams to the code.

315c647

Made it such that one can switch between default and non default stream.

2efacb1

Made the test 'GPU aware' not realy some pieces are missing.

c07f7f5

This should make it work on GPU.

791725f

Forgot to update the 'schedule_wait' call.

a87178d

Fixed a bug in strides computation, in default case.

63c05f9

Fixed another issue.

9b0f17a

Modified the checking a bit.

f1e72d0

This should be more GPU aware.

9a0d35f

Made more verbose error messages.

c04ab31

Why does this bug always happens to me?

0612e95

philip-paul-mueller and others added 4 commits December 23, 2025 13:24

Must be present.

644c7f8

Added more checks also on the Python bindings.

d716815

The inplace version odes not work, with the changes.

3347efd

Added a note about the failing test.

5ddf5f6

philip-paul-mueller added 4 commits December 24, 2025 09:14

The schedule_*() functions no longer fall back to normal operations…

96d28b3

… on the Python side.

small fixup.

3a3219d

This is what I should have done.

f7fe13b

It was already there.

f974c7b

msimberg reviewed Jan 5, 2026

View reviewed changes

philip-paul-mueller added 5 commits January 6, 2026 07:01

Applied Mikael's comments.

da91403

Forgot to apply the formating.

78486e5

Forgot some stuff.

82fbb3b

Why do I forgot them all the time.

9cc59bd

I think I should start compiling it locally.

e4277d5

msimberg reviewed Jan 7, 2026

View reviewed changes

philip-paul-mueller added 5 commits January 8, 2026 08:08

Applied Mikaels comments.

68751a5

Forgot to update the descripton.

d50db32

This function should be public.

51e20d2

Forgot to update the bindings.

34141c4

Make compelete_schedule_exchange() again a private function.

7825d17

I think that it should be private but currently the test needs them and currently I have no good idea to solve it.

msimberg reviewed Jan 8, 2026

View reviewed changes

Modified teh code such that complete_scheduled_exchange() can now b…

80c0650

…ecome private.

philip-paul-mueller marked this pull request as ready for review January 23, 2026 09:10

msimberg mentioned this pull request Feb 4, 2026

Add GPU backends to distributed CI pipeline C2SM/icon4py#1012

Draft

philip-paul-mueller mentioned this pull request Feb 6, 2026

Scheduled Halo Exchange C2SM/icon4py#980

Open

Some small modification of comments.

a1de2a6

	event_pool(std::size_t) {};
	event_pool(std::size_t) = default;

		//We do not use `reserve()` to ensure that the events are initialized now
		// and not in the hot path when they are actually queried.

	operator bool() const noexcept { return m_moved; }
	operator bool() const noexcept { return !m_moved; }

	assert(!bool(m_events[event_to_use]));
	assert(bool(m_events[event_to_use]));

Conversation

philip-paul-mueller commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 23, 2025

Uh oh!

philip-paul-mueller commented Dec 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msimberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philip-paul-mueller commented Dec 18, 2025 •

edited

Loading

philip-paul-mueller Jan 6, 2026 •

edited

Loading

philip-paul-mueller Jan 7, 2026 •

edited

Loading

philip-paul-mueller Jan 9, 2026 •

edited

Loading