Conversation
- This moves message container to RAJA to avoid dependency
issues with required atomic operation. Currently, testing and
waiting for messages will block until the stream is synchronized.
|
I think the design would benefit from a view like object that can be used in device kernels. |
|
@bechols97 , can you add an example in the examples folder? I have a use case where this may be handy. In my application if a thread gets a negative value, I want to take note and output it at the end of the kernel by the root rank. Currently I am using printf and every thread that encounters the negative value spews information onto the screen. To double check, would this be a use case? |
|
Hi @artv3, yes that could be a use case for this. I will add an example of something similar to the |
- This allows the message queue to be passed to RAJA kernels
- This also allows the message queue to be allocated with pinned
memory when needed
- Currently, example requires XNACK with HIP. (message queue should be using
pinned memory so need to look into this)
|
@bechols97 so one of the libraries I maintain has a failure macro that on the host side throw an error with some useful error messages associated with it which can provide useful context for users / developers for why something failed. Do you think we could emulate something like with this framework if we could provide the absolute max size of the char array / string literal that we'd like and then at the error site have that passed into your message class? |
|
I think it would be possible to add a fixed capacity string-like object that could be passed through this interface. |
|
@rcarson3 Yes, the original intention behind this idea is to be used as a device side error handler (though left to be more generic in case there are other use cases). As @MrBurmark mentioned, it would be possible to use a fixed-string object with the message handler. For string literals, you would want some type of fixed-string object / char array to store the string that is later passed to a host side callback. In addition to the fixed-string object, any type that is trivially destructible should work as well. |
|
The current state looks good if you're trying to handle a single kind of error. |
|
There are use cases where we might want to handle multiple kinds of errors each with different data in the same loop. Does anyone else have such a use case? What do you think of a slightly more general interface that looks more like this? By moving the signature to the queues we don't know the message sizes upfront. So I'm not sure if it makes sense to do the sizing upfront or later when we know what kinds of messages are possible. |
|
Another use case that I'm considering is having a long lived logger with an allocation. Then I can enqueue multiple types of messages in that while keeping the gpu running and check for messages occasionally to avoid extra synchronizes. |
|
Being able to support multiple error/logging messages within the same loop is definitely a use case that we would want to support. This is something that the library I help maintain uses. There are a couple of concerns with moving the callback to be a parameter of the
Just to show another option with the current interface: (Please note this example is not entirely the same and requires some additional types to be created; however, one could create a type similar to |
|
@bechols97 , can we bring this up to date? |
- Also, fixes examples to not use stream 2 memory in stream 1
- This allows messages to be sorted, filtered, etc.
…LLNL/RAJA into feature/bechols97/device_messages
- Moves internal `queue` class to be public - Moves typedef back to struct for message args
|
|
||
| std::cout << "\n Running RAJA omp_parallel_for_static_exec (default chunksize) vector addition...\n"; | ||
|
|
||
| RAJA::forall<RAJA::omp_parallel_for_static_exec< >>(host, RAJA::RangeSegment(0, N), |
There was a problem hiding this comment.
I think you may have extra < > in the policy?
There was a problem hiding this comment.
oh I see, it has a default chunk size that you can modify:
RAJA/include/RAJA/policy/openmp/policy.hpp
Lines 242 to 244 in 1d2ed09
There was a problem hiding this comment.
The chunk size doesn't really matter in this example, So, if you prefer having an explicit value in the < > to be more clear, then I don't mind updating this.
| #if defined(RAJA_ENABLE_CUDA) | ||
| RAJA::resources::Cuda res_gpu1; | ||
| RAJA::resources::Cuda res_gpu2; | ||
| using EXEC_POLICY = RAJA::cuda_exec_async<GPU_BLOCK_SIZE>; | ||
| #elif defined(RAJA_ENABLE_HIP) | ||
| RAJA::resources::Hip res_gpu1; | ||
| RAJA::resources::Hip res_gpu2; | ||
| using EXEC_POLICY = RAJA::hip_exec_async<GPU_BLOCK_SIZE>; | ||
| #elif defined(RAJA_ENABLE_SYCL) | ||
| RAJA::resources::Sycl res_gpu1; | ||
| RAJA::resources::Sycl res_gpu2; | ||
| using EXEC_POLICY = RAJA::sycl_exec<GPU_BLOCK_SIZE>; |
There was a problem hiding this comment.
I think you can simplify this by templating the resource on policy type. See:
Line 60 in 1d2ed09
There was a problem hiding this comment.
In the purpose of this example, the hope was to show that this feature is able to work with non-default resources as well as work with multiple GPU resources. However, there are two examples showing the multiple non-default resources. If preferred, I can update the first example to only use the default stream (since most use cases will likely be using RAJA's default stream)?
| }; | ||
|
|
||
| template<typename Fn> | ||
| struct get_signature; |
There was a problem hiding this comment.
We should move these into a generic function_signature_helper.hpp header. See https://github.com/llnl/RAJA/pull/1949/changes#diff-72533564b1cbd49c320bfd7981489ca5e5a08143353955c70c3ef535e38fc4ccR56. Arturo recently added similar methods for deducing the index of template parameters. These types of metaprogramming utilities are broadly useful and should live in a more visible header
There was a problem hiding this comment.
It would be nice to move both to a generic header, maybe in internal. I think it's good to consolidate stuff like this so we don't end up re-implementing the methods when we need it. For example we have type_trait helper headers like https://github.com/llnl/RAJA/blob/develop/include/RAJA/pattern/kernel/TypeTraits.hpp
There was a problem hiding this comment.
Is there a preferred location for these types of metaprogramming utitlies? Would https://github.com/llnl/RAJA/tree/develop/include/RAJA/util be a good place for the more generic header?
There was a problem hiding this comment.
I think so yes, see for example https://github.com/llnl/RAJA/blob/develop/include/RAJA/util/EnableIf.hpp is already there. I would try to move Arturo's helpers from github.com/llnl/RAJA/pull/1949/changes#diff-72533564b1cbd49c320bfd7981489ca5e5a08143353955c70c3ef535e38fc4ccR56 there as well, and name it FunctionSignatureUtil.hpp or something
| ~message_bus() { reset(); } | ||
|
|
||
| // Copy ctor/operator | ||
| message_bus(const message_bus&) = delete; |
There was a problem hiding this comment.
I think we might also need message_bus (message_bus) = delete;
| /// Currently, this forces a synchronize prior to calling | ||
| /// the callback function or testing if there are any messages. | ||
| /// | ||
| class message_manager |
There was a problem hiding this comment.
Do we want this case for the new classes @llnl/raja-core ? kernel is all the MessageManager naming convention for classes
| template<typename Callable> | ||
| void subscribe(msg_id id, Callable&& c) | ||
| { | ||
| auto callback = RAJA::msg_callback {std::forward<Callable>(c)}; |
There was a problem hiding this comment.
it might be better to use an explicit constructor here
| { | ||
| auto& fn_list = m_callback_map.at(id); | ||
| auto it = std::find_if(fn_list.begin(), fn_list.end(), [](const auto& fn) { | ||
| return typeid(Callable).hash_code() == fn->hash(); |
There was a problem hiding this comment.
I don't know if this is how we want to be hashing values. For one, I think it's possible to have hashing collisions here, to find_if could just match the first of several possible matches. I think it might be better to hash together the msg_id with std::type_index, and just make m_callback_map a std::unordered_map<HashCode, vector<msg_callback_t>>
| size_type new_sz = old_sz + msg_sz; | ||
| local_sz = old_sz; // offset to start of message | ||
| // Checks if fits in queue | ||
| if (new_sz <= m_container->m_capacity) |
There was a problem hiding this comment.
do we ever resize the m_container? this could be a race condition if so
There was a problem hiding this comment.
The mpsc_queue and spsc_queue are view-like containers to the owning version of message_bus. All resizing type of operations on message_bus will end up forcing the resource in message_bus to synchronize prior to resizing along with ignoring any messages that are currently stored.
| #include "RAJA/util/resource.hpp" | ||
|
|
||
| /* | ||
| * Vector Addition Example |
There was a problem hiding this comment.
Maybe we can rename it to "RAJA::messages example" and update the description
Summary
This PR adds a feature for device messages (i.e. store function arguments on device to be handled by a host callback at a later time). The original idea for the feature is to have better error handling on the device with handling any output on with a host callback. This moves the code from Camp to RAJA due dependency on atomic operations.
Design review
For the design, there are some open questions. (regarding these open questions please see the
Design notesbelow)Design notes
Based on discussion from a meeting:
message_handlerclass that stores the callback should be move-only to prevent accidental copies to a lambda with supporting a view-like queue to copy to device kernels (as mentioned below by @MrBurmark).Additional design notes
Based on further discussions from a meeting:
Ability to bind args for a callback, such as source location or strings.std::bind_frontcan be used. However, if needed, this can be added in a future PRStore number of missed messagesRemove duplicate messages