-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] CUDA Scan implementation #1250
base: main
Are you sure you want to change the base?
Conversation
for (size_t i = 0; i < 10; ++i) { | ||
std::cout << host_out[i] << std::endl; | ||
} | ||
self.op_state_.propagate_completion_signal(stdexec::set_value, d_out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change d_out
to *d_out
you can confirm that the data is scanning properly. But it won't compile I imagine because of the comcpletion signatures being wrong.
|
||
template <class SenderId, class ReceiverId, class InitT, class Fun> | ||
struct receiver_t | ||
: public __algo_range_init_fun::receiver_t< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reuses the algorithm_base.cuh
. The ExclusiveScan
API I used was the one that allows you to specify an initial value, so I could easily reuse this base. Nearly all of scan.cuh
is identical to reduce
with the exception of the CUB api they call and the final return type.
The difference between the reduce is that it returns a single value where as a scan is to return an array of data so it is very similar.
// template <class Range> | ||
// using _set_value_t = completion_signatures<set_value_t( | ||
// std::vector<typename __algo_range_init_fun::binary_invoke_result_t<Range, InitT, Fun>>)>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to get the completion signatures right. Hoping to get some guidance
@gevtushenko Can you take a look? |
Just an initial skeleton of a scan implementation for CUDA. For brevity I just used the reduce test spec to test my changes. Obviously it would need it's own spec.