Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iota kernel #1946

Merged
merged 15 commits into from
Dec 27, 2024
Merged

Add iota kernel #1946

merged 15 commits into from
Dec 27, 2024

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This PR builds on top of feature/topk branch.

It adds iota_impl in new sort_utils.hpp file, and uses it in merge_sort.hpp, radix_sort.hpp and topk.hpp.

It also fixes possible USM allocation leak in exception handling.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Dec 20, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_336 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_337 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_338 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_339 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_340 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Until it is passed over to the host function, and
unique_ptr's ownership is released.

Also reduced allocation sizes, where too much was being
allocated.

Introduce smart_malloc_device, etc.

The smart_malloc_device<T>(count, q) makes USM allocation
and returns a unique_ptr<T, USMDeleter> which owns the
allocation. The function throws an exception (std::runtime_error)
if USM allocation is not successful.

Introduce async_smart_free.

This function intends to replace use of host_task submissions
to manage USM temporary deallocations.

The usage is as follows:

```
  // returns unique_ptr
  auto alloc_owner = smart_malloc_device<T>(count, q);

  // get raw pointer for use in kernels
  T *data = alloc_owner.get();

  [..SNIP..]

  // submit host_task that releases the unique_ptr
  // after the host task was successfully submitted
  // and ownership of USM allocation is transfered to
  // the said host task
  sycl::event ht_ev =
      async_smart_free(q,
      dependent_events,
      alloc_owner);

  [...SNIP...]
```
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_341 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_338 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_339 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_340 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Factored out map_back_impl projects indexing from flat index to a
row-wise index.

Removed dead code excluded by preprocessor conditional.
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_341 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian
Copy link
Collaborator

Ping @AlexanderKalistratov
If the utilities added here (smart_malloc_* functions) are merged in, they can be reused by dpnp and replace the similar code in choose PR

Replaced it with hand-written implementation of ceil_log2(n),
such that n <= (dectype(n){1} << ceil_log2(n)) is true for all
positive values of `n` in the range.
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_342 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_344 ran successfully.
Passed: 892
Failed: 4
Skipped: 118

One asserts that at least one unique pointer is specified.
Another that specified arguments are unique pointers with
USMDeleter.
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_346 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@oleksandr-pavlyk
Copy link
Collaborator Author

I suggest we exclude these failing dpt.int8, and dpt.int16 tests in the workflow, now that we know they are caused by a CPU driver issue for certain architectures.

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, we can merge this into the topk branch and drop the test file PR, then remove the commit that adds test_top_k_largest_1d_radix_i1

@oleksandr-pavlyk oleksandr-pavlyk merged commit e161ea1 into feature/topk Dec 27, 2024
41 of 52 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the add-iota-kernel branch December 27, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants