Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements top_k in dpctl.tensor #1921

Merged
merged 30 commits into from
Jan 5, 2025
Merged

Implements top_k in dpctl.tensor #1921

merged 30 commits into from
Jan 5, 2025

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Dec 4, 2024

This PR implements the functions top_k as per proposal in array API spec.

Radix and merge sorting are used, and modified merge-sort kernels are introduced which sort the array in chunks and write out to a temporary the k largest or smallest values.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Dec 4, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

github-actions bot commented Dec 4, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_296 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Dec 4, 2024

Coverage Status

coverage: 87.716% (+0.06%) from 87.659%
when pulling 8c6abf5 on feature/topk
into f7cb1b1 on master.

@oleksandr-pavlyk
Copy link
Collaborator

@ndgrigorian Please add top_k to the docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst:

diff --git a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
index ae1605d988..ef20f4654c 100644
--- a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
+++ b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
@@ -10,3 +10,4 @@ Sorting functions

    argsort
    sort
+   top_k

Copy link

github-actions bot commented Dec 5, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_295 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

github-actions bot commented Dec 6, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_297 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_326 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_327 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian ndgrigorian marked this pull request as ready for review December 12, 2024 09:24
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian ndgrigorian force-pushed the feature/topk branch 2 times, most recently from 8bcb100 to 8f38b80 Compare December 13, 2024 01:00
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_388 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

@ndgrigorian ndgrigorian force-pushed the feature/topk branch 4 times, most recently from 84d1388 to 809cb70 Compare December 28, 2024 01:13
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_391 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_385 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_387 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian
Copy link
Collaborator Author

@oleksandr-pavlyk
I've added a skip for both the smallest and largest tests with int8

@oleksandr-pavlyk
Copy link
Collaborator

Good to see the CI green again!

I was suggesting to only skip it in the test command used in the workflow so that we can still provide a reproducer to the CPU team.

I was thinking we could add a file of tests to skip, and pass it as argument to pytest.

I think this is the approach taken by dpnp.

oleksandr-pavlyk and others added 18 commits January 4, 2025 17:16
Until it is passed over to the host function, and
unique_ptr's ownership is released.

Also reduced allocation sizes, where too much was being
allocated.

Introduce smart_malloc_device, etc.

The smart_malloc_device<T>(count, q) makes USM allocation
and returns a unique_ptr<T, USMDeleter> which owns the
allocation. The function throws an exception (std::runtime_error)
if USM allocation is not successful.

Introduce async_smart_free.

This function intends to replace use of host_task submissions
to manage USM temporary deallocations.

The usage is as follows:

```
  // returns unique_ptr
  auto alloc_owner = smart_malloc_device<T>(count, q);

  // get raw pointer for use in kernels
  T *data = alloc_owner.get();

  [..SNIP..]

  // submit host_task that releases the unique_ptr
  // after the host task was successfully submitted
  // and ownership of USM allocation is transfered to
  // the said host task
  sycl::event ht_ev =
      async_smart_free(q,
      dependent_events,
      alloc_owner);

  [...SNIP...]
```
Replaced three duplicates of the same kernel with calls to
this function.
Factored out map_back_impl projects indexing from flat index to a
row-wise index.

Removed dead code excluded by preprocessor conditional.
Replaced it with hand-written implementation of ceil_log2(n),
such that n <= (dectype(n){1} << ceil_log2(n)) is true for all
positive values of `n` in the range.
Add check of computed against expected indices
One asserts that at least one unique pointer is specified.
Another that specified arguments are unique pointers with
USMDeleter.
gid-lane_id is already a multiple of sg_size.
Change kernel to process few data elements in the work-item.
Counters can not exceed uint16_t max, because the kernel
assumes that the number of elements to sort fits into uint16_t.
The change reduces the kernel SLM footprint.

Also, remove use of std::move, uint16_t->std::uint16_t, etc

Replace size_t->std::size_t, uint32_t->std::uint32_t

Use `if constexpr` in order-preservign-cast for better readability.
The team developing OpenCL:CPU device runtime and compiler was notified.
See CMPLRLLVM-64592

Once fixed, the work-around should be removed.
was applied in C++.

Add tests for 2d input arrays, for axis=0 and axis=1

Add a test for non-contiguous input, 0d input, validation

100% coverage of top_k function implementation achieved
@ndgrigorian
Copy link
Collaborator Author

I'd suggest rebasing the branch on top of the targeted base branch to remove two cherry-picked commits fixing the workflow for building with nightly DPC++ bundle

Done, I will add the addition of topk to the changelog

Copy link

github-actions bot commented Jan 5, 2025

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_395 ran successfully.
Passed: 892
Failed: 4
Skipped: 118

@ndgrigorian ndgrigorian changed the title Implements top_k functions in dpctl.tensor Implements top_k in dpctl.tensor Jan 5, 2025
Copy link

github-actions bot commented Jan 5, 2025

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_396 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

@ndgrigorian ndgrigorian merged commit 3a1a7c5 into master Jan 5, 2025
58 of 61 checks passed
@ndgrigorian ndgrigorian deleted the feature/topk branch January 5, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants