Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge of 0.15.1dev2 tag to gold/2021 #1471

Merged
merged 100 commits into from
Nov 9, 2023
Merged

Merge of 0.15.1dev2 tag to gold/2021 #1471

merged 100 commits into from
Nov 9, 2023

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This PR merges development milestone 0.15.1dev2 complete with reduction functions, and statistical functions to gold/2021 (golden old days) for the purpose of building the milestone and uploading it to the internal package channel.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

oleksandr-pavlyk and others added 30 commits September 27, 2023 11:14
Fix upload_linux/upload_windows steps
* Implements necessary sycl utilities for custom reductions

* Implements dpctl.tensor.max and dpctl.tensor.min

* Adds tests for min and max

* Reductions now set max_wg to the minimum of the max work group size and 2048
- This prevents running out of resources when using local memory on CPU

* max and min nan propagation fixed for CPU devices
- drops use of fetch_max/fetch_min for floats, which do not handle nans correctly

* Tweak to test_reduction_kernels

* Implements dpctl.tensor.argmax and argmin

* Tests for argmin and argmax

Also fixes argmin and argmax for scalar inputs

* Argmin and argmax now handle identities correctly

Adds a test for this behavior

Fixed a typo in argmin and argmax causing shared local memory variant to be used for more types than expected

* Replaced `std::min` with `idx_reduction_op_`

* reductions now well-behaved for size-zero arrays
- comparison and search reductions will throw an error in this case
- slips in change to align sum signature with array API spec

* removed unnecessary copies in reduction templates

* Refactors sum to use generic reduction templates

* Sum now uses a generic Python API

* Docstrings added for argmax, argmin, max, and min

* Small reduction clean-ups

Removed unnecessary copies in custom_reduce_over_group

Sequential reduction now casts before calling operator (makes behavior explicit rather than implicit)

* Added test for argmin with keepdims=True

* Added a test for raised errors in reductions

Also removed unused `_usm_types` in `test_tensor_sum`

* Removed `void` overloads from reduction utilities

These were unused by dpctl

* Added missing include, Identity to use has_known_identity

Implementation of Identity trait should call sycl::known_identity
if trait sycl::has_known_identity is a true_type.

Added IsMultiplies, and identity value for it, since sycl::known_identity
for multiplies is only defined for real-valued types.

* Adding functor factories for product over axis

* Added Python API for _prod_over_axis

* Common reduction template takes functions to test if atomics are applicable

Passing these function pointers around allows to turn atomic off altogether
if desired.

Use custom trait to check if reduce_over_groups can be used. This allows to
work-around bug, or switch to custom code for reduction over group if desired.

Such custom trait type works around issue with incorrect result returned from
sycl::reduce_over_group for sycl::multiplies operator for 64-bit integral types.

* Defined dpctl.tensor.prod

Also tweaked docstring for sum.

* Added tests for dpt.prod, removed uses of numpy

* Corrected prod docstring

Small tweaks to sum, min, and max docstrings

---------

Co-authored-by: Oleksandr Pavlyk <oleksandr.pavlyk@intel.com>
* Implements flat overload for repeat

Adds tests for new functionality

* repeat `repeats` parameter relaxed to permit lists and ranges

Docstring has been adjusted to reflect changes to `axis` as well as new `repeats` types

Corrected a bug in the behavior of `repeat` for size 1 `repeats` Python sequences

* Fixed repeat error syntax for `repeats array with ndim > 1
[CI] Fix test_linux step in conda-packages workflow
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md

This includes HW characteristics for Intel Level-Zero GPU devices as well as
access to PCI device-identifier.
Adds a test for _is_gen9 helper utility useful for skipping
tests known to fail on Gen9.

Adds a test for intel_device_info function.
Test that descriptor names do not have typos.
1d variant of repeat was not passed host task event dependency for allocating shapes and strides on the device.
This caused sporadic segfaults, where the kernel would attempt to access unallocated device data.
Removed cmake/IntelDPCPP.cmake, vendored cmake/IntelSYCL.cmake script

Changed project's CMake scripts to use IntelSYCL. Renamed
libsyclinterface/cmake/modules/FindIntelSycl.cmake to
libsyclinterface/cmake/modules/FindIntelSyclCompiler.cmake to
avoid possible name conflict on OS with case insensitive FS
Removed -fsycl for default linking options on Windows. The option
is added as needed.
Set variable in public CI to override using interprocedural optimization
in public CI to avoid insufficient resources failure during compilation
on Windows.
oleksandr-pavlyk and others added 27 commits October 26, 2023 19:56
…ffset-handling

Fix reduction contig impl offset handling
* max and min now use MinMaxAtomicSupportFactory

These functions were using ArithmeticAtomicSupportFactory, which disables atomics for floating point types

* Resolves #1455

This issue was caused by a typo where when the `axis0` kernels
for tree and atomic reductions would be called, the `axis1` kernel
would be called instead

* Adds tests for #1455 resolution
Closes gh-1457

```

In [1]: import dpctl.tensor as dpt

In [2]: dpt.asnumpy([1,2,3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 dpt.asnumpy([1,2,3])

File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:185, in asnumpy(usm_ary)
    169 def asnumpy(usm_ary):
    170     """
    171     asnumpy(usm_ary)
    172
   (...)
    183             of `usm_ary`
    184     """
--> 185     return _copy_to_numpy(usm_ary)

File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:40, in _copy_to_numpy(ary)
     38 def _copy_to_numpy(ary):
     39     if not isinstance(ary, dpt.usm_ndarray):
---> 40         raise TypeError(
     41             f"Expected dpctl.tensor.usm_ndarray, got {type(ary)}"
     42         )
     43     nb = ary.usm_data.nbytes
     44     hh = dpm.MemoryUSMHost(nb, queue=ary.sycl_queue)

TypeError: Expected dpctl.tensor.usm_ndarray, got <class 'list'>

In [3]: quit
```
#1462)

* Fixes correctness regression in search functions

``py_search_over_axis`` no longer calls the ``axis1`` contiguous variant

``py_search_over_axis`` now only calls ``axis0`` variant wh

* Adds tests for fixed search reduction behavior
1. Renamed misspelled variable
2. If reduction_nelems is small, used SequentialReductionKernel
   for tree-reductions as it is done for atomic reduction
3. Tweak scaling down logic for moderately-sized number of elements
   to reduce.

   We should also use max_wg if the iter_nelems is very small (one),
   since choosing max_wg for large iter_nelems may lead to under-
   utilization of GPU.
_tensor_impl continues holding constructors, where, clip

_tensor_elementwise_impl holds elementwise functions
_tensor_reductions_impl holds reduction functions.
Added stable API to retrieve implementation functions in each elementwise
function class instance to allow `dpnp` to access that information using
stable API.
…at types

Added entries for float and double types to TypePairSupportDataForCompReductionAtomic
as spotted by @ndgrigorian in the PR review.

Also moved comments around.
This removes use of dpnp.matmul from the example, making this example
self-contained.
…ts (#1464)

* Adds SequentialSearchReduction functor to search reductions

* Search reductions use correct branch for float16

constexpr branch logic accounted for floating point types but not sycl::half,
which meant NaNs were not propagating for float16 data
…duction

Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed
Removed mention of dtype kwarg in usage line
Function _reduce_over_axis promotes input array to requested
result data type and carries out reduction computation in that
data type. This is done in dtype if implementation supports it.

If implementation does not support the requested dtype, we reduce
in the default_dtype, and cast to the request dtype afterwards.
Fix for gh-1468 in arithmetic reduction when type promotion is needed
…hape-is-integral-numpy-scalar

Fix usm_ndarray ctor when shape is integral numpy scalar
* Adds __array_namespace_info__ inspection utility

This inspection utility is coming to the array API specification in the near future

* Set __array_api_version__ to "2022.12"

* Remove --ci from array API conformity workflow

* Adds __array_namespace_info__ docstrings

Disallows dtypes for `kind` kwarg in __array_namespace_info__().dtypes

Removes  `float16` from dtypes listed by __array_namespace_info__ as per spec

Permits dpctl.tensor.Device objects in device keyword arguments in array API inspection utilities

* Adds tests for array API inspection
* Resolves gh-1456

Tree reductions now populate destination with the identity when reducing over
zero-size axes. As a result, logic was removed for handling zero-size axes.

``argmax``, ``argmin``, ``max``, and ``min`` still raise an error for
zero-size axes.

Reductions now return a copy when provided an empty axis tuple.

Adds additional supported dtype combinations to ``prod`` and ``sum``, specifically for input integers and inexact output type

* Implements mean, var, and std

* Adds more tests for statistical functions

* Adds docstrings for statistical functions

* Adds more supported types to arithmetic reductions

Permits `float` accumulation type with 64 bit integer and unsigned integer inouts
to prevent unnecessary copies on devices that don't support double precision

* Changes mean reduction to use output data type as sum accumulation type

Mean in-place division now uses the real type for the denominator
Copy link

github-actions bot commented Nov 9, 2023

@oleksandr-pavlyk oleksandr-pavlyk merged commit 7957990 into gold/2021 Nov 9, 2023
47 of 53 checks passed
Copy link

github-actions bot commented Nov 9, 2023

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants