-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge of 0.15.1dev2 tag to gold/2021 #1471
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix upload_linux/upload_windows steps
* Implements necessary sycl utilities for custom reductions * Implements dpctl.tensor.max and dpctl.tensor.min * Adds tests for min and max * Reductions now set max_wg to the minimum of the max work group size and 2048 - This prevents running out of resources when using local memory on CPU * max and min nan propagation fixed for CPU devices - drops use of fetch_max/fetch_min for floats, which do not handle nans correctly * Tweak to test_reduction_kernels * Implements dpctl.tensor.argmax and argmin * Tests for argmin and argmax Also fixes argmin and argmax for scalar inputs * Argmin and argmax now handle identities correctly Adds a test for this behavior Fixed a typo in argmin and argmax causing shared local memory variant to be used for more types than expected * Replaced `std::min` with `idx_reduction_op_` * reductions now well-behaved for size-zero arrays - comparison and search reductions will throw an error in this case - slips in change to align sum signature with array API spec * removed unnecessary copies in reduction templates * Refactors sum to use generic reduction templates * Sum now uses a generic Python API * Docstrings added for argmax, argmin, max, and min * Small reduction clean-ups Removed unnecessary copies in custom_reduce_over_group Sequential reduction now casts before calling operator (makes behavior explicit rather than implicit) * Added test for argmin with keepdims=True * Added a test for raised errors in reductions Also removed unused `_usm_types` in `test_tensor_sum` * Removed `void` overloads from reduction utilities These were unused by dpctl * Added missing include, Identity to use has_known_identity Implementation of Identity trait should call sycl::known_identity if trait sycl::has_known_identity is a true_type. Added IsMultiplies, and identity value for it, since sycl::known_identity for multiplies is only defined for real-valued types. * Adding functor factories for product over axis * Added Python API for _prod_over_axis * Common reduction template takes functions to test if atomics are applicable Passing these function pointers around allows to turn atomic off altogether if desired. Use custom trait to check if reduce_over_groups can be used. This allows to work-around bug, or switch to custom code for reduction over group if desired. Such custom trait type works around issue with incorrect result returned from sycl::reduce_over_group for sycl::multiplies operator for 64-bit integral types. * Defined dpctl.tensor.prod Also tweaked docstring for sum. * Added tests for dpt.prod, removed uses of numpy * Corrected prod docstring Small tweaks to sum, min, and max docstrings --------- Co-authored-by: Oleksandr Pavlyk <oleksandr.pavlyk@intel.com>
* Implements flat overload for repeat Adds tests for new functionality * repeat `repeats` parameter relaxed to permit lists and ranges Docstring has been adjusted to reflect changes to `axis` as well as new `repeats` types Corrected a bug in the behavior of `repeat` for size 1 `repeats` Python sequences * Fixed repeat error syntax for `repeats array with ndim > 1
[CI] Fix test_linux step in conda-packages workflow
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md This includes HW characteristics for Intel Level-Zero GPU devices as well as access to PCI device-identifier.
Adds a test for _is_gen9 helper utility useful for skipping tests known to fail on Gen9. Adds a test for intel_device_info function. Test that descriptor names do not have typos.
1d variant of repeat was not passed host task event dependency for allocating shapes and strides on the device. This caused sporadic segfaults, where the kernel would attempt to access unallocated device data.
Removed cmake/IntelDPCPP.cmake, vendored cmake/IntelSYCL.cmake script Changed project's CMake scripts to use IntelSYCL. Renamed libsyclinterface/cmake/modules/FindIntelSycl.cmake to libsyclinterface/cmake/modules/FindIntelSyclCompiler.cmake to avoid possible name conflict on OS with case insensitive FS
Removed -fsycl for default linking options on Windows. The option is added as needed.
…'s pyhon_add_library
Set variable in public CI to override using interprocedural optimization in public CI to avoid insufficient resources failure during compilation on Windows.
…ffset-handling Fix reduction contig impl offset handling
* max and min now use MinMaxAtomicSupportFactory These functions were using ArithmeticAtomicSupportFactory, which disables atomics for floating point types * Resolves #1455 This issue was caused by a typo where when the `axis0` kernels for tree and atomic reductions would be called, the `axis1` kernel would be called instead * Adds tests for #1455 resolution
Closes gh-1457 ``` In [1]: import dpctl.tensor as dpt In [2]: dpt.asnumpy([1,2,3]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 1 ----> 1 dpt.asnumpy([1,2,3]) File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:185, in asnumpy(usm_ary) 169 def asnumpy(usm_ary): 170 """ 171 asnumpy(usm_ary) 172 (...) 183 of `usm_ary` 184 """ --> 185 return _copy_to_numpy(usm_ary) File ~/repos/dpctl/dpctl/tensor/_copy_utils.py:40, in _copy_to_numpy(ary) 38 def _copy_to_numpy(ary): 39 if not isinstance(ary, dpt.usm_ndarray): ---> 40 raise TypeError( 41 f"Expected dpctl.tensor.usm_ndarray, got {type(ary)}" 42 ) 43 nb = ary.usm_data.nbytes 44 hh = dpm.MemoryUSMHost(nb, queue=ary.sycl_queue) TypeError: Expected dpctl.tensor.usm_ndarray, got <class 'list'> In [3]: quit ```
#1462) * Fixes correctness regression in search functions ``py_search_over_axis`` no longer calls the ``axis1`` contiguous variant ``py_search_over_axis`` now only calls ``axis0`` variant wh * Adds tests for fixed search reduction behavior
1. Renamed misspelled variable 2. If reduction_nelems is small, used SequentialReductionKernel for tree-reductions as it is done for atomic reduction 3. Tweak scaling down logic for moderately-sized number of elements to reduce. We should also use max_wg if the iter_nelems is very small (one), since choosing max_wg for large iter_nelems may lead to under- utilization of GPU.
_tensor_impl continues holding constructors, where, clip _tensor_elementwise_impl holds elementwise functions _tensor_reductions_impl holds reduction functions.
Added stable API to retrieve implementation functions in each elementwise function class instance to allow `dpnp` to access that information using stable API.
…at types Added entries for float and double types to TypePairSupportDataForCompReductionAtomic as spotted by @ndgrigorian in the PR review. Also moved comments around.
This removes use of dpnp.matmul from the example, making this example self-contained.
…ts (#1464) * Adds SequentialSearchReduction functor to search reductions * Search reductions use correct branch for float16 constexpr branch logic accounted for floating point types but not sycl::half, which meant NaNs were not propagating for float16 data
…duction Improve performance of reduction for small number of elements to reduce for types where tree-reduction is needed
Removed mention of dtype kwarg in usage line
Function _reduce_over_axis promotes input array to requested result data type and carries out reduction computation in that data type. This is done in dtype if implementation supports it. If implementation does not support the requested dtype, we reduce in the default_dtype, and cast to the request dtype afterwards.
Fix for gh-1468 in arithmetic reduction when type promotion is needed
…hape-is-integral-numpy-scalar Fix usm_ndarray ctor when shape is integral numpy scalar
* Adds __array_namespace_info__ inspection utility This inspection utility is coming to the array API specification in the near future * Set __array_api_version__ to "2022.12" * Remove --ci from array API conformity workflow * Adds __array_namespace_info__ docstrings Disallows dtypes for `kind` kwarg in __array_namespace_info__().dtypes Removes `float16` from dtypes listed by __array_namespace_info__ as per spec Permits dpctl.tensor.Device objects in device keyword arguments in array API inspection utilities * Adds tests for array API inspection
* Resolves gh-1456 Tree reductions now populate destination with the identity when reducing over zero-size axes. As a result, logic was removed for handling zero-size axes. ``argmax``, ``argmin``, ``max``, and ``min`` still raise an error for zero-size axes. Reductions now return a copy when provided an empty axis tuple. Adds additional supported dtype combinations to ``prod`` and ``sum``, specifically for input integers and inexact output type * Implements mean, var, and std * Adds more tests for statistical functions * Adds docstrings for statistical functions * Adds more supported types to arithmetic reductions Permits `float` accumulation type with 64 bit integer and unsigned integer inouts to prevent unnecessary copies on devices that don't support double precision * Changes mean reduction to use output data type as sum accumulation type Mean in-place division now uses the real type for the denominator
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1471/index.html |
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR merges development milestone 0.15.1dev2 complete with reduction functions, and statistical functions to gold/2021 (golden old days) for the purpose of building the milestone and uploading it to the internal package channel.