Enable compiling for cuda #1454

oleksandr-pavlyk · 2023-10-25T15:18:02Z

This PR fixes build for CUDA by replacing overlooked instances of uses std namespace functions for complex type inputs with uses of sycl::ext::oneapi::experimental namespace functions.

Cmake scripts are modified to allow building multi sycl-targets binaries. This can be done by specifying -DDPCTL_TARGET_CUDA:BOOL=ON, or by setting environment variable DPCTL_TARGET_CUDA=1.

It is also possible to manually specify sycl targets string via -DDPCTL_SYCL_TARGETS=nvptx64-nvidia-cuda,spir64-unknown-unknown.

Test suite runs to completion, when ONEAPI_DEVICE_SELECTOR=cuda:gpu is set, but with many test failures which must be investigated, e.g.

FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[f2] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[c8] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[c16] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

This may be caused by compiler support not yet implemented for these types.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

* Implements dpctl.tensor.cbrt * Implements copysign and exp2 elementwise funcs * Adds tests for cbrt, copysign, exp2 * Implements rsqrt and tests for rsqrt * Modified tests for cbrt, copysign, and rsqrt Now test more type combinations/output types

Tweaked test_intel_device_info

Use sycl_complex extension to implement complex-valued trigonometric, hyperbolic functions and their inverses. This works around use of double precision functions/literals in implementations of these functions in MSVC headers, causing failures to offload on Iris Xe for single precision input citing lack of fp64 support by the hardware.

Used functions from sycl::ext::oneapi::experimental context to implement evaluation on data of complex type.

For every CMake target, where add_sycl_to_target is used, we also run target_compile_options( ${target_name} PRIVATE -fysl-targets=spir64-unknown-unknown,nvptx64-nvidia-cuda )

Replaced them with uses of sycl::ext::oneapi::experimental namespace functions instead.

Also DPCTL_SYCL_TARGETS parameter can be used to specify targets to build for. DPCTL_TARGET_CUDA could be set via cmake option, or via environment variable, e.g. ``` $ DPCTL_TARGET_CUDA=1 python scripts/build_locally.py --verbose ```

…ile-for-cuda

github-actions · 2023-10-25T16:47:43Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1454/index.html

ndgrigorian

This is very exciting. The project now builds on my local hardware as well, with similar outcomes with respect to the test suite, but with it freezing in test_tensor_sum rather than failing.

github-actions · 2023-10-25T17:22:36Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_71 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

github-actions · 2023-10-25T17:30:59Z

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_72 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

github-actions · 2023-10-25T19:46:02Z

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

oleksandr-pavlyk and others added 19 commits October 17, 2023 01:32

Tweaked test_intel_device_info

c9cc505

Merge pull request #1445 from IntelPython/fix-intel-device-test

1d57614

Tweaked test_intel_device_info

Set SYCL_EXT_ONEAPI_COMPLEX on Windows as well

1d5fdce

Use sycl_complex in add, conj

8df4745

More transitions to experimental complex

ef2563d

More files change to use sycl_complex

0717bbe

Use oneapi extension for complexes for remaining elementwise functions

c5f26eb

Used functions from sycl::ext::oneapi::experimental context to implement evaluation on data of complex type.

Changes include CL/sycl.hpp to sycl/sycl.hpp per SYCL-2020 spec

f6c3e56

Change include CL/sycl.hpp to sycl/sycl.hpp per SYCL-2020 spec

3b9d81d

Use experimental::complex for in-place division

44abcb4

include "CL/sycl.hpp" -> include "sycl/sycl.hpp"

23aeec6

include CL/sycl.hpp -> include sycl/sycl.hpp

66ba04e

Add target_compile_options setting sycl-targets for targets needing SYCL

fd9df2a

For every CMake target, where add_sycl_to_target is used, we also run target_compile_options( ${target_name} PRIVATE -fysl-targets=spir64-unknown-unknown,nvptx64-nvidia-cuda )

Fix compilation error

9561b6e

Replace overlooked std::log, std::sinh, std::exp for complex types

c101748

Replaced them with uses of sycl::ext::oneapi::experimental namespace functions instead.

Replaced include CL/sycl.hpp with include sycl/sycl.hpp

0827f3d

Add DPCTL_TARGET_CUDA Boolean cmake option

5eefdd1

Also DPCTL_SYCL_TARGETS parameter can be used to specify targets to build for. DPCTL_TARGET_CUDA could be set via cmake option, or via environment variable, e.g. ``` $ DPCTL_TARGET_CUDA=1 python scripts/build_locally.py --verbose ```

oleksandr-pavlyk requested a review from ndgrigorian October 25, 2023 15:18

oleksandr-pavlyk added 2 commits October 25, 2023 10:47

Merge branch 'use-sycl-ext-oneapi-experimental-for-complex' into comp…

1d51752

…ile-for-cuda

clang-format fixes

986dc6f

oleksandr-pavlyk changed the title ~~Compile for cuda~~ Enable compiling for cuda Oct 25, 2023

ndgrigorian approved these changes Oct 25, 2023

View reviewed changes

oleksandr-pavlyk merged commit 479a969 into use-sycl-ext-oneapi-experimental-for-complex Oct 25, 2023
23 of 26 checks passed

oleksandr-pavlyk deleted the compile-for-cuda branch October 25, 2023 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable compiling for cuda #1454

Enable compiling for cuda #1454

oleksandr-pavlyk commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

ndgrigorian left a comment

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

Enable compiling for cuda #1454

Enable compiling for cuda #1454

Conversation

oleksandr-pavlyk commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

ndgrigorian left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023