-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable compiling for cuda #1454
Enable compiling for cuda #1454
Conversation
* Implements dpctl.tensor.cbrt * Implements copysign and exp2 elementwise funcs * Adds tests for cbrt, copysign, exp2 * Implements rsqrt and tests for rsqrt * Modified tests for cbrt, copysign, and rsqrt Now test more type combinations/output types
Tweaked test_intel_device_info
Use sycl_complex extension to implement complex-valued trigonometric, hyperbolic functions and their inverses. This works around use of double precision functions/literals in implementations of these functions in MSVC headers, causing failures to offload on Iris Xe for single precision input citing lack of fp64 support by the hardware.
Used functions from sycl::ext::oneapi::experimental context to implement evaluation on data of complex type.
For every CMake target, where add_sycl_to_target is used, we also run target_compile_options( ${target_name} PRIVATE -fysl-targets=spir64-unknown-unknown,nvptx64-nvidia-cuda )
Replaced them with uses of sycl::ext::oneapi::experimental namespace functions instead.
Also DPCTL_SYCL_TARGETS parameter can be used to specify targets to build for. DPCTL_TARGET_CUDA could be set via cmake option, or via environment variable, e.g. ``` $ DPCTL_TARGET_CUDA=1 python scripts/build_locally.py --verbose ```
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1454/index.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting. The project now builds on my local hardware as well, with similar outcomes with respect to the test suite, but with it freezing in test_tensor_sum
rather than failing.
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_71 ran successfully. |
Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_72 ran successfully. |
479a969
into
use-sycl-ext-oneapi-experimental-for-complex
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
This PR fixes build for CUDA by replacing overlooked instances of uses
std
namespace functions for complex type inputs with uses ofsycl::ext::oneapi::experimental
namespace functions.Cmake scripts are modified to allow building multi sycl-targets binaries. This can be done by specifying
-DDPCTL_TARGET_CUDA:BOOL=ON
, or by setting environment variableDPCTL_TARGET_CUDA=1
.It is also possible to manually specify sycl targets string via
-DDPCTL_SYCL_TARGETS=nvptx64-nvidia-cuda,spir64-unknown-unknown
.Test suite runs to completion, when
ONEAPI_DEVICE_SELECTOR=cuda:gpu
is set, but with many test failures which must be investigated, e.g.This may be caused by compiler support not yet implemented for these types.