Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable compiling for cuda #1454

Merged

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This PR fixes build for CUDA by replacing overlooked instances of uses std namespace functions for complex type inputs with uses of sycl::ext::oneapi::experimental namespace functions.

Cmake scripts are modified to allow building multi sycl-targets binaries. This can be done by specifying -DDPCTL_TARGET_CUDA:BOOL=ON, or by setting environment variable DPCTL_TARGET_CUDA=1.

It is also possible to manually specify sycl targets string via -DDPCTL_SYCL_TARGETS=nvptx64-nvidia-cuda,spir64-unknown-unknown.

Test suite runs to completion, when ONEAPI_DEVICE_SELECTOR=cuda:gpu is set, but with many test failures which must be investigated, e.g.

FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[f2] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[c8] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
FAILED dpctl/tests/test_tensor_sum.py::test_sum_arg_dtype_default_output_dtype_matrix[c16] - RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

This may be caused by compiler support not yet implemented for these types.


  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

oleksandr-pavlyk and others added 19 commits October 17, 2023 01:32
* Implements dpctl.tensor.cbrt

* Implements copysign and exp2 elementwise funcs

* Adds tests for cbrt, copysign, exp2

* Implements rsqrt and tests for rsqrt

* Modified tests for cbrt, copysign, and rsqrt

Now test more type combinations/output types
Use sycl_complex extension to implement complex-valued trigonometric,
hyperbolic functions and their inverses.

This works around use of double precision functions/literals in implementations
of these functions in MSVC headers, causing failures to offload on Iris Xe for
single precision input citing lack of fp64 support by the hardware.
Used functions from sycl::ext::oneapi::experimental context to implement
evaluation on data of complex type.
For every CMake target, where add_sycl_to_target is used, we also run
target_compile_options(
   ${target_name}
   PRIVATE
   -fysl-targets=spir64-unknown-unknown,nvptx64-nvidia-cuda
)
Replaced them with uses of sycl::ext::oneapi::experimental namespace
functions instead.
Also DPCTL_SYCL_TARGETS parameter can be used to specify targets
to build for.

DPCTL_TARGET_CUDA could be set via cmake option, or via environment
variable, e.g.

```
$ DPCTL_TARGET_CUDA=1 python scripts/build_locally.py --verbose
```
@oleksandr-pavlyk oleksandr-pavlyk changed the title Compile for cuda Enable compiling for cuda Oct 25, 2023
@github-actions
Copy link

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exciting. The project now builds on my local hardware as well, with similar outcomes with respect to the test suite, but with it freezing in test_tensor_sum rather than failing.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_71 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.15.1dev0=py310ha25a700_72 ran successfully.
Passed: 935
Failed: 65
Skipped: 119

@oleksandr-pavlyk oleksandr-pavlyk merged commit 479a969 into use-sycl-ext-oneapi-experimental-for-complex Oct 25, 2023
23 of 26 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the compile-for-cuda branch October 25, 2023 19:45
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants