-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ext] Define and Implement sycl_ext_tensor_map #16247
base: sycl
Are you sure you want to change the base?
Conversation
This depends on Hugh's work for Unified Runtime here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceConfigFile
changes LGTM
3d2c85f
to
16f22f6
Compare
I saw this build error locally due to a stale build tree. Are we not doing clean checkouts for CI? |
No we used cached checkouts |
How do I clear them? |
I have to log into the runners and do it manually, but I don't know if other PRs will end up in the cache and cause the same problem. I'll try, give me a sec. |
Thanks. I'll see if I can track down why the |
add_custom_command(
OUTPUT ${OUT_HEADERS_IN_SYCL_DIR}
${OUT_HEADERS_IN_CL_DIR}
${OUT_HEADERS_IN_STD_DIR}
${OUT_HEADERS_IN_SYCLCOMPAT_DIR}
DEPENDS ${HEADERS_IN_SYCL_DIR}
${HEADERS_IN_CL_DIR}
${HEADERS_IN_STD_DIR}
${HEADERS_IN_SYCLCOMPAT_DIR}
COMMAND ${CMAKE_COMMAND} -E copy_directory ${sycl_inc_dir}/sycl ${SYCL_INCLUDE_BUILD_DIR}/sycl
COMMAND ${CMAKE_COMMAND} -E copy_directory ${sycl_inc_dir}/CL ${SYCL_INCLUDE_BUILD_DIR}/CL
COMMAND ${CMAKE_COMMAND} -E copy_directory ${sycl_inc_dir}/std ${SYCL_INCLUDE_BUILD_DIR}/std
COMMAND ${CMAKE_COMMAND} -E copy_directory ${sycl_inc_dir}/syclcompat ${SYCL_INCLUDE_BUILD_DIR}/syclcompat
COMMAND ${CMAKE_COMMAND} -E copy ${sycl_inc_dir}/syclcompat.hpp ${SYCL_INCLUDE_BUILD_DIR}/syclcompat.hpp
COMMAND ${CMAKE_COMMAND} -E copy ${UNIFIED_RUNTIME_INCLUDE_DIR}/ur_api.h ${SYCL_INCLUDE_BUILD_DIR}
COMMAND ${CMAKE_COMMAND} -E copy ${UNIFIED_RUNTIME_INCLUDE_DIR}/ur_api_funcs.def ${SYCL_INCLUDE_BUILD_DIR}
COMMAND ${CMAKE_COMMAND} -E copy ${UNIFIED_RUNTIME_INCLUDE_DIR}/ur_print.hpp ${SYCL_INCLUDE_BUILD_DIR}
COMMENT "Copying SYCL headers ...")
Yeah there's no dependency on the input files for the UR headers. I'll submit a patch |
Cool. I tried clearing the cache but it didn't work because we check out intel/llvm HEAD first, so we hit the problem again. Ping me on the PR for the CMake fix and I'll try to fast track it |
|
e169672
to
0033748
Compare
0033748
to
2b161d2
Compare
2b161d2
to
0c32697
Compare
sycl/doc/extensions/experimental/sycl_ext_codeplay_cuda_tensor_map.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_codeplay_cuda_tensor_map.asciidoc
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/codeplay/experimental/cuda_tensor_map.hpp
Outdated
Show resolved
Hide resolved
43d7ed1
to
c8dee17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add tests for this feature.
should not rely on APIs defined in this specification.* It is likely to be | ||
generalized and significantly change in later revisions as more backend vendors | ||
implement analogous features of more or less expressivity and generality than | ||
shown here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wild about adding this new category of "oneapi-only" extension. I'm guessing this is for XeTLA? If that is the only library we expect to use this, why not add the support directly in that library, rather than adding a SYCL API for it? I imagine this would end up calling CUDA APIs or inline asm statements, but I think XeTLA does this already for other devices. I'd feel differently if we were adding a general API that other applications could make use of, but that's not what we're doing in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CUTLASS, actually - but yeah it's not great to be so limiting.
We discussed a couple of ways to do this (with one of my suggestions being interop), but it seems there's little appetite to be including CUDA specific headers in these ports.
To be clear, there's no reason we really need to be so limiting with our wording here, I just added it since I'm not completely confident it has uses outside of the CUTLASS case and wanted to limit maintenance burden.
Would relaxing the language here be appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used the standard boilerplate language for the Status
section
This is a fairly mechanical implementation of the basic infrastructure required to access CUDA TMA descriptors from within SYCL kernels, while initializing them on the host. The new feature exposes two new classes and associated support structure in `sycl::ext::codeplay::experimental::cuda`. There's some ugliness involved to make this work on account of the way NVIDIA implemented this basic feature, but it's all in the name of {legitimate-field-of-endeavour}.
c8dee17
to
1fba3c9
Compare
@againull Good catch. I've added aspect and macro tests. The use of this feature requests sm90+ GPU and inline assembly, so I've ignored that part. Hope that's enough |
This is a fairly mechanical implementation of the basic infrastructure
required to access CUDA TMA descriptors from within SYCL kernels, while
initializing them on the host. The new feature exposes two new classes
and associated support structure in
sycl::ext::codeplay::experimental::cuda
.There's some ugliness involved to make this work on account of the way
NVIDIA implemented this basic feature, but it's all in the name of
{legitimate-field-of-endeavour}.