-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Driver][SYCL] Improve support for -fsycl-link with AOT #12828
Conversation
Support for -fsycl-link was restricted to only providing device linking and providing the final device binary for JIT. These changes expand on that functionality to pull in the ability to also perform offline compilation steps to support AOT targets. Example: clang++ -fsycl -fsycl-targets=spir64_gen -c file.cpp clang++ -fsycl -fsycl-link -fsycl-targets=spir64_gen -Xsycl-target-backend "-device skl" file.o -o newfile.o clang++ file.o newfile.o -o finalexe 'newfile.o' here is a final device binary for GPU that is wrapped and allowed to be linked in as a regular host object. The changes here are significant, performing adjustments on when the device linking occurs within the compilation toolchain within the driver. This improves the general dependency flow, allowing for the device linking to occur in a common location for -fsycl-link and normal link steps. Due to this, a number of tests relying on phase checking required to be tweaked.
Not ready for review yet - still requires a number of test updates and an additional fix for a multiple unbundle issue seen with early testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style comments only (not a "real" review :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my approval here is more "no flags" than anything, i don't feel qualified to do an in depth review but the overall approach looks good to me, a couple of nits and unrelated questions
appendSYCLDeviceLink(CAList, TargetInfo.TC, DA, DeviceLinkActions, | ||
TargetInfo.BoundArch, | ||
/*AddOffloadAction=*/true); | ||
appendSYCLDeviceLink(CAList, TargetInfo.TC, DeviceLinkActions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not really related to this PR so feel free to respond with lower priority, but i've started working on -fno-sycl-rdc
with the new offload driver, and it seems like in that case we don't use the DeviceActionBuilders
, so it makes it somewhat hard to call the SYCL device linking backend. my idea is to call it at Driver::BuildOffloadingActions
, but to do that it seems we would need to make appendSYCLDeviceLink
callable without the builders. is that the right idea or am i misunderstanding the flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds about right. Typical expectation is for the appendSYCLDeviceLink
to only occur within the SYCL specific toolchain. If we require access to this behavior otherwise, we need to pull it out and generalize it, which I would assume may require additional parameters to satisfy any AOT specific behaviors within.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@intel/dpcpp-nativecpu-pi-reviewers, please review. This update involved a more general flow update for the SYCL toolchain, impacting one of your sycl-native-cpu tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change on the Native CPU test looks good to me, I ran some tests offline with this PR and there are a couple breakages on Native CPU (runtime tests for Native CPU are not enabled in Github CI yet), but I'll probably create a follow up PR after this one goes in.
I think that the errors on Native CPU are due to the fact that clang-offload-wrapper
is called twice, and since for Native CPU host and device triple are the same, the cache entries used by BuildJobsForAction
get overwritten.
@intel/llvm-gatekeepers, this should be ready for merge - thanks! |
Support for -fsycl-link was restricted to only providing device linking and providing the final device binary for JIT. These changes expand on that functionality to pull in the ability to also perform offline compilation steps to support AOT targets.
Example:
clang++ -fsycl -fsycl-targets=spir64_gen -c file.cpp
clang++ -fsycl -fsycl-link -fsycl-targets=spir64_gen
-Xsycl-target-backend "-device skl" file.o -o newfile.o
clang++ file.o newfile.o -o finalexe
'newfile.o' here is a final device binary for GPU that is wrapped and allowed to be linked in as a regular host object.
The changes here are significant, performing adjustments on when the device linking occurs within the compilation toolchain within the driver. This improves the general dependency flow, allowing for the device linking to occur in a common location for -fsycl-link and normal link steps. Due to this, a number of tests relying on phase checking required to be tweaked.