Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

Closed
yxsamliu opened this issue Oct 23, 2023 · 8 comments · Fixed by #70369
Closed

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

yxsamliu opened this issue Oct 23, 2023 · 8 comments · Fixed by #70369
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category cuda

Comments

@yxsamliu
Copy link
Collaborator

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Oct 23, 2023
@Artem-B
Copy link
Member

Artem-B commented Oct 23, 2023

It looks like the same issue as what we've recently seen with concepts and template deduction guidelines.

In order to determine callability, we need to plumb through the caller context, and we currently don't (or assume global/host context, as it appears to be the case in the reproducer above.)

AFAICT, NVCC gets it wrong in the opposite direction and claims that everything is callable from everywhere:
https://godbolt.org/z/b4Wja18Eo

It may end up being the least bad choice here, as the user may instantiate a type based on std::is_invocable_v and the same type may be applicable to GPU and the host and we can't give two different answers within the same compilation for the same type we're creating with std::is_invocable_v and the given function object.

I guess it boils down to what we want std::is_invocable_v to mean? That the function object will really callable at runtime in the current sub-compilation? Or that it may potentially be callable in some subcompilation?

Ideally, within your reproducer we'd want std::is_invocable_v to be false for the host-only function objects within GPU-only functions and vice versa.

@Artem-B Artem-B added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. cuda labels Oct 23, 2023
@ldionne
Copy link
Member

ldionne commented Oct 23, 2023

FWIW we have experience with using strict availability attributes to influence how things SFINAE-away (which is what's happening here under the hood), and it has been the cause of a lot of issues. I am not well versed in CUDA, but from the outside I would strongly recommend that __host__ and __device__ does not influence what std::is_invocable returns.

@Artem-B
Copy link
Member

Artem-B commented Oct 23, 2023

That suggests that we may need to stick with the "true if it may be callable somewhere" interpretation, which is what NVCC appears to do.

On the positive side, it should be consistent for both host and GPU sub-compilations.
The downside is that it will be rather counter-intuitive to see "true" for a function object that is obviously not callable (e.g. an object with a GPU-only operator() used in an assert() within a host-only function).

We may need to grow some sort of target-specific extension std::is_callable_on_host and std::is_callable_on_gpu which could be used to check callability more specifically, where it matters.

@ldionne
Copy link
Member

ldionne commented Oct 23, 2023

I think something like this is quite desirable as we work towards running e.g. PSTL algorithms on GPUs (see for example #66968). However, it would probably operate at a different level than something like std::is_invocable, which clearly operates at the C++ "front-end" level in the sense that it checks whether some expression is syntactically valid.

@yxsamliu
Copy link
Collaborator Author

The reduced test case is here https://godbolt.org/z/xfYs16M35

clang tries to substitute __is_invocable<DeviceAdd,int, int>

for that, it tries to substitute _S_test<DeviceAdd,int, int>

for that it tries to substitute std::declval<DeviceAdd>()(std::declval<int, int>())

which ends up as a call DeviceAdd(0,0) with the current function _S_test. IsAllowedCUDACall thinks that it is a call of device function in a host function. This call is discarded, which causes substitution failure.

adding __device__ __host__ to _S_test makes the test pass.

In a sense, clang's behavior is reasonable. When we do substitutions for decltype(std::declval<_F>()(std::declval<_Args>()...)) _S_test(int);, we may want to limit _F to functions that are available to _S_test, because we could have defined functions of the same name but different signatures for device and host. It is not so useful to do substitution with functions not available, since it will just result in inconsistencies after the type is really instantiated.

However, we are now facing the issue that <type_traits> are written in C++ and all function templates are assumed to be host by default. constexpr can make them implicitly host device but in this case we have a function that is not constexpr.

One workaround for this issue is to add a wrapper header for <type_traits> where we wrap the original header with #pragma clang force_cuda_host_device begin and #pragma clang force_cuda_host_device end.

Another solution is to disable availability check during template substitution, but this could cause consistency issues and regressions.

yxsamliu added a commit to yxsamliu/llvm-project that referenced this issue Oct 26, 2023
Currently std::is_invocable does not work for CUDA/HIP since
its implementation requires checking whether a function
is invocable in the context of a synthesized host function.

In general, to make <type_traits> work with CUDA/HIP, the
template functions need to be defined as
so that they are available in both host and device contexts.

Fixes: llvm#69956

Fixes: SWDEV-428314
yxsamliu added a commit to yxsamliu/llvm-project that referenced this issue Nov 1, 2023
Added option -foffload-implicit-host-device-templates which is off by default.

When the option is on, template functions and specializations without
host/device attributes have implicit host device attributes.

They can be overridden by device template functions with the same signagure.
They are emitted on device side only if they are used on device side.

This feature is added as an extension.
`__has_extension(cuda_implicit_host_device_templates)` can be used to
check whether it is enabled.

This is to facilitate using standard C++ headers for device.

Fixes: llvm#69956

Fixes: SWDEV-428314
yxsamliu added a commit that referenced this issue Nov 10, 2023
Added option -foffload-implicit-host-device-templates which is off by
default.

When the option is on, template functions and specializations without
host/device attributes have implicit host device attributes.

They can be overridden by device template functions with the same
signagure.
They are emitted on device side only if they are used on device side.

This feature is added as an extension.
`__has_extension(cuda_implicit_host_device_templates)` can be used to
check whether it is enabled.

This is to facilitate using standard C++ headers for device.

Fixes: #69956

Fixes: SWDEV-428314
@EugeneZelenko EugeneZelenko added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen and removed libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. labels Nov 10, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/issue-subscribers-clang-codegen

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/issue-subscribers-clang-driver

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/issue-subscribers-clang-frontend

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
Added option -foffload-implicit-host-device-templates which is off by
default.

When the option is on, template functions and specializations without
host/device attributes have implicit host device attributes.

They can be overridden by device template functions with the same
signagure.
They are emitted on device side only if they are used on device side.

This feature is added as an extension.
`__has_extension(cuda_implicit_host_device_templates)` can be used to
check whether it is enabled.

This is to facilitate using standard C++ headers for device.

Fixes: llvm#69956

Fixes: SWDEV-428314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category cuda
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants