discrepancy between clang and nvcc regarding std::is_invocable_v #69956

yxsamliu · 2023-10-23T18:14:39Z

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

Artem-B · 2023-10-23T21:53:03Z

It looks like the same issue as what we've recently seen with concepts and template deduction guidelines.

In order to determine callability, we need to plumb through the caller context, and we currently don't (or assume global/host context, as it appears to be the case in the reproducer above.)

AFAICT, NVCC gets it wrong in the opposite direction and claims that everything is callable from everywhere:
https://godbolt.org/z/b4Wja18Eo

It may end up being the least bad choice here, as the user may instantiate a type based on std::is_invocable_v and the same type may be applicable to GPU and the host and we can't give two different answers within the same compilation for the same type we're creating with std::is_invocable_v and the given function object.

I guess it boils down to what we want std::is_invocable_v to mean? That the function object will really callable at runtime in the current sub-compilation? Or that it may potentially be callable in some subcompilation?

Ideally, within your reproducer we'd want std::is_invocable_v to be false for the host-only function objects within GPU-only functions and vice versa.

ldionne · 2023-10-23T22:04:51Z

FWIW we have experience with using strict availability attributes to influence how things SFINAE-away (which is what's happening here under the hood), and it has been the cause of a lot of issues. I am not well versed in CUDA, but from the outside I would strongly recommend that __host__ and __device__ does not influence what std::is_invocable returns.

Artem-B · 2023-10-23T22:23:53Z

That suggests that we may need to stick with the "true if it may be callable somewhere" interpretation, which is what NVCC appears to do.

On the positive side, it should be consistent for both host and GPU sub-compilations.
The downside is that it will be rather counter-intuitive to see "true" for a function object that is obviously not callable (e.g. an object with a GPU-only operator() used in an assert() within a host-only function).

We may need to grow some sort of target-specific extension std::is_callable_on_host and std::is_callable_on_gpu which could be used to check callability more specifically, where it matters.

ldionne · 2023-10-23T22:29:31Z

I think something like this is quite desirable as we work towards running e.g. PSTL algorithms on GPUs (see for example #66968). However, it would probably operate at a different level than something like std::is_invocable, which clearly operates at the C++ "front-end" level in the sense that it checks whether some expression is syntactically valid.

yxsamliu · 2023-10-25T20:55:02Z

The reduced test case is here https://godbolt.org/z/xfYs16M35

clang tries to substitute __is_invocable<DeviceAdd,int, int>

for that, it tries to substitute _S_test<DeviceAdd,int, int>

for that it tries to substitute std::declval<DeviceAdd>()(std::declval<int, int>())

which ends up as a call DeviceAdd(0,0) with the current function _S_test. IsAllowedCUDACall thinks that it is a call of device function in a host function. This call is discarded, which causes substitution failure.

adding __device__ __host__ to _S_test makes the test pass.

In a sense, clang's behavior is reasonable. When we do substitutions for decltype(std::declval<_F>()(std::declval<_Args>()...)) _S_test(int);, we may want to limit _F to functions that are available to _S_test, because we could have defined functions of the same name but different signatures for device and host. It is not so useful to do substitution with functions not available, since it will just result in inconsistencies after the type is really instantiated.

However, we are now facing the issue that <type_traits> are written in C++ and all function templates are assumed to be host by default. constexpr can make them implicitly host device but in this case we have a function that is not constexpr.

One workaround for this issue is to add a wrapper header for <type_traits> where we wrap the original header with #pragma clang force_cuda_host_device begin and #pragma clang force_cuda_host_device end.

Another solution is to disable availability check during template substitution, but this could cause consistency issues and regressions.

Currently std::is_invocable does not work for CUDA/HIP since its implementation requires checking whether a function is invocable in the context of a synthesized host function. In general, to make <type_traits> work with CUDA/HIP, the template functions need to be defined as so that they are available in both host and device contexts. Fixes: llvm#69956 Fixes: SWDEV-428314

Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: llvm#69956 Fixes: SWDEV-428314

Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: #69956 Fixes: SWDEV-428314

llvmbot · 2023-11-10T01:42:54Z

@llvm/issue-subscribers-clang-codegen

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

llvmbot · 2023-11-10T01:42:58Z

@llvm/issue-subscribers-clang-driver

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

llvmbot · 2023-11-10T01:42:59Z

@llvm/issue-subscribers-clang-frontend

Author: Yaxun (Sam) Liu (yxsamliu)

https://godbolt.org/z/3cP5v4cbd

Basically, nvcc does not check availability by host/device attributes in std::is_invocable_v but clang does. This makes std::is_invocable_v returns false for a host function in a kernel for clang but not for nvcc.

Open this issue to discuss whether this should be treated as a clang bug.

@Artem-B

Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: llvm#69956 Fixes: SWDEV-428314

github-actions bot added the clang Clang issues not falling into any other category label Oct 23, 2023

Artem-B added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. cuda labels Oct 23, 2023

Pierre-vh mentioned this issue Oct 26, 2023

[clang] std::is_invocable fails for __device__ operator() #70006

Closed

yxsamliu mentioned this issue Oct 26, 2023

[CUDA][HIP] Make template implicitly host device #70369

Merged

yxsamliu closed this as completed in #70369 Nov 10, 2023

EugeneZelenko added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen and removed libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. labels Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

yxsamliu commented Oct 23, 2023

Artem-B commented Oct 23, 2023

ldionne commented Oct 23, 2023 •

edited

Loading

Artem-B commented Oct 23, 2023 •

edited

Loading

ldionne commented Oct 23, 2023

yxsamliu commented Oct 25, 2023

llvmbot commented Nov 10, 2023

llvmbot commented Nov 10, 2023

llvmbot commented Nov 10, 2023

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

discrepancy between clang and nvcc regarding std::is_invocable_v #69956

Comments

yxsamliu commented Oct 23, 2023

Artem-B commented Oct 23, 2023

ldionne commented Oct 23, 2023 • edited Loading

Artem-B commented Oct 23, 2023 • edited Loading

ldionne commented Oct 23, 2023

yxsamliu commented Oct 25, 2023

llvmbot commented Nov 10, 2023

llvmbot commented Nov 10, 2023

llvmbot commented Nov 10, 2023

ldionne commented Oct 23, 2023 •

edited

Loading

Artem-B commented Oct 23, 2023 •

edited

Loading