Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Separate OpenMP Offloading Backend to libcxx/include/__algorithm/pstl_backends #66968

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AntonRydahl
Copy link
Contributor

@AntonRydahl AntonRydahl commented Sep 21, 2023

To address some of the good feedback from the recent request for comments, I have added a separate OpenMP offloading backend in libcxx/include/__algorithm/pstl_backends.

To begin with, I have added std::for_each, std::transform, std::transform_reduce, and std::fill.

I know there are still a lot of things missing, such as LIT tests, but I think we should try to agree on the project structure first.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Sep 21, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 21, 2023

@llvm/pr-subscribers-github-workflow
@llvm/pr-subscribers-libcxxabi

@llvm/pr-subscribers-libcxx

Changes

To address some of the good feedback from the recent request for comments, I have added a separate OpenMP offloading backend in libcxx/include/__algorithm/pstl_backends.

To begin with, I have only added std::for_each and std::fill.

I know there are still a lot of things missing, such as LIT tests, but I think we should try to agree on the project structure first.


Full diff: https://github.com/llvm/llvm-project/pull/66968.diff

9 Files Affected:

  • (modified) libcxx/CMakeLists.txt (+14)
  • (modified) libcxx/include/CMakeLists.txt (+5)
  • (modified) libcxx/include/__algorithm/pstl_backend.h (+8)
  • (added) libcxx/include/__algorithm/pstl_backends/gpu_backend.h (+21)
  • (added) libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h (+33)
  • (added) libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h (+59)
  • (added) libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h (+59)
  • (added) libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h (+94)
  • (modified) libcxx/include/__config_site.in (+1)
diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt
index bb2898b799bcef9..43d2a448de79584 100644
--- a/libcxx/CMakeLists.txt
+++ b/libcxx/CMakeLists.txt
@@ -290,6 +290,8 @@ option(LIBCXX_HAS_WIN32_THREAD_API "Ignore auto-detection and force use of win32
 option(LIBCXX_HAS_EXTERNAL_THREAD_API
   "Build libc++ with an externalized threading API.
    This option may only be set to ON when LIBCXX_ENABLE_THREADS=ON." OFF)
+option(LIBCXX_ENABLE_GPU_OFFLOAD 
+  "Build libc++ with support for GPU offload" OFF)
 
 if (LIBCXX_ENABLE_THREADS)
   set(LIBCXX_PSTL_CPU_BACKEND "std_thread" CACHE STRING "Which PSTL CPU backend to use")
@@ -297,6 +299,14 @@ else()
   set(LIBCXX_PSTL_CPU_BACKEND "serial" CACHE STRING "Which PSTL CPU backend to use")
 endif()
 
+if (NOT DEFINED LIBCXX_PSTL_GPU_BACKEND)
+  if (${LIBCXX_ENABLE_GPU_OFFLOAD})
+    set(LIBCXX_PSTL_GPU_BACKEND "omp_offload" CACHE STRING "Which PSTL GPU backend to use")
+  else()
+    set(LIBCXX_PSTL_GPU_BACKEND "none" CACHE STRING "Which PSTL GPU backend to use")
+  endif()
+endif()
+
 # Misc options ----------------------------------------------------------------
 # FIXME: Turn -pedantic back ON. It is currently off because it warns
 # about #include_next which is used everywhere.
@@ -809,6 +819,10 @@ else()
                        Valid backends are: serial, std_thread and libdispatch")
 endif()
 
+if (LIBCXX_PSTL_GPU_BACKEND STREQUAL "omp_offload")
+  config_define(1 _LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+endif()
+
 if (LIBCXX_ABI_DEFINES)
   set(abi_defines)
   foreach (abi_define ${LIBCXX_ABI_DEFINES})
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 2ec755236dbaee2..a3d72df61a86dde 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -85,6 +85,11 @@ set(files
   __algorithm/pstl_backends/cpu_backends/thread.h
   __algorithm/pstl_backends/cpu_backends/transform.h
   __algorithm/pstl_backends/cpu_backends/transform_reduce.h
+  __algorithm/pstl_backends/gpu_backend.h
+  __algorithm/pstl_backends/gpu_backends/backend.h
+  __algorithm/pstl_backends/gpu_backends/fill.h
+  __algorithm/pstl_backends/gpu_backends/for_each.h
+  __algorithm/pstl_backends/gpu_backends/omp_offload.h
   __algorithm/pstl_copy.h
   __algorithm/pstl_count.h
   __algorithm/pstl_fill.h
diff --git a/libcxx/include/__algorithm/pstl_backend.h b/libcxx/include/__algorithm/pstl_backend.h
index 93372f019031b63..47f5191b48517ba 100644
--- a/libcxx/include/__algorithm/pstl_backend.h
+++ b/libcxx/include/__algorithm/pstl_backend.h
@@ -10,6 +10,7 @@
 #define _LIBCPP___ALGORITHM_PSTL_BACKEND_H
 
 #include <__algorithm/pstl_backends/cpu_backend.h>
+#include <__algorithm/pstl_backends/gpu_backend.h>
 #include <__config>
 #include <execution>
 
@@ -179,10 +180,17 @@ struct __select_backend<std::execution::parallel_policy> {
   using type = __cpu_backend_tag;
 };
 
+#    if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+template <>
+struct __select_backend<std::execution::parallel_unsequenced_policy> {
+  using type = __gpu_backend_tag;
+};
+#    else
 template <>
 struct __select_backend<std::execution::parallel_unsequenced_policy> {
   using type = __cpu_backend_tag;
 };
+#    endif
 
 #  else
 
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backend.h b/libcxx/include/__algorithm/pstl_backends/gpu_backend.h
new file mode 100644
index 000000000000000..7237036156a1bf3
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backend.h
@@ -0,0 +1,21 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
+
+#include <__config>
+
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+
+#if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+#  include <__algorithm/pstl_backends/gpu_backends/fill.h>
+#  include <__algorithm/pstl_backends/gpu_backends/for_each.h>
+#endif
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h
new file mode 100644
index 000000000000000..a8b400afbb94d9d
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h
@@ -0,0 +1,33 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
+
+#include <__config>
+#include <cstddef>
+
+#if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+#  include <__algorithm/pstl_backends/gpu_backends/omp_offload.h>
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+#if _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+struct __gpu_backend_tag {};
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h
new file mode 100644
index 000000000000000..32926da87e2a083
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h
@@ -0,0 +1,59 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
+
+#include <__algorithm/fill.h>
+#include <__algorithm/pstl_backends/cpu_backends/backend.h>
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+#include <__config>
+#include <__iterator/concepts.h>
+#include <__type_traits/is_execution_policy.h>
+#include <__utility/terminate_on_exception.h>
+#include <stdio.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+template <class _ExecutionPolicy, class _ForwardIterator, class _Tp>
+_LIBCPP_HIDE_FROM_ABI void
+__pstl_fill(__gpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
+  // It is only safe to execute for_each on the GPU, it the execution policy is
+  // parallel unsequenced, as it is the only execution policy prohibiting throwing
+  // exceptions and allowing SIMD instructions
+  if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
+                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+    std::__par_backend::__parallel_for_simd_val_1(__first, __last - __first, __value);
+  }
+  // Else if the excution policy is parallel, we execute for_each on the CPU instead
+  else if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
+                     __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+    std::__terminate_on_exception([&] {
+      __par_backend::__parallel_for(
+          __first, __last, [&__value](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            std::__pstl_fill<__remove_parallel_policy_t<_ExecutionPolicy>>(
+                __cpu_backend_tag{}, __brick_first, __brick_last, __value);
+          });
+    });
+    // Else we execute for_each in serial
+  } else {
+    std::fill(__first, __last, __value);
+  }
+}
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h
new file mode 100644
index 000000000000000..14de2af8e4a15c6
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h
@@ -0,0 +1,59 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
+
+#include <__algorithm/for_each.h>
+#include <__algorithm/pstl_backends/cpu_backends/backend.h>
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+#include <__config>
+#include <__iterator/concepts.h>
+#include <__type_traits/is_execution_policy.h>
+#include <__utility/terminate_on_exception.h>
+#include <stdio.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+template <class _ExecutionPolicy, class _ForwardIterator, class _Functor>
+_LIBCPP_HIDE_FROM_ABI void
+__pstl_for_each(__gpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, _Functor __func) {
+  // It is only safe to execute for_each on the GPU, it the execution policy is
+  // parallel unsequenced, as it is the only execution policy prohibiting throwing
+  // exceptions and allowing SIMD instructions
+  if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
+                __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+    std::__par_backend::__parallel_for_simd_1(__first, __last - __first, __func);
+  }
+  // Else if the excution policy is parallel, we execute for_each on the CPU instead
+  else if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
+                     __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+    std::__terminate_on_exception([&] {
+      std::__par_backend::__parallel_for(
+          __first, __last, [__func](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+            std::__pstl_for_each<__remove_parallel_policy_t<_ExecutionPolicy>>(
+                __cpu_backend_tag{}, __brick_first, __brick_last, __func);
+          });
+    });
+    // Else we execute for_each in serial
+  } else {
+    std::for_each(__first, __last, __func);
+  }
+}
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h
new file mode 100644
index 000000000000000..4baa4e7f65859d1
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h
@@ -0,0 +1,94 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
+
+#include <__assert>
+#include <__config>
+#include <__utility/move.h>
+#include <cstddef>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+namespace __par_backend {
+inline namespace __omp_gpu_backend {
+
+// In OpenMP, we need to extract the pointer for the underlying data for data
+// structures like std::vector and std::array to be able to map the data to the
+// device.
+
+template <typename T>
+_LIBCPP_HIDE_FROM_ABI inline T __omp_extract_base_ptr(T p) {
+  return p;
+}
+
+template <typename T>
+_LIBCPP_HIDE_FROM_ABI inline T __omp_extract_base_ptr(std::__wrap_iter<T> w) {
+  std::pointer_traits<std::__wrap_iter<T>> PT;
+  return PT.to_address(w);
+}
+
+// Applying function or lambda in a loop
+
+template <class _Iterator, class _DifferenceType, class _Function>
+_LIBCPP_HIDE_FROM_ABI _Iterator
+__omp_parallel_for_simd_1(_Iterator __first, _DifferenceType __n, _Function __f) noexcept {
+#  pragma omp target teams distribute parallel for simd map(tofrom : __first[0 : __n])
+  for (_DifferenceType __i = 0; __i < __n; ++__i)
+    __f(__first[__i]);
+
+  return __first + __n;
+}
+
+// Extracting the underlying pointer
+
+template <class _Iterator, class _DifferenceType, class _Function>
+_LIBCPP_HIDE_FROM_ABI _Iterator __parallel_for_simd_1(_Iterator __first, _DifferenceType __n, _Function __f) noexcept {
+  __omp_parallel_for_simd_1(__omp_gpu_backend::__omp_extract_base_ptr(__first), __n, __f);
+  return __first + __n;
+}
+
+// Assigning a value in a loop
+
+template <class _Index, class _DifferenceType, class _Tp>
+_LIBCPP_HIDE_FROM_ABI _Index
+__omp_parallel_for_simd_val_1(_Index __first, _DifferenceType __n, const _Tp& __value) noexcept {
+#  pragma omp target teams distribute parallel for simd map(tofrom : __first[0 : __n]) map(to : __value)
+  for (_DifferenceType __i = 0; __i < __n; ++__i)
+    __first[__i] = __value;
+
+  return __first + __n;
+}
+
+template <class _Index, class _DifferenceType, class _Tp>
+_LIBCPP_HIDE_FROM_ABI _Index
+__parallel_for_simd_val_1(_Index __first, _DifferenceType __n, const _Tp& __value) noexcept {
+  __omp_parallel_for_simd_val_1(__omp_gpu_backend::__omp_extract_base_ptr(__first), __n, __value);
+  return __first + __n;
+}
+
+} // namespace __omp_gpu_backend
+} // namespace __par_backend
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
diff --git a/libcxx/include/__config_site.in b/libcxx/include/__config_site.in
index c85cbcd02c441b9..e0edddce3afc3ff 100644
--- a/libcxx/include/__config_site.in
+++ b/libcxx/include/__config_site.in
@@ -34,6 +34,7 @@
 #cmakedefine _LIBCPP_PSTL_CPU_BACKEND_SERIAL
 #cmakedefine _LIBCPP_PSTL_CPU_BACKEND_THREAD
 #cmakedefine _LIBCPP_PSTL_CPU_BACKEND_LIBDISPATCH
+#cmakedefine _LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD
 
 // Hardening.
 #cmakedefine01 _LIBCPP_ENABLE_HARDENED_MODE_DEFAULT

@AntonRydahl
Copy link
Contributor Author

Do we want to enable GPU offloading when compiling libcxx, for instance, by using #cmakedefine instead of #define, or should the user be able to switch it on and off when using the library?

@AntonRydahl
Copy link
Contributor Author

I made it depend on CMake options only.

@AntonRydahl AntonRydahl marked this pull request as ready for review September 22, 2023 00:09
@AntonRydahl AntonRydahl requested a review from a team as a code owner September 22, 2023 00:09
@github-actions
Copy link

github-actions bot commented Oct 2, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@AntonRydahl AntonRydahl force-pushed the libcxx_pstl_omp_offload_backend branch from a5f7752 to 96adadf Compare October 2, 2023 20:25
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the patch! I think this is really great and I'm glad we're paving the way for new backends being added in the library. Since this is the first such backend, there are some thing to change as we discussed during our live review, but I definitely see a path forward.

Note to self: did not have time to review omp_offload.h yet, will do next time.

libcxx/CMakeLists.txt Outdated Show resolved Hide resolved
libcxx/include/__algorithm/pstl_backend.h Outdated Show resolved Hide resolved
@@ -0,0 +1,349 @@
//===----------------------------------------------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached to this line:

We need to discuss how exceptions are handled when we offload to the GPU. @AntonRydahl mentioned that throw was compiled to __builtin_trap(). This is going to both fail our tests and make us non-conforming. Our preference would be that the OpenMP backend doesn't try to offload if there are exceptions in the code path we're considering offloading.

Needs more discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written a section about this in the libc++ documentation. Do you think the description is sufficiently detailed?

libcxx/include/__algorithm/pstl_backends/gpu_backend.h Outdated Show resolved Hide resolved
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually remove this header entirely until you have an implementation for it, since it'll be implemented using std::find_if. Note that if you run into issues while doing that, it's a pre-existing PSTL bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is unfortunately a bug in the PSTL logic. I cannot compile libcxx if I remove the overlay header files for any_of, find_if, or stable_sort.

template <class _T1, class _T2, class _T3>
struct _LIBCPP_HIDE_FROM_ABI __is_supported_reduction : std::false_type {};

# define __PSTL_IS_SUPPORTED_REDUCTION(funname) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something trying to emerge here. We already have libcxx/include/__type_traits/predicate_traits.h and libcxx/include/__type_traits/operation_traits.h. IMO we could unify those into something like:

template <>
struct __desugars_to<__equal_to, std::equal_to<>> : true_type {};

template <>
struct __desugars_to<ranges::equal_to, std::equal_to<>> : true_type {};

template <>
struct __desugars_to<ranges::plus, std::plus<>> : std::true_type {};

// etc...

Then, this becomes simply:

template <class _Func>
struct __is_supported_reduction : bool_constant<
    __desugars_to<_Func, std::minus<>>::value ||
    __desugars_to<_Func, std::multiplies<>>::value ||
    __desugars_to<_Func, std::logical_and<>>::value ||
    __desugars_to<_Func, std::logical_or<>>::value ||
    __desugars_to<_Func, std::bit_and<>>::value ||
    __desugars_to<_Func, std::bit_or<>>::value ||
    __desugars_to<_Func, std::bit_xor<>>::value
> {};

This would mandate some prior refactoring, but it would be quite nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not make this PR even bigger, I have added a refactor to use __desugars_to here: #68642

I hope that is what you meant. 😄

ldionne added a commit to ldionne/llvm-project that referenced this pull request Oct 4, 2023
…TION_POINT

The _LIBCPP_PSTL_CUSTOMIZATION_POINT macro was assuming that the policy
was called _RawPolicy and referencing it by name. It happened to always
work but this was definitely accidental and an oversight in the original
implementation. This patch fixes that by passing the policy to the macro
explicitly. Noticed while reviewing llvm#66968.
ldionne added a commit that referenced this pull request Oct 4, 2023
…TION_POINT (#68238)

The _LIBCPP_PSTL_CUSTOMIZATION_POINT macro was assuming that the policy
was called _RawPolicy and referencing it by name. It happened to always
work but this was definitely accidental and an oversight in the original
implementation. This patch fixes that by passing the policy to the macro
explicitly. Noticed while reviewing #66968.
@philnik777 philnik777 added the pstl Issues related to the C++17 Parallel STL label Oct 6, 2023
@EricWF
Copy link
Member

EricWF commented Oct 9, 2023

@philnik777 Last we spoke, the design work around the customization points was still ongoing. Have we locked it in now?

@philnik777
Copy link
Contributor

@philnik777 Last we spoke, the design work around the customization points was still ongoing. Have we locked it in now?

Kind-of, but not really. We'll probably get more insights while working on this. We're working closely with @AntonRydahl and others to get this backend into libc++.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look really good! Note to self: still need to dive into the implementation of most algorithms, but we discussed for_each and offloading conditions in some details.

libcxx/CMakeLists.txt Outdated Show resolved Hide resolved
libcxx/include/__algorithm/pstl_backends/openmp/any_of.h Outdated Show resolved Hide resolved
libcxx/include/__algorithm/pstl_backends/openmp/for_each.h Outdated Show resolved Hide resolved
ldionne pushed a commit that referenced this pull request Oct 14, 2023
This PR addresses a smaller detail discussed in the code review for
#66968. Currently, some
functions in the `libc++` PSTL CPU backend have been appended with a
digit to indicate the number of input iterator arguments. However, there
is no need to change the name for each version as overloading can be
used instead. This PR will make the naming more consistent in the the
CPU and the proposed OpenMP backend.
libcxx/include/__algorithm/pstl_backends/openmp/fill.h Outdated Show resolved Hide resolved
libcxx/include/__algorithm/pstl_backends/openmp/find_if.h Outdated Show resolved Hide resolved
libcxx/include/__algorithm/pstl_backends/openmp/for_each.h Outdated Show resolved Hide resolved
Comment on lines 57 to 58
// parallel unsequenced, as it is the only execution policy prohibiting throwing
// exceptions and allowing SIMD instructions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not technically correct, since exceptions can be thrown from user code even when par_unseq is used.

And unseq also allows for SIMD. So I think we need to explain that we don't have sequencing between the teams (thanks @jdoerfert !).

Also, I think it might be worth promoting this comment to openmp.h since it is not specific for for_each.

blueboxd pushed a commit to blueboxd/libcxx that referenced this pull request Oct 17, 2023
This PR addresses a smaller detail discussed in the code review for
llvm/llvm-project#66968. Currently, some
functions in the `libc++` PSTL CPU backend have been appended with a
digit to indicate the number of input iterator arguments. However, there
is no need to change the name for each version as overloading can be
used instead. This PR will make the naming more consistent in the the
CPU and the proposed OpenMP backend.

NOKEYCHECK=True
GitOrigin-RevId: f2b79ed9c6c858426b15a0374103ab901b5b2ef3
@AntonRydahl AntonRydahl requested a review from a team as a code owner October 26, 2023 05:52
@AntonRydahl
Copy link
Contributor Author

AntonRydahl commented Oct 26, 2023

When adding the logic to error out when OpenMP is not enabled, I had to add a bit of CMake logic to allow installing the library without getting that same error. On the good side, this also made it possible to enable the OpenMP toolchain when running the tests, if the OpenMP backend was selected.

I now found that a few of the tests failed, because I had not thought of handling cases where the base type of the iterator is not trivially copyable. For instance, some of the tests may throw exceptions in the copy constructor. The commit I pushed today makes those tests pass while still allowing my local tests to be executed on the GPU.

What do you think of this change?

@AntonRydahl
Copy link
Contributor Author

I have also added a number of tests that verify that vectors of primitive data types are offloaded to the GPU. They use omp_is_initial_device() to test whether the loop bodies run on the host or the device.

If there are no available devices, the new tests simply don't run:

  if (omp_get_num_devices() < 1)
    return 0;

Also, I have introduced a new feature such that they only run when the OpenMP PSTL backend is enabled:

// REQUIRES: openmp_pstl_backend

@ldionne
Copy link
Member

ldionne commented Mar 27, 2024

@jdoerfert @AntonRydahl The new CI job I added is failing due to the inability to find -lomp: https://github.com/llvm/llvm-project/actions/runs/8457375990/job/23175325016?pr=66968

@jdoerfert
Copy link
Member

jdoerfert commented Mar 28, 2024

@jdoerfert @AntonRydahl The new CI job I added is failing due to the inability to find -lomp: https://github.com/llvm/llvm-project/actions/runs/8457375990/job/23175325016?pr=66968

Can you add openmp as a runtime (or project) to the cmake invocation? We want to build our own libomp, not rely on the system.

@ldionne
Copy link
Member

ldionne commented Mar 28, 2024

@jdoerfert @AntonRydahl The new CI job I added is failing due to the inability to find -lomp: https://github.com/llvm/llvm-project/actions/runs/8457375990/job/23175325016?pr=66968

Can you add openmp as a runtime (or project) to the cmake invocation? We want to build our own libomp, not rely on the system.

Surely I need to do more than that? We are not building our own Clang, so I'd have to configure the test suite to use the just-built omp library as well, right?

@ldionne
Copy link
Member

ldionne commented Mar 28, 2024

Isn't openmp available by default on Linux?

generate-cmake -C "${MONOREPO_ROOT}/libcxx/cmake/caches/Generic-pstl-openmp.cmake"
generate-cmake-openmp -C "${MONOREPO_ROOT}/libcxx/cmake/caches/Generic-pstl-openmp.cmake"
echo "+++ Installing OpenMP and Clang"
${NINJA} -vC "${BUILD_DIR}" install
check-runtimes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this can be done in a more elegant way, @ldionne? By installing Clang and OpenMP as a project rather than a runtime, I got the build-bot to work locally.

@jdoerfert
Copy link
Member

jdoerfert commented Mar 28, 2024

@jdoerfert @AntonRydahl The new CI job I added is failing due to the inability to find -lomp: https://github.com/llvm/llvm-project/actions/runs/8457375990/job/23175325016?pr=66968

Can you add openmp as a runtime (or project) to the cmake invocation? We want to build our own libomp, not rely on the system.

Surely I need to do more than that? We are not building our own Clang, so I'd have to configure the test suite to use the just-built omp library as well, right?

If you don't have clang, put openmp as a project. That will work for CPU parallelism (not offloading). I would suggest clang though...

Isn't openmp available by default on Linux?

You can install it, for sure.

@jdoerfert
Copy link
Member

@AntonRydahl does this pass smoke tests in the builder now?

@AntonRydahl
Copy link
Contributor Author

I am uncertain why the new OpenMP buildbot failed. I guess that is the only thing blocking this PR right now. If you have time, could I get you to try to run it with ./libcxx/utils/ci/run-buildbot generic-pstl-openmp on one of your Linux machines?

@jdoerfert
Copy link
Member

I run the builder locally, it did not reproduce the errors we see here.
Nothing in the error seems to be related to openmp anyway.

FAILED: libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o 
/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/./bin/clang++ --target=x86_64-unknown-linux-gnu -DLIBCXX_BUILDING_LIBCXXABI -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_LINK_PTHREAD_LIB -D_LIBCPP_LINK_RT_LIB -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/runner/_work/llvm-project/llvm-project/libcxx/src -I/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/include/x86_64-unknown-linux-gnu/c++/v1 -I/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/include/c++/v1 -I/home/runner/_work/llvm-project/llvm-project/libcxxabi/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O2 -g -DNDEBUG -faligned-allocation -nostdinc++ -fvisibility-inlines-hidden -fvisibility=hidden -fopenmp -Wall -Wextra -Wnewline-eof -Wshadow -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wextra-semi -Wundef -Wunused-template -Wformat-nonliteral -Wno-user-defined-literals -Wno-covered-switch-default -Wno-suggest-override -Werror -std=c++2b -MD -MT libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o -MF libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o.d -o libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o -c /home/runner/_work/llvm-project/llvm-project/libcxx/src/filesystem/operations.cpp
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/./bin/clang++ --target=x86_64-unknown-linux-gnu -DLIBCXX_BUILDING_LIBCXXABI -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_LINK_PTHREAD_LIB -D_LIBCPP_LINK_RT_LIB -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/runner/_work/llvm-project/llvm-project/libcxx/src -I/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/include/x86_64-unknown-linux-gnu/c++/v1 -I/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/include/c++/v1 -I/home/runner/_work/llvm-project/llvm-project/libcxxabi/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O2 -g -DNDEBUG -faligned-allocation -nostdinc++ -fvisibility-inlines-hidden -fvisibility=hidden -fopenmp -Wall -Wextra -Wnewline-eof -Wshadow -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wextra-semi -Wundef -Wunused-template -Wformat-nonliteral -Wno-user-defined-literals -Wno-covered-switch-default -Wno-suggest-override -Werror -std=c++2b -MD -MT libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o -MF libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o.d -o libcxx/src/CMakeFiles/cxx_static.dir/filesystem/operations.cpp.o -c /home/runner/_work/llvm-project/llvm-project/libcxx/src/filesystem/operations.cpp
1.	<eof> parser at end of file
2.	Per-file LLVM IR generation
3.	/home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/include/c++/v1/string:1850:3: Generating code for declaration 'std::basic_string<char>::__set_short_size'
#0 0x000055d30970f368 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/runner/_work/llvm-project/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
#1 0x000055d30970d570 llvm::sys::RunSignalHandlers() /home/runner/_work/llvm-project/llvm-project/llvm/lib/Support/Signals.cpp:106:18
#2 0x000055d3096909b9 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /home/runner/_work/llvm-project/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
#3 0x000055d3096909b9 CrashRecoverySignalHandler(int) /home/runner/_work/llvm-project/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
#4 0x00007fb76eb59520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#5 0x000055d309bb3841 clang::AtomicType::classof(clang::Type const*) /home/runner/_work/llvm-project/llvm-project/clang/include/clang/AST/Type.h:6993:30
#6 0x000055d309bb3841 llvm::isa_impl<clang::AtomicType, clang::Type, void>::doit(clang::Type const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:64:53
#7 0x000055d309bb3841 llvm::isa_impl_cl<clang::AtomicType, clang::Type const*>::doit(clang::Type const*) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:110:12
#8 0x000055d309bb3841 llvm::isa_impl_wrap<clang::AtomicType, clang::Type const*, clang::Type const*>::doit(clang::Type const* const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:137:12
#9 0x000055d309bb3841 llvm::isa_impl_wrap<clang::AtomicType, clang::Type const* const, clang::Type const*>::doit(clang::Type const* const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:127:12
#10 0x000055d309bb3841 llvm::CastIsPossible<clang::AtomicType, clang::Type const*, void>::isPossible(clang::Type const* const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:255:12
#11 0x000055d309bb3841 llvm::CastInfo<clang::AtomicType, clang::QualType const, void>::isPossible(clang::QualType const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:509:12
#12 0x000055d309bb3841 bool llvm::isa<clang::AtomicType, clang::QualType>(clang::QualType const&) /home/runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/Casting.h:549:10
#13 0x000055d309bb3841 clang::AtomicType const* clang::Type::getAs<clang::AtomicType>() const /home/runner/_work/llvm-project/llvm-project/clang/include/clang/AST/Type.h:7922:8
#14 0x000055d314449c50 
clang++: error: clang frontend command failed with exit code 139 (use -v to see invocation)
clang version 19.0.0git (https://github.com/llvm/llvm-project b0ff6ebc3d3ad79fb6661b3b38c4abd8020f42ef)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/runner/_work/llvm-project/llvm-project/build/generic-pstl-openmp/bin
clang++: note: diagnostic msg: 
********************

@jdoerfert
Copy link
Member

@AntonRydahl I tried to rebase this but there are conflicts that I cannot easily resolve.
Can you give it a try please?

@ldionne ldionne force-pushed the libcxx_pstl_omp_offload_backend branch from 2ecd90a to f87dfc5 Compare June 13, 2024 19:21
@llvmbot llvmbot added libc++abi libc++abi C++ Runtime Library. Not libc++. github:workflow labels Jun 13, 2024
@ldionne
Copy link
Member

ldionne commented Jun 13, 2024

@jdoerfert Rebase done. There will probably be many small issues because I couldn't build this locally, but I basically resolved the rebasing issues and reformulated the backend on top of the new dispatching mechanism.

@ldionne ldionne force-pushed the libcxx_pstl_omp_offload_backend branch from f87dfc5 to 9594623 Compare June 13, 2024 19:35
@AntonRydahl
Copy link
Contributor Author

Thank you so, so much for spending time fixing this, @ldionne! I will re-run the tests and see if I can replicate the errors from the build bot.

@ldionne ldionne force-pushed the libcxx_pstl_omp_offload_backend branch from 9594623 to 736262f Compare June 14, 2024 12:48
@jdoerfert
Copy link
Member

Thanks a lot!

Bots seem clean, right?

FWIW, I can (re-)run stuff on linux machines, at least try. We also have people, incl. a GSOC student who will pick this up now.

@AntonRydahl
Copy link
Contributor Author

I think we need to wait and see if stage3 (generic-pstl-openmp, libcxx-runners-8-set) passes.

@ldionne
Copy link
Member

ldionne commented Jun 14, 2024

@AntonRydahl I don't think we ever had a clean run of the OpenMP bot, did we? IMO the testing + CI story for the OpenMP backend is the only thing blocking this PR from being merged, everything else (like improving the backend) can and should be done in separate PRs so we can capture progress here.

@AntonRydahl AntonRydahl force-pushed the libcxx_pstl_omp_offload_backend branch from 736262f to 10d408f Compare June 18, 2024 17:15
@AntonRydahl
Copy link
Contributor Author

The tests should not be able to locate libomptarget.so. Let's see what other failures we get from the build bot.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should rebase on top of main, that will solve the issues you're seeing on the macOS builders.

__libcpp_is_contiguous_iterator<_ForwardOutIterator>::value && is_trivially_copyable_v<_ValueType>) {
return std::__rewrap_iter(
__outit,
__omp_transform(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__omp_transform(
__pstl::__omp_transform(

@xevor11
Copy link

xevor11 commented Aug 15, 2024

Just pushed header files for ranges shift left and shift right for review, I was wondering if the directives and the argument types are correct, a specific format to follow? To test the files in libcxx we would have to run the executable llvm-lit in the openmp test directory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github:workflow libc++abi libc++abi C++ Runtime Library. Not libc++. libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pstl Issues related to the C++17 Parallel STL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants