From e10de135173c26ca89595e95b7ee08237b51af00 Mon Sep 17 00:00:00 2001 From: Engin Kayraklioglu Date: Wed, 18 Sep 2024 10:04:59 -0700 Subject: [PATCH 1/4] GPU technote updates for 2.2 Signed-off-by: Engin Kayraklioglu --- doc/rst/technotes/gpu.rst | 72 ++++++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 28 deletions(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index 68ed74d7dfb..f9193212a20 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -5,8 +5,10 @@ GPU Programming =============== -Chapel can be used to program GPUs. Currently NVIDIA and AMD GPUs are -supported. Support for Intel GPUs is planned but not implemented, yet. +Chapel can be used to program GPUs. The `GPU Programming in Chapel series +`_ is a good +resource for getting started with GPU programming in Chapel. This technote has +some examples, but it is closer to a reference manual than a tutorial. .. warning:: @@ -18,27 +20,38 @@ supported. Support for Intel GPUs is planned but not implemented, yet. Overview -------- -The Chapel compiler will generate GPU kernels for certain ``forall`` and -``foreach`` loops and launch these onto a GPU when the current locale (e.g. -``here``) is assigned to a special (sub)locale representing a GPU. To deploy -code to a GPU, put the relevant code in an ``on`` statement targeting a GPU -sublocale (i.e. ``here.gpus[0]``). +The Chapel compiler will generate GPU kernels for certain parallel operations +such as ``forall``/``foreach`` loops, ``reduce`` expressions and promoted +expressions. These will be launched onto a GPU when the current locale (e.g. +``here``) is the sublocale representing that particluar GPU. To deploy code to a +GPU, put the relevant code in an ``on`` statement targeting a GPU sublocale +(i.e. ``here.gpus[0]``). Any arrays that are declared by tasks executing on a GPU sublocale will, by default, be accessible on the GPU (see the `Memory Strategies`_ subsection for more information about alternate memory strategies). -Chapel will launch kernels for all eligible loops that are encountered by tasks -executing on a GPU sublocale. Loops are eligible when: +Chapel will launch kernels for all eligible data-parallel operations that are +encountered by tasks executing on a GPU sublocale. Expressions are eligible +when: + +* They are order-independent, such as: + + * `forall <../users-guide/datapar/forall.html>`_ or `foreach `_ + loops over iterators that are also order-independent (i.e. the yielding loop + uses ``foreach`` loops instead of ``for``. All Chapel iterators of ranges, + domains and arrays are order-independent), + + * ``reduce`` expressions over order-independent iterators, + + * Promoted expressions over order-independent iterators. -* They are order-independent. i.e., `forall - <../users-guide/datapar/forall.html>`_ or `foreach `_ loops over - iterators that are also order-independent. -* They only make use of known compiler primitives that are fast and local. Here - "fast" means "safe to run in a signal handler" and "local" means "doesn't - cause any network communication". * They do not call out to ``extern`` functions (aside from those in an exempted set of Chapel runtime functions). + +* They do not allocate memory dynamically (i.e. no class instances or Chapel + arrays are created within). + * They are free of any call to a function that fails to meet the above criteria or accesses outer variables. @@ -120,8 +133,9 @@ used with GPU support. The following are further requirements for GPU support: -* For targeting NVIDIA or AMD GPUs, ``LLVM`` must be used as Chapel's backend - compiler (i.e. ``CHPL_LLVM`` must be set to ``system`` or ``bundled``). +* For targeting NVIDIA or AMD GPUs, the default ``LLVM`` backend must be used as + Chapel's backend compiler (i.e. ``CHPL_LLVM`` must be set to ``system`` or + ``bundled``). * Note that ``CHPL_TARGET_COMPILER`` must be ``llvm``. This is the default when ``CHPL_LLVM`` is set to ``system`` or ``bundled``. @@ -142,14 +156,16 @@ The following are further requirements for GPU support: * Specifically for targeting AMD GPUs: - * ROCm version between 4.x and 5.4 or between ROCm 6.0 and 6.2 must be installed. + * ROCm version between 4.x and 5.4 or between ROCm 6.0 and 6.2 must be + installed. * For ROCm 5.x, ``CHPL_LLVM`` must be set to ``system``. Note that, ROCm installations come with LLVM. Setting ``CHPL_LLVM=system`` will allow you to use that LLVM. - * For ROCm 6.x, only LLVM 18+ is supported. Currently, only - ``CHPL_LLVM=bundled`` is supported due to bugs in LLVM. + * For ROCm 6.x, only ``CHPL_LLVM=bundled`` is supported. The bundled LLVM is + version 18 with a patch to support ROCm 6. Said patch is in LLVM 19, as such + we expect to support system LLVM 19+ with ROCm 6 in the upcoming releases. * Specifically for using the `CPU-as-Device mode`_: @@ -427,16 +443,16 @@ For more examples see the tests under |multi_locale_dir|_ available from our Reductions and Scans ~~~~~~~~~~~~~~~~~~~~ +``+``, ``min`` and ``max`` reductions are supported via ``reduce`` expressions +and intents. We are working towards expanding this to other kinds of reductions +and ``scan`` expressions and deprecating the mentioned functions in the +:mod:`GPU` module. + The :mod:`GPU` module has standalone functions for basic reductions (e.g. :proc:`~GPU.gpuSumReduce`) and scans (e.g. :proc:`~GPU.gpuScan`). We expect these functions to be deprecated in favor of ``reduce`` and ``scan`` expressions in a future release. -As of Chapel 2.1, ``+``, ``min`` and ``max`` reductions are supported via -``reduce`` expressions and intents. We are working towards expanding this to -other kinds of reductions and ``scan`` expressions and deprecating the mentioned -functions in the :mod:`GPU` module. - Device-to-Device Communication Support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Chapel supports direct communication between interconnected GPUs. The supported @@ -616,11 +632,11 @@ Tested Configurations --------------------- We have experience with the following hardware and software versions. The ones -marked with * are covered in our nightly testing configuration. +marked with * are covered in our nightly testing configurations. * NVIDIA - * Hardware: RTX A2000, P100*, V100*, A100* and H100 + * Hardware: RTX A2000, P100*, V100*, A100*, H100, GH200 * Software: CUDA 11.3*, 11.6, 11.8*, 12.0, 12.2*, 12.4 @@ -628,7 +644,7 @@ marked with * are covered in our nightly testing configuration. * Hardware: MI60*, MI100 and MI250X* - * Software:ROCm 4.2*, 4.4, 5.4*, 6.0, 6.1, 6.2 + * Software:ROCm 4.2*, 4.4, 5.4*, 6.0, 6.1, 6.2* GPU Support on Windows Subsystem for Linux From 10f552a035abe84ba1510b4f1b0b715b0a177de5 Mon Sep 17 00:00:00 2001 From: Engin Kayraklioglu Date: Wed, 18 Sep 2024 10:11:07 -0700 Subject: [PATCH 2/4] Add the blog series at the end as well Signed-off-by: Engin Kayraklioglu --- doc/rst/technotes/gpu.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index f9193212a20..4cd1778e455 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -666,6 +666,10 @@ for more information on using Chapel with WSL. Further Information ------------------- +* The `GPU Programming in Chapel series + `_ is a good + resource for getting started with GPU programming in Chapel. + * Please refer to issues with `GPU Support label `_ for other known limitations and issues. From 1e8d36657d9232a06b0a3ba96d0844b07a86568a Mon Sep 17 00:00:00 2001 From: Engin Kayraklioglu Date: Wed, 18 Sep 2024 10:51:26 -0700 Subject: [PATCH 3/4] Drop details about ROCm 6 Signed-off-by: Engin Kayraklioglu --- doc/rst/technotes/gpu.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index 4cd1778e455..c8c013ae425 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -163,9 +163,7 @@ The following are further requirements for GPU support: installations come with LLVM. Setting ``CHPL_LLVM=system`` will allow you to use that LLVM. - * For ROCm 6.x, only ``CHPL_LLVM=bundled`` is supported. The bundled LLVM is - version 18 with a patch to support ROCm 6. Said patch is in LLVM 19, as such - we expect to support system LLVM 19+ with ROCm 6 in the upcoming releases. + * For ROCm 6.x, only ``CHPL_LLVM=bundled`` is supported. * Specifically for using the `CPU-as-Device mode`_: From 05d2c29570ea50bda8ed8b2977ad8d6630bffea3 Mon Sep 17 00:00:00 2001 From: Engin Kayraklioglu Date: Wed, 18 Sep 2024 12:16:56 -0700 Subject: [PATCH 4/4] Take Andy's suggestion for the intro Signed-off-by: Engin Kayraklioglu --- doc/rst/technotes/gpu.rst | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index c8c013ae425..272279c634c 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -5,10 +5,16 @@ GPU Programming =============== -Chapel can be used to program GPUs. The `GPU Programming in Chapel series -`_ is a good -resource for getting started with GPU programming in Chapel. This technote has -some examples, but it is closer to a reference manual than a tutorial. +Chapel enables developers to use parallelism at different levels: from +intra-node multicore parallelism, to cross-node distributed parallelism, to +GPUs. This technote serves as a reference on how to use Chapel to program GPUs. +Specifically, it gives a quick overview of GPU programming, includes a handful +of examples, discusses system requirements and current limitations for GPU +support, and delves into more details on some specific GPU-related features. + +Readers preferring a more tutorial-like introduction to Chapel's GPU support, +may also wish to look at our `GPU Programming in Chapel +`_ blog series. .. warning::