Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ahayashi committed Jun 4, 2021
1 parent 8afd588 commit 959bf5b
Show file tree
Hide file tree
Showing 11 changed files with 100 additions and 30 deletions.
62 changes: 58 additions & 4 deletions doc/rst/api/gpuapi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ MID-level API Reference

.. class:: GPUArray

.. method:: proc init(ref arr)
.. method:: proc init(ref arr, pitched=false)

Allocates memory on the device. The allocation size is automatically computed by this module -i.e., ``(arr.size: size_t) * c_sizeof(arr.eltType)``.
Allocates memory on the device. The allocation size is automatically computed by this module -i.e., ``(arr.size: size_t) * c_sizeof(arr.eltType)``, which means the index space is linearlized when ``arr`` is multi-dimensional. Also, if ``arr`` is 2D and ``pitched=true``, pitched allocation is performed and the host and device pitch can be obtained by doing ``obj.hpitch`` and ``obj.dpitch``. Note that the allocated memory is automatically reclaimed when the object is deleted.

:arg arr: The reference of the non-distributed Chapel Array that will be mapped onto the device.
:arg pitched: whether pitched allocation is performed or not (default is false)

.. code-block:: chapel
:emphasize-lines: 6,21
Expand Down Expand Up @@ -121,10 +122,18 @@ MID-level API Reference
toDevice(A, B)
..
fromDevice(C);
free(A, B, C);
// GPU memory is automatically deallocated when dA, dB, and dC.
.. class:: GPUJaggedArray

LOW-MID-level API Reference
.. method:: proc init(ref arr1, ref arr2, ...)

Allocates jagged array on the device. Basically it takes a set of Chapel arrays and creates an array of arrays on the device.

.. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/jagged/jagged.chpl>`_.


MID-LOW-level API Reference
############################

.. method:: Malloc(ref devPtr: c_void_ptr, size: size_t)
Expand Down Expand Up @@ -170,6 +179,25 @@ LOW-MID-level API Reference
.. note:: ``c_sizeofo(A.eltType)`` returns the size in bytes of the element of the Chapel array ``A``. For more details, please refer to `this <https://chapel-lang.org/docs/builtins/CPtr.html#CPtr.c_sizeof>`_.



.. method:: MallocPitch(ref devPtr: c_void_ptr, ref pitch: size_t, width: size_t, height: size_t)

Allocates pitched 2D memory on the device.

:arg devPtr: Pointer to the allocated pitched 2D device array
:type devPtr: `c_voidPtr`

:arg pitch: Pitch for allocation on the device, which is set by the runtime
:type pitch: `size_t`

:arg width: The width of the original Chapel array (in bytes)
:type width: `size_t`

:arg height: The number of rows (height)
:type height: `size_t`

.. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/pitched2d/pitched2d.chpl>`_. The detailed descirption of the underlying CUDA API can be found `here <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g32bd7a39135594788a542ae72217775c>`_.

.. method:: Memcpy(dst: c_void_ptr, src: c_void_ptr, count: size_t, kind: int)

Transfers data between the host and the device
Expand Down Expand Up @@ -203,6 +231,32 @@ LOW-MID-level API Reference
.. note:: ``c_ptrTo(A)`` returns a pointer to the Chapel rectangular array ``A``. For more details, see `this document <https://chapel-lang.org/docs/builtins/CPtr.html#CPtr.c_ptrTo>`_.

.. method:: Memcpy2D(dst: c_void_ptr, dpitch: size_t, src: c_void_ptr, spitch: size_t, width: size_t, height:size_t, kind: int)

Transfers pitched 2D array between the host and the device

:arg dst: the desination address
:type dst: `c_void_ptr`

:arg dpitch: the pitch of destination memory
:type dpitch: `size_t`

:arg src: the source address
:type src: `c_void_ptr`

:arg spitch: the pitch of source memory
:type spitch: `size_t`

:arg width: the width of 2D array to be transferred (in bytes)
:type width: `size_t`

:arg height: the height of 2D array to be transferred (# of rows)
:type height: `size_t`

:arg kind: type of transfer (``0``: host-to-device, ``1``: device-to-host)
:type kind: `int`

.. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/pitched2d/pitched2d.chpl>`_. The detailed descirption of the underlying CUDA API can be found `here <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g3a58270f6775efe56c65ac47843e7cee>`_.

.. method:: Free(devPtr: c_void_ptr)

Expand Down
4 changes: 2 additions & 2 deletions doc/rst/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@

# General information about the project.
project = 'Chapel-GPU'
copyright = '2019, Rice University, 2019-2020, Georgia Institute of Technology'
copyright = '2019, Rice University, 2019-2021, Georgia Institute of Technology'
author = 'Akihiro Hayashi, Sri Raj Paul, Vivek Sarkar'

# The version info for the project you're documenting, acts as replacement for
Expand All @@ -62,7 +62,7 @@
# The short X.Y version.
version = ''
# The full version, including alpha/beta/rc tags.
release = '0.2'
release = '0.3'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
2 changes: 1 addition & 1 deletion doc/rst/details/gpuapi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The GPUAPI module provides Chapel-level GPU API. The use of the API assumes case

* Example: ``var ga = new GPUArray(A);``

* `LOW-MID-level`: Provides wrapper functions for raw GPU API functions
* `MID-LOW-level`: Provides wrapper functions for raw GPU API functions

* Example: ``var ga: c_void_ptr = Malloc(sizeInBytes);``

Expand Down
7 changes: 7 additions & 0 deletions doc/rst/history/evolution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@
Chapel-GPU Evolution
=============================================

version 0.3.0, June, 2021
############################

Version 0.3.0 adds the following new features to version 0.2.0:

- Update the GPUAPI module to add 1) pitched 2D array support at MID-LOW-level, and 2) jagged array suuport at the MID-level.

version 0.2.0, August, 2020
############################

Expand Down
2 changes: 1 addition & 1 deletion doc/rst/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This document describes the following two Chapel modules that facilitate GPU pro

* Example: :chapel:`var ga = new GPUArray(A);`

* `LOW-MID-level`: Provides wrapper functions for raw GPU API functions
* `MID-LOW-level`: Provides wrapper functions for raw GPU API functions

* Example: :chapel:`var ga: c_void_ptr = Malloc(sizeInBytes);`

Expand Down
4 changes: 2 additions & 2 deletions doc/rst/instructions/build.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Building Chapel-GPU
Prerequisites
##############

* Chapel: 1.22 or below. Detailed instructions for installing Chapel can be found: `here <https://chapel-lang.org/docs/usingchapel/QUICKSTART.html>`_.
* Chapel: 1.24.1 or below. Detailed instructions for installing Chapel can be found: `here <https://chapel-lang.org/docs/usingchapel/QUICKSTART.html>`_.

* GPU Compilers & Runtimes: GPUIterator and GPUAPI require either of the following GPU programing environments.

Expand All @@ -17,7 +17,7 @@ Prerequisites

* cmake: 3.8.0 or above

.. note:: While ``GPUIterator`` works with OpenCL, ``GPUAPI`` with OpenCL is under developement.
.. note:: While ``GPUIterator`` works with OpenCL, ``GPUAPI`` with OpenCL/SYCL is under developement.

Instructions
##############
Expand Down
17 changes: 13 additions & 4 deletions doc/rst/instructions/compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The repository has several example applications in ``chapel-gpu/example`` and ``
Matrix Multiplication, ``apps/mm``, Matrix-Matrix Multiply,
PageRank, ``apps/mm``, The pagerank algorithm, WIP
N-Queens, WIP, The n-queens problem, WIP

GPU API Examples, ``example/gpuapi``, ,

.. note:: This section assumes the Chapel-GPU components are already installed in ``$CHPL_GPU_HOME``. If you have not done so please see :ref:`Building Chapel-GPU`.

Expand Down Expand Up @@ -43,7 +43,16 @@ The example applications in ``chapel-gpu/example`` and ``chapel-gpu/apps`` direc
or
make opencl
- Example 2: ``chapel-gpu/apps/stream``
- Example 2: ``chapel-gpu/example/gpuapi``

.. code-block:: bash
cd path/to/chapel-gpu/example/gpuapi/2d
make cuda
or
make hip
- Example 3: ``chapel-gpu/apps/stream``

.. code-block:: bash
Expand All @@ -68,11 +77,11 @@ The example applications in ``chapel-gpu/example`` and ``chapel-gpu/apps`` direc
``vc.cuda.gpu``, A GPU-only implmentation w/o the GPUIterator., ``make cuda.gpu``
``vc.cuda.hybrid``, The GPUIterator implemenation (single-locale)., ``make cuda.hybrid``
``vc.cuda.hybrid.dist``, The GPUIterator implemenation (multi-locale)., ``make cuda.hybrid.dist``
``vc.cuda.hybrid.dist.lowmid``, The LOW-MID implemenation (multi-locale)., ``make cuda.hybrid.dist.lowmid``
``vc.cuda.hybrid.dist.midlow``, The MID-LOW implemenation (multi-locale)., ``make cuda.hybrid.dist.midlow``
``vc.cuda.hybrid.dist.mid``, The MID implementation (multi-locale)., ``make cuda.hybrid.dist.mid``


.. tip:: If you want to compile a specific variant, please do ``make X.Y``, where ``X`` is either ``cuda``, ``hip``, or ``opencl``, and ``Y`` is either ``gpu``, ``hybrid``, ``hybrid.dist``, ``hybrid.dist.lowmid``, or ``hybrid.dist.mid``. Please also see the third column above. Also, the LOW-MID and MID variants with OpenCL are currently not supported.
.. tip:: If you want to compile a specific variant, please do ``make X.Y``, where ``X`` is either ``cuda``, ``hip``, or ``opencl``, and ``Y`` is either ``gpu``, ``hybrid``, ``hybrid.dist``, ``hybrid.dist.midlow``, or ``hybrid.dist.mid``. Please also see the third column above. Also, the MID-LOW and MID variants with OpenCL are currently not supported.

.. note:: The ``Makefile`` internally uses ``cmake`` to generate a static library from a GPU source program (``vc.cu`` in this case)

Expand Down
12 changes: 6 additions & 6 deletions doc/rst/instructions/guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,31 +13,31 @@ In general, GPU programs should include typical host and device operations inclu

* - Level
- MID-level
- LOW-MID-level
- MID-LOW-level
- LOW-level
* - Kernel Invocation
- CUDA/HIP
- CUDA/HIP
- CUDA/HIP/OpenCL
* - Memory (de)allocations
- Chapel (MID)
- Chapel (LOW-MID)
- Chapel (MID-LOW)
- CUDA/HIP/OpenCL
* - Data transfers
- Chapel (MID)
- Chapel (LOW-MID)
- Chapel (MID-LOW)
- CUDA/HIP/OpenCL


.. seealso::

* :ref:`Writing MID-level programs`
* :ref:`MID-level API Reference`
* :ref:`Writing LOW-MID-level programs`
* :ref:`LOW-MID-level API Reference`
* :ref:`Writing MID-LOW-level programs`
* :ref:`MID-LOW-level API Reference`
* :ref:`Writing LOW-level (GPUIterator Only) programs`

.. note:: LOW/LOW-MID/MID levels can interoperate with each other.
.. note:: LOW/MID-LOW/MID levels can interoperate with each other.


Writing GPU program
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. default-domain:: chpl

=============================================
Writing LOW-MID-level programs
Writing MID-LOW-level programs
=============================================

LOW-MID-level API
MID-LOW-level API
######################

The biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
The biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:

.. code-block:: chapel
:caption: vc.hybrid.chpl
Expand All @@ -34,7 +34,7 @@ The biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is
}
}
At the LOW-MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. However, it's worth noting that this level of abstraction only provides thin wrapper functions for the CUDA/HIP/OpenCL-level API functions, which requires you to directly manipulate C types like ``c_void_ptr`` and so on. The LOW-MID is helpful particularly when you want to fine-tune the use of GPU API, but still want to stick with Chapel. Here is an example program written with the LOW-MID-level API:
At the MID-LOW-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. However, it's worth noting that this level of abstraction only provides thin wrapper functions for the CUDA/HIP/OpenCL-level API functions, which requires you to directly manipulate C types like ``c_void_ptr`` and so on. The MID-LOW is helpful particularly when you want to fine-tune the use of GPU API, but still want to stick with Chapel. Here is an example program written with the MID-LOW-level API:

.. code-block:: chapel
:caption: vc.hybrid.chpl
Expand All @@ -52,6 +52,6 @@ At the LOW-MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocat
Free(dB);
}
.. tip:: The LOW-MID-level API can interoperate with the MID-level API.
.. tip:: The MID-LOW-level API can interoperate with the MID-level API.

.. seealso:: :ref:`LOW-MID-level API Reference`
.. seealso:: :ref:`MID-LOW-level API Reference`
6 changes: 3 additions & 3 deletions doc/rst/instructions/mid.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Writing MID-level programs
MID-level API
######################

To reiterate, the biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
To reiterate, the biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:

.. code-block:: chapel
:caption: vc.hybrid.chpl
Expand All @@ -34,7 +34,7 @@ To reiterate, the biggest motivation for introducing ``LOW-MID`` and ``MID`` -le
}
}
At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. Also, unlike the LOW-MID level, the MID-level API is more Chapel programmer-friendly, where you can allocate GPU memory using the ``new`` keyword and no longer need to directly manipulate C types. Here is an example program written with the MID-level API:
At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. Also, unlike the MID-LOW level, the MID-level API is more Chapel programmer-friendly, where you can allocate GPU memory using the ``new`` keyword and no longer need to directly manipulate C types. Here is an example program written with the MID-level API:


.. code-block:: chapel
Expand All @@ -50,7 +50,7 @@ At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation,
free(dA, dB);
}
.. tip:: The MID-level API can interoperate with the LOW-MID-level API.
.. tip:: The MID-level API can interoperate with the MID-LOW-level API.

.. seealso:: :ref:`MID-level API Reference`

2 changes: 1 addition & 1 deletion doc/rst/instructions/write.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Using Chapel-GPU
:caption: Step-by-step Guide

low
low-mid
mid-low
mid
compile
guide

0 comments on commit 959bf5b

Please sign in to comment.