Update docs

ahayashi · Jun 4, 2021 · 959bf5b · 959bf5b
1 parent 8afd588
commit 959bf5b
Show file tree

Hide file tree

Showing 11 changed files with 100 additions and 30 deletions.
diff --git a/doc/rst/api/gpuapi.rst b/doc/rst/api/gpuapi.rst
@@ -9,11 +9,12 @@ MID-level API Reference
 
 .. class:: GPUArray
 
-   .. method:: proc init(ref arr)
+   .. method:: proc init(ref arr, pitched=false)
 
-      Allocates memory on the device. The allocation size is automatically computed by this module -i.e., ``(arr.size: size_t) * c_sizeof(arr.eltType)``.
+      Allocates memory on the device. The allocation size is automatically computed by this module -i.e., ``(arr.size: size_t) * c_sizeof(arr.eltType)``, which means the index space is linearlized when ``arr`` is multi-dimensional. Also, if ``arr`` is 2D and ``pitched=true``, pitched allocation is performed and the host and device pitch can be obtained by doing ``obj.hpitch`` and ``obj.dpitch``. Note that the allocated memory is automatically reclaimed when the object is deleted.
 
       :arg arr: The reference of the non-distributed Chapel Array that will be mapped onto the device.
+      :arg pitched: whether pitched allocation is performed or not (default is false)
 
       .. code-block:: chapel
          :emphasize-lines: 6,21
@@ -121,10 +122,18 @@ MID-level API Reference
    toDevice(A, B)
    ..
    fromDevice(C);
-   free(A, B, C);
+   // GPU memory is automatically deallocated when dA, dB, and dC.
 
+.. class:: GPUJaggedArray
 
-LOW-MID-level API Reference
+   .. method:: proc init(ref arr1, ref arr2, ...)
+
+      Allocates jagged array on the device. Basically it takes a set of Chapel arrays and creates an array of arrays on the device.
+
+   .. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/jagged/jagged.chpl>`_.
+
+
+MID-LOW-level API Reference
 ############################
 
 .. method:: Malloc(ref devPtr: c_void_ptr, size: size_t)
@@ -170,6 +179,25 @@ LOW-MID-level API Reference
    .. note:: ``c_sizeofo(A.eltType)`` returns the size in bytes of the element of the Chapel array ``A``. For more details, please refer to `this <https://chapel-lang.org/docs/builtins/CPtr.html#CPtr.c_sizeof>`_.
 
 
+
+.. method:: MallocPitch(ref devPtr: c_void_ptr, ref pitch: size_t, width: size_t, height: size_t)
+
+   Allocates pitched 2D memory on the device.
+
+   :arg devPtr: Pointer to the allocated pitched 2D device array
+   :type devPtr: `c_voidPtr`
+
+   :arg pitch: Pitch for allocation on the device, which is set by the runtime
+   :type pitch: `size_t`
+
+   :arg width: The width of the original Chapel array (in bytes)
+   :type width: `size_t`
+
+   :arg height: The number of rows (height)
+   :type height: `size_t`
+
+   .. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/pitched2d/pitched2d.chpl>`_. The detailed descirption of the underlying CUDA API can be found `here <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g32bd7a39135594788a542ae72217775c>`_.
+
 .. method:: Memcpy(dst: c_void_ptr, src: c_void_ptr, count: size_t, kind: int)
 
    Transfers data between the host and the device
@@ -203,6 +231,32 @@ LOW-MID-level API Reference
 
    .. note:: ``c_ptrTo(A)`` returns a pointer to the Chapel rectangular array ``A``. For more details, see `this document <https://chapel-lang.org/docs/builtins/CPtr.html#CPtr.c_ptrTo>`_.
 
+.. method:: Memcpy2D(dst: c_void_ptr, dpitch: size_t, src: c_void_ptr, spitch: size_t, width: size_t, height:size_t, kind: int)
+
+   Transfers pitched 2D array between the host and the device
+
+   :arg dst: the desination address
+   :type dst: `c_void_ptr`
+
+   :arg dpitch: the pitch of destination memory
+   :type dpitch: `size_t`
+
+   :arg src: the source address
+   :type src: `c_void_ptr`
+
+   :arg spitch: the pitch of source memory
+   :type spitch: `size_t`
+
+   :arg width: the width of 2D array to be transferred (in bytes)
+   :type width: `size_t`
+
+   :arg height: the height of 2D array to be transferred (# of rows)
+   :type height: `size_t`
+
+   :arg kind: type of transfer (``0``: host-to-device, ``1``: device-to-host)
+   :type kind: `int`
+
+   .. note:: A working example can be found `here <https://github.com/ahayashi/chapel-gpu/blob/master/example/gpuapi/pitched2d/pitched2d.chpl>`_. The detailed descirption of the underlying CUDA API can be found `here <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g3a58270f6775efe56c65ac47843e7cee>`_.
 
 .. method:: Free(devPtr: c_void_ptr)
 

diff --git a/doc/rst/conf.py b/doc/rst/conf.py
@@ -52,7 +52,7 @@
 
 # General information about the project.
 project = 'Chapel-GPU'
-copyright = '2019, Rice University, 2019-2020, Georgia Institute of Technology'
+copyright = '2019, Rice University, 2019-2021, Georgia Institute of Technology'
 author = 'Akihiro Hayashi, Sri Raj Paul, Vivek Sarkar'
 
 # The version info for the project you're documenting, acts as replacement for
@@ -62,7 +62,7 @@
 # The short X.Y version.
 version = ''
 # The full version, including alpha/beta/rc tags.
-release = '0.2'
+release = '0.3'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/doc/rst/details/gpuapi.rst b/doc/rst/details/gpuapi.rst
@@ -13,7 +13,7 @@ The GPUAPI module provides Chapel-level GPU API. The use of the API assumes case
 
   * Example: ``var ga = new GPUArray(A);``
 
-* `LOW-MID-level`: Provides wrapper functions for raw GPU API functions
+* `MID-LOW-level`: Provides wrapper functions for raw GPU API functions
 
   * Example: ``var ga: c_void_ptr = Malloc(sizeInBytes);``
 

diff --git a/doc/rst/history/evolution.rst b/doc/rst/history/evolution.rst
@@ -2,6 +2,13 @@
 Chapel-GPU Evolution
 =============================================
 
+version 0.3.0, June, 2021
+############################
+
+Version 0.3.0 adds the following new features to version 0.2.0:
+
+- Update the GPUAPI module to add 1) pitched 2D array support at MID-LOW-level, and 2) jagged array suuport at the MID-level.
+
 version 0.2.0, August, 2020
 ############################
 

diff --git a/doc/rst/index.rst b/doc/rst/index.rst
@@ -16,7 +16,7 @@ This document describes the following two Chapel modules that facilitate GPU pro
 
        * Example: :chapel:`var ga = new GPUArray(A);`
 
-    * `LOW-MID-level`: Provides wrapper functions for raw GPU API functions
+    * `MID-LOW-level`: Provides wrapper functions for raw GPU API functions
 
        * Example: :chapel:`var ga: c_void_ptr = Malloc(sizeInBytes);`
 

diff --git a/doc/rst/instructions/build.rst b/doc/rst/instructions/build.rst
@@ -5,7 +5,7 @@ Building Chapel-GPU
 Prerequisites
 ##############
 
-* Chapel: 1.22 or below. Detailed instructions for installing Chapel can be found: `here <https://chapel-lang.org/docs/usingchapel/QUICKSTART.html>`_.
+* Chapel: 1.24.1 or below. Detailed instructions for installing Chapel can be found: `here <https://chapel-lang.org/docs/usingchapel/QUICKSTART.html>`_.
 
 * GPU Compilers & Runtimes: GPUIterator and GPUAPI require either of the following GPU programing environments.
 
@@ -17,7 +17,7 @@ Prerequisites
 
    * cmake: 3.8.0 or above
 
-.. note:: While ``GPUIterator`` works with OpenCL, ``GPUAPI`` with OpenCL is under developement.
+.. note:: While ``GPUIterator`` works with OpenCL, ``GPUAPI`` with OpenCL/SYCL is under developement.
 
 Instructions
 ##############

diff --git a/doc/rst/instructions/compile.rst b/doc/rst/instructions/compile.rst
@@ -15,7 +15,7 @@ The repository has several example applications in ``chapel-gpu/example`` and ``
    Matrix Multiplication, ``apps/mm``, Matrix-Matrix Multiply,
    PageRank, ``apps/mm``, The pagerank algorithm, WIP
    N-Queens, WIP, The n-queens problem, WIP
-
+   GPU API Examples, ``example/gpuapi``, ,
 
 .. note:: This section assumes the Chapel-GPU components are already installed in ``$CHPL_GPU_HOME``. If you have not done so please see :ref:`Building Chapel-GPU`.
 
@@ -43,7 +43,16 @@ The example applications in ``chapel-gpu/example`` and ``chapel-gpu/apps`` direc
         or
         make opencl
 
-   - Example 2: ``chapel-gpu/apps/stream``
+   - Example 2: ``chapel-gpu/example/gpuapi``
+
+     .. code-block:: bash
+
+        cd path/to/chapel-gpu/example/gpuapi/2d
+        make cuda
+        or
+        make hip
+
+   - Example 3: ``chapel-gpu/apps/stream``
 
      .. code-block:: bash
 
@@ -68,11 +77,11 @@ The example applications in ``chapel-gpu/example`` and ``chapel-gpu/apps`` direc
       ``vc.cuda.gpu``, A GPU-only implmentation w/o the GPUIterator., ``make cuda.gpu``
       ``vc.cuda.hybrid``, The GPUIterator implemenation (single-locale)., ``make cuda.hybrid``
       ``vc.cuda.hybrid.dist``, The GPUIterator implemenation (multi-locale)., ``make cuda.hybrid.dist``
-      ``vc.cuda.hybrid.dist.lowmid``, The LOW-MID implemenation (multi-locale)., ``make cuda.hybrid.dist.lowmid``
+      ``vc.cuda.hybrid.dist.midlow``, The MID-LOW implemenation (multi-locale)., ``make cuda.hybrid.dist.midlow``
       ``vc.cuda.hybrid.dist.mid``, The MID implementation (multi-locale)., ``make cuda.hybrid.dist.mid``
 
 
-   .. tip:: If you want to compile a specific variant, please do ``make X.Y``, where ``X`` is either ``cuda``, ``hip``, or ``opencl``, and ``Y`` is either ``gpu``, ``hybrid``, ``hybrid.dist``, ``hybrid.dist.lowmid``, or ``hybrid.dist.mid``. Please also see the third column above. Also, the LOW-MID and MID variants with OpenCL are currently not supported.
+   .. tip:: If you want to compile a specific variant, please do ``make X.Y``, where ``X`` is either ``cuda``, ``hip``, or ``opencl``, and ``Y`` is either ``gpu``, ``hybrid``, ``hybrid.dist``, ``hybrid.dist.midlow``, or ``hybrid.dist.mid``. Please also see the third column above. Also, the MID-LOW and MID variants with OpenCL are currently not supported.
 
   .. note:: The ``Makefile`` internally uses ``cmake`` to generate a static library from a GPU source program (``vc.cu`` in this case)
 

diff --git a/doc/rst/instructions/guide.rst b/doc/rst/instructions/guide.rst
@@ -13,31 +13,31 @@ In general, GPU programs should include typical host and device operations inclu
 
    * - Level
      - MID-level
-     - LOW-MID-level
+     - MID-LOW-level
      - LOW-level
    * - Kernel Invocation
      - CUDA/HIP
      - CUDA/HIP
      - CUDA/HIP/OpenCL
    * - Memory (de)allocations
      - Chapel (MID)
-     - Chapel (LOW-MID)
+     - Chapel (MID-LOW)
      - CUDA/HIP/OpenCL
    * - Data transfers
      - Chapel (MID)
-     - Chapel (LOW-MID)
+     - Chapel (MID-LOW)
      - CUDA/HIP/OpenCL
 
 
 .. seealso::
 
    * :ref:`Writing MID-level programs`
    * :ref:`MID-level API Reference`
-   * :ref:`Writing LOW-MID-level programs`
-   * :ref:`LOW-MID-level API Reference`
+   * :ref:`Writing MID-LOW-level programs`
+   * :ref:`MID-LOW-level API Reference`
    * :ref:`Writing LOW-level (GPUIterator Only) programs`
 
-.. note:: LOW/LOW-MID/MID levels can interoperate with each other.
+.. note:: LOW/MID-LOW/MID levels can interoperate with each other.
 
 
 Writing GPU program

diff --git a/doc/rst/instructions/low-mid.rst → doc/rst/instructions/mid-low.rst b/doc/rst/instructions/low-mid.rst → doc/rst/instructions/mid-low.rst
@@ -1,13 +1,13 @@
 .. default-domain:: chpl
 
 =============================================
-Writing LOW-MID-level programs
+Writing MID-LOW-level programs
 =============================================
 
-LOW-MID-level API
+MID-LOW-level API
 ######################
 
-The biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
+The biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
 
 .. code-block:: chapel
    :caption: vc.hybrid.chpl
@@ -34,7 +34,7 @@ The biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is
      }
    }
 
-At the LOW-MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. However, it's worth noting that this level of abstraction only provides thin wrapper functions for the CUDA/HIP/OpenCL-level API functions, which requires you to directly manipulate C types like ``c_void_ptr`` and so on. The LOW-MID is helpful particularly when you want to fine-tune the use of GPU API, but still want to stick with Chapel. Here is an example program written with the LOW-MID-level API:
+At the MID-LOW-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. However, it's worth noting that this level of abstraction only provides thin wrapper functions for the CUDA/HIP/OpenCL-level API functions, which requires you to directly manipulate C types like ``c_void_ptr`` and so on. The MID-LOW is helpful particularly when you want to fine-tune the use of GPU API, but still want to stick with Chapel. Here is an example program written with the MID-LOW-level API:
 
 .. code-block:: chapel
    :caption: vc.hybrid.chpl
@@ -52,6 +52,6 @@ At the LOW-MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocat
      Free(dB);     
    }
 
-.. tip:: The LOW-MID-level API can interoperate with the MID-level API.
+.. tip:: The MID-LOW-level API can interoperate with the MID-level API.
 
-.. seealso:: :ref:`LOW-MID-level API Reference`
+.. seealso:: :ref:`MID-LOW-level API Reference`
diff --git a/doc/rst/instructions/mid.rst b/doc/rst/instructions/mid.rst
@@ -7,7 +7,7 @@ Writing MID-level programs
 MID-level API
 ######################
 
-To reiterate, the biggest motivation for introducing ``LOW-MID`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
+To reiterate, the biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function:
 
 .. code-block:: chapel
    :caption: vc.hybrid.chpl
@@ -34,7 +34,7 @@ To reiterate, the biggest motivation for introducing ``LOW-MID`` and ``MID`` -le
      }
    }
 
-At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. Also, unlike the LOW-MID level, the MID-level API is more Chapel programmer-friendly, where you can allocate GPU memory using the ``new`` keyword and no longer need to directly manipulate C types. Here is an example program written with the MID-level API:
+At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. Also, unlike the MID-LOW level, the MID-level API is more Chapel programmer-friendly, where you can allocate GPU memory using the ``new`` keyword and no longer need to directly manipulate C types. Here is an example program written with the MID-level API:
 
 
 .. code-block:: chapel
@@ -50,7 +50,7 @@ At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation,
      free(dA, dB);
    }
 
-.. tip:: The MID-level API can interoperate with the LOW-MID-level API.
+.. tip:: The MID-level API can interoperate with the MID-LOW-level API.
 
 .. seealso:: :ref:`MID-level API Reference`
 
diff --git a/doc/rst/instructions/write.rst b/doc/rst/instructions/write.rst
@@ -7,7 +7,7 @@ Using Chapel-GPU
    :caption: Step-by-step Guide
 
    low
-   low-mid
+   mid-low
    mid
    compile
    guide
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,7 +7,7 @@ Using Chapel-GPU @@
        :caption: Step-by-step Guide
        low
-       low-mid
+       mid-low
        mid
        compile
        guide