|
1 | 1 | .. meta::
|
2 | 2 | :description: This chapter presents how to port the CUDA driver API and showcases equivalent operations in HIP.
|
3 |
| - :keywords: AMD, ROCm, HIP, CUDA, driver API |
| 3 | + :keywords: AMD, ROCm, HIP, CUDA, driver API, porting, port |
4 | 4 |
|
5 | 5 | .. _porting_driver_api:
|
6 | 6 |
|
7 | 7 | *******************************************************************************
|
8 | 8 | Porting CUDA driver API
|
9 | 9 | *******************************************************************************
|
10 | 10 |
|
11 |
| -NVIDIA provides separate CUDA driver and runtime APIs. The two APIs have |
12 |
| -significant overlap in functionality: |
13 |
| - |
14 |
| -* Both APIs support events, streams, memory management, memory copy, and error |
15 |
| - handling. |
16 |
| - |
17 |
| -* Both APIs deliver similar performance. |
| 11 | +CUDA provides separate driver and runtime APIs. The two APIs generally provide |
| 12 | +the same functionality, however the driver API allows for more fine-grained |
| 13 | +control over initialization and context- and module-management. This is all |
| 14 | +taken care of implicitly by the runtime API. |
18 | 15 |
|
19 | 16 | * Driver API calls begin with the prefix ``cu``, while runtime API calls begin
|
20 | 17 | with the prefix ``cuda``. For example, the driver API contains
|
21 | 18 | ``cuEventCreate``, while the runtime API contains ``cudaEventCreate``, which
|
22 | 19 | has similar functionality.
|
23 | 20 |
|
24 |
| -* The driver API defines a different, but largely overlapping, error code space |
25 |
| - than the runtime API and uses a different coding convention. For example, the |
26 |
| - driver API defines ``CUDA_ERROR_INVALID_VALUE``, while the runtime API defines |
27 |
| - ``cudaErrorInvalidValue``. |
| 21 | +* The driver API offers two additional functionalities not directly provided by |
| 22 | + the runtime API: ``cuModule`` and ``cuCtx`` APIs. |
28 | 23 |
|
29 |
| -The driver API offers two additional functionalities not provided by the runtime |
30 |
| -API: ``cuModule`` and ``cuCtx`` APIs. |
| 24 | +HIP does not explicitly provide two different APIs, the corresponding functions |
| 25 | +for the CUDA driver API are available in the HIP runtime API, and are usually |
| 26 | +prefixed with ``hipDrv``. The module and context functionality is available with |
| 27 | +the ``hipModule`` and ``hipCtx`` prefix. |
31 | 28 |
|
32 | 29 | cuModule API
|
33 | 30 | ================================================================================
|
@@ -123,8 +120,8 @@ HIPIFY translation of CUDA driver API
|
123 | 120 | The HIPIFY tools convert CUDA driver APIs for streams, events, modules, devices, memory management, context, and the profiler to the equivalent HIP calls. For example, ``cuEventCreate`` is translated to ``hipEventCreate``.
|
124 | 121 | HIPIFY tools also convert error codes from the driver namespace and coding conventions to the equivalent HIP error code. HIP unifies the APIs for these common functions.
|
125 | 122 |
|
126 |
| -The memory copy API requires additional explanation. The CUDA driver includes the memory direction in the name of the API (``cuMemcpyH2D``), while the CUDA driver API provides a single memory copy API with a parameter that specifies the direction. It also supports a "default" direction where the runtime determines the direction automatically. |
127 |
| -HIP provides APIs with both styles, for example, ``hipMemcpyH2D`` as well as ``hipMemcpy``. |
| 123 | +The memory copy API requires additional explanation. The CUDA driver includes the memory direction in the name of the API (``cuMemcpyH2D``), while the CUDA runtime API provides a single memory copy API with a parameter that specifies the direction. It also supports a "default" direction where the runtime determines the direction automatically. |
| 124 | +HIP provides both versions, for example, ``hipMemcpyH2D`` as well as ``hipMemcpy``. |
128 | 125 | The first version might be faster in some cases because it avoids any host overhead to detect the different memory directions.
|
129 | 126 |
|
130 | 127 | HIP defines a single error space and uses camel case for all errors (i.e. ``hipErrorInvalidValue``).
|
@@ -547,3 +544,67 @@ The HIP version number is defined as an integer:
|
547 | 544 | .. code-block:: cpp
|
548 | 545 |
|
549 | 546 | HIP_VERSION=HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 + HIP_VERSION_PATCH
|
| 547 | +
|
| 548 | +******************************************************************************** |
| 549 | +CU_POINTER_ATTRIBUTE_MEMORY_TYPE |
| 550 | +******************************************************************************** |
| 551 | + |
| 552 | +To get the pointer's memory type in HIP, developers should use |
| 553 | +:cpp:func:`hipPointerGetAttributes`. First parameter of the function is |
| 554 | +`hipPointerAttribute_t`. Its ``type`` member variable indicates whether the |
| 555 | +memory pointed to is allocated on the device or the host. |
| 556 | + |
| 557 | +For example: |
| 558 | + |
| 559 | +.. code-block:: cpp |
| 560 | +
|
| 561 | + double * ptr; |
| 562 | + hipMalloc(&ptr, sizeof(double)); |
| 563 | + hipPointerAttribute_t attr; |
| 564 | + hipPointerGetAttributes(&attr, ptr); /*attr.type is hipMemoryTypeDevice*/ |
| 565 | + if(attr.type == hipMemoryTypeDevice) |
| 566 | + std::cout << "ptr is of type hipMemoryTypeDevice" << std::endl; |
| 567 | +
|
| 568 | + double* ptrHost; |
| 569 | + hipHostMalloc(&ptrHost, sizeof(double)); |
| 570 | + hipPointerAttribute_t attr; |
| 571 | + hipPointerGetAttributes(&attr, ptrHost); /*attr.type is hipMemoryTypeHost*/ |
| 572 | + if(attr.type == hipMemorTypeHost) |
| 573 | + std::cout << "ptrHost is of type hipMemoryTypeHost" << std::endl; |
| 574 | +
|
| 575 | +Note that ``hipMemoryType`` enum values are different from the |
| 576 | +``cudaMemoryType`` enum values. |
| 577 | + |
| 578 | +For example, on AMD platform, `hipMemoryType` is defined in `hip_runtime_api.h`, |
| 579 | + |
| 580 | +.. code-block:: cpp |
| 581 | +
|
| 582 | + typedef enum hipMemoryType { |
| 583 | + hipMemoryTypeHost = 0, ///< Memory is physically located on host |
| 584 | + hipMemoryTypeDevice = 1, ///< Memory is physically located on device. (see deviceId for specific device) |
| 585 | + hipMemoryTypeArray = 2, ///< Array memory, physically located on device. (see deviceId for specific device) |
| 586 | + hipMemoryTypeUnified = 3, ///< Not used currently |
| 587 | + hipMemoryTypeManaged = 4 ///< Managed memory, automaticallly managed by the unified memory system |
| 588 | + } hipMemoryType; |
| 589 | +
|
| 590 | +Looking into CUDA toolkit, it defines `cudaMemoryType` as following, |
| 591 | + |
| 592 | +.. code-block:: cpp |
| 593 | +
|
| 594 | + enum cudaMemoryType |
| 595 | + { |
| 596 | + cudaMemoryTypeUnregistered = 0, // Unregistered memory. |
| 597 | + cudaMemoryTypeHost = 1, // Host memory. |
| 598 | + cudaMemoryTypeDevice = 2, // Device memory. |
| 599 | + cudaMemoryTypeManaged = 3, // Managed memory |
| 600 | + } |
| 601 | +
|
| 602 | +In this case, memory type translation for `hipPointerGetAttributes` needs to be handled properly on NVIDIA platform to get the correct memory type in CUDA, which is done in the file `nvidia_hip_runtime_api.h`. |
| 603 | + |
| 604 | +So in any HIP applications which use HIP APIs involving memory types, developers should use `#ifdef` in order to assign the correct enum values depending on NVIDIA or AMD platform. |
| 605 | + |
| 606 | +As an example, please see the code from the `link <https://github.com/ROCm/hip-tests/tree/develop/catch/unit/memory/hipMemcpyParam2D.cc>`_. |
| 607 | + |
| 608 | +With the `#ifdef` condition, HIP APIs work as expected on both AMD and NVIDIA platforms. |
| 609 | + |
| 610 | +Note, `cudaMemoryTypeUnregistered` is currently not supported as `hipMemoryType` enum, due to HIP functionality backward compatibility. |
0 commit comments