From 7c3cc1ea490952e16011a7a157a3f853638f0502 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Wed, 24 Sep 2025 19:14:51 +0200 Subject: [PATCH 1/4] Revise the section on CU & WGP modes Signed-off-by: Jan Stephan --- .wordlist.txt | 2 ++ docs/how-to/hip_rtc.rst | 45 +++++++++++++++++++++++++---------------- 2 files changed, 30 insertions(+), 17 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index 1bca54a941..c19f407f1d 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -39,6 +39,8 @@ Dereferencing DFT dll DirectX +DPP +dst EIGEN enqueue enqueues diff --git a/docs/how-to/hip_rtc.rst b/docs/how-to/hip_rtc.rst index f188c22839..a126d1b0cf 100644 --- a/docs/how-to/hip_rtc.rst +++ b/docs/how-to/hip_rtc.rst @@ -319,31 +319,42 @@ using the bitcode APIs provided by HIPRTC. vector kernel_bitcode(bitCodeSize); hiprtcGetBitcode(prog, kernel_bitcode.data()); -CU Mode vs WGP mode +CU mode vs WGP mode ------------------------------------------------------------------------------- -AMD GPUs consist of an array of workgroup processors, each built with 2 compute -units (CUs) capable of executing SIMD32. All the CUs inside a workgroup -processor use local data share (LDS). +All :doc:`supported AMD GPUs ` are built around a data-parallel +processor (DPP) array. -gfx10+ support execution of wavefront in CU mode and work-group processor mode -(WGP). Please refer to section 2.3 of `RDNA3 ISA reference `_. +On CDNA GPUs, the DPP is organized as a set of compute unit (CU) pipelines, with each CU containing a single SIMD64 +unit. Each CU has its own low-latency memory space called local data share (LDS), which threads from a warp running on +the CU can access. -gfx9 and below only supports CU mode. +On RDNA GPUs, the DPP is organized as a set of workgroup processor (WGP) pipelines. Each WGP contains two CUs, and each +CU contains two SIMD32 units. The LDS is attached to the WGP, so threads from different warps can access the same LDS if +they run on CUs within the same WGP. -In WGP mode, 4 warps of a block can simultaneously be executed on the workgroup -processor, where as in CU mode only 2 warps of a block can simultaneously -execute on a CU. In theory, WGP mode might help with occupancy and increase the -performance of certain HIP programs (if not bound to inter warp communication), -but might incur performance penalty on other HIP programs which rely on atomics -and inter warp communication. This also has effect of how the LDS is split -between warps, please refer to `RDNA3 ISA reference `_ for more information. +.. note:: + + Because CDNA GPUs do not use workgroup processors and have a different CU layout, the following information applies + only to RDNA GPUs. + +Warps are dispatched in one of two modes. These control whether warps are distributed across two SIMD32s (**CU mode**) +or across all four SIMD32s within a WGP (**WGP mode**). + +CU mode executes two warps per block on a single CU and provides only half the LDS to those warps. Independence between +CUs can improve performance for workloads avoiding inter-warp communication, but LDS capacity per CU is limited. + +WGP mode executes four warps per block on a WGP with a shared LDS. It can increase occupancy and improve performance +for workloads without heavy inter-warp communication, but it can degrade performance for programs relying on atomics or +extensive inter-warp communication. + +For more information on the differences between CU and WGP modes, please refer to the appropriate ISA reference under +`AMD RDNA architecture `__. .. note:: - HIPRTC assumes **WGP mode by default** for gfx10+. This can be overridden by - passing ``-mcumode`` to HIPRTC compile options in - :cpp:func:`hiprtcCompileProgram`. + HIPRTC assumes **WGP mode by default** for RDNA GPUs. This can be overridden by passing ``-mcumode`` as a compile + option in :cpp:func:`hiprtcCompileProgram`. Linker APIs =============================================================================== From 2f3dbcd27014939254a9288fe43d8ab99ba74d1a Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Fri, 12 Sep 2025 12:21:32 +0200 Subject: [PATCH 2/4] Include examples from rocm examples source --- docs/how-to/hip_cpp_language_extensions.rst | 110 +---- docs/how-to/hip_runtime_api/asynchronous.rst | 319 +------------- docs/how-to/hip_runtime_api/call_stack.rst | 86 +--- .../how-to/hip_runtime_api/error_handling.rst | 71 +--- docs/how-to/hip_runtime_api/hipgraph.rst | 301 +------------ .../how-to/hip_runtime_api/initialization.rst | 22 +- .../memory_management/host_memory.rst | 118 +----- .../stream_ordered_allocator.rst | 296 ++----------- .../memory_management/unified_memory.rst | 401 ++---------------- docs/how-to/hip_runtime_api/multi_device.rst | 366 +--------------- docs/reference/api_syntax.rst | 90 +--- docs/reference/complex_math_api.rst | 118 +----- docs/reference/math_api.rst | 86 +--- docs/tools/example_codes/add_kernel.hip | 95 +++++ .../example_codes/async_kernel_execution.hip | 142 +++++++ docs/tools/example_codes/block_reduction.cu | 110 +++++ .../example_codes/call_stack_management.cpp | 58 +++ .../calling_global_functions.hip | 89 ++++ docs/tools/example_codes/compilation_apis.cpp | 165 +++++++ docs/tools/example_codes/complex_math.hip | 142 +++++++ .../example_codes/constant_memory_device.hip | 75 ++++ docs/tools/example_codes/data_prefetching.hip | 84 ++++ .../device_code_feature_identification.hip | 61 +++ .../example_codes/device_enumeration.cpp | 74 ++++ docs/tools/example_codes/device_recursion.hip | 72 ++++ docs/tools/example_codes/device_selection.hip | 98 +++++ .../dynamic_shared_memory_device.hip | 64 +++ .../example_codes/dynamic_unified_memory.hip | 74 ++++ docs/tools/example_codes/error_handling.hip | 97 +++++ .../event_based_synchronization.hip | 153 +++++++ docs/tools/example_codes/explicit_copy.cpp | 58 +++ docs/tools/example_codes/explicit_memory.hip | 79 ++++ .../example_codes/extern_shared_memory.hip | 53 +++ docs/tools/example_codes/graph_capture.hip | 168 ++++++++ docs/tools/example_codes/graph_creation.hip | 226 ++++++++++ .../host_code_feature_identification.cpp | 59 +++ .../host_code_feature_identification.hip | 59 +++ ...dentifying_compilation_target_platform.cpp | 48 +++ ...entifying_host_device_compilation_pass.hip | 52 +++ .../kernel_memory_allocation.hip | 75 ++++ docs/tools/example_codes/launch_bounds.hip | 91 ++++ docs/tools/example_codes/linker_apis.cpp | 200 +++++++++ docs/tools/example_codes/linker_apis_file.cpp | 219 ++++++++++ .../example_codes/linker_apis_options.cpp | 200 +++++++++ docs/tools/example_codes/load_module.cpp | 107 +++++ docs/tools/example_codes/load_module_ex.cpp | 145 +++++++ .../example_codes/load_module_ex_cuda.cpp | 134 ++++++ .../low_precision_float_fp16.hip | 111 +++++ .../example_codes/low_precision_float_fp8.hip | 130 ++++++ docs/tools/example_codes/lowered_names.cpp | 202 +++++++++ docs/tools/example_codes/math.hip | 118 ++++++ docs/tools/example_codes/memory_pool.hip | 109 +++++ .../memory_pool_resource_usage_statistics.cpp | 115 +++++ .../example_codes/memory_pool_threshold.hip | 115 +++++ docs/tools/example_codes/memory_pool_trim.cpp | 69 +++ .../example_codes/memory_range_attributes.hip | 90 ++++ .../multi_device_synchronization.hip | 133 ++++++ .../ordinary_memory_allocation.hip | 81 ++++ .../tools/example_codes/p2p_memory_access.hip | 112 +++++ .../p2p_memory_access_failed.hip | 106 +++++ .../example_codes/pageable_host_memory.cpp | 80 ++++ .../per_thread_default_stream.cpp | 78 ++++ .../example_codes/pinned_host_memory.cpp | 81 ++++ .../example_codes/pointer_memory_type.cpp | 61 +++ .../example_codes/rtc_error_handling.cpp | 79 ++++ .../sequential_kernel_execution.hip | 131 ++++++ .../example_codes/set_constant_memory.hip | 47 ++ .../example_codes/simple_device_query.cpp | 42 ++ .../example_codes/standard_unified_memory.hip | 73 ++++ .../static_shared_memory_device.hip | 46 ++ .../example_codes/static_unified_memory.hip | 65 +++ .../stream_ordered_memory_allocation.hip | 85 ++++ .../template_warp_size_reduction.hip | 337 ++++++++------- docs/tools/example_codes/timer.hip | 66 +++ .../example_codes/unified_memory_advice.hip | 89 ++++ .../example_codes/warp_size_reduction.hip | 295 +++++++------ docs/tools/update_example_codes.py | 299 ++++++++++++- 77 files changed, 6921 insertions(+), 2534 deletions(-) create mode 100644 docs/tools/example_codes/add_kernel.hip create mode 100644 docs/tools/example_codes/async_kernel_execution.hip create mode 100644 docs/tools/example_codes/block_reduction.cu create mode 100644 docs/tools/example_codes/call_stack_management.cpp create mode 100644 docs/tools/example_codes/calling_global_functions.hip create mode 100644 docs/tools/example_codes/compilation_apis.cpp create mode 100644 docs/tools/example_codes/complex_math.hip create mode 100644 docs/tools/example_codes/constant_memory_device.hip create mode 100644 docs/tools/example_codes/data_prefetching.hip create mode 100644 docs/tools/example_codes/device_code_feature_identification.hip create mode 100644 docs/tools/example_codes/device_enumeration.cpp create mode 100644 docs/tools/example_codes/device_recursion.hip create mode 100644 docs/tools/example_codes/device_selection.hip create mode 100644 docs/tools/example_codes/dynamic_shared_memory_device.hip create mode 100644 docs/tools/example_codes/dynamic_unified_memory.hip create mode 100644 docs/tools/example_codes/error_handling.hip create mode 100644 docs/tools/example_codes/event_based_synchronization.hip create mode 100644 docs/tools/example_codes/explicit_copy.cpp create mode 100644 docs/tools/example_codes/explicit_memory.hip create mode 100644 docs/tools/example_codes/extern_shared_memory.hip create mode 100644 docs/tools/example_codes/graph_capture.hip create mode 100644 docs/tools/example_codes/graph_creation.hip create mode 100644 docs/tools/example_codes/host_code_feature_identification.cpp create mode 100644 docs/tools/example_codes/host_code_feature_identification.hip create mode 100644 docs/tools/example_codes/identifying_compilation_target_platform.cpp create mode 100644 docs/tools/example_codes/identifying_host_device_compilation_pass.hip create mode 100644 docs/tools/example_codes/kernel_memory_allocation.hip create mode 100644 docs/tools/example_codes/launch_bounds.hip create mode 100644 docs/tools/example_codes/linker_apis.cpp create mode 100644 docs/tools/example_codes/linker_apis_file.cpp create mode 100644 docs/tools/example_codes/linker_apis_options.cpp create mode 100644 docs/tools/example_codes/load_module.cpp create mode 100644 docs/tools/example_codes/load_module_ex.cpp create mode 100644 docs/tools/example_codes/load_module_ex_cuda.cpp create mode 100644 docs/tools/example_codes/low_precision_float_fp16.hip create mode 100644 docs/tools/example_codes/low_precision_float_fp8.hip create mode 100644 docs/tools/example_codes/lowered_names.cpp create mode 100644 docs/tools/example_codes/math.hip create mode 100644 docs/tools/example_codes/memory_pool.hip create mode 100644 docs/tools/example_codes/memory_pool_resource_usage_statistics.cpp create mode 100644 docs/tools/example_codes/memory_pool_threshold.hip create mode 100644 docs/tools/example_codes/memory_pool_trim.cpp create mode 100644 docs/tools/example_codes/memory_range_attributes.hip create mode 100644 docs/tools/example_codes/multi_device_synchronization.hip create mode 100644 docs/tools/example_codes/ordinary_memory_allocation.hip create mode 100644 docs/tools/example_codes/p2p_memory_access.hip create mode 100644 docs/tools/example_codes/p2p_memory_access_failed.hip create mode 100644 docs/tools/example_codes/pageable_host_memory.cpp create mode 100644 docs/tools/example_codes/per_thread_default_stream.cpp create mode 100644 docs/tools/example_codes/pinned_host_memory.cpp create mode 100644 docs/tools/example_codes/pointer_memory_type.cpp create mode 100644 docs/tools/example_codes/rtc_error_handling.cpp create mode 100644 docs/tools/example_codes/sequential_kernel_execution.hip create mode 100644 docs/tools/example_codes/set_constant_memory.hip create mode 100644 docs/tools/example_codes/simple_device_query.cpp create mode 100644 docs/tools/example_codes/standard_unified_memory.hip create mode 100644 docs/tools/example_codes/static_shared_memory_device.hip create mode 100644 docs/tools/example_codes/static_unified_memory.hip create mode 100644 docs/tools/example_codes/stream_ordered_memory_allocation.hip create mode 100644 docs/tools/example_codes/timer.hip create mode 100644 docs/tools/example_codes/unified_memory_advice.hip diff --git a/docs/how-to/hip_cpp_language_extensions.rst b/docs/how-to/hip_cpp_language_extensions.rst index ca2da69783..02b265182d 100644 --- a/docs/how-to/hip_cpp_language_extensions.rst +++ b/docs/how-to/hip_cpp_language_extensions.rst @@ -103,66 +103,10 @@ The kernel arguments are listed after the configuration parameters. .. code-block:: cpp - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Performs a simple initialization of an array with the thread's index variables. - // This function is only available in device code. - __device__ void init_array(float * const a, const unsigned int arraySize){ - // globalIdx uniquely identifies a thread in a 1D launch configuration. - const int globalIdx = threadIdx.x + blockIdx.x * blockDim.x; - // Each thread initializes a single element of the array. - if(globalIdx < arraySize){ - a[globalIdx] = globalIdx; - } - } - - // Rounds a value up to the next multiple. - // This function is available in host and device code. - __host__ __device__ constexpr int round_up_to_nearest_multiple(int number, int multiple){ - return (number + multiple - 1)/multiple; - } - - __global__ void example_kernel(float * const a, const unsigned int N) - { - // Initialize array. - init_array(a, N); - // Perform additional work: - // - work with the array - // - use the array in a different kernel - // - ... - } - - int main() - { - constexpr int N = 100000000; // problem size - constexpr int blockSize = 256; //configurable block size - - //needed number of blocks for the given problem size - constexpr int gridSize = round_up_to_nearest_multiple(N, blockSize); - - float *a; - // allocate memory on the GPU - HIP_CHECK(hipMalloc(&a, sizeof(*a) * N)); - - std::cout << "Launching kernel." << std::endl; - example_kernel<<>>(a, N); - // make sure kernel execution is finished by synchronizing. The CPU can also - // execute other instructions during that time - HIP_CHECK(hipDeviceSynchronize()); - std::cout << "Kernel execution finished." << std::endl; - - HIP_CHECK(hipFree(a)); - } + .. literalinclude:: ../tools/example_codes/calling_global_functions.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Inline qualifiers -------------------------------------------------------------------------------- @@ -321,28 +265,10 @@ launch has to specify the needed amount of ``extern`` shared memory in the launc configuration. The statically allocated shared memory is allocated without this parameter. -.. code-block:: cpp - - #include - - extern __shared__ int shared_array[]; - - __global__ void kernel(){ - // initialize shared memory - shared_array[threadIdx.x] = threadIdx.x; - // use shared memory - synchronize to make sure, that all threads of the - // block see all changes to shared memory - __syncthreads(); - } - - int main(){ - //shared memory in this case depends on the configurable block size - constexpr int blockSize = 256; - constexpr int sharedMemSize = blockSize * sizeof(int); - constexpr int gridSize = 2; - - kernel<<>>(); - } +.. literalinclude:: ../tools/example_codes/extern_shared_memory.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp __managed__ -------------------------------------------------------------------------------- @@ -735,22 +661,18 @@ with the actual frequency. The difference between the returned values represents the cycles used. -.. code-block:: cpp - - __global void kernel(){ - long long int start = clock64(); - // kernel code - long long int stop = clock64(); - long long int cycles = stop - start; - } +.. literalinclude:: ../tools/example_codes/timer.hip + :start-after: // [sphinx-kernel-start] + :end-before: // [sphinx-kernel-end] + :language: cpp ``long long int wall_clock64()`` returns the wall clock time on the device, with a constant, fixed frequency. The frequency is device dependent and can be queried using: -.. code-block:: cpp - - int wallClkRate = 0; //in kilohertz - hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId); +.. literalinclude:: ../tools/example_codes/timer.hip + :start-after: // [sphinx-query-start] + :end-before: // [sphinx-query-end] + :language: cpp .. _atomic functions: diff --git a/docs/how-to/hip_runtime_api/asynchronous.rst b/docs/how-to/hip_runtime_api/asynchronous.rst index 63aeddc2cf..e1c93cd2b3 100644 --- a/docs/how-to/hip_runtime_api/asynchronous.rst +++ b/docs/how-to/hip_runtime_api/asynchronous.rst @@ -207,319 +207,24 @@ The example codes .. tab-item:: Sequential - .. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - // GPU Kernels - __global__ void kernelA(double* arrayA, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] += 1.0;} - }; - __global__ void kernelB(double* arrayA, double* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayB[x] += arrayA[x] + 3.0;} - }; - - int main() - { - constexpr int numOfBlocks = 1 << 20; - constexpr int threadsPerBlock = 1024; - constexpr int numberOfIterations = 50; - // The array size smaller to avoid the relatively short kernel launch compared to memory copies - constexpr size_t arraySize = 1U << 25; - double *d_dataA; - double *d_dataB; - - double initValueA = 0.0; - double initValueB = 2.0; - - std::vector vectorA(arraySize, initValueA); - std::vector vectorB(arraySize, initValueB); - // Allocate device memory - HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); - HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); - for(int iteration = 0; iteration < numberOfIterations; iteration++) - { - // Host to Device copies - HIP_CHECK(hipMemcpy(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice)); - HIP_CHECK(hipMemcpy(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice)); - // Launch the GPU kernels - hipLaunchKernelGGL(kernelA, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_dataA, arraySize); - hipLaunchKernelGGL(kernelB, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_dataA, d_dataB, arraySize); - // Device to Host copies - HIP_CHECK(hipMemcpy(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost)); - HIP_CHECK(hipMemcpy(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost)); - } - // Wait for all operations to complete - HIP_CHECK(hipDeviceSynchronize()); - - // Verify results - const double expectedA = (double)numberOfIterations; - const double expectedB = - initValueB + (3.0 * numberOfIterations) + - (expectedA * (expectedA + 1.0)) / 2.0; - bool passed = true; - for(size_t i = 0; i < arraySize; ++i){ - if(vectorA[i] != expectedA){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << " at index: " << i << std::endl; - break; - } - if(vectorB[i] != expectedB){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << " at index: " << i << std::endl; - break; - } - } - - if(passed){ - std::cout << "Sequential execution completed successfully." << std::endl; - }else{ - std::cerr << "Sequential execution failed." << std::endl; - } - - // Cleanup - HIP_CHECK(hipFree(d_dataA)); - HIP_CHECK(hipFree(d_dataB)); - - return 0; - } + .. literalinclude:: ../../tools/example_codes/sequential_kernel_execution.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. tab-item:: Asynchronous - .. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - // GPU Kernels - __global__ void kernelA(double* arrayA, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] += 1.0;} - }; - __global__ void kernelB(double* arrayA, double* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayB[x] += arrayA[x] + 3.0;} - }; - - int main() - { - constexpr int numOfBlocks = 1 << 20; - constexpr int threadsPerBlock = 1024; - constexpr int numberOfIterations = 50; - // The array size smaller to avoid the relatively short kernel launch compared to memory copies - constexpr size_t arraySize = 1U << 25; - double *d_dataA; - double *d_dataB; - - double initValueA = 0.0; - double initValueB = 2.0; - - std::vector vectorA(arraySize, initValueA); - std::vector vectorB(arraySize, initValueB); - // Allocate device memory - HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); - HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); - // Create streams - hipStream_t streamA, streamB; - HIP_CHECK(hipStreamCreate(&streamA)); - HIP_CHECK(hipStreamCreate(&streamB)); - for(unsigned int iteration = 0; iteration < numberOfIterations; iteration++) - { - // Stream 1: Host to Device 1 - HIP_CHECK(hipMemcpyAsync(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice, streamA)); - // Stream 2: Host to Device 2 - HIP_CHECK(hipMemcpyAsync(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice, streamB)); - // Stream 1: Kernel 1 - hipLaunchKernelGGL(kernelA, dim3(numOfBlocks), dim3(threadsPerBlock), 0, streamA, d_dataA, arraySize); - // Wait for streamA finish - HIP_CHECK(hipStreamSynchronize(streamA)); - // Stream 2: Kernel 2 - hipLaunchKernelGGL(kernelB, dim3(numOfBlocks), dim3(threadsPerBlock), 0, streamB, d_dataA, d_dataB, arraySize); - // Stream 1: Device to Host 2 (after Kernel 1) - HIP_CHECK(hipMemcpyAsync(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost, streamA)); - // Stream 2: Device to Host 2 (after Kernel 2) - HIP_CHECK(hipMemcpyAsync(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost, streamB)); - } - // Wait for all operations in both streams to complete - HIP_CHECK(hipStreamSynchronize(streamA)); - HIP_CHECK(hipStreamSynchronize(streamB)); - // Verify results - double expectedA = (double)numberOfIterations; - double expectedB = - initValueB + (3.0 * numberOfIterations) + - (expectedA * (expectedA + 1.0)) / 2.0; - bool passed = true; - for(size_t i = 0; i < arraySize; ++i){ - if(vectorA[i] != expectedA){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << " at index: " << i << std::endl; - break; - } - if(vectorB[i] != expectedB){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << " at index: " << i << std::endl; - break; - } - } - if(passed){ - std::cout << "Asynchronous execution completed successfully." << std::endl; - }else{ - std::cerr << "Asynchronous execution failed." << std::endl; - } - - // Cleanup - HIP_CHECK(hipStreamDestroy(streamA)); - HIP_CHECK(hipStreamDestroy(streamB)); - HIP_CHECK(hipFree(d_dataA)); - HIP_CHECK(hipFree(d_dataB)); - - return 0; - } + .. literalinclude:: ../../tools/example_codes/async_kernel_execution.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. tab-item:: hipStreamWaitEvent - .. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - // GPU Kernels - __global__ void kernelA(double* arrayA, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] += 1.0;} - }; - __global__ void kernelB(double* arrayA, double* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayB[x] += arrayA[x] + 3.0;} - }; - - int main() - { - constexpr int numOfBlocks = 1 << 20; - constexpr int threadsPerBlock = 1024; - constexpr int numberOfIterations = 50; - // The array size smaller to avoid the relatively short kernel launch compared to memory copies - constexpr size_t arraySize = 1U << 25; - double *d_dataA; - double *d_dataB; - double initValueA = 0.0; - double initValueB = 2.0; - - std::vector vectorA(arraySize, initValueA); - std::vector vectorB(arraySize, initValueB); - // Allocate device memory - HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); - HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); - // Create streams - hipStream_t streamA, streamB; - HIP_CHECK(hipStreamCreate(&streamA)); - HIP_CHECK(hipStreamCreate(&streamB)); - // Create events - hipEvent_t event, eventA, eventB; - HIP_CHECK(hipEventCreate(&event)); - HIP_CHECK(hipEventCreate(&eventA)); - HIP_CHECK(hipEventCreate(&eventB)); - for(unsigned int iteration = 0; iteration < numberOfIterations; iteration++) - { - // Stream 1: Host to Device 1 - HIP_CHECK(hipMemcpyAsync(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice, streamA)); - // Stream 2: Host to Device 2 - HIP_CHECK(hipMemcpyAsync(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice, streamB)); - // Stream 1: Kernel 1 - hipLaunchKernelGGL(kernelA, dim3(numOfBlocks), dim3(threadsPerBlock), 0, streamA, d_dataA, arraySize); - // Record event after the GPU kernel in Stream 1 - HIP_CHECK(hipEventRecord(event, streamA)); - // Stream 2: Wait for event before starting Kernel 2 - HIP_CHECK(hipStreamWaitEvent(streamB, event, 0)); - // Stream 2: Kernel 2 - hipLaunchKernelGGL(kernelB, dim3(numOfBlocks), dim3(threadsPerBlock), 0, streamB, d_dataA, d_dataB, arraySize); - // Stream 1: Device to Host 2 (after Kernel 1) - HIP_CHECK(hipMemcpyAsync(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost, streamA)); - // Stream 2: Device to Host 2 (after Kernel 2) - HIP_CHECK(hipMemcpyAsync(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost, streamB)); - // Wait for all operations in both streams to complete - HIP_CHECK(hipEventRecord(eventA, streamA)); - HIP_CHECK(hipEventRecord(eventB, streamB)); - HIP_CHECK(hipStreamWaitEvent(streamA, eventA, 0)); - HIP_CHECK(hipStreamWaitEvent(streamB, eventB, 0)); - } - // Verify results - double expectedA = (double)numberOfIterations; - double expectedB = - initValueB + (3.0 * numberOfIterations) + - (expectedA * (expectedA + 1.0)) / 2.0; - bool passed = true; - for(size_t i = 0; i < arraySize; ++i){ - if(vectorA[i] != expectedA){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << std::endl; - break; - } - if(vectorB[i] != expectedB){ - passed = false; - std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << std::endl; - break; - } - } - if(passed){ - std::cout << "Asynchronous execution with events completed successfully." << std::endl; - }else{ - std::cerr << "Asynchronous execution with events failed." << std::endl; - } - - // Cleanup - HIP_CHECK(hipEventDestroy(event)); - HIP_CHECK(hipEventDestroy(eventA)); - HIP_CHECK(hipEventDestroy(eventB)); - HIP_CHECK(hipStreamDestroy(streamA)); - HIP_CHECK(hipStreamDestroy(streamB)); - HIP_CHECK(hipFree(d_dataA)); - HIP_CHECK(hipFree(d_dataB)); - - return 0; - } + .. literalinclude:: ../../tools/example_codes/event_based_synchronization.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp HIP Graphs =============================================================================== diff --git a/docs/how-to/hip_runtime_api/call_stack.rst b/docs/how-to/hip_runtime_api/call_stack.rst index a9d03bb493..101480bdb2 100644 --- a/docs/how-to/hip_runtime_api/call_stack.rst +++ b/docs/how-to/hip_runtime_api/call_stack.rst @@ -33,38 +33,10 @@ You can adjust the call stack size as shown in the following example, allowing fine-tuning based on specific kernel requirements. This helps prevent stack overflow errors by ensuring sufficient stack memory is allocated. -.. code-block:: cpp - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - int main() - { - size_t stackSize; - HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); - std::cout << "Default stack size: " << stackSize << " bytes" << std::endl; - - // Set a new stack size - size_t newStackSize = 1024 * 8; // 8 KiB - HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize)); - - HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); - std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl; - - return 0; - } +.. literalinclude:: ../../tools/example_codes/call_stack_management.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Depending on the GPU model, at full occupancy, it can consume a significant amount of memory. For instance, an MI300X with 304 compute units (CU) and up to @@ -81,49 +53,7 @@ needed for the call stack due to the GPUs inherent parallelism. This can be achieved by increasing stack size or optimizing code to reduce stack usage. To detect stack overflow add proper error handling or use debugging tools. -.. code-block:: cpp - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - __device__ unsigned long long fibonacci(unsigned long long n) - { - if (n == 0 || n == 1) - { - return n; - } - return fibonacci(n - 1) + fibonacci(n - 2); - } - - __global__ void kernel(unsigned long long n) - { - unsigned long long result = fibonacci(n); - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - - if (x == 0) - printf("%llu! = %llu \n", n, result); - } - - int main() - { - kernel<<<1, 1>>>(10); - HIP_CHECK(hipDeviceSynchronize()); - - // With -O0 optimization option hit the stack limit - // kernel<<<1, 256>>>(2048); - // HIP_CHECK(hipDeviceSynchronize()); - - return 0; - } +.. literalinclude:: ../../tools/example_codes/device_recursion.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp diff --git a/docs/how-to/hip_runtime_api/error_handling.rst b/docs/how-to/hip_runtime_api/error_handling.rst index b78df92e2d..e5d15254dd 100644 --- a/docs/how-to/hip_runtime_api/error_handling.rst +++ b/docs/how-to/hip_runtime_api/error_handling.rst @@ -68,70 +68,7 @@ Complete example A complete example to demonstrate the error handling with a simple addition of two values kernel: -.. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c, size_t size) { - const size_t index = threadIdx.x + blockDim.x * blockIdx.x; - if(index < size) { - c[index] += a[index] + b[index]; - } - } - - int main() { - constexpr int numOfBlocks = 256; - constexpr int threadsPerBlock = 256; - constexpr size_t arraySize = 1U << 16; - - std::vector a(arraySize), b(arraySize), c(arraySize); - int *d_a, *d_b, *d_c; - - // Setup input values. - std::fill(a.begin(), a.end(), 1); - std::fill(b.begin(), b.end(), 2); - - // Allocate device copies of a, b and c. - HIP_CHECK(hipMalloc(&d_a, arraySize * sizeof(*d_a))); - HIP_CHECK(hipMalloc(&d_b, arraySize * sizeof(*d_b))); - HIP_CHECK(hipMalloc(&d_c, arraySize * sizeof(*d_c))); - - // Copy input values to device. - HIP_CHECK(hipMemcpy(d_a, &a, arraySize * sizeof(*d_a), hipMemcpyHostToDevice)); - HIP_CHECK(hipMemcpy(d_b, &b, arraySize * sizeof(*d_b), hipMemcpyHostToDevice)); - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_a, d_b, d_c, arraySize); - // Check the kernel launch - HIP_CHECK(hipGetLastError()); - // Check for kernel execution error - HIP_CHECK(hipDeviceSynchronize()); - - // Copy the result back to the host. - HIP_CHECK(hipMemcpy(&c, d_c, arraySize * sizeof(*d_c), hipMemcpyDeviceToHost)); - - // Cleanup allocated memory. - HIP_CHECK(hipFree(d_a)); - HIP_CHECK(hipFree(d_b)); - HIP_CHECK(hipFree(d_c)); - - // Print the result. - std::cout << a[0] << " + " << b[0] << " = " << c[0] << std::endl; - - return 0; - } +.. literalinclude:: ../../tools/example_codes/error_handling.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp diff --git a/docs/how-to/hip_runtime_api/hipgraph.rst b/docs/how-to/hip_runtime_api/hipgraph.rst index 7f36a7d373..1b936fed82 100644 --- a/docs/how-to/hip_runtime_api/hipgraph.rst +++ b/docs/how-to/hip_runtime_api/hipgraph.rst @@ -180,124 +180,10 @@ The general flow for using stream capture to create a graph template is: The following code is an example of how to use the HIP graph API to capture a graph from a stream. -.. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - - __global__ void kernelA(double* arrayA, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] *= 2.0;} - }; - __global__ void kernelB(int* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayB[x] = 3;} - }; - __global__ void kernelC(double* arrayA, const int* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] += arrayB[x];} - }; - - struct set_vector_args{ - std::vector& h_array; - double value; - }; - - void set_vector(void* args){ - set_vector_args h_args{*(reinterpret_cast(args))}; - - std::vector& vec{h_args.h_array}; - vec.assign(vec.size(), h_args.value); - } - - int main(){ - constexpr int numOfBlocks = 1024; - constexpr int threadsPerBlock = 1024; - constexpr size_t arraySize = 1U << 20; - - // This example assumes that kernelA operates on data that needs to be initialized on - // and copied from the host, while kernelB initializes the array that is passed to it. - // Both arrays are then used as input to kernelC, where arrayA is also used as - // output, that is copied back to the host, while arrayB is only read from and not modified. - - double* d_arrayA; - int* d_arrayB; - std::vector h_array(arraySize); - constexpr double initValue = 2.0; - - hipStream_t captureStream; - HIP_CHECK(hipStreamCreate(&captureStream)); - - // Start capturing the operations assigned to the stream - HIP_CHECK(hipStreamBeginCapture(captureStream, hipStreamCaptureModeGlobal)); - - // hipMallocAsync and hipMemcpyAsync are needed, to be able to assign it to a stream - HIP_CHECK(hipMallocAsync(&d_arrayA, arraySize*sizeof(double), captureStream)); - HIP_CHECK(hipMallocAsync(&d_arrayB, arraySize*sizeof(int), captureStream)); - - // Assign host function to the stream - // Needs a custom struct to pass the arguments - set_vector_args args{h_array, initValue}; - HIP_CHECK(hipLaunchHostFunc(captureStream, set_vector, &args)); - - HIP_CHECK(hipMemcpyAsync(d_arrayA, h_array.data(), arraySize*sizeof(double), hipMemcpyHostToDevice, captureStream)); - - kernelA<<>>(d_arrayA, arraySize); - kernelB<<>>(d_arrayB, arraySize); - kernelC<<>>(d_arrayA, d_arrayB, arraySize); - - HIP_CHECK(hipMemcpyAsync(h_array.data(), d_arrayA, arraySize*sizeof(*d_arrayA), hipMemcpyDeviceToHost, captureStream)); - - HIP_CHECK(hipFreeAsync(d_arrayA, captureStream)); - HIP_CHECK(hipFreeAsync(d_arrayB, captureStream)); - - // Stop capturing - hipGraph_t graph; - HIP_CHECK(hipStreamEndCapture(captureStream, &graph)); - - // Create an executable graph from the captured graph - hipGraphExec_t graphExec; - HIP_CHECK(hipGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0)); - - // The graph template can be deleted after the instantiation if it's not needed for later use - HIP_CHECK(hipGraphDestroy(graph)); - - // Actually launch the graph. The stream does not have - // to be the same as the one used for capturing. - HIP_CHECK(hipGraphLaunch(graphExec, captureStream)); - - // Verify results - constexpr double expected = initValue * 2.0 + 3; - bool passed = true; - for(size_t i = 0; i < arraySize; ++i){ - if(h_array[i] != expected){ - passed = false; - std::cerr << "Validation failed! Expected " << expected << " got " << h_array[0] << std::endl; - break; - } - } - if(passed){ - std::cerr << "Validation passed." << std::endl; - } - - // Free graph and stream resources after usage - HIP_CHECK(hipGraphExecDestroy(graphExec)); - HIP_CHECK(hipStreamDestroy(captureStream)); - } +.. literalinclude:: ../../tools/example_codes/graph_capture.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Explicit graph creation ================================================================================ @@ -333,178 +219,7 @@ The general flow for explicitly creating a graph is usually: The following code example demonstrates how to explicitly create nodes in order to create a graph. -.. code-block:: cpp - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - __global__ void kernelA(double* arrayA, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] *= 2.0;} - }; - __global__ void kernelB(int* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayB[x] = 3;} - }; - __global__ void kernelC(double* arrayA, const int* arrayB, size_t size){ - const size_t x = threadIdx.x + blockDim.x * blockIdx.x; - if(x < size){arrayA[x] += arrayB[x];} - }; - - struct set_vector_args{ - std::vector& h_array; - double value; - }; - - void set_vector(void* args){ - set_vector_args h_args{*(reinterpret_cast(args))}; - - std::vector& vec{h_args.h_array}; - vec.assign(vec.size(), h_args.value); - } - - int main(){ - constexpr int numOfBlocks = 1024; - constexpr int threadsPerBlock = 1024; - size_t arraySize = 1U << 20; - - // The pointers to the device memory don't need to be declared here, - // they are contained within the hipMemAllocNodeParams as the dptr member - std::vector h_array(arraySize); - constexpr double initValue = 2.0; - - // Create graph an empty graph - hipGraph_t graph; - HIP_CHECK(hipGraphCreate(&graph, 0)); - - // Parameters to allocate arrays - hipMemAllocNodeParams allocArrayAParams{}; - allocArrayAParams.poolProps.allocType = hipMemAllocationTypePinned; - allocArrayAParams.poolProps.location.type = hipMemLocationTypeDevice; - allocArrayAParams.poolProps.location.id = 0; // GPU on which memory resides - allocArrayAParams.bytesize = arraySize * sizeof(double); - - hipMemAllocNodeParams allocArrayBParams{}; - allocArrayBParams.poolProps.allocType = hipMemAllocationTypePinned; - allocArrayBParams.poolProps.location.type = hipMemLocationTypeDevice; - allocArrayBParams.poolProps.location.id = 0; // GPU on which memory resides - allocArrayBParams.bytesize = arraySize * sizeof(int); - - // Add the allocation nodes to the graph. They don't have any dependencies - hipGraphNode_t allocNodeA, allocNodeB; - HIP_CHECK(hipGraphAddMemAllocNode(&allocNodeA, graph, nullptr, 0, &allocArrayAParams)); - HIP_CHECK(hipGraphAddMemAllocNode(&allocNodeB, graph, nullptr, 0, &allocArrayBParams)); - - // Parameters for the host function - // Needs custom struct to pass the arguments - set_vector_args args{h_array, initValue}; - hipHostNodeParams hostParams{}; - hostParams.fn = set_vector; - hostParams.userData = static_cast(&args); - - // Add the host node that initializes the host array. It also doesn't have any dependencies - hipGraphNode_t hostNode; - HIP_CHECK(hipGraphAddHostNode(&hostNode, graph, nullptr, 0, &hostParams)); - - // Add memory copy node, that copies the initialized host array to the device. - // It has to wait for the host array to be initialized and the device memory to be allocated - hipGraphNode_t cpyNodeDependencies[] = {allocNodeA, hostNode}; - hipGraphNode_t cpyToDevNode; - HIP_CHECK(hipGraphAddMemcpyNode1D(&cpyToDevNode, graph, cpyNodeDependencies, 1, allocArrayAParams.dptr, h_array.data(), arraySize * sizeof(double), hipMemcpyHostToDevice)); - - // Parameters for kernelA - hipKernelNodeParams kernelAParams; - void* kernelAArgs[] = {&allocArrayAParams.dptr, static_cast(&arraySize)}; - kernelAParams.func = reinterpret_cast(kernelA); - kernelAParams.gridDim = numOfBlocks; - kernelAParams.blockDim = threadsPerBlock; - kernelAParams.sharedMemBytes = 0; - kernelAParams.kernelParams = kernelAArgs; - kernelAParams.extra = nullptr; - - // Add the node for kernelA. It has to wait for the memory copy to finish, as it depends on the values from the host array. - hipGraphNode_t kernelANode; - HIP_CHECK(hipGraphAddKernelNode(&kernelANode, graph, &cpyToDevNode, 1, &kernelAParams)); - - // Parameters for kernelB - hipKernelNodeParams kernelBParams; - void* kernelBArgs[] = {&allocArrayBParams.dptr, static_cast(&arraySize)}; - kernelBParams.func = reinterpret_cast(kernelB); - kernelBParams.gridDim = numOfBlocks; - kernelBParams.blockDim = threadsPerBlock; - kernelBParams.sharedMemBytes = 0; - kernelBParams.kernelParams = kernelBArgs; - kernelBParams.extra = nullptr; - - // Add the node for kernelB. It only has to wait for the memory to be allocated, as it initializes the array. - hipGraphNode_t kernelBNode; - HIP_CHECK(hipGraphAddKernelNode(&kernelBNode, graph, &allocNodeB, 1, &kernelBParams)); - - // Parameters for kernelC - hipKernelNodeParams kernelCParams; - void* kernelCArgs[] = {&allocArrayAParams.dptr, &allocArrayBParams.dptr, static_cast(&arraySize)}; - kernelCParams.func = reinterpret_cast(kernelC); - kernelCParams.gridDim = numOfBlocks; - kernelCParams.blockDim = threadsPerBlock; - kernelCParams.sharedMemBytes = 0; - kernelCParams.kernelParams = kernelCArgs; - kernelCParams.extra = nullptr; - - // Add the node for kernelC. It has to wait on both kernelA and kernelB to finish, as it depends on their results. - hipGraphNode_t kernelCNode; - hipGraphNode_t kernelCDependencies[] = {kernelANode, kernelBNode}; - HIP_CHECK(hipGraphAddKernelNode(&kernelCNode, graph, kernelCDependencies, 1, &kernelCParams)); - - // Copy the results back to the host. Has to wait for kernelC to finish. - hipGraphNode_t cpyToHostNode; - HIP_CHECK(hipGraphAddMemcpyNode1D(&cpyToHostNode, graph, &kernelCNode, 1, h_array.data(), allocArrayAParams.dptr, arraySize * sizeof(double), hipMemcpyDeviceToHost)); - - // Free array of allocNodeA. It needs to wait for the copy to finish, as kernelC stores its results in it. - hipGraphNode_t freeNodeA; - HIP_CHECK(hipGraphAddMemFreeNode(&freeNodeA, graph, &cpyToHostNode, 1, allocArrayAParams.dptr)); - // Free array of allocNodeB. It only needs to wait for kernelC to finish, as it is not written back to the host. - hipGraphNode_t freeNodeB; - HIP_CHECK(hipGraphAddMemFreeNode(&freeNodeB, graph, &kernelCNode, 1, allocArrayBParams.dptr)); - - // Instantiate the graph in order to execute it - hipGraphExec_t graphExec; - HIP_CHECK(hipGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0)); - - // The graph can be freed after the instantiation if it's not needed for other purposes - HIP_CHECK(hipGraphDestroy(graph)); - - // Actually launch the graph - hipStream_t graphStream; - HIP_CHECK(hipStreamCreate(&graphStream)); - HIP_CHECK(hipGraphLaunch(graphExec, graphStream)); - - // Verify results - constexpr double expected = initValue * 2.0 + 3; - bool passed = true; - for(size_t i = 0; i < arraySize; ++i){ - if(h_array[i] != expected){ - passed = false; - std::cerr << "Validation failed! Expected " << expected << " got " << h_array[0] << std::endl; - break; - } - } - if(passed){ - std::cerr << "Validation passed." << std::endl; - } - - HIP_CHECK(hipGraphExecDestroy(graphExec)); - HIP_CHECK(hipStreamDestroy(graphStream)); - } +.. literalinclude:: ../../tools/example_codes/graph_creation.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp diff --git a/docs/how-to/hip_runtime_api/initialization.rst b/docs/how-to/hip_runtime_api/initialization.rst index 864e6e16c9..7b8e9996e5 100644 --- a/docs/how-to/hip_runtime_api/initialization.rst +++ b/docs/how-to/hip_runtime_api/initialization.rst @@ -66,24 +66,10 @@ which can be used to loop over the available GPUs. Example code of querying GPUs: -.. code-block:: cpp - - #include - #include - - int main() { - - int deviceCount; - if (hipGetDeviceCount(&deviceCount) == hipSuccess){ - for (int i = 0; i < deviceCount; ++i){ - hipDeviceProp_t prop; - if ( hipGetDeviceProperties(&prop, i) == hipSuccess) - std::cout << "Device" << i << prop.name << std::endl; - } - } - - return 0; - } +.. literalinclude:: ../../tools/example_codes/simple_device_query.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Setting the GPU -------------------------------------------------------------------------------- diff --git a/docs/how-to/hip_runtime_api/memory_management/host_memory.rst b/docs/how-to/hip_runtime_api/memory_management/host_memory.rst index 01f00ce555..93df695011 100644 --- a/docs/how-to/hip_runtime_api/memory_management/host_memory.rst +++ b/docs/how-to/hip_runtime_api/memory_management/host_memory.rst @@ -47,61 +47,10 @@ C++ application. **Example:** Using pageable host memory in HIP -.. code-block:: cpp - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - int main() - { - const int element_number = 100; - - int *host_input, *host_output; - // Host allocation - host_input = new int[element_number]; - host_output = new int[element_number]; - - // Host data preparation - for (int i = 0; i < element_number; i++) { - host_input[i] = i; - } - memset(host_output, 0, element_number * sizeof(int)); - - int *device_input, *device_output; - - // Device allocation - HIP_CHECK(hipMalloc((int **)&device_input, element_number * sizeof(int))); - HIP_CHECK(hipMalloc((int **)&device_output, element_number * sizeof(int))); - - // Device data preparation - HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); - HIP_CHECK(hipMemset(device_output, 0, element_number * sizeof(int))); - - // Run the kernel - // ... - - HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); - - // Free host memory - delete[] host_input; - delete[] host_output; - - // Free device memory - HIP_CHECK(hipFree(device_input)); - HIP_CHECK(hipFree(device_output)); - } +.. literalinclude:: ../../../tools/example_codes/pageable_host_memory.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. note:: @@ -133,61 +82,10 @@ processes, which can negatively impact the overall performance of the host. **Example:** Using pinned memory in HIP -.. code-block:: cpp - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if(status != hipSuccess){ \ - std::cerr << "HIP error " \ - << status << ": " \ - << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - } \ - } - - int main() - { - const int element_number = 100; - - int *host_input, *host_output; - // Host allocation - HIP_CHECK(hipHostMalloc((int **)&host_input, element_number * sizeof(int))); - HIP_CHECK(hipHostMalloc((int **)&host_output, element_number * sizeof(int))); - - // Host data preparation - for (int i = 0; i < element_number; i++) { - host_input[i] = i; - } - memset(host_output, 0, element_number * sizeof(int)); - - int *device_input, *device_output; - - // Device allocation - HIP_CHECK(hipMalloc((int **)&device_input, element_number * sizeof(int))); - HIP_CHECK(hipMalloc((int **)&device_output, element_number * sizeof(int))); - - // Device data preparation - HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); - HIP_CHECK(hipMemset(device_output, 0, element_number * sizeof(int))); - - // Run the kernel - // ... - - HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); - - // Free host memory - delete[] host_input; - delete[] host_output; - - // Free device memory - HIP_CHECK(hipFree(device_input)); - HIP_CHECK(hipFree(device_output)); - } +.. literalinclude:: ../../../tools/example_codes/pinned_host_memory.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. _memory_allocation_flags: diff --git a/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst b/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst index 778d9c3532..12f8783dc1 100644 --- a/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst +++ b/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst @@ -37,102 +37,17 @@ Here is how to use stream ordered memory allocation: .. tab-set:: .. tab-item:: Stream Ordered Memory Allocation - .. code-block:: cpp - - #include - #include - - // Kernel to perform some computation on allocated memory. - __global__ void myKernel(int* data, size_t numElements) { - int tid = threadIdx.x + blockIdx.x * blockDim.x; - if (tid < numElements) { - data[tid] = tid * 2; - } - } - - int main() { - // Initialize HIP. - hipInit(0); - - // Stream 0. - constexpr hipStream_t streamId = 0; - - // Allocate memory with stream ordered semantics. - constexpr size_t numElements = 1024; - int* devData; - hipMallocAsync(&devData, numElements * sizeof(*devData), streamId); - - // Launch the kernel to perform computation. - dim3 blockSize(256); - dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); - myKernel<<>>(devData, numElements); - - // Copy data back to host. - int* hostData = new int[numElements]; - hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost); - - // Print the array. - for (size_t i = 0; i < numElements; ++i) { - std::cout << "Element " << i << ": " << hostData[i] << std::endl; - } - - // Free memory with stream ordered semantics. - hipFreeAsync(devData, streamId); - delete[] hostData; - - // Synchronize to ensure completion. - hipDeviceSynchronize(); - - return 0; - } + .. literalinclude:: ../../../tools/example_codes/stream_ordered_memory_allocation.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. tab-item:: Ordinary Allocation - .. code-block:: cpp - - #include - #include - - // Kernel to perform some computation on allocated memory. - __global__ void myKernel(int* data, size_t numElements) { - int tid = threadIdx.x + blockIdx.x * blockDim.x; - if (tid < numElements) { - data[tid] = tid * 2; - } - } - - int main() { - // Initialize HIP. - hipInit(0); - - // Allocate memory. - constexpr size_t numElements = 1024; - int* devData; - hipMalloc(&devData, numElements * sizeof(*devData)); - - // Launch the kernel to perform computation. - dim3 blockSize(256); - dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); - myKernel<<>>(devData, numElements); - - // Copy data back to host. - int* hostData = new int[numElements]; - hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost); - - // Print the array. - for (size_t i = 0; i < numElements; ++i) { - std::cout << "Element " << i << ": " << hostData[i] << std::endl; - } - - // Free memory. - hipFree(devData); - delete[] hostData; - - // Synchronize to ensure completion. - hipDeviceSynchronize(); - - return 0; - } + .. literalinclude:: ../../../tools/example_codes/ordinary_memory_allocation.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp For more details, see :ref:`stream_ordered_memory_allocator_reference`. @@ -148,121 +63,29 @@ The ``hipMallocAsync()`` function uses the current memory pool and also provides Unlike NVIDIA CUDA, where stream-ordered memory allocation can be implicit, ROCm HIP is explicit. This requires managing memory allocation for each stream in HIP while ensuring precise control over memory usage and synchronization. -.. code-block:: cpp - - #include - #include - - // Kernel to perform some computation on allocated memory. - __global__ void myKernel(int* data, size_t numElements) { - int tid = threadIdx.x + blockIdx.x * blockDim.x; - if (tid < numElements) { - data[tid] = tid * 2; - } - } - - int main() { - // Create a stream. - hipStream_t stream; - hipStreamCreate(&stream); - - // Create a memory pool with default properties. - hipMemPoolProps poolProps = {}; - poolProps.allocType = hipMemAllocationTypePinned; - poolProps.handleTypes = hipMemHandleTypePosixFileDescriptor; - poolProps.location.type = hipMemLocationTypeDevice; - poolProps.location.id = 0; // Assuming device 0. - - hipMemPool_t memPool; - hipMemPoolCreate(&memPool, &poolProps); - - // Allocate memory from the pool asynchronously. - constexpr size_t numElements = 1024; - int* devData = nullptr; - hipMallocFromPoolAsync(&devData, numElements * sizeof(*devData), memPool, stream); - - // Define grid and block sizes. - dim3 blockSize(256); - dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); - - // Launch the kernel to perform computation. - myKernel<<>>(devData, numElements); - - // Synchronize the stream. - hipStreamSynchronize(stream); - - // Copy data back to host. - int* hostData = new int[numElements]; - hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost); - - // Print the array. - for (size_t i = 0; i < numElements; ++i) { - std::cout << "Element " << i << ": " << hostData[i] << std::endl; - } - - // Free the allocated memory. - hipFreeAsync(devData, stream); - - // Synchronize the stream again to ensure all operations are complete. - hipStreamSynchronize(stream); - - // Destroy the memory pool and stream. - hipMemPoolDestroy(memPool); - hipStreamDestroy(stream); - - // Free host memory. - delete[] hostData; - - return 0; - } +.. literalinclude:: ../../../tools/example_codes/memory_pool.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Trim pools ---------- The memory allocator allows you to allocate and free memory in stream order. To control memory usage, set the release threshold attribute using ``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of reserved memory in bytes to hold onto. -.. code-block:: cpp - - uint64_t threshold = UINT64_MAX; - hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold); +.. literalinclude:: ../../../tools/example_codes/memory_pool_threshold.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp When the amount of memory held in the memory pool exceeds the threshold, the allocator tries to release memory back to the operating system during the next call to stream, event, or context synchronization. To improve performance, it is a good practice to adjust the memory pool size using ``hipMemPoolTrimTo()``. It helps to reclaim memory from an excessive memory pool, which optimizes memory usage for your application. -.. code-block:: cpp - - #include - #include - - int main() { - hipMemPool_t memPool; - hipDevice_t device = 0; // Specify the device index. - - // Initialize the device. - hipSetDevice(device); - - // Get the default memory pool for the device. - hipDeviceGetDefaultMemPool(&memPool, device); - - // Allocate memory from the pool (e.g., 1 MB). - size_t allocSize = 1 * 1024 * 1024; - void* ptr; - hipMalloc(&ptr, allocSize); - - // Free the allocated memory. - hipFree(ptr); - - // Trim the memory pool to a specific size (e.g., 512 KB). - size_t newSize = 512 * 1024; - hipMemPoolTrimTo(memPool, newSize); - - // Clean up. - hipMemPoolDestroy(memPool); - - std::cout << "Memory pool trimmed to " << newSize << " bytes." << std::endl; - return 0; - } +.. literalinclude:: ../../../tools/example_codes/memory_pool_trim.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Resource usage statistics ------------------------- @@ -276,81 +99,10 @@ Resource usage statistics help in optimization. Here is the list of pool attribu To reset these attributes to the current value, use ``hipMemPoolSetAttribute()``. -.. code-block:: cpp - - #include - #include - - // Sample helper functions for getting the usage statistics in bulk. - struct usageStatistics { - uint64_t reservedMemCurrent; - uint64_t reservedMemHigh; - uint64_t usedMemCurrent; - uint64_t usedMemHigh; - }; - - void getUsageStatistics(hipMemPool_t memPool, struct usageStatistics *statistics) { - hipMemPoolGetAttribute(memPool, hipMemPoolAttrReservedMemCurrent, &statistics->reservedMemCurrent); - hipMemPoolGetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &statistics->reservedMemHigh); - hipMemPoolGetAttribute(memPool, hipMemPoolAttrUsedMemCurrent, &statistics->usedMemCurrent); - hipMemPoolGetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &statistics->usedMemHigh); - } - - // Resetting the watermarks resets them to the current value. - void resetStatistics(hipMemPool_t memPool) { - uint64_t value = 0; - hipMemPoolSetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &value); - hipMemPoolSetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &value); - } - - int main() { - hipMemPool_t memPool; - hipDevice_t device = 0; // Specify the device index. - - // Initialize the device. - hipSetDevice(device); - - // Get the default memory pool for the device. - hipDeviceGetDefaultMemPool(&memPool, device); - - // Allocate memory from the pool (e.g., 1 MB). - size_t allocSize = 1 * 1024 * 1024; - void* ptr; - hipMalloc(&ptr, allocSize); - - // Free the allocated memory. - hipFree(ptr); - - // Trim the memory pool to a specific size (e.g., 512 KB). - size_t newSize = 512 * 1024; - hipMemPoolTrimTo(memPool, newSize); - - // Get and print usage statistics before resetting. - usageStatistics statsBefore; - getUsageStatistics(memPool, &statsBefore); - std::cout << "Before resetting statistics:" << std::endl; - std::cout << "Reserved Memory Current: " << statsBefore.reservedMemCurrent << " bytes" << std::endl; - std::cout << "Reserved Memory High: " << statsBefore.reservedMemHigh << " bytes" << std::endl; - std::cout << "Used Memory Current: " << statsBefore.usedMemCurrent << " bytes" << std::endl; - std::cout << "Used Memory High: " << statsBefore.usedMemHigh << " bytes" << std::endl; - - // Reset the statistics. - resetStatistics(memPool); - - // Get and print usage statistics after resetting. - usageStatistics statsAfter; - getUsageStatistics(memPool, &statsAfter); - std::cout << "After resetting statistics:" << std::endl; - std::cout << "Reserved Memory Current: " << statsAfter.reservedMemCurrent << " bytes" << std::endl; - std::cout << "Reserved Memory High: " << statsAfter.reservedMemHigh << " bytes" << std::endl; - std::cout << "Used Memory Current: " << statsAfter.usedMemCurrent << " bytes" << std::endl; - std::cout << "Used Memory High: " << statsAfter.usedMemHigh << " bytes" << std::endl; - - // Clean up. - hipMemPoolDestroy(memPool); - - return 0; - } +.. literalinclude:: ../../../tools/example_codes/memory_pool_resource_usage_statistics.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Memory reuse policies --------------------- diff --git a/docs/how-to/hip_runtime_api/memory_management/unified_memory.rst b/docs/how-to/hip_runtime_api/memory_management/unified_memory.rst index ac7bba454e..a0f04c5fbe 100644 --- a/docs/how-to/hip_runtime_api/memory_management/unified_memory.rst +++ b/docs/how-to/hip_runtime_api/memory_management/unified_memory.rst @@ -303,207 +303,35 @@ explicit memory management example is presented in the last tab. .. tab-item:: hipMallocManaged() - .. code-block:: cpp + .. literalinclude:: ../../../tools/example_codes/dynamic_unified_memory.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 22-25 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int *a, *b, *c; - - // Allocate memory for a, b and c that is accessible to both device and host codes. - HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); - HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); - HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); - - // Setup input values. - *a = 1; - *b = 2; - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, a, b, c); - - // Wait for GPU to finish before accessing on host. - HIP_CHECK(hipDeviceSynchronize()); - - // Print the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - - // Cleanup allocated memory. - HIP_CHECK(hipFree(a)); - HIP_CHECK(hipFree(b)); - HIP_CHECK(hipFree(c)); - - return 0; - } + :language: cpp .. tab-item:: __managed__ - .. code-block:: cpp + .. literalinclude:: ../../../tools/example_codes/static_unified_memory.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 19-20 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - // Declare a, b and c as static variables. - __managed__ int a, b, c; - - int main() { - // Setup input values. - a = 1; - b = 2; - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, &a, &b, &c); - - // Wait for GPU to finish before accessing on host. - HIP_CHECK(hipDeviceSynchronize()); - - // Prints the result. - std::cout << a << " + " << b << " = " << c << std::endl; - - return 0; - } + :language: cpp .. tab-item:: new - .. code-block:: cpp + .. literalinclude:: ../../../tools/example_codes/standard_unified_memory.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 21-24 - - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int* a, int* b, int* c) { - *c = *a + *b; - } - - // This example requires HMM support and the environment variable HSA_XNACK needs to be set to 1 - int main() { - // Allocate memory with proper alignment for performance - int *a = new(std::align_val_t(128)) int[1]; - int *b = new(std::align_val_t(128)) int[1]; - int *c = new(std::align_val_t(128)) int[1]; - - // Setup input values. - *a = 1; - *b = 2; - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, a, b, c); - - // Wait for GPU to finish before accessing on host. - HIP_CHECK(hipDeviceSynchronize()); - - // Prints the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - - // Cleanup allocated memory with matching aligned delete. - ::operator delete[](a, std::align_val_t(128)); - ::operator delete[](b, std::align_val_t(128)); - ::operator delete[](c, std::align_val_t(128)); - - return 0; - } + :language: cpp .. tab-item:: Explicit Memory Management - .. code-block:: cpp + .. literalinclude:: ../../../tools/example_codes/explicit_memory.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 27-34, 39-40 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int a, b, c; - int *d_a, *d_b, *d_c; - - // Setup input values. - a = 1; - b = 2; - - // Allocate device copies of a, b and c. - HIP_CHECK(hipMalloc(&d_a, sizeof(*d_a))); - HIP_CHECK(hipMalloc(&d_b, sizeof(*d_b))); - HIP_CHECK(hipMalloc(&d_c, sizeof(*d_c))); - - // Copy input values to device. - HIP_CHECK(hipMemcpy(d_a, &a, sizeof(*d_a), hipMemcpyHostToDevice)); - HIP_CHECK(hipMemcpy(d_b, &b, sizeof(*d_b), hipMemcpyHostToDevice)); - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, d_a, d_b, d_c); - - // Copy the result back to the host. - HIP_CHECK(hipMemcpy(&c, d_c, sizeof(*d_c), hipMemcpyDeviceToHost)); - - // Cleanup allocated memory. - HIP_CHECK(hipFree(d_a)); - HIP_CHECK(hipFree(d_b)); - HIP_CHECK(hipFree(d_c)); - - // Prints the result. - std::cout << a << " + " << b << " = " << c << std::endl; - - return 0; - } + :language: cpp .. _using unified memory: @@ -559,65 +387,11 @@ Data prefetching is a technique used to improve the performance of your application by moving data to the desired device before it's actually needed. ``hipCpuDeviceId`` is a special constant to specify the CPU as target. -.. code-block:: cpp +.. literalinclude:: ../../../tools/example_codes/data_prefetching.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 33-36,41-42 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int *a, *b, *c; - int deviceId; - HIP_CHECK(hipGetDevice(&deviceId)); // Get the current device ID - - // Allocate memory for a, b and c that is accessible to both device and host codes. - HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); - HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); - HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); - - // Setup input values. - *a = 1; - *b = 2; - - // Prefetch the data to the GPU device. - HIP_CHECK(hipMemPrefetchAsync(a, sizeof(*a), deviceId, 0)); - HIP_CHECK(hipMemPrefetchAsync(b, sizeof(*b), deviceId, 0)); - HIP_CHECK(hipMemPrefetchAsync(c, sizeof(*c), deviceId, 0)); - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, a, b, c); - - // Prefetch the result back to the CPU. - HIP_CHECK(hipMemPrefetchAsync(c, sizeof(*c), hipCpuDeviceId, 0)); - - // Wait for the prefetch operations to complete. - HIP_CHECK(hipDeviceSynchronize()); - - // Prints the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - - // Cleanup allocated memory. - HIP_CHECK(hipFree(a)); - HIP_CHECK(hipFree(b)); - HIP_CHECK(hipFree(c)); - - return 0; - } + :language: cpp Memory advice -------------------------------------------------------------------------------- @@ -642,71 +416,11 @@ impact on performance can vary based on the specific use case and the system. The following is the updated version of the example above with memory advice instead of prefetching. -.. code-block:: cpp +.. literalinclude:: ../../../tools/example_codes/unified_memory_advice.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 29-41 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int deviceId; - HIP_CHECK(hipGetDevice(&deviceId)); - int *a, *b, *c; - - // Allocate memory for a, b, and c accessible to both device and host codes. - HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); - HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); - HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); - - // Set memory advice for a and b to be read, located on and accessed by the GPU. - HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetPreferredLocation, deviceId)); - HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetAccessedBy, deviceId)); - HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, deviceId)); - - HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetPreferredLocation, deviceId)); - HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetAccessedBy, deviceId)); - HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetReadMostly, deviceId)); - - // Set memory advice for c to be read, located on and accessed by the CPU. - HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetPreferredLocation, hipCpuDeviceId)); - HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetAccessedBy, hipCpuDeviceId)); - HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetReadMostly, hipCpuDeviceId)); - - // Setup input values. - *a = 1; - *b = 2; - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, a, b, c); - - // Wait for GPU to finish before accessing on host. - HIP_CHECK(hipDeviceSynchronize()); - - // Prints the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - - // Cleanup allocated memory. - HIP_CHECK(hipFree(a)); - HIP_CHECK(hipFree(b)); - HIP_CHECK(hipFree(c)); - - return 0; - } + :language: cpp Memory range attributes -------------------------------------------------------------------------------- @@ -714,70 +428,11 @@ Memory range attributes :cpp:func:`hipMemRangeGetAttribute()` allows you to query attributes of a given memory range. The attributes are given in :cpp:enum:`hipMemRangeAttribute`. -.. code-block:: cpp +.. literalinclude:: ../../../tools/example_codes/memory_range_attributes.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 44-49 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if(err != hipSuccess){ \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - } \ - } - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int *a, *b, *c; - unsigned int attributeValue; - constexpr size_t attributeSize = sizeof(attributeValue); - - int deviceId; - HIP_CHECK(hipGetDevice(&deviceId)); - - // Allocate memory for a, b and c that is accessible to both device and host codes. - HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); - HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); - HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); - - // Setup input values. - *a = 1; - *b = 2; - - HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, deviceId)); - - // Launch add() kernel on GPU. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, 0, a, b, c); - - // Wait for GPU to finish before accessing on host. - HIP_CHECK(hipDeviceSynchronize()); - - // Query an attribute of the memory range. - HIP_CHECK(hipMemRangeGetAttribute(&attributeValue, - attributeSize, - hipMemRangeAttributeReadMostly, - a, - sizeof(*a))); - - // Prints the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - std::cout << "The array a is" << (attributeValue == 1 ? "" : " NOT") << " set to hipMemRangeAttributeReadMostly" << std::endl; - - // Cleanup allocated memory. - HIP_CHECK(hipFree(a)); - HIP_CHECK(hipFree(b)); - HIP_CHECK(hipFree(c)); - - return 0; - } + :language: cpp Asynchronously attach memory to a stream -------------------------------------------------------------------------------- diff --git a/docs/how-to/hip_runtime_api/multi_device.rst b/docs/how-to/hip_runtime_api/multi_device.rst index 4d2fb98dcc..3facb80f65 100644 --- a/docs/how-to/hip_runtime_api/multi_device.rst +++ b/docs/how-to/hip_runtime_api/multi_device.rst @@ -22,43 +22,10 @@ dynamic selections during runtime to ensure optimal performance. If the application does not define a specific GPU, device 0 is selected. -.. code-block:: cpp - - #include - #include - - int main() - { - int deviceCount; - hipGetDeviceCount(&deviceCount); - std::cout << "Number of devices: " << deviceCount << std::endl; - - for (int deviceId = 0; deviceId < deviceCount; ++deviceId) - { - hipDeviceProp_t deviceProp; - hipGetDeviceProperties(&deviceProp, deviceId); - std::cout << "Device " << deviceId << std::endl << " Properties:" << std::endl; - std::cout << " Name: " << deviceProp.name << std::endl; - std::cout << " Total Global Memory: " << deviceProp.totalGlobalMem / (1024 * 1024) << " MiB" << std::endl; - std::cout << " Shared Memory per Block: " << deviceProp.sharedMemPerBlock / 1024 << " KiB" << std::endl; - std::cout << " Registers per Block: " << deviceProp.regsPerBlock << std::endl; - std::cout << " Warp Size: " << deviceProp.warpSize << std::endl; - std::cout << " Max Threads per Block: " << deviceProp.maxThreadsPerBlock << std::endl; - std::cout << " Max Threads per Multiprocessor: " << deviceProp.maxThreadsPerMultiProcessor << std::endl; - std::cout << " Number of Multiprocessors: " << deviceProp.multiProcessorCount << std::endl; - std::cout << " Max Threads Dimensions: [" - << deviceProp.maxThreadsDim[0] << ", " - << deviceProp.maxThreadsDim[1] << ", " - << deviceProp.maxThreadsDim[2] << "]" << std::endl; - std::cout << " Max Grid Size: [" - << deviceProp.maxGridSize[0] << ", " - << deviceProp.maxGridSize[1] << ", " - << deviceProp.maxGridSize[2] << "]" << std::endl; - std::cout << std::endl; - } - - return 0; - } +.. literalinclude:: ../../tools/example_codes/device_enumeration.cpp + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp .. _multi_device_selection: @@ -72,71 +39,10 @@ different GPUs might have different capabilities or workloads. By selecting the appropriate device, you ensure that the computational tasks are directed to the correct GPU, optimizing performance and resource utilization. -.. code-block:: cpp - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if (status != hipSuccess) { \ - std::cerr << "HIP error " << status \ - << ": " << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - exit(status); \ - } \ - } - - __global__ void simpleKernel(double *data) - { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - data[idx] = idx * 2.0; - } - - int main() - { - double* deviceData0; - double* deviceData1; - size_t size = 1024 * sizeof(*deviceData0); - - int deviceId0 = 0; - int deviceId1 = 1; - - // Set device 0 and perform operations - HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current - HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 - simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 - HIP_CHECK(hipDeviceSynchronize()); - - // Set device 1 and perform operations - HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current - HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 - simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 - HIP_CHECK(hipDeviceSynchronize()); - - // Copy result from device 0 - double hostData0[1024]; - HIP_CHECK(hipSetDevice(deviceId0)); - HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); - - // Copy result from device 1 - double hostData1[1024]; - HIP_CHECK(hipSetDevice(deviceId1)); - HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); - - // Display results from both devices - std::cout << "Device 0 data: " << hostData0[0] << std::endl; - std::cout << "Device 1 data: " << hostData1[0] << std::endl; - - // Free device memory - HIP_CHECK(hipFree(deviceData0)); - HIP_CHECK(hipFree(deviceData1)); - - return 0; - } - +.. literalinclude:: ../../tools/example_codes/device_selection.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Stream and event behavior =============================================================================== @@ -151,100 +57,10 @@ conditions and optimizes data flow in multi-GPU systems. Together, streams and events maximize performance by enabling parallel execution, load balancing, and effective resource utilization across heterogeneous hardware. -.. code-block:: cpp - - #include - #include - - __global__ void simpleKernel(double *data) - { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - data[idx] = idx * 2.0; - } - - int main() - { - int numDevices; - hipGetDeviceCount(&numDevices); - - if (numDevices < 2) { - std::cerr << "This example requires at least two GPUs." << std::endl; - return -1; - } - - double *deviceData0, *deviceData1; - size_t size = 1024 * sizeof(*deviceData0); - - // Create streams and events for each device - hipStream_t stream0, stream1; - hipEvent_t startEvent0, stopEvent0, startEvent1, stopEvent1; - - // Initialize device 0 - hipSetDevice(0); - hipStreamCreate(&stream0); - hipEventCreate(&startEvent0); - hipEventCreate(&stopEvent0); - hipMalloc(&deviceData0, size); - - // Initialize device 1 - hipSetDevice(1); - hipStreamCreate(&stream1); - hipEventCreate(&startEvent1); - hipEventCreate(&stopEvent1); - hipMalloc(&deviceData1, size); - - // Record the start event on device 0 - hipSetDevice(0); - hipEventRecord(startEvent0, stream0); - - // Launch the kernel asynchronously on device 0 - simpleKernel<<<1000, 128, 0, stream0>>>(deviceData0); - - // Record the stop event on device 0 - hipEventRecord(stopEvent0, stream0); - - // Wait for the stop event on device 0 to complete - hipEventSynchronize(stopEvent0); - - // Record the start event on device 1 - hipSetDevice(1); - hipEventRecord(startEvent1, stream1); - - // Launch the kernel asynchronously on device 1 - simpleKernel<<<1000, 128, 0, stream1>>>(deviceData1); - - // Record the stop event on device 1 - hipEventRecord(stopEvent1, stream1); - - // Wait for the stop event on device 1 to complete - hipEventSynchronize(stopEvent1); - - // Calculate elapsed time between the events for both devices - float milliseconds0 = 0, milliseconds1 = 0; - hipEventElapsedTime(&milliseconds0, startEvent0, stopEvent0); - hipEventElapsedTime(&milliseconds1, startEvent1, stopEvent1); - - std::cout << "Elapsed time on GPU 0: " << milliseconds0 << " ms" << std::endl; - std::cout << "Elapsed time on GPU 1: " << milliseconds1 << " ms" << std::endl; - - // Cleanup for device 0 - hipSetDevice(0); - hipEventDestroy(startEvent0); - hipEventDestroy(stopEvent0); - hipStreamSynchronize(stream0); - hipStreamDestroy(stream0); - hipFree(deviceData0); - - // Cleanup for device 1 - hipSetDevice(1); - hipEventDestroy(startEvent1); - hipEventDestroy(stopEvent1); - hipStreamSynchronize(stream1); - hipStreamDestroy(stream1); - hipFree(deviceData1); - - return 0; - } +.. literalinclude:: ../../tools/example_codes/multi_device_synchronization.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Peer-to-peer memory access =============================================================================== @@ -263,158 +79,16 @@ By adding peer-to-peer access to the example referenced in .. tab-item:: with peer-to-peer - .. code-block:: cpp + .. literalinclude:: ../../tools/example_codes/p2p_memory_access.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 31-37, 51-55 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if (status != hipSuccess) { \ - std::cerr << "HIP error " << status \ - << ": " << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - exit(status); \ - } \ - } - - __global__ void simpleKernel(double *data) - { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - data[idx] = idx * 2.0; - } - - int main() - { - double* deviceData0; - double* deviceData1; - size_t size = 1024 * sizeof(*deviceData0); - - int deviceId0 = 0; - int deviceId1 = 1; - - // Enable peer access to the memory (allocated and future) on the peer device. - // Ensure the device is active before enabling peer access. - hipSetDevice(deviceId0); - hipDeviceEnablePeerAccess(deviceId1, 0); - - hipSetDevice(deviceId1); - hipDeviceEnablePeerAccess(deviceId0, 0); - - // Set device 0 and perform operations - HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current - HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 - simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 - HIP_CHECK(hipDeviceSynchronize()); - - // Set device 1 and perform operations - HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current - HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 - simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 - HIP_CHECK(hipDeviceSynchronize()); - - // Use peer-to-peer access - hipSetDevice(deviceId0); - - // Now device 0 can access memory allocated on device 1 - hipMemcpy(deviceData0, deviceData1, size, hipMemcpyDeviceToDevice); - - // Copy result from device 0 - double hostData0[1024]; - HIP_CHECK(hipSetDevice(deviceId0)); - HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); - - // Copy result from device 1 - double hostData1[1024]; - HIP_CHECK(hipSetDevice(deviceId1)); - HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); - - // Display results from both devices - std::cout << "Device 0 data: " << hostData0[0] << std::endl; - std::cout << "Device 1 data: " << hostData1[0] << std::endl; - - // Free device memory - HIP_CHECK(hipFree(deviceData0)); - HIP_CHECK(hipFree(deviceData1)); - - return 0; - } + :language: cpp .. tab-item:: without peer-to-peer - .. code-block:: cpp + .. literalinclude:: ../../tools/example_codes/p2p_memory_access.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] :emphasize-lines: 43-49, 53, 58 - - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t status = expression; \ - if (status != hipSuccess) { \ - std::cerr << "HIP error " << status \ - << ": " << hipGetErrorString(status) \ - << " at " << __FILE__ << ":" \ - << __LINE__ << std::endl; \ - exit(status); \ - } \ - } - - __global__ void simpleKernel(double *data) - { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - data[idx] = idx * 2.0; - } - - int main() - { - double* deviceData0; - double* deviceData1; - size_t size = 1024 * sizeof(*deviceData0); - - int deviceId0 = 0; - int deviceId1 = 1; - - // Set device 0 and perform operations - HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current - HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 - simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 - HIP_CHECK(hipDeviceSynchronize()); - - // Set device 1 and perform operations - HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current - HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 - simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 - HIP_CHECK(hipDeviceSynchronize()); - - // Attempt to use deviceData0 on device 1 (This will not work as deviceData0 is allocated on device 0) - HIP_CHECK(hipSetDevice(deviceId1)); - hipError_t err = hipMemcpy(deviceData1, deviceData0, size, hipMemcpyDeviceToDevice); // This should fail - if (err != hipSuccess) - { - std::cout << "Error: Cannot access deviceData0 from device 1, deviceData0 is on device 0" << std::endl; - } - - // Copy result from device 0 - double hostData0[1024]; - HIP_CHECK(hipSetDevice(deviceId0)); - HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); - - // Copy result from device 1 - double hostData1[1024]; - HIP_CHECK(hipSetDevice(deviceId1)); - HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); - - // Display results from both devices - std::cout << "Device 0 data: " << hostData0[0] << std::endl; - std::cout << "Device 1 data: " << hostData1[0] << std::endl; - - // Free device memory - HIP_CHECK(hipFree(deviceData0)); - HIP_CHECK(hipFree(deviceData1)); - - return 0; - } \ No newline at end of file + :language: cpp diff --git a/docs/reference/api_syntax.rst b/docs/reference/api_syntax.rst index ead33fa5e1..bba41de1b5 100644 --- a/docs/reference/api_syntax.rst +++ b/docs/reference/api_syntax.rst @@ -11,92 +11,10 @@ example and comparison table. For a complete list of mappings, visit :ref:`HIPIF The following CUDA code example illustrates several CUDA API syntaxes. -.. code-block:: cpp - - #include - #include - #include - - __global__ void block_reduction(const float* input, float* output, int num_elements) - { - extern __shared__ float s_data[]; - - int tid = threadIdx.x; - int global_id = blockDim.x * blockIdx.x + tid; - - if (global_id < num_elements) - { - s_data[tid] = input[global_id]; - } - else - { - s_data[tid] = 0.0f; - } - __syncthreads(); - - for (int stride = blockDim.x / 2; stride > 0; stride >>= 1) - { - if (tid < stride) - { - s_data[tid] += s_data[tid + stride]; - } - __syncthreads(); - } - - if (tid == 0) - { - output[blockIdx.x] = s_data[0]; - } - } - - int main() - { - int threads = 256; - const int num_elements = 50000; - - std::vector h_a(num_elements); - std::vector h_b((num_elements + threads - 1) / threads); - - for (int i = 0; i < num_elements; ++i) - { - h_a[i] = rand() / static_cast(RAND_MAX); - } - - float *d_a, *d_b; - cudaMalloc(&d_a, h_a.size() * sizeof(float)); - cudaMalloc(&d_b, h_b.size() * sizeof(float)); - - cudaStream_t stream; - cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking); - - cudaEvent_t start_event, stop_event; - cudaEventCreate(&start_event); - cudaEventCreate(&stop_event); - - cudaMemcpyAsync(d_a, h_a.data(), h_a.size() * sizeof(float), cudaMemcpyHostToDevice, stream); - - cudaEventRecord(start_event, stream); - - int blocks = (num_elements + threads - 1) / threads; - block_reduction<<>>(d_a, d_b, num_elements); - - cudaMemcpyAsync(h_b.data(), d_b, h_b.size() * sizeof(float), cudaMemcpyDeviceToHost, stream); - - cudaEventRecord(stop_event, stream); - cudaEventSynchronize(stop_event); - - cudaEventElapsedTime(&milliseconds, start_event, stop_event); - std::cout << "Kernel execution time: " << milliseconds << " ms\n"; - - cudaFree(d_a); - cudaFree(d_b); - - cudaEventDestroy(start_event); - cudaEventDestroy(stop_event); - cudaStreamDestroy(stream); - - return 0; - } +.. literalinclude:: ../tools/example_codes/block_reduction.cu + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp The following table maps CUDA API functions to corresponding HIP API functions, as demonstrated in the preceding code examples. diff --git a/docs/reference/complex_math_api.rst b/docs/reference/complex_math_api.rst index b65cc2e484..9b6d2ce621 100644 --- a/docs/reference/complex_math_api.rst +++ b/docs/reference/complex_math_api.rst @@ -337,117 +337,7 @@ The kernel function ``computeDFT`` shows various HIP complex math operations in The example also demonstrates proper use of complex number handling on both host and device, including memory allocation, transfer, and validation of results between CPU and GPU implementations. -.. code-block:: cpp - - #include - #include - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if (err != hipSuccess) { \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - exit(EXIT_FAILURE); \ - } \ - } - - // Kernel to compute DFT - __global__ void computeDFT(const float* input, - hipFloatComplex* output, - const int N) - { - int k = blockIdx.x * blockDim.x + threadIdx.x; - if (k >= N) return; - - hipFloatComplex sum = make_hipFloatComplex(0.0f, 0.0f); - - for (int n = 0; n < N; n++) { - float angle = -2.0f * M_PI * k * n / N; - hipFloatComplex w = make_hipFloatComplex(cosf(angle), sinf(angle)); - hipFloatComplex x = make_hipFloatComplex(input[n], 0.0f); - sum = hipCaddf(sum, hipCmulf(x, w)); - } - - output[k] = sum; - } - - // CPU implementation of DFT for verification - std::vector cpuDFT(const std::vector& input) { - const int N = input.size(); - std::vector result(N); - - for (int k = 0; k < N; k++) { - hipFloatComplex sum = make_hipFloatComplex(0.0f, 0.0f); - for (int n = 0; n < N; n++) { - float angle = -2.0f * M_PI * k * n / N; - hipFloatComplex w = make_hipFloatComplex(cosf(angle), sinf(angle)); - hipFloatComplex x = make_hipFloatComplex(input[n], 0.0f); - sum = hipCaddf(sum, hipCmulf(x, w)); - } - result[k] = sum; - } - return result; - } - - int main() { - const int N = 256; // Signal length - const int blockSize = 256; - - // Generate input signal: sum of two sine waves - std::vector signal(N); - for (int i = 0; i < N; i++) { - float t = static_cast(i) / N; - signal[i] = sinf(2.0f * M_PI * 10.0f * t) + // 10 Hz component - 0.5f * sinf(2.0f * M_PI * 20.0f * t); // 20 Hz component - } - - // Compute reference solution on CPU - std::vector cpu_output = cpuDFT(signal); - - // Allocate device memory - float* d_signal; - hipFloatComplex* d_output; - HIP_CHECK(hipMalloc(&d_signal, N * sizeof(float))); - HIP_CHECK(hipMalloc(&d_output, N * sizeof(hipFloatComplex))); - - // Copy input to device - HIP_CHECK(hipMemcpy(d_signal, signal.data(), N * sizeof(float), - hipMemcpyHostToDevice)); - - // Launch kernel - dim3 grid((N + blockSize - 1) / blockSize); - dim3 block(blockSize); - computeDFT<<>>(d_signal, d_output, N); - HIP_CHECK(hipGetLastError()); - - // Get GPU results - std::vector gpu_output(N); - HIP_CHECK(hipMemcpy(gpu_output.data(), d_output, N * sizeof(hipFloatComplex), - hipMemcpyDeviceToHost)); - - // Verify results - bool passed = true; - const float tolerance = 1e-5f; // Adjust based on precision requirements - - for (int i = 0; i < N; i++) { - float diff_real = std::abs(hipCrealf(gpu_output[i]) - hipCrealf(cpu_output[i])); - float diff_imag = std::abs(hipCimagf(gpu_output[i]) - hipCimagf(cpu_output[i])); - - if (diff_real > tolerance || diff_imag > tolerance) { - passed = false; - break; - } - } - - std::cout << "DFT Verification: " << (passed ? "PASSED" : "FAILED") << "\n"; - - // Cleanup - HIP_CHECK(hipFree(d_signal)); - HIP_CHECK(hipFree(d_output)); - return passed ? 0 : 1; - } +.. literalinclude:: ../tools/example_codes/complex_math.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp diff --git a/docs/reference/math_api.rst b/docs/reference/math_api.rst index 06902e2290..b62fbc2c08 100644 --- a/docs/reference/math_api.rst +++ b/docs/reference/math_api.rst @@ -24,88 +24,10 @@ The following C++ example shows a simplified method for computing ULP difference HIP and standard C++ math functions by first finding where the maximum absolute error occurs. -.. code-block:: cpp - - #include - #include - #include - #include - #include - - #define HIP_CHECK(expression) \ - { \ - const hipError_t err = expression; \ - if (err != hipSuccess) { \ - std::cerr << "HIP error: " \ - << hipGetErrorString(err) \ - << " at " << __LINE__ << "\n"; \ - exit(EXIT_FAILURE); \ - } \ - } - - // Simple ULP difference calculator - int64_t ulp_diff(float a, float b) { - if (a == b) return 0; - union { float f; int32_t i; } ua{a}, ub{b}; - - // For negative values, convert to a positive-based representation - if (ua.i < 0) ua.i = std::numeric_limits::max() - ua.i; - if (ub.i < 0) ub.i = std::numeric_limits::max() - ub.i; - - return std::abs((int64_t)ua.i - (int64_t)ub.i); - } - - // Test kernel - __global__ void test_sin(float* out, int n) { - int i = blockIdx.x * blockDim.x + threadIdx.x; - if (i < n) { - float x = -M_PI + (2.0f * M_PI * i) / (n - 1); - out[i] = sin(x); - } - } - - int main() { - const int n = 1000000; - const int blocksize = 256; - std::vector outputs(n); - float* d_out; - - HIP_CHECK(hipMalloc(&d_out, n * sizeof(float))); - dim3 threads(blocksize); - dim3 blocks((n + blocksize - 1) / blocksize); // Fixed grid calculation - test_sin<<>>(d_out, n); - HIP_CHECK(hipPeekAtLastError()); - HIP_CHECK(hipMemcpy(outputs.data(), d_out, n * sizeof(float), hipMemcpyDeviceToHost)); - - // Step 1: Find the maximum absolute error - double max_abs_error = 0.0; - float max_error_output = 0.0; - float max_error_expected = 0.0; - - for (int i = 0; i < n; i++) { - float x = -M_PI + (2.0f * M_PI * i) / (n - 1); - float expected = std::sin(x); - double abs_error = std::abs(outputs[i] - expected); - - if (abs_error > max_abs_error) { - max_abs_error = abs_error; - max_error_output = outputs[i]; - max_error_expected = expected; - } - } - - // Step 2: Compute ULP difference based on the max absolute error pair - int64_t max_ulp = ulp_diff(max_error_output, max_error_expected); - - // Output results - std::cout << "Max Absolute Error: " << max_abs_error << std::endl; - std::cout << "Max ULP Difference: " << max_ulp << std::endl; - std::cout << "Max Error Values -> Got: " << max_error_output - << ", Expected: " << max_error_expected << std::endl; - - HIP_CHECK(hipFree(d_out)); - return 0; - } +.. literalinclude:: ../tools/example_codes/math.hip + :start-after: // [sphinx-start] + :end-before: // [sphinx-end] + :language: cpp Standard mathematical functions =============================== diff --git a/docs/tools/example_codes/add_kernel.hip b/docs/tools/example_codes/add_kernel.hip new file mode 100644 index 0000000000..f99d644cea --- /dev/null +++ b/docs/tools/example_codes/add_kernel.hip @@ -0,0 +1,95 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include +#include +#include + +///\brief Calculates \p a[i] = \p a[i] + \p b[i] where \p i stands for the thread's index in the grid. +// [sphinx-kernel-start] +__global__ void AddKernel(float* a, const float* b) +{ + int global_idx = threadIdx.x + blockIdx.x * blockDim.x; + + a[global_idx] += b[global_idx]; +} +// [sphinx-kernel-end] + +int main() +{ + // The number of float elements in each vector. + constexpr unsigned int size = 1 << 20; // == 1'048'576 elements + + // Bytes to allocate for each device vector. + constexpr size_t size_bytes = size * sizeof(float); + + // Number of threads per kernel block. + constexpr unsigned int threads_per_block = 256; + + // Number of blocks per kernel grid. The expression below calculates ceil(size/block_size). + constexpr unsigned int number_of_blocks = ceiling_div(size, threads_per_block); + + // Allocate a vector and fill it with an increasing sequence (i.e. 1, 2, 3, 4...) + std::vector h_a(size); + std::iota(h_a.begin(), h_a.end(), 1.f); + + // Allocate b vector and fill it with a decreasing sequence (i.e. 1'048'576, 1'048'575, ..., 3, 2, 1) + std::vector h_b(size); + std::iota(h_b.rbegin(), h_b.rend(), 1.f); + + // Allocate and copy vectors to device memory. + float* d_a{}; + float* d_b{}; + HIP_CHECK(hipMalloc(&d_a, size_bytes)); + HIP_CHECK(hipMalloc(&d_b, size_bytes)); + HIP_CHECK(hipMemcpy(d_a, h_a.data(), size_bytes, hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(d_b, h_b.data(), size_bytes, hipMemcpyHostToDevice)); + + std::cout << "Calculating a[i] = a[i] + b[i] over " << size << " elements." << std::endl; + + // Launch the kernel on the default stream. + // [sphinx-kernel-launch-start] + AddKernel<<>>(d_a, d_b); + // [sphinx-kernel-launch-end] + + // Check if the kernel launch was successful. + HIP_CHECK(hipGetLastError()); + + // Copy the results back to the host. This call blocks the host's execution until the copy is finished. + HIP_CHECK(hipMemcpy(h_a.data(), d_a, size_bytes, hipMemcpyDeviceToHost)); + + // Free device memory. + HIP_CHECK(hipFree(d_b)); + HIP_CHECK(hipFree(d_a)); + + // Print the first few elements of the results: + constexpr size_t elements_to_print = 10; + std::cout << "First " << elements_to_print << " elements of the results: " + << format_range(h_a.begin(), h_a.begin() + elements_to_print) << std::endl; + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/async_kernel_execution.hip b/docs/tools/example_codes/async_kernel_execution.hip new file mode 100644 index 0000000000..23b298be5a --- /dev/null +++ b/docs/tools/example_codes/async_kernel_execution.hip @@ -0,0 +1,142 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +// GPU Kernels +__global__ void kernelA(double* arrayA, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] += 1.0; + } +} + +__global__ void kernelB(double* arrayA, double* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayB[x] += arrayA[x] + 3.0; + } +} + +int main() +{ + constexpr int numOfBlocks = 1 << 20; + constexpr int threadsPerBlock = 1024; + constexpr int numberOfIterations = 50; + // The array size smaller to avoid the relatively short kernel launch compared to memory copies + constexpr std::size_t arraySize = 1U << 25; + double *d_dataA; + double *d_dataB; + + double initValueA = 0.0; + double initValueB = 2.0; + + std::vector vectorA(arraySize, initValueA); + std::vector vectorB(arraySize, initValueB); + // Allocate device memory + HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); + HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); + // Create streams + hipStream_t streamA, streamB; + HIP_CHECK(hipStreamCreate(&streamA)); + HIP_CHECK(hipStreamCreate(&streamB)); + for(unsigned int iteration = 0; iteration < numberOfIterations; iteration++) + { + // Stream 1: Host to Device 1 + HIP_CHECK(hipMemcpyAsync(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice, streamA)); + // Stream 2: Host to Device 2 + HIP_CHECK(hipMemcpyAsync(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice, streamB)); + // Stream 1: Kernel 1 + kernelA<<>>(d_dataA, arraySize); + // Wait for streamA finish + HIP_CHECK(hipStreamSynchronize(streamA)); + // Stream 2: Kernel 2 + kernelB<<>>(d_dataA, d_dataB, arraySize); + // Stream 1: Device to Host 2 (after Kernel 1) + HIP_CHECK(hipMemcpyAsync(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost, streamA)); + // Stream 2: Device to Host 2 (after Kernel 2) + HIP_CHECK(hipMemcpyAsync(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost, streamB)); + } + // Wait for all operations in both streams to complete + HIP_CHECK(hipStreamSynchronize(streamA)); + HIP_CHECK(hipStreamSynchronize(streamB)); + // Verify results + double expectedA = (double)numberOfIterations; + double expectedB = initValueB + (3.0 * numberOfIterations) + (expectedA * (expectedA + 1.0)) / 2.0; + bool passed = true; + for(std::size_t i = 0; i < arraySize; ++i) + { + if(vectorA[i] != expectedA) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << " at index: " << i << std::endl; + break; + } + if(vectorB[i] != expectedB) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << " at index: " << i << std::endl; + break; + } + } + + if(passed) + { + std::cout << "Asynchronous execution completed successfully." << std::endl; + } + else + { + std::cerr << "Asynchronous execution failed." << std::endl; + } + + // Cleanup + HIP_CHECK(hipStreamDestroy(streamA)); + HIP_CHECK(hipStreamDestroy(streamB)); + HIP_CHECK(hipFree(d_dataA)); + HIP_CHECK(hipFree(d_dataB)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/block_reduction.cu b/docs/tools/example_codes/block_reduction.cu new file mode 100644 index 0000000000..9e0b73b3c4 --- /dev/null +++ b/docs/tools/example_codes/block_reduction.cu @@ -0,0 +1,110 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +__global__ void block_reduction(const float* input, float* output, int num_elements) +{ + extern __shared__ float s_data[]; + + int tid = threadIdx.x; + int global_id = blockDim.x * blockIdx.x + tid; + + if (global_id < num_elements) + { + s_data[tid] = input[global_id]; + } + else + { + s_data[tid] = 0.0f; + } + __syncthreads(); + + for (int stride = blockDim.x / 2; stride > 0; stride >>= 1) + { + if (tid < stride) + { + s_data[tid] += s_data[tid + stride]; + } + __syncthreads(); + } + + if (tid == 0) + { + output[blockIdx.x] = s_data[0]; + } +} + +int main() +{ + int threads = 256; + const int num_elements = 50000; + + std::vector h_a(num_elements); + std::vector h_b((num_elements + threads - 1) / threads); + + for (int i = 0; i < num_elements; ++i) + { + h_a[i] = rand() / static_cast(RAND_MAX); + } + + float *d_a, *d_b; + cudaMalloc(&d_a, h_a.size() * sizeof(float)); + cudaMalloc(&d_b, h_b.size() * sizeof(float)); + + cudaStream_t stream; + cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking); + + cudaEvent_t start_event, stop_event; + cudaEventCreate(&start_event); + cudaEventCreate(&stop_event); + + cudaMemcpyAsync(d_a, h_a.data(), h_a.size() * sizeof(float), cudaMemcpyHostToDevice, stream); + + cudaEventRecord(start_event, stream); + + int blocks = (num_elements + threads - 1) / threads; + block_reduction<<>>(d_a, d_b, num_elements); + + cudaMemcpyAsync(h_b.data(), d_b, h_b.size() * sizeof(float), cudaMemcpyDeviceToHost, stream); + + cudaEventRecord(stop_event, stream); + cudaEventSynchronize(stop_event); + + float milliseconds = 0.f; + cudaEventElapsedTime(&milliseconds, start_event, stop_event); + std::cout << "Kernel execution time: " << milliseconds << " ms\n"; + + cudaFree(d_a); + cudaFree(d_b); + + cudaEventDestroy(start_event); + cudaEventDestroy(stop_event); + cudaStreamDestroy(stream); + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/call_stack_management.cpp b/docs/tools/example_codes/call_stack_management.cpp new file mode 100644 index 0000000000..852d5f07e3 --- /dev/null +++ b/docs/tools/example_codes/call_stack_management.cpp @@ -0,0 +1,58 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +int main() +{ + std::size_t stackSize; + HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); + std::cout << "Default stack size: " << stackSize << " bytes" << std::endl; + + // Set a new stack size + std::size_t newStackSize = 1024 * 8; // 8 KiB + HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize)); + + HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); + std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/calling_global_functions.hip b/docs/tools/example_codes/calling_global_functions.hip new file mode 100644 index 0000000000..b58cff758d --- /dev/null +++ b/docs/tools/example_codes/calling_global_functions.hip @@ -0,0 +1,89 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Performs a simple initialization of an array with the thread's index variables. +// This function is only available in device code. +__device__ void init_array(float * const a, const unsigned int arraySize) +{ + // globalIdx uniquely identifies a thread in a 1D launch configuration. + const int globalIdx = threadIdx.x + blockIdx.x * blockDim.x; + // Each thread initializes a single element of the array. + if(globalIdx < arraySize) + { + a[globalIdx] = globalIdx; + } +} + +// Rounds a value up to the next multiple. +// This function is available in host and device code. +__host__ __device__ constexpr int round_up_to_nearest_multiple(int number, int multiple) +{ + return (number + multiple - 1)/multiple; +} + +__global__ void example_kernel(float * const a, const unsigned int N) +{ + // Initialize array. + init_array(a, N); + // Perform additional work: + // - work with the array + // - use the array in a different kernel + // - ... +} + +int main() +{ + constexpr int N = 100000000; // problem size + constexpr int blockSize = 256; //configurable block size + + //needed number of blocks for the given problem size + constexpr int gridSize = round_up_to_nearest_multiple(N, blockSize); + + float *a; + // allocate memory on the GPU + HIP_CHECK(hipMalloc(&a, sizeof(*a) * N)); + + std::cout << "Launching kernel." << std::endl; + example_kernel<<>>(a, N); + // make sure kernel execution is finished by synchronizing. The CPU can also + // execute other instructions during that time + HIP_CHECK(hipDeviceSynchronize()); + std::cout << "Kernel execution finished." << std::endl; + + HIP_CHECK(hipFree(a)); +} +// [sphinx-end] diff --git a/docs/tools/example_codes/compilation_apis.cpp b/docs/tools/example_codes/compilation_apis.cpp new file mode 100644 index 0000000000..4c95ee2397 --- /dev/null +++ b/docs/tools/example_codes/compilation_apis.cpp @@ -0,0 +1,165 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#include +#include +#include +#include +#include + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +// source code for hiprtc +static constexpr auto kernel_source{ + R"( + extern "C" + __global__ void vector_add(float* output, float* input1, float* input2, size_t size) + { + int i = threadIdx.x; + if (i < size) + { + output[i] = input1[i] + input2[i]; + } + } +)"}; + +int main() +{ + hiprtcProgram prog; + auto rtc_ret_code = hiprtcCreateProgram(&prog, // HIPRTC program handle + kernel_source, // kernel source string + "vector_add.cpp", // Name of the file + 0, // Number of headers + nullptr, // Header sources + nullptr); // Name of header file + + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + hipDeviceProp_t props; + int device = 0; + HIP_CHECK(hipGetDeviceProperties(&props, device)); + auto sarg = std::string{"--gpu-architecture="} + props.gcnArchName; // device for which binary is to be generated + + const char* options[] = {sarg.c_str()}; + + rtc_ret_code = hiprtcCompileProgram(prog, // hiprtcProgram + 1, // Number of options + options); // Clang Options + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + std::size_t logSize; + HIPRTC_CHECK(hiprtcGetProgramLogSize(prog, &logSize)); + + if (logSize) + { + std::string log(logSize, '\0'); + HIPRTC_CHECK(hiprtcGetProgramLog(prog, &log[0])); + std::cerr << "Compilation failed or produced warnings: " << log << std::endl; + std::abort(); + } + + std::size_t codeSize; + HIPRTC_CHECK(hiprtcGetCodeSize(prog, &codeSize)); + + std::vector kernel_binary(codeSize); + HIPRTC_CHECK(hiprtcGetCode(prog, kernel_binary.data())); + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + hipModule_t module; + hipFunction_t kernel; + + HIP_CHECK(hipModuleLoadData(&module, kernel_binary.data())); + HIP_CHECK(hipModuleGetFunction(&kernel, module, "vector_add")); + + constexpr std::size_t ele_size = 256; // total number of items to add + std::vector hinput, output; + hinput.reserve(ele_size); + output.reserve(ele_size); + for (std::size_t i = 0; i < ele_size; i++) + { + hinput.push_back(static_cast(i + 1)); + output.push_back(0.0f); + } + + float *dinput1, *dinput2, *doutput; + HIP_CHECK(hipMalloc(&dinput1, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&dinput2, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&doutput, sizeof(float) * ele_size)); + + HIP_CHECK(hipMemcpy(dinput1, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(dinput2, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + + struct + { + float* output; + float* input1; + float* input2; + std::size_t size; + } args{doutput, dinput1, dinput2, ele_size}; + + auto size = sizeof(args); + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &args, HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, + HIP_LAUNCH_PARAM_END}; + + HIP_CHECK(hipModuleLaunchKernel(kernel, 1, 1, 1, ele_size, 1, 1, 0, nullptr, nullptr, config)); + + HIP_CHECK(hipMemcpy(output.data(), doutput, sizeof(float) * ele_size, hipMemcpyDeviceToHost)); + + for (std::size_t i = 0; i < ele_size; i++) + { + if ((hinput[i] + hinput[i]) != output[i]) + { + std::cout << "Failed in validation: " << (hinput[i] + hinput[i]) << " - " << output[i] << std::endl; + std::abort(); + } + } + std::cout << "Passed" << std::endl; + + HIP_CHECK(hipFree(dinput1)); + HIP_CHECK(hipFree(dinput2)); + HIP_CHECK(hipFree(doutput)); + + return EXIT_SUCCESS; +} +// [sphinx-stop] diff --git a/docs/tools/example_codes/complex_math.hip b/docs/tools/example_codes/complex_math.hip new file mode 100644 index 0000000000..9a6edb7fcb --- /dev/null +++ b/docs/tools/example_codes/complex_math.hip @@ -0,0 +1,142 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. +// [sphinx-start] +#include +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ + { \ + const hipError_t err = expression; \ + if (err != hipSuccess) { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + exit(EXIT_FAILURE); \ + } \ + } + +// Kernel to compute DFT +__global__ void computeDFT(const float* input, hipFloatComplex* output, const int N) +{ + int k = blockIdx.x * blockDim.x + threadIdx.x; + if (k >= N) return; + + hipFloatComplex sum = make_hipFloatComplex(0.0f, 0.0f); + + for (int n = 0; n < N; n++) + { + float angle = -2.0f * M_PI * k * n / N; + hipFloatComplex w = make_hipFloatComplex(cosf(angle), sinf(angle)); + hipFloatComplex x = make_hipFloatComplex(input[n], 0.0f); + sum = hipCaddf(sum, hipCmulf(x, w)); + } + + output[k] = sum; +} + +// CPU implementation of DFT for verification +std::vector cpuDFT(const std::vector& input) +{ + const int N = input.size(); + std::vector result(N); + + for (int k = 0; k < N; k++) + { + hipFloatComplex sum = make_hipFloatComplex(0.0f, 0.0f); + for (int n = 0; n < N; n++) + { + float angle = -2.0f * M_PI * k * n / N; + hipFloatComplex w = make_hipFloatComplex(cosf(angle), sinf(angle)); + hipFloatComplex x = make_hipFloatComplex(input[n], 0.0f); + sum = hipCaddf(sum, hipCmulf(x, w)); + } + result[k] = sum; + } + return result; +} + +int main() +{ + const int N = 256; // Signal length + const int blockSize = 256; + + // Generate input signal: sum of two sine waves + std::vector signal(N); + for (int i = 0; i < N; i++) + { + float t = static_cast(i) / N; + signal[i] = sinf(2.0f * M_PI * 10.0f * t) + // 10 Hz component + 0.5f * sinf(2.0f * M_PI * 20.0f * t); // 20 Hz component + } + + // Compute reference solution on CPU + std::vector cpu_output = cpuDFT(signal); + + // Allocate device memory + float* d_signal; + hipFloatComplex* d_output; + HIP_CHECK(hipMalloc(&d_signal, N * sizeof(float))); + HIP_CHECK(hipMalloc(&d_output, N * sizeof(hipFloatComplex))); + + // Copy input to device + HIP_CHECK(hipMemcpy(d_signal, signal.data(), N * sizeof(float), hipMemcpyHostToDevice)); + + // Launch kernel + dim3 grid((N + blockSize - 1) / blockSize); + dim3 block(blockSize); + computeDFT<<>>(d_signal, d_output, N); + HIP_CHECK(hipGetLastError()); + + // Get GPU results + std::vector gpu_output(N); + HIP_CHECK(hipMemcpy(gpu_output.data(), d_output, N * sizeof(hipFloatComplex), hipMemcpyDeviceToHost)); + + // Verify results + bool passed = true; + const float tolerance = 1e-5f; // Adjust based on precision requirements + + for (int i = 0; i < N; i++) + { + float diff_real = std::abs(hipCrealf(gpu_output[i]) - hipCrealf(cpu_output[i])); + float diff_imag = std::abs(hipCimagf(gpu_output[i]) - hipCimagf(cpu_output[i])); + + if (diff_real > tolerance || diff_imag > tolerance) + { + passed = false; + break; + } + } + + std::cout << "DFT Verification: " << (passed ? "PASSED" : "FAILED") << "\n"; + + // Cleanup + HIP_CHECK(hipFree(d_signal)); + HIP_CHECK(hipFree(d_output)); + + return passed ? EXIT_SUCCESS : EXIT_FAILURE; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/constant_memory_device.hip b/docs/tools/example_codes/constant_memory_device.hip new file mode 100644 index 0000000000..8ab180452d --- /dev/null +++ b/docs/tools/example_codes/constant_memory_device.hip @@ -0,0 +1,75 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include +#include + +// [sphinx-start] +constexpr std::size_t const_array_size = 32; +__constant__ double const_array[const_array_size]; + +void set_constant_memory(double* values) +{ + HIP_CHECK(hipMemcpyToSymbol(const_array, values, const_array_size * sizeof(double))); +} + +__global__ void kernel_using_const_memory(double* array) +{ + int warpIdx = threadIdx.x / warpSize; + // uniform access of warps to const_array for best performance + array[blockIdx.x] *= const_array[warpIdx]; +} +// [sphinx-end] + +int main() +{ + std::size_t elements = 32; + std::size_t size_bytes = elements * sizeof(double); + + // allocate host array + double *host_array = new double[elements]; + + // allocate device array + double *device_array = nullptr; + HIP_CHECK(hipMalloc((double**) &device_array, size_bytes)); + + // copy from host to the device + set_constant_memory(host_array); + + kernel_using_const_memory<<<32, 32>>>(device_array); + + // copy from device to host, to e.g. get results from the kernel + HIP_CHECK(hipMemcpy(host_array, device_array, size_bytes, hipMemcpyDeviceToHost)); + + // free memory when not needed any more + HIP_CHECK(hipFree(device_array)); + delete[] host_array; + + std::cout << "Success!" << std::endl; + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/data_prefetching.hip b/docs/tools/example_codes/data_prefetching.hip new file mode 100644 index 0000000000..7984c0ec25 --- /dev/null +++ b/docs/tools/example_codes/data_prefetching.hip @@ -0,0 +1,84 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +int main() +{ + int *a, *b, *c; + int deviceId; + HIP_CHECK(hipGetDevice(&deviceId)); // Get the current device ID + + // Allocate memory for a, b and c that is accessible to both device and host codes. + HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); + HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); + HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); + + // Setup input values. + *a = 1; + *b = 2; + + // Prefetch the data to the GPU device. + HIP_CHECK(hipMemPrefetchAsync(a, sizeof(*a), deviceId, 0)); + HIP_CHECK(hipMemPrefetchAsync(b, sizeof(*b), deviceId, 0)); + HIP_CHECK(hipMemPrefetchAsync(c, sizeof(*c), deviceId, 0)); + + // Launch add() kernel on GPU. + add<<<1, 1>>>(a, b, c); + + // Prefetch the result back to the CPU. + HIP_CHECK(hipMemPrefetchAsync(c, sizeof(*c), hipCpuDeviceId, 0)); + + // Wait for the prefetch operations to complete. + HIP_CHECK(hipDeviceSynchronize()); + + // Prints the result. + std::cout << *a << " + " << *b << " = " << *c << std::endl; + + // Cleanup allocated memory. + HIP_CHECK(hipFree(a)); + HIP_CHECK(hipFree(b)); + HIP_CHECK(hipFree(c)); + + return 0; +} +// [sphinx-end] \ No newline at end of file diff --git a/docs/tools/example_codes/device_code_feature_identification.hip b/docs/tools/example_codes/device_code_feature_identification.hip new file mode 100644 index 0000000000..deeefd4d55 --- /dev/null +++ b/docs/tools/example_codes/device_code_feature_identification.hip @@ -0,0 +1,61 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__global__ void test_kernel() +{ + // [sphinx-start] +//#if __CUDA_ARCH__ >= 130 // does not properly specify, what feature is required, not portable +#if __HIP_ARCH_HAS_DOUBLES__ == 1 // explicitly specifies, what feature is required, portable between AMD and NVIDIA GPUs + // device code +#endif + // [sphinx-end] + +#if __HIP_ARCH_HAS_DOUBLES__ == 1 + printf("Device has double-precision support.\n"); +#else + printf("Device does not have double-precision support.\n"); +#endif +} + +int main() +{ + test_kernel<<<1, 1, 0, 0>>>(); + HIP_CHECK(hipDeviceSynchronize()); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/device_enumeration.cpp b/docs/tools/example_codes/device_enumeration.cpp new file mode 100644 index 0000000000..b98dc00716 --- /dev/null +++ b/docs/tools/example_codes/device_enumeration.cpp @@ -0,0 +1,74 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + std::cout << "Number of devices: " << deviceCount << std::endl; + + for (int deviceId = 0; deviceId < deviceCount; ++deviceId) + { + hipDeviceProp_t deviceProp; + HIP_CHECK(hipGetDeviceProperties(&deviceProp, deviceId)); + std::cout << "Device " << deviceId << std::endl << " Properties:" << std::endl; + std::cout << " Name: " << deviceProp.name << std::endl; + std::cout << " Total Global Memory: " << deviceProp.totalGlobalMem / (1024 * 1024) << " MiB" << std::endl; + std::cout << " Shared Memory per Block: " << deviceProp.sharedMemPerBlock / 1024 << " KiB" << std::endl; + std::cout << " Registers per Block: " << deviceProp.regsPerBlock << std::endl; + std::cout << " Warp Size: " << deviceProp.warpSize << std::endl; + std::cout << " Max Threads per Block: " << deviceProp.maxThreadsPerBlock << std::endl; + std::cout << " Max Threads per Multiprocessor: " << deviceProp.maxThreadsPerMultiProcessor << std::endl; + std::cout << " Number of Multiprocessors: " << deviceProp.multiProcessorCount << std::endl; + std::cout << " Max Threads Dimensions: [" + << deviceProp.maxThreadsDim[0] << ", " + << deviceProp.maxThreadsDim[1] << ", " + << deviceProp.maxThreadsDim[2] << "]" << std::endl; + std::cout << " Max Grid Size: [" + << deviceProp.maxGridSize[0] << ", " + << deviceProp.maxGridSize[1] << ", " + << deviceProp.maxGridSize[2] << "]" << std::endl; + std::cout << std::endl; + } + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/device_recursion.hip b/docs/tools/example_codes/device_recursion.hip new file mode 100644 index 0000000000..1a0ca56afd --- /dev/null +++ b/docs/tools/example_codes/device_recursion.hip @@ -0,0 +1,72 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +__device__ unsigned long long fibonacci(unsigned long long n) +{ + if (n == 0 || n == 1) + { + return n; + } + return fibonacci(n - 1) + fibonacci(n - 2); +} + +__global__ void kernel(unsigned long long n) +{ + unsigned long long result = fibonacci(n); + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + + if (x == 0) + printf("%llu! = %llu \n", n, result); +} + +int main() +{ + kernel<<<1, 1>>>(10); + HIP_CHECK(hipDeviceSynchronize()); + + // With -O0 optimization option hit the stack limit + // kernel<<<1, 256>>>(2048); + // HIP_CHECK(hipDeviceSynchronize()); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/device_selection.hip b/docs/tools/example_codes/device_selection.hip new file mode 100644 index 0000000000..5bde4e3304 --- /dev/null +++ b/docs/tools/example_codes/device_selection.hip @@ -0,0 +1,98 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__global__ void simpleKernel(double *data) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + data[idx] = idx * 2.0; +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + if(deviceCount < 2) + { + std::cout << "This example requires at least two HIP devices." << std::endl; + return EXIT_SUCCESS; + } + + double* deviceData0; + double* deviceData1; + std::size_t size = 1024 * sizeof(*deviceData0); + + int deviceId0 = 0; + int deviceId1 = 1; + + // Set device 0 and perform operations + HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current + HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 + simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 + HIP_CHECK(hipDeviceSynchronize()); + + // Set device 1 and perform operations + HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current + HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 + simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 + HIP_CHECK(hipDeviceSynchronize()); + + // Copy result from device 0 + double hostData0[1024]; + HIP_CHECK(hipSetDevice(deviceId0)); + HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); + + // Copy result from device 1 + double hostData1[1024]; + HIP_CHECK(hipSetDevice(deviceId1)); + HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); + + // Display results from both devices + std::cout << "Device 0 data: " << hostData0[0] << std::endl; + std::cout << "Device 1 data: " << hostData1[0] << std::endl; + + // Free device memory + HIP_CHECK(hipFree(deviceData0)); + HIP_CHECK(hipFree(deviceData1)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/dynamic_shared_memory_device.hip b/docs/tools/example_codes/dynamic_shared_memory_device.hip new file mode 100644 index 0000000000..8f9c356757 --- /dev/null +++ b/docs/tools/example_codes/dynamic_shared_memory_device.hip @@ -0,0 +1,64 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include +#include + +// [sphinx-start] +extern __shared__ int dynamic_shared[]; + +__global__ void kernel(int array1SizeX, int array1SizeY, int array2Size) +{ + // at least (array1SizeX * array1SizeY + array2Size) * sizeof(int) bytes + // dynamic shared memory need to be allocated when the kernel is launched + int* array1 = dynamic_shared; + // array1 is interpreted as 2D of size: + int array1Size = array1SizeX * array1SizeY; + + int* array2 = &(array1[array1Size]); + + if(threadIdx.x < array1SizeX && threadIdx.y < array1SizeY) + { + // access array1 with threadIdx.x + threadIdx.y * array1SizeX + } + if(threadIdx.x < array2Size) + { + // access array2 threadIdx.x + } +} +// [sphinx-end] + +int main() +{ + std::size_t shared_memory_bytes = 512 * sizeof(int); + kernel<<<64, 512, shared_memory_bytes>>>(512, 1, 512); + HIP_CHECK(hipPeekAtLastError()); + HIP_CHECK(hipDeviceSynchronize()); + + std::cout << "Success!" << std::endl; + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/dynamic_unified_memory.hip b/docs/tools/example_codes/dynamic_unified_memory.hip new file mode 100644 index 0000000000..d5c5fb00ba --- /dev/null +++ b/docs/tools/example_codes/dynamic_unified_memory.hip @@ -0,0 +1,74 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +int main() +{ + int *a, *b, *c; + + // Allocate memory for a, b and c that is accessible to both device and host codes. + HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); + HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); + HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); + + // Setup input values. + *a = 1; + *b = 2; + + // Launch add() kernel on GPU. + add<<<1, 1>>>(a, b, c); + + // Wait for GPU to finish before accessing on host. + HIP_CHECK(hipDeviceSynchronize()); + + // Print the result. + std::cout << *a << " + " << *b << " = " << *c << std::endl; + + // Cleanup allocated memory. + HIP_CHECK(hipFree(a)); + HIP_CHECK(hipFree(b)); + HIP_CHECK(hipFree(c)); + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/error_handling.hip b/docs/tools/example_codes/error_handling.hip new file mode 100644 index 0000000000..78fd89d985 --- /dev/null +++ b/docs/tools/example_codes/error_handling.hip @@ -0,0 +1,97 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c, std::size_t size) +{ + const std::size_t index = threadIdx.x + blockDim.x * blockIdx.x; + if(index < size) + { + c[index] += a[index] + b[index]; + } +} + +int main() +{ + constexpr int numOfBlocks = 256; + constexpr int threadsPerBlock = 256; + constexpr std::size_t arraySize = 1U << 16; + + std::vector a(arraySize), b(arraySize), c(arraySize); + int *d_a, *d_b, *d_c; + + // Setup input values. + std::fill(a.begin(), a.end(), 1); + std::fill(b.begin(), b.end(), 2); + + // Allocate device copies of a, b and c. + HIP_CHECK(hipMalloc(&d_a, arraySize * sizeof(int))); + HIP_CHECK(hipMalloc(&d_b, arraySize * sizeof(int))); + HIP_CHECK(hipMalloc(&d_c, arraySize * sizeof(int))); + + // Copy input values to device. + HIP_CHECK(hipMemcpy(d_a, a.data(), arraySize * sizeof(int), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(d_b, b.data(), arraySize * sizeof(int), hipMemcpyHostToDevice)); + + // Launch add() kernel on GPU. + add<<>>(d_a, d_b, d_c, arraySize); + // Check the kernel launch + HIP_CHECK(hipGetLastError()); + // Check for kernel execution error + HIP_CHECK(hipDeviceSynchronize()); + + // Copy the result back to the host. + HIP_CHECK(hipMemcpy(c.data(), d_c, arraySize * sizeof(int), hipMemcpyDeviceToHost)); + + // Cleanup allocated memory. + HIP_CHECK(hipFree(d_a)); + HIP_CHECK(hipFree(d_b)); + HIP_CHECK(hipFree(d_c)); + + // Print the result. + std::cout << a[0] << " + " << b[0] << " = " << c[0] << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/event_based_synchronization.hip b/docs/tools/example_codes/event_based_synchronization.hip new file mode 100644 index 0000000000..d49137530a --- /dev/null +++ b/docs/tools/example_codes/event_based_synchronization.hip @@ -0,0 +1,153 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +// GPU Kernels +__global__ void kernelA(double* arrayA, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] += 1.0; + } +} + +__global__ void kernelB(double* arrayA, double* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayB[x] += arrayA[x] + 3.0; + } +} + +int main() +{ + constexpr int numOfBlocks = 1 << 20; + constexpr int threadsPerBlock = 1024; + constexpr int numberOfIterations = 50; + // The array size smaller to avoid the relatively short kernel launch compared to memory copies + constexpr std::size_t arraySize = 1U << 25; + double *d_dataA; + double *d_dataB; + double initValueA = 0.0; + double initValueB = 2.0; + + std::vector vectorA(arraySize, initValueA); + std::vector vectorB(arraySize, initValueB); + // Allocate device memory + HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); + HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); + // Create streams + hipStream_t streamA, streamB; + HIP_CHECK(hipStreamCreate(&streamA)); + HIP_CHECK(hipStreamCreate(&streamB)); + // Create events + hipEvent_t event, eventA, eventB; + HIP_CHECK(hipEventCreate(&event)); + HIP_CHECK(hipEventCreate(&eventA)); + HIP_CHECK(hipEventCreate(&eventB)); + for(unsigned int iteration = 0; iteration < numberOfIterations; iteration++) + { + // Stream 1: Host to Device 1 + HIP_CHECK(hipMemcpyAsync(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice, streamA)); + // Stream 2: Host to Device 2 + HIP_CHECK(hipMemcpyAsync(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice, streamB)); + // Stream 1: Kernel 1 + kernelA<<>>(d_dataA, arraySize); + // Record event after the GPU kernel in Stream 1 + HIP_CHECK(hipEventRecord(event, streamA)); + // Stream 2: Wait for event before starting Kernel 2 + HIP_CHECK(hipStreamWaitEvent(streamB, event, 0)); + // Stream 2: Kernel 2 + kernelB<<>>(d_dataA, d_dataB, arraySize); + // Stream 1: Device to Host 2 (after Kernel 1) + HIP_CHECK(hipMemcpyAsync(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost, streamA)); + // Stream 2: Device to Host 2 (after Kernel 2) + HIP_CHECK(hipMemcpyAsync(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost, streamB)); + // Wait for all operations in both streams to complete + HIP_CHECK(hipEventRecord(eventA, streamA)); + HIP_CHECK(hipEventRecord(eventB, streamB)); + HIP_CHECK(hipStreamWaitEvent(streamA, eventA, 0)); + HIP_CHECK(hipStreamWaitEvent(streamB, eventB, 0)); + } + // Verify results + double expectedA = (double)numberOfIterations; + double expectedB = initValueB + (3.0 * numberOfIterations) + (expectedA * (expectedA + 1.0)) / 2.0; + bool passed = true; + for(std::size_t i = 0; i < arraySize; ++i) + { + if(vectorA[i] != expectedA) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << std::endl; + break; + } + if(vectorB[i] != expectedB) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << std::endl; + break; + } + } + + if(passed) + { + std::cout << "Asynchronous execution with events completed successfully." << std::endl; + } + else + { + std::cerr << "Asynchronous execution with events failed." << std::endl; + } + + // Cleanup + HIP_CHECK(hipEventDestroy(event)); + HIP_CHECK(hipEventDestroy(eventA)); + HIP_CHECK(hipEventDestroy(eventB)); + HIP_CHECK(hipStreamDestroy(streamA)); + HIP_CHECK(hipStreamDestroy(streamB)); + HIP_CHECK(hipFree(d_dataA)); + HIP_CHECK(hipFree(d_dataB)); + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/explicit_copy.cpp b/docs/tools/example_codes/explicit_copy.cpp new file mode 100644 index 0000000000..49a40d6771 --- /dev/null +++ b/docs/tools/example_codes/explicit_copy.cpp @@ -0,0 +1,58 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include +#include + +int main() +{ + // [sphinx-start] + std::size_t elements = 1 << 20; + std::size_t size_bytes = elements * sizeof(int); + + // allocate host and device memory + int *host_pointer = new int[elements]; + int *device_input, *device_result; + HIP_CHECK(hipMalloc(&device_input, size_bytes)); + HIP_CHECK(hipMalloc(&device_result, size_bytes)); + + // copy from host to the device + HIP_CHECK(hipMemcpy(device_input, host_pointer, size_bytes, hipMemcpyHostToDevice)); + + // Use memory on the device, i.e. execute kernels + + // copy from device to host, to e.g. get results from the kernel + HIP_CHECK(hipMemcpy(host_pointer, device_result, size_bytes, hipMemcpyDeviceToHost)); + + // free memory when not needed any more + HIP_CHECK(hipFree(device_result)); + HIP_CHECK(hipFree(device_input)); + delete[] host_pointer; + // [sphinx-end] + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/explicit_memory.hip b/docs/tools/example_codes/explicit_memory.hip new file mode 100644 index 0000000000..975611afcb --- /dev/null +++ b/docs/tools/example_codes/explicit_memory.hip @@ -0,0 +1,79 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +int main() +{ + int a, b, c; + int *d_a, *d_b, *d_c; + + // Setup input values. + a = 1; + b = 2; + + // Allocate device copies of a, b and c. + HIP_CHECK(hipMalloc(&d_a, sizeof(*d_a))); + HIP_CHECK(hipMalloc(&d_b, sizeof(*d_b))); + HIP_CHECK(hipMalloc(&d_c, sizeof(*d_c))); + + // Copy input values to device. + HIP_CHECK(hipMemcpy(d_a, &a, sizeof(*d_a), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(d_b, &b, sizeof(*d_b), hipMemcpyHostToDevice)); + + // Launch add() kernel on GPU. + add<<<1, 1>>>(d_a, d_b, d_c); + + // Copy the result back to the host. + HIP_CHECK(hipMemcpy(&c, d_c, sizeof(*d_c), hipMemcpyDeviceToHost)); + + // Cleanup allocated memory. + HIP_CHECK(hipFree(d_a)); + HIP_CHECK(hipFree(d_b)); + HIP_CHECK(hipFree(d_c)); + + // Prints the result. + std::cout << a << " + " << b << " = " << c << std::endl; + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/extern_shared_memory.hip b/docs/tools/example_codes/extern_shared_memory.hip new file mode 100644 index 0000000000..9ba4283035 --- /dev/null +++ b/docs/tools/example_codes/extern_shared_memory.hip @@ -0,0 +1,53 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +extern __shared__ int shared_array[]; + +__global__ void kernel() +{ + // initialize shared memory + shared_array[threadIdx.x] = threadIdx.x; + // use shared memory - synchronize to make sure, that all threads of the + // block see all changes to shared memory + __syncthreads(); +} + +int main() +{ + //shared memory in this case depends on the configurable block size + constexpr int blockSize = 256; + constexpr int sharedMemSize = blockSize * sizeof(int); + constexpr int gridSize = 2; + + kernel<<>>(); + if(auto err = hipDeviceSynchronize(); err != hipSuccess) + std::cerr << "HIP error " << err << ": " << hipGetErrorString(err) << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] \ No newline at end of file diff --git a/docs/tools/example_codes/graph_capture.hip b/docs/tools/example_codes/graph_capture.hip new file mode 100644 index 0000000000..2c284be229 --- /dev/null +++ b/docs/tools/example_codes/graph_capture.hip @@ -0,0 +1,168 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +__global__ void kernelA(double* arrayA, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] *= 2.0; + } +} + +__global__ void kernelB(int* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayB[x] = 3; + } +} + +__global__ void kernelC(double* arrayA, const int* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] += arrayB[x]; + } +} + +struct set_vector_args +{ + std::vector& h_array; + double value; +}; + +void set_vector(void* args) +{ + set_vector_args h_args{*(reinterpret_cast(args))}; + + std::vector& vec{h_args.h_array}; + vec.assign(vec.size(), h_args.value); +} + +int main() +{ + constexpr int numOfBlocks = 1024; + constexpr int threadsPerBlock = 1024; + constexpr std::size_t arraySize = 1U << 20; + + // This example assumes that kernelA operates on data that needs to be initialized on + // and copied from the host, while kernelB initializes the array that is passed to it. + // Both arrays are then used as input to kernelC, where arrayA is also used as + // output, that is copied back to the host, while arrayB is only read from and not modified. + + double* d_arrayA; + int* d_arrayB; + std::vector h_array(arraySize); + constexpr double initValue = 2.0; + + hipStream_t captureStream; + HIP_CHECK(hipStreamCreate(&captureStream)); + + // Start capturing the operations assigned to the stream + HIP_CHECK(hipStreamBeginCapture(captureStream, hipStreamCaptureModeGlobal)); + + // hipMallocAsync and hipMemcpyAsync are needed, to be able to assign it to a stream + HIP_CHECK(hipMallocAsync(reinterpret_cast(&d_arrayA), arraySize*sizeof(double), captureStream)); + HIP_CHECK(hipMallocAsync(reinterpret_cast(&d_arrayB), arraySize*sizeof(int), captureStream)); + + // Assign host function to the stream + // Needs a custom struct to pass the arguments + set_vector_args args{h_array, initValue}; + HIP_CHECK(hipLaunchHostFunc(captureStream, set_vector, &args)); + + HIP_CHECK(hipMemcpyAsync(d_arrayA, h_array.data(), arraySize*sizeof(double), hipMemcpyHostToDevice, captureStream)); + + kernelA<<>>(d_arrayA, arraySize); + kernelB<<>>(d_arrayB, arraySize); + kernelC<<>>(d_arrayA, d_arrayB, arraySize); + + HIP_CHECK(hipMemcpyAsync(h_array.data(), d_arrayA, arraySize*sizeof(*d_arrayA), hipMemcpyDeviceToHost, captureStream)); + + HIP_CHECK(hipFreeAsync(d_arrayA, captureStream)); + HIP_CHECK(hipFreeAsync(d_arrayB, captureStream)); + + // Stop capturing + hipGraph_t graph; + HIP_CHECK(hipStreamEndCapture(captureStream, &graph)); + + // Create an executable graph from the captured graph + hipGraphExec_t graphExec; + HIP_CHECK(hipGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0)); + + // The graph template can be deleted after the instantiation if it's not needed for later use + HIP_CHECK(hipGraphDestroy(graph)); + + // Actually launch the graph. The stream does not have + // to be the same as the one used for capturing. + HIP_CHECK(hipGraphLaunch(graphExec, captureStream)); + + HIP_CHECK(hipStreamSynchronize(captureStream)); + + // Verify results + constexpr double expected = initValue * 2.0 + 3; + bool passed = true; + for(std::size_t i = 0; i < arraySize; ++i) + { + if(h_array[i] != expected) + { + passed = false; + std::cerr << "Validation failed! Expected " << expected << " got " << h_array[0] << std::endl; + break; + } + } + + if(passed) + { + std::cerr << "Validation passed." << std::endl; + } + + // Free graph and stream resources after usage + HIP_CHECK(hipGraphExecDestroy(graphExec)); + HIP_CHECK(hipStreamDestroy(captureStream)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/graph_creation.hip b/docs/tools/example_codes/graph_creation.hip new file mode 100644 index 0000000000..bbc6deac26 --- /dev/null +++ b/docs/tools/example_codes/graph_creation.hip @@ -0,0 +1,226 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +__global__ void kernelA(double* arrayA, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] *= 2.0; + } +} + +__global__ void kernelB(int* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayB[x] = 3; + } +} + +__global__ void kernelC(double* arrayA, const int* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] += arrayB[x]; + } +} + +struct set_vector_args +{ + std::vector& h_array; + double value; +}; + +void set_vector(void* args) +{ + set_vector_args h_args{*(reinterpret_cast(args))}; + + std::vector& vec{h_args.h_array}; + vec.assign(vec.size(), h_args.value); +} + +int main() +{ + constexpr int numOfBlocks = 1024; + constexpr int threadsPerBlock = 1024; + std::size_t arraySize = 1U << 20; + + // The pointers to the device memory don't need to be declared here, + // they are contained within the hipMemAllocNodeParams as the dptr member + std::vector h_array(arraySize); + constexpr double initValue = 2.0; + + // Create graph an empty graph + hipGraph_t graph; + HIP_CHECK(hipGraphCreate(&graph, 0)); + + // Parameters to allocate arrays + hipMemAllocNodeParams allocArrayAParams{}; + allocArrayAParams.poolProps.allocType = hipMemAllocationTypePinned; + allocArrayAParams.poolProps.location.type = hipMemLocationTypeDevice; + allocArrayAParams.poolProps.location.id = 0; // GPU on which memory resides + allocArrayAParams.bytesize = arraySize * sizeof(double); + + hipMemAllocNodeParams allocArrayBParams{}; + allocArrayBParams.poolProps.allocType = hipMemAllocationTypePinned; + allocArrayBParams.poolProps.location.type = hipMemLocationTypeDevice; + allocArrayBParams.poolProps.location.id = 0; // GPU on which memory resides + allocArrayBParams.bytesize = arraySize * sizeof(int); + + // Add the allocation nodes to the graph. They don't have any dependencies + hipGraphNode_t allocNodeA, allocNodeB; + HIP_CHECK(hipGraphAddMemAllocNode(&allocNodeA, graph, nullptr, 0, &allocArrayAParams)); + HIP_CHECK(hipGraphAddMemAllocNode(&allocNodeB, graph, nullptr, 0, &allocArrayBParams)); + + // Parameters for the host function + // Needs custom struct to pass the arguments + set_vector_args args{h_array, initValue}; + hipHostNodeParams hostParams{}; + hostParams.fn = set_vector; + hostParams.userData = static_cast(&args); + + // Add the host node that initializes the host array. It also doesn't have any dependencies + hipGraphNode_t hostNode; + HIP_CHECK(hipGraphAddHostNode(&hostNode, graph, nullptr, 0, &hostParams)); + + // Add memory copy node, that copies the initialized host array to the device. + // It has to wait for the host array to be initialized and the device memory to be allocated + hipGraphNode_t cpyNodeDependencies[] = {allocNodeA, hostNode}; + hipGraphNode_t cpyToDevNode; + HIP_CHECK(hipGraphAddMemcpyNode1D(&cpyToDevNode, graph, cpyNodeDependencies, 2, allocArrayAParams.dptr, h_array.data(), arraySize * sizeof(double), hipMemcpyHostToDevice)); + + // Parameters for kernelA + hipKernelNodeParams kernelAParams; + void* kernelAArgs[] = {&allocArrayAParams.dptr, static_cast(&arraySize)}; + kernelAParams.func = reinterpret_cast(kernelA); + kernelAParams.gridDim = numOfBlocks; + kernelAParams.blockDim = threadsPerBlock; + kernelAParams.sharedMemBytes = 0; + kernelAParams.kernelParams = kernelAArgs; + kernelAParams.extra = nullptr; + + // Add the node for kernelA. It has to wait for the memory copy to finish, as it depends on the values from the host array. + hipGraphNode_t kernelANode; + HIP_CHECK(hipGraphAddKernelNode(&kernelANode, graph, &cpyToDevNode, 1, &kernelAParams)); + + // Parameters for kernelB + hipKernelNodeParams kernelBParams; + void* kernelBArgs[] = {&allocArrayBParams.dptr, static_cast(&arraySize)}; + kernelBParams.func = reinterpret_cast(kernelB); + kernelBParams.gridDim = numOfBlocks; + kernelBParams.blockDim = threadsPerBlock; + kernelBParams.sharedMemBytes = 0; + kernelBParams.kernelParams = kernelBArgs; + kernelBParams.extra = nullptr; + + // Add the node for kernelB. It only has to wait for the memory to be allocated, as it initializes the array. + hipGraphNode_t kernelBNode; + HIP_CHECK(hipGraphAddKernelNode(&kernelBNode, graph, &allocNodeB, 1, &kernelBParams)); + + // Parameters for kernelC + hipKernelNodeParams kernelCParams; + void* kernelCArgs[] = {&allocArrayAParams.dptr, &allocArrayBParams.dptr, static_cast(&arraySize)}; + kernelCParams.func = reinterpret_cast(kernelC); + kernelCParams.gridDim = numOfBlocks; + kernelCParams.blockDim = threadsPerBlock; + kernelCParams.sharedMemBytes = 0; + kernelCParams.kernelParams = kernelCArgs; + kernelCParams.extra = nullptr; + + // Add the node for kernelC. It has to wait on both kernelA and kernelB to finish, as it depends on their results. + hipGraphNode_t kernelCNode; + hipGraphNode_t kernelCDependencies[] = {kernelANode, kernelBNode}; + HIP_CHECK(hipGraphAddKernelNode(&kernelCNode, graph, kernelCDependencies, 2, &kernelCParams)); + + // Copy the results back to the host. Has to wait for kernelC to finish. + hipGraphNode_t cpyToHostNode; + HIP_CHECK(hipGraphAddMemcpyNode1D(&cpyToHostNode, graph, &kernelCNode, 1, h_array.data(), allocArrayAParams.dptr, arraySize * sizeof(double), hipMemcpyDeviceToHost)); + + // Free array of allocNodeA. It needs to wait for the copy to finish, as kernelC stores its results in it. + hipGraphNode_t freeNodeA; + HIP_CHECK(hipGraphAddMemFreeNode(&freeNodeA, graph, &cpyToHostNode, 1, allocArrayAParams.dptr)); + // Free array of allocNodeB. It only needs to wait for kernelC to finish, as it is not written back to the host. + hipGraphNode_t freeNodeB; + HIP_CHECK(hipGraphAddMemFreeNode(&freeNodeB, graph, &kernelCNode, 1, allocArrayBParams.dptr)); + + // Instantiate the graph in order to execute it + hipGraphExec_t graphExec; + HIP_CHECK(hipGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0)); + + // The graph can be freed after the instantiation if it's not needed for other purposes + HIP_CHECK(hipGraphDestroy(graph)); + + // Actually launch the graph + hipStream_t graphStream; + HIP_CHECK(hipStreamCreate(&graphStream)); + HIP_CHECK(hipGraphLaunch(graphExec, graphStream)); + + HIP_CHECK(hipStreamSynchronize(graphStream)); + + // Verify results + constexpr double expected = initValue * 2.0 + 3; + bool passed = true; + for(std::size_t i = 0; i < arraySize; ++i) + { + if(h_array[i] != expected) + { + passed = false; + std::cerr << "Validation failed! Expected " << expected << " got " << h_array[0] << std::endl; + break; + } + } + + if(passed) + { + std::cerr << "Validation passed." << std::endl; + } + + HIP_CHECK(hipGraphExecDestroy(graphExec)); + HIP_CHECK(hipStreamDestroy(graphStream)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/host_code_feature_identification.cpp b/docs/tools/example_codes/host_code_feature_identification.cpp new file mode 100644 index 0000000000..6a8377fd4a --- /dev/null +++ b/docs/tools/example_codes/host_code_feature_identification.cpp @@ -0,0 +1,59 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + + int device = 0; // Query first available GPU. Can be replaced with any + // integer up to, not including, deviceCount + hipDeviceProp_t deviceProp; + HIP_CHECK(hipGetDeviceProperties(&deviceProp, device)); + + std::cout << "The queried device "; + if (deviceProp.arch.hasSharedInt32Atomics) // portable HIP feature query + std::cout << "supports"; + else + std::cout << "does not support"; + std::cout << " shared int32 atomic operations" << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/host_code_feature_identification.hip b/docs/tools/example_codes/host_code_feature_identification.hip new file mode 100644 index 0000000000..6a8377fd4a --- /dev/null +++ b/docs/tools/example_codes/host_code_feature_identification.hip @@ -0,0 +1,59 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + + int device = 0; // Query first available GPU. Can be replaced with any + // integer up to, not including, deviceCount + hipDeviceProp_t deviceProp; + HIP_CHECK(hipGetDeviceProperties(&deviceProp, device)); + + std::cout << "The queried device "; + if (deviceProp.arch.hasSharedInt32Atomics) // portable HIP feature query + std::cout << "supports"; + else + std::cout << "does not support"; + std::cout << " shared int32 atomic operations" << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/identifying_compilation_target_platform.cpp b/docs/tools/example_codes/identifying_compilation_target_platform.cpp new file mode 100644 index 0000000000..26e223ad78 --- /dev/null +++ b/docs/tools/example_codes/identifying_compilation_target_platform.cpp @@ -0,0 +1,48 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include + +int main() +{ + // [sphinx-amd-start] +#ifdef __HIP_PLATFORM_AMD__ + // This code path is compiled when amdclang++ is used for compilation +#endif + // [sphinx-amd-end] + + // [sphinx-nvidia-start] +#ifdef __HIP_PLATFORM_NVIDIA__ + // This code path is compiled when nvcc is used for compilation + // Could be compiling with CUDA language extensions enabled (for example, a ".cu file) + // Could be in pass-through mode to an underlying host compiler (for example, a .cpp file) +#endif + // [sphinx-nvidia-end] + +#if !defined(__HIP_PLATFORM_AMD__) && !defined(__HIP_PLATFORM_NVIDIA__) +# error "No compatible HIP platform defined!" +#endif + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/identifying_host_device_compilation_pass.hip b/docs/tools/example_codes/identifying_host_device_compilation_pass.hip new file mode 100644 index 0000000000..71d9ca0153 --- /dev/null +++ b/docs/tools/example_codes/identifying_host_device_compilation_pass.hip @@ -0,0 +1,52 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +__host__ __device__ void call_func() +{ + #ifdef __HIP_DEVICE_COMPILE__ + printf("device\n"); + #else + std::cout << "host" << std::endl; + #endif +} + +__global__ void test_kernel() +{ + call_func(); +} + +int main() +{ + test_kernel<<<1, 1, 0, 0>>>(); + if(auto err = hipDeviceSynchronize(); err != hipSuccess) + std::cerr << "HIP error " << err << ": " << hipGetErrorString(err) << std::endl; + + call_func(); + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/kernel_memory_allocation.hip b/docs/tools/example_codes/kernel_memory_allocation.hip new file mode 100644 index 0000000000..3a14d7822b --- /dev/null +++ b/docs/tools/example_codes/kernel_memory_allocation.hip @@ -0,0 +1,75 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include +#include + +// [sphinx-kernel-start] +__global__ void kernel_memory_allocation() +{ + // The pointer is stored in shared memory, so that all + // threads of the block can access the pointer + __shared__ int *memory; + + std::size_t blockSize = blockDim.x; + constexpr std::size_t elementsPerThread = 1024; + if(threadIdx.x == 0) + { + // allocate memory in one contiguous block + memory = new int[blockDim.x * elementsPerThread]; + } + __syncthreads(); + + // load pointer into thread-local variable to avoid + // unnecessary accesses to shared memory + int *localPtr = memory; + + // work with allocated memory, e.g. initialization + for(int i = 0; i < elementsPerThread; ++i) + { + // access in a contiguous way + localPtr[i * blockSize + threadIdx.x] = i; + } + + // synchronize to make sure no thread is accessing the memory before freeing + __syncthreads(); + if(threadIdx.x == 0) + { + delete[] memory; + } +} +// [sphinx-kernel-end] + +int main() +{ + kernel_memory_allocation<<<64, 1024>>>(); + HIP_CHECK(hipGetLastError()); + + std::cout << "Success!" << std::endl; + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/launch_bounds.hip b/docs/tools/example_codes/launch_bounds.hip new file mode 100644 index 0000000000..f5c7241809 --- /dev/null +++ b/docs/tools/example_codes/launch_bounds.hip @@ -0,0 +1,91 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Performs a simple initialization of an array with the thread's index variables. +// This function is only available in device code. +__device__ void init_array(float * const a, const unsigned int arraySize) +{ + // globalIdx uniquely identifies a thread in a 1D launch configuration. + const int globalIdx = threadIdx.x + blockIdx.x * blockDim.x; + // Each thread initializes a single element of the array. + if(globalIdx < arraySize) + { + a[globalIdx] = globalIdx; + } +} + +// Rounds a value up to the next multiple. +// This function is available in host and device code. +__host__ __device__ constexpr int round_up_to_nearest_multiple(int number, int multiple) +{ + return (number + multiple - 1)/multiple; +} + +__global__ +__launch_bounds__(512, 4) // This kernel requires at most 512 threads per block and at least 4 warps per execution unit. +void example_kernel(float * const a, const unsigned int N) +{ + // Initialize array. + init_array(a, N); + // Perform additional work: + // - work with the array + // - use the array in a different kernel + // - ... +} + +int main() +{ + constexpr int N = 100000000; // problem size + constexpr int blockSize = 256; //configurable block size + + //needed number of blocks for the given problem size + constexpr int gridSize = round_up_to_nearest_multiple(N, blockSize); + + float *a; + // allocate memory on the GPU + HIP_CHECK(hipMalloc(&a, sizeof(*a) * N)); + + std::cout << "Launching kernel." << std::endl; + example_kernel<<>>(a, N); + // make sure kernel execution is finished by synchronizing. The CPU can also + // execute other instructions during that time + HIP_CHECK(hipDeviceSynchronize()); + std::cout << "Kernel execution finished." << std::endl; + + HIP_CHECK(hipFree(a)); +} +// [sphinx-end] diff --git a/docs/tools/example_codes/linker_apis.cpp b/docs/tools/example_codes/linker_apis.cpp new file mode 100644 index 0000000000..9ef10150be --- /dev/null +++ b/docs/tools/example_codes/linker_apis.cpp @@ -0,0 +1,200 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#include +#include +#include +#include +#include + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +// source code for hiprtc +static constexpr auto kernel_source{ + R"( + extern "C" + __global__ void vector_add(float* output, float* input1, float* input2, size_t size) + { + int i = threadIdx.x; + if (i < size) + { + output[i] = input1[i] + input2[i]; + } + } +)"}; + +int main() +{ + hiprtcProgram prog; + auto rtc_ret_code = hiprtcCreateProgram(&prog, // HIPRTC program handle + kernel_source, // kernel source string + "vector_add.cpp", // Name of the file + 0, // Number of headers + nullptr, // Header sources + nullptr); // Name of header file + + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + // [sphinx-options-start] + auto sarg = std::string{"-fgpu-rdc"}; + const char* compile_options[] = {sarg.c_str()}; + + rtc_ret_code = hiprtcCompileProgram(prog, // hiprtcProgram + 1, // Number of options + compile_options); + // [sphinx-options-end] + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + std::size_t logSize; + HIPRTC_CHECK(hiprtcGetProgramLogSize(prog, &logSize)); + + if (logSize) + { + std::string log(logSize, '\0'); + HIPRTC_CHECK(hiprtcGetProgramLog(prog, &log[0])); + std::cerr << "Compilation failed or produced warnings: " << log << std::endl; + std::abort(); + } + + // [sphinx-bitcode-start] + std::size_t bitCodeSize; + HIPRTC_CHECK(hiprtcGetBitcodeSize(prog, &bitCodeSize)); + + std::vector kernel_bitcode(bitCodeSize); + HIPRTC_CHECK(hiprtcGetBitcode(prog, kernel_bitcode.data())); + // [sphinx-bitcode-end] + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + auto num_options = 0u; + hiprtcJIT_option* options = nullptr; + void* option_vals[] = {nullptr}; + auto rtc_link_state = hiprtcLinkState{}; + // [sphinx-link-create-start] + HIPRTC_CHECK(hiprtcLinkCreate(num_options, // number of options + options, // Array of options + option_vals, // Array of option values cast to void* + &rtc_link_state)); // HIPRTC link state created upon success + // [sphinx-link-create-end] + + auto input_type = HIPRTC_JIT_INPUT_LLVM_BITCODE; + auto bit_code_ptr = kernel_bitcode.data(); + auto bit_code_size = bitCodeSize; + // [sphinx-link-add-start] + HIPRTC_CHECK(hiprtcLinkAddData(rtc_link_state, // HIPRTC link state + input_type, // type of the input data or bitcode + bit_code_ptr, // input data which is null terminated + bit_code_size, // size of the input data + "a", // optional name for this input + 0, // size of the options + nullptr, // Array of options applied to this input + nullptr)); // Array of option values cast to void* + // [sphinx-link-add-end] + + void* binary = nullptr; + auto binarySize = std::size_t{}; + // [sphinx-link-complete-start] + HIPRTC_CHECK(hiprtcLinkComplete(rtc_link_state, // HIPRTC link state + &binary, // upon success, points to the output binary + &binarySize)); // size of the binary is stored (optional) + // [sphinx-link-complete-end] + + hipModule_t module; + hipFunction_t kernel; + + HIP_CHECK(hipModuleLoadData(&module, binary)); + HIP_CHECK(hipModuleGetFunction(&kernel, module, "vector_add")); + + HIPRTC_CHECK(hiprtcLinkDestroy(rtc_link_state)); + + constexpr std::size_t ele_size = 256; // total number of items to add + std::vector hinput, output; + hinput.reserve(ele_size); + output.reserve(ele_size); + for (std::size_t i = 0; i < ele_size; i++) + { + hinput.push_back(static_cast(i + 1)); + output.push_back(0.0f); + } + + float *dinput1, *dinput2, *doutput; + HIP_CHECK(hipMalloc(&dinput1, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&dinput2, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&doutput, sizeof(float) * ele_size)); + + HIP_CHECK(hipMemcpy(dinput1, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(dinput2, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + + struct + { + float* output; + float* input1; + float* input2; + std::size_t size; + } args{doutput, dinput1, dinput2, ele_size}; + + auto size = sizeof(args); + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &args, HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, + HIP_LAUNCH_PARAM_END}; + + HIP_CHECK(hipModuleLaunchKernel(kernel, 1, 1, 1, ele_size, 1, 1, 0, nullptr, nullptr, config)); + + HIP_CHECK(hipMemcpy(output.data(), doutput, sizeof(float) * ele_size, hipMemcpyDeviceToHost)); + + for (std::size_t i = 0; i < ele_size; i++) + { + if ((hinput[i] + hinput[i]) != output[i]) + { + std::cout << "Failed in validation: " << (hinput[i] + hinput[i]) << " - " << output[i] << std::endl; + std::abort(); + } + } + std::cout << "Passed" << std::endl; + + HIP_CHECK(hipFree(dinput1)); + HIP_CHECK(hipFree(dinput2)); + HIP_CHECK(hipFree(doutput)); + + return EXIT_SUCCESS; +} +// [sphinx-stop] diff --git a/docs/tools/example_codes/linker_apis_file.cpp b/docs/tools/example_codes/linker_apis_file.cpp new file mode 100644 index 0000000000..c65b7d1eab --- /dev/null +++ b/docs/tools/example_codes/linker_apis_file.cpp @@ -0,0 +1,219 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#if __has_include() + #include + namespace fs = std::filesystem; +#elif __has_include() + #include + namespace fs = std::experimental::filesystem; +#else + static_assert(false, "filesystem not available"); +#endif + + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +// source code for hiprtc +static constexpr auto kernel_source{ + R"( + extern "C" + __global__ void vector_add(float* output, float* input1, float* input2, size_t size) + { + int i = threadIdx.x; + if (i < size) + { + output[i] = input1[i] + input2[i]; + } + } +)"}; + +int main() +{ + hiprtcProgram prog; + auto rtc_ret_code = hiprtcCreateProgram(&prog, // HIPRTC program handle + kernel_source, // kernel source string + "vector_add.cpp", // Name of the file + 0, // Number of headers + nullptr, // Header sources + nullptr); // Name of header file + + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + // [sphinx-options-start] + auto sarg = std::string{"-fgpu-rdc"}; + const char* compile_options[] = {sarg.c_str()}; + + rtc_ret_code = hiprtcCompileProgram(prog, // hiprtcProgram + 1, // Number of options + compile_options); + // [sphinx-options-end] + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + std::size_t logSize; + HIPRTC_CHECK(hiprtcGetProgramLogSize(prog, &logSize)); + + if (logSize) + { + std::string log(logSize, '\0'); + HIPRTC_CHECK(hiprtcGetProgramLog(prog, &log[0])); + std::cerr << "Compilation failed or produced warnings: " << log << std::endl; + std::abort(); + } + + // [sphinx-bitcode-start] + std::size_t bitCodeSize; + HIPRTC_CHECK(hiprtcGetBitcodeSize(prog, &bitCodeSize)); + + std::vector kernel_bitcode(bitCodeSize); + HIPRTC_CHECK(hiprtcGetBitcode(prog, kernel_bitcode.data())); + // [sphinx-bitcode-end] + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + auto num_options = 0u; + hiprtcJIT_option* options = nullptr; + void* option_vals[] = {nullptr}; + auto rtc_link_state = hiprtcLinkState{}; + // [sphinx-link-create-start] + HIPRTC_CHECK(hiprtcLinkCreate(num_options, // number of options + options, // Array of options + option_vals, // Array of option values cast to void* + &rtc_link_state)); // HIPRTC link state created upon success + // [sphinx-link-create-end] + + auto input_type = HIPRTC_JIT_INPUT_LLVM_BITCODE; + auto bc_file_path = std::string{"bitcode.bc"}; + auto bc_file = std::fstream{bc_file_path.c_str(), std::ios::binary | std::ios::out}; + if(!bc_file.is_open()) + { + std::cerr << "Could not open bitcode file for writing!" << std::endl; + std::abort(); + } + bc_file.write(kernel_bitcode.data(), bitCodeSize); + bc_file.close(); + // [sphinx-link-add-start] + HIPRTC_CHECK(hiprtcLinkAddFile(rtc_link_state, // HIPRTC link state + input_type, // type of the input data or bitcode + bc_file_path.c_str(), // input data which is null terminated + 0, // size of the options + nullptr, // Array of options applied to this input + nullptr)); // Array of option values cast to void* + // [sphinx-link-add-end] + fs::remove(bc_file_path); + + void* binary = nullptr; + auto binarySize = std::size_t{}; + // [sphinx-link-complete-start] + HIPRTC_CHECK(hiprtcLinkComplete(rtc_link_state, // HIPRTC link state + &binary, // upon success, points to the output binary + &binarySize)); // size of the binary is stored (optional) + // [sphinx-link-complete-end] + + hipModule_t module; + hipFunction_t kernel; + + HIP_CHECK(hipModuleLoadData(&module, binary)); + HIP_CHECK(hipModuleGetFunction(&kernel, module, "vector_add")); + + HIPRTC_CHECK(hiprtcLinkDestroy(rtc_link_state)); + + constexpr std::size_t ele_size = 256; // total number of items to add + std::vector hinput, output; + hinput.reserve(ele_size); + output.reserve(ele_size); + for (std::size_t i = 0; i < ele_size; i++) + { + hinput.push_back(static_cast(i + 1)); + output.push_back(0.0f); + } + + float *dinput1, *dinput2, *doutput; + HIP_CHECK(hipMalloc(&dinput1, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&dinput2, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&doutput, sizeof(float) * ele_size)); + + HIP_CHECK(hipMemcpy(dinput1, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(dinput2, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + + struct + { + float* output; + float* input1; + float* input2; + std::size_t size; + } args{doutput, dinput1, dinput2, ele_size}; + + auto size = sizeof(args); + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &args, HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, + HIP_LAUNCH_PARAM_END}; + + HIP_CHECK(hipModuleLaunchKernel(kernel, 1, 1, 1, ele_size, 1, 1, 0, nullptr, nullptr, config)); + + HIP_CHECK(hipMemcpy(output.data(), doutput, sizeof(float) * ele_size, hipMemcpyDeviceToHost)); + + for (std::size_t i = 0; i < ele_size; i++) + { + if ((hinput[i] + hinput[i]) != output[i]) + { + std::cout << "Failed in validation: " << (hinput[i] + hinput[i]) << " - " << output[i] << std::endl; + std::abort(); + } + } + std::cout << "Passed" << std::endl; + + HIP_CHECK(hipFree(dinput1)); + HIP_CHECK(hipFree(dinput2)); + HIP_CHECK(hipFree(doutput)); + + return EXIT_SUCCESS; +} +// [sphinx-stop] diff --git a/docs/tools/example_codes/linker_apis_options.cpp b/docs/tools/example_codes/linker_apis_options.cpp new file mode 100644 index 0000000000..6b219b9acf --- /dev/null +++ b/docs/tools/example_codes/linker_apis_options.cpp @@ -0,0 +1,200 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#include +#include +#include +#include +#include + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +// source code for hiprtc +static constexpr auto kernel_source{ + R"( + extern "C" + __global__ void vector_add(float* output, float* input1, float* input2, size_t size) + { + int i = threadIdx.x; + if (i < size) + { + output[i] = input1[i] + input2[i]; + } + } +)"}; + +int main() +{ + hiprtcProgram prog; + auto rtc_ret_code = hiprtcCreateProgram(&prog, // HIPRTC program handle + kernel_source, // kernel source string + "vector_add.cpp", // Name of the file + 0, // Number of headers + nullptr, // Header sources + nullptr); // Name of header file + + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + // [sphinx-options-start] + auto sarg = std::string{"-fgpu-rdc"}; + const char* compile_options[] = {sarg.c_str()}; + + rtc_ret_code = hiprtcCompileProgram(prog, // hiprtcProgram + 1, // Number of options + compile_options); + // [sphinx-options-end] + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + std::size_t logSize; + HIPRTC_CHECK(hiprtcGetProgramLogSize(prog, &logSize)); + + if (logSize) + { + std::string log(logSize, '\0'); + HIPRTC_CHECK(hiprtcGetProgramLog(prog, &log[0])); + std::cerr << "Compilation failed or produced warnings: " << log << std::endl; + std::abort(); + } + + // [sphinx-bitcode-start] + std::size_t bitCodeSize; + HIPRTC_CHECK(hiprtcGetBitcodeSize(prog, &bitCodeSize)); + + std::vector kernel_bitcode(bitCodeSize); + HIPRTC_CHECK(hiprtcGetBitcode(prog, kernel_bitcode.data())); + // [sphinx-bitcode-end] + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + // [sphinx-link-create-start] + const char* isaopts[] = {"-mllvm", "-inline-threshold=1", "-mllvm", "-inlinehint-threshold=1"}; + std::vector jit_options = {HIPRTC_JIT_IR_TO_ISA_OPT_EXT, + HIPRTC_JIT_IR_TO_ISA_OPT_COUNT_EXT}; + std::size_t isaoptssize = 4; + void* lopts[] = {reinterpret_cast(isaopts), + reinterpret_cast(isaoptssize)}; + hiprtcLinkState linkstate; + HIPRTC_CHECK(hiprtcLinkCreate(2u, jit_options.data(), reinterpret_cast(lopts), &linkstate)); + // [sphinx-link-create-end] + + auto input_type = HIPRTC_JIT_INPUT_LLVM_BITCODE; + auto bit_code_ptr = kernel_bitcode.data(); + auto bit_code_size = bitCodeSize; + // [sphinx-link-add-start] + HIPRTC_CHECK(hiprtcLinkAddData(linkstate, // HIPRTC link state + input_type, // type of the input data or bitcode + bit_code_ptr, // input data which is null terminated + bit_code_size, // size of the input data + "a", // optional name for this input + 0, // size of the options + nullptr, // Array of options applied to this input + nullptr)); // Array of option values cast to void* + // [sphinx-link-add-end] + + void* binary = nullptr; + auto binarySize = std::size_t{}; + // [sphinx-link-complete-start] + HIPRTC_CHECK(hiprtcLinkComplete(linkstate, // HIPRTC link state + &binary, // upon success, points to the output binary + &binarySize)); // size of the binary is stored (optional) + // [sphinx-link-complete-end] + + hipModule_t module; + hipFunction_t kernel; + + HIP_CHECK(hipModuleLoadData(&module, binary)); + HIP_CHECK(hipModuleGetFunction(&kernel, module, "vector_add")); + + HIPRTC_CHECK(hiprtcLinkDestroy(linkstate)); + + constexpr std::size_t ele_size = 256; // total number of items to add + std::vector hinput, output; + hinput.reserve(ele_size); + output.reserve(ele_size); + for (std::size_t i = 0; i < ele_size; i++) + { + hinput.push_back(static_cast(i + 1)); + output.push_back(0.0f); + } + + float *dinput1, *dinput2, *doutput; + HIP_CHECK(hipMalloc(&dinput1, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&dinput2, sizeof(float) * ele_size)); + HIP_CHECK(hipMalloc(&doutput, sizeof(float) * ele_size)); + + HIP_CHECK(hipMemcpy(dinput1, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(dinput2, hinput.data(), sizeof(float) * ele_size, hipMemcpyHostToDevice)); + + struct + { + float* output; + float* input1; + float* input2; + std::size_t size; + } args{doutput, dinput1, dinput2, ele_size}; + + auto size = sizeof(args); + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &args, HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, + HIP_LAUNCH_PARAM_END}; + + HIP_CHECK(hipModuleLaunchKernel(kernel, 1, 1, 1, ele_size, 1, 1, 0, nullptr, nullptr, config)); + + HIP_CHECK(hipMemcpy(output.data(), doutput, sizeof(float) * ele_size, hipMemcpyDeviceToHost)); + + for (std::size_t i = 0; i < ele_size; i++) + { + if ((hinput[i] + hinput[i]) != output[i]) + { + std::cout << "Failed in validation: " << (hinput[i] + hinput[i]) << " - " << output[i] << std::endl; + std::abort(); + } + } + std::cout << "Passed" << std::endl; + + HIP_CHECK(hipFree(dinput1)); + HIP_CHECK(hipFree(dinput2)); + HIP_CHECK(hipFree(doutput)); + + return EXIT_SUCCESS; +} +// [sphinx-stop] diff --git a/docs/tools/example_codes/load_module.cpp b/docs/tools/example_codes/load_module.cpp new file mode 100644 index 0000000000..fa42071d8a --- /dev/null +++ b/docs/tools/example_codes/load_module.cpp @@ -0,0 +1,107 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + std::size_t elements = 64*1024; + std::size_t size_bytes = elements * sizeof(float); + + std::vector A(elements), B(elements); + + // On NVIDIA platforms the driver runtime needs to be initiated + #ifdef __HIP_PLATFORM_NVIDIA__ + hipInit(0); + hipDevice_t device; + hipCtx_t context; + HIP_CHECK(hipDeviceGet(&device, 0)); + HIP_CHECK(hipCtxCreate(&context, 0, device)); + #endif + + // Allocate device memory + hipDeviceptr_t d_A, d_B; + HIP_CHECK(hipMalloc(reinterpret_cast(&d_A), size_bytes)); + HIP_CHECK(hipMalloc(reinterpret_cast(&d_B), size_bytes)); + + // Copy data to device + HIP_CHECK(hipMemcpyHtoD(d_A, A.data(), size_bytes)); + HIP_CHECK(hipMemcpyHtoD(d_B, B.data(), size_bytes)); + + // Load module + hipModule_t Module; + // For AMD the module file has to contain architecture specific object code + // For NVIDIA the module file has to contain PTX, found in e.g. "vcpy_isa.ptx" + #ifdef __HIP_PLATFORM_AMD__ + HIP_CHECK(hipModuleLoad(&Module, "vcpy_isa.hsaco")); + #elif defined(__HIP_PLATFORM_NVIDIA__) + HIP_CHECK(hipModuleLoad(&Module, "vcpy_isa.ptx")); + #endif + // Get kernel function from the module via its name + hipFunction_t Function; + HIP_CHECK(hipModuleGetFunction(&Function, Module, "hello_world")); + + // Create buffer for kernel arguments + std::vector argBuffer{reinterpret_cast(d_A), reinterpret_cast(d_B)}; + std::size_t arg_size_bytes = argBuffer.size() * sizeof(void*); + + // Create configuration passed to the kernel as arguments + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, argBuffer.data(), + HIP_LAUNCH_PARAM_BUFFER_SIZE, &arg_size_bytes, + HIP_LAUNCH_PARAM_END}; + + int threads_per_block = 128; + int blocks = (elements + threads_per_block - 1) / threads_per_block; + + // Actually launch kernel + HIP_CHECK(hipModuleLaunchKernel(Function, blocks, 1, 1, threads_per_block, 1, 1, 0, 0, NULL, config)); + + HIP_CHECK(hipMemcpyDtoH(A.data(), d_A, elements)); + HIP_CHECK(hipMemcpyDtoH(B.data(), d_B, elements)); + + HIP_CHECK(hipFree(reinterpret_cast(d_A))); + HIP_CHECK(hipFree(reinterpret_cast(d_B))); + + #ifdef __HIP_PLATFORM_NVIDIA__ + HIP_CHECK(hipCtxDestroy(context)); + #endif + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/load_module_ex.cpp b/docs/tools/example_codes/load_module_ex.cpp new file mode 100644 index 0000000000..07e7955d84 --- /dev/null +++ b/docs/tools/example_codes/load_module_ex.cpp @@ -0,0 +1,145 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include +#include +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +void* populate_data_pointer() +{ +#ifdef __HIP_PLATFORM_AMD__ + auto filename = std::string{"myKernel.hsaco"}; +#elif defined(__HIP_PLATFORM_NVIDIA__) + auto filename = std::string{"myKernel.ptx"}; +#endif + std::fstream file{filename, std::ios::in | std::ios::binary | std::ios::ate}; + if(!file.is_open()) + { + std::cerr << "Error opening file " << filename << std::endl; + std::exit(EXIT_FAILURE); + } + + auto filesize = file.tellg(); + auto storage = new char[filesize]; + + file.seekg(0, std::ios::beg); + file.read(storage, filesize); + + return storage; +} + +int main() +{ + std::size_t elements = 64*1024; + std::size_t size_bytes = elements * sizeof(float); + + std::vector A(elements), B(elements); + + // On NVIDIA platforms the driver runtime needs to be initiated + #ifdef __HIP_PLATFORM_NVIDIA__ + HIP_CHECK(hipInit(0)); + hipDevice_t device; + hipCtx_t context; + HIP_CHECK(hipDeviceGet(&device, 0)); + HIP_CHECK(hipCtxCreate(&context, 0, device)); + #endif + + // Allocate device memory + hipDeviceptr_t d_A, d_B; + HIP_CHECK(hipMalloc(reinterpret_cast(&d_A), size_bytes)); + HIP_CHECK(hipMalloc(reinterpret_cast(&d_B), size_bytes)); + + // Copy data to device + HIP_CHECK(hipMemcpyHtoD(d_A, A.data(), size_bytes)); + HIP_CHECK(hipMemcpyHtoD(d_B, B.data(), size_bytes)); + + // Load module + + // For AMD the module file has to contain architecture specific object code + // For NVIDIA the module file has to contain PTX, found in e.g. "myKernel.ptx" + // [sphinx-start] + hipModule_t module; + void* imagePtr = populate_data_pointer(); + + const int numOptions = 1; + hipJitOption options[numOptions]; + void *optionValues[numOptions]; + + options[0] = hipJitOptionMaxRegisters; + unsigned maxRegs = 15; + optionValues[0] = static_cast(&maxRegs); + + // hipModuleLoadData(module, imagePtr) will be called on HIP-Clang path, JIT options will not be used, and + // cuModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues) will be called on NVCC path + HIP_CHECK(hipModuleLoadDataEx(&module, imagePtr, numOptions, options, optionValues)); + + // Get kernel function from the module via its name + hipFunction_t k; + HIP_CHECK(hipModuleGetFunction(&k, module, "myKernel")); + // [sphinx-end] + + // Create buffer for kernel arguments + std::vector argBuffer{reinterpret_cast(d_A), reinterpret_cast(d_B)}; + std::size_t arg_size_bytes = argBuffer.size() * sizeof(void*); + + // Create configuration passed to the kernel as arguments + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, argBuffer.data(), + HIP_LAUNCH_PARAM_BUFFER_SIZE, &arg_size_bytes, + HIP_LAUNCH_PARAM_END}; + + int threads_per_block = 128; + int blocks = (elements + threads_per_block - 1) / threads_per_block; + + // Actually launch kernel + HIP_CHECK(hipModuleLaunchKernel(k, blocks, 1, 1, threads_per_block, 1, 1, 0, 0, NULL, config)); + + HIP_CHECK(hipMemcpyDtoH(A.data(), d_A, elements)); + HIP_CHECK(hipMemcpyDtoH(B.data(), d_B, elements)); + + HIP_CHECK(hipFree(reinterpret_cast(d_A))); + HIP_CHECK(hipFree(reinterpret_cast(d_B))); + + #ifdef __HIP_PLATFORM_NVIDIA__ + HIP_CHECK(hipCtxDestroy(context)); + #endif + + delete[] static_cast(imagePtr); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/load_module_ex_cuda.cpp b/docs/tools/example_codes/load_module_ex_cuda.cpp new file mode 100644 index 0000000000..e43f592ff0 --- /dev/null +++ b/docs/tools/example_codes/load_module_ex_cuda.cpp @@ -0,0 +1,134 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include +#include +#include +#include +#include +#include + +#define CUDA_CHECK(expression) \ +{ \ + const CUresult err = expression; \ + if (err != CUDA_SUCCESS) \ + { \ + const char* err_str{nullptr}; \ + cuGetErrorString(err, &err_str); \ + std::cerr << "CUDA Error: " << err_str \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +void* populate_data_pointer() +{ + auto filename = std::string{"myKernel.ptx"}; + std::fstream file{filename, std::ios::in | std::ios::binary | std::ios::ate}; + if(!file.is_open()) + { + std::cerr << "Error opening file " << filename << std::endl; + std::exit(EXIT_FAILURE); + } + + auto filesize = file.tellg(); + auto storage = new char[filesize]; + + file.seekg(0, std::ios::beg); + file.read(storage, filesize); + + return storage; +} + +int main() +{ + std::size_t elements = 64*1024; + std::size_t size_bytes = elements * sizeof(float); + + std::vector A(elements), B(elements); + + // On NVIDIA platforms the driver runtime needs to be initiated + cuInit(0); + CUdevice device; + CUcontext context; + CUDA_CHECK(cuDeviceGet(&device, 0)); + CUDA_CHECK(cuCtxCreate(&context, 0, device)); + + // Allocate device memory + CUdeviceptr d_A, d_B; + CUDA_CHECK(cuMemAlloc(&d_A, size_bytes)); + CUDA_CHECK(cuMemAlloc(&d_B, size_bytes)); + + // Copy data to device + CUDA_CHECK(cuMemcpyHtoD(d_A, A.data(), size_bytes)); + CUDA_CHECK(cuMemcpyHtoD(d_B, B.data(), size_bytes)); + + // Load module + + // For NVIDIA the module file has to contain PTX, found in e.g. "myKernel.ptx" + // [sphinx-start] + CUmodule module; + void* imagePtr = populate_data_pointer(); + + const int numOptions = 1; + CUjit_option options[numOptions]; + void *optionValues[numOptions]; + + options[0] = CU_JIT_MAX_REGISTERS; + unsigned maxRegs = 15; + optionValues[0] = (void *)(&maxRegs); + + cuModuleLoadDataEx(&module, imagePtr, numOptions, options, optionValues); + + CUfunction k; + cuModuleGetFunction(&k, module, "myKernel"); + // [sphinx-end] + + // Create buffer for kernel arguments + std::vector argBuffer{&d_A, &d_B}; + std::size_t arg_size_bytes = argBuffer.size() * sizeof(void*); + + // Create configuration passed to the kernel as arguments + void* config[] = {CU_LAUNCH_PARAM_BUFFER_POINTER, argBuffer.data(), + CU_LAUNCH_PARAM_BUFFER_SIZE, &arg_size_bytes, CU_LAUNCH_PARAM_END}; + + int threads_per_block = 128; + int blocks = (elements + threads_per_block - 1) / threads_per_block; + + // Actually launch kernel + CUDA_CHECK(cuLaunchKernel(k, blocks, 1, 1, threads_per_block, 1, 1, 0, 0, NULL, config)); + + CUDA_CHECK(cuMemcpyDtoH(A.data(), d_A, elements)); + CUDA_CHECK(cuMemcpyDtoH(B.data(), d_B, elements)); + + CUDA_CHECK(cuMemFree(d_A)); + CUDA_CHECK(cuMemFree(d_B)); + + CUDA_CHECK(cuCtxDestroy(context)); + + delete[] static_cast(imagePtr); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/low_precision_float_fp16.hip b/docs/tools/example_codes/low_precision_float_fp16.hip new file mode 100644 index 0000000000..249fa470ff --- /dev/null +++ b/docs/tools/example_codes/low_precision_float_fp16.hip @@ -0,0 +1,111 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include +#include +#include +#include + +#define hip_check(hip_call) \ +{ \ + auto hip_res = hip_call; \ + if (hip_res != hipSuccess) { \ + std::cerr << "Failed in HIP call: " << #hip_call \ + << " at " << __FILE__ << ":" << __LINE__ \ + << " with error: " << hipGetErrorString(hip_res) << std::endl; \ + std::abort(); \ + } \ +} + +__global__ void add_half_precision(__half* in1, __half* in2, float* out, size_t size) +{ + int idx = threadIdx.x; + if (idx < size) + { + // Load as half, perform addition in float, store as float + float sum = __half2float(in1[idx] + in2[idx]); + out[idx] = sum; + } +} + +int main() +{ + constexpr size_t size = 32; + constexpr float tolerance = 1e-1f; // Allowable numerical difference + + // Initialize input vectors as floats + std::vector in1(size), in2(size); + for (size_t i = 0; i < size; i++) { + in1[i] = i + 1.1f; + in2[i] = i + 2.2f; + } + + // Compute expected results in full precision on CPU + std::vector cpu_out(size); + for (size_t i = 0; i < size; i++) { + cpu_out[i] = in1[i] + in2[i]; // Direct float addition + } + + // Allocate device memory (store input as half, output as float) + __half *d_in1, *d_in2; + float *d_out; + hip_check(hipMalloc(&d_in1, sizeof(__half) * size)); + hip_check(hipMalloc(&d_in2, sizeof(__half) * size)); + hip_check(hipMalloc(&d_out, sizeof(float) * size)); + + // Convert input to half and copy to device + std::vector<__half> in1_half(size), in2_half(size); + for (size_t i = 0; i < size; i++) + { + in1_half[i] = __float2half(in1[i]); + in2_half[i] = __float2half(in2[i]); + } + + hip_check(hipMemcpy(d_in1, in1_half.data(), sizeof(__half) * size, hipMemcpyHostToDevice)); + hip_check(hipMemcpy(d_in2, in2_half.data(), sizeof(__half) * size, hipMemcpyHostToDevice)); + + // Launch kernel + add_half_precision<<<1, size>>>(d_in1, d_in2, d_out, size); + + // Copy result back to host + std::vector gpu_out(size, 0.0f); + hip_check(hipMemcpy(gpu_out.data(), d_out, sizeof(float) * size, hipMemcpyDeviceToHost)); + + // Free device memory + hip_check(hipFree(d_in1)); + hip_check(hipFree(d_in2)); + hip_check(hipFree(d_out)); + + // Validation with tolerance + for (size_t i = 0; i < size; i++) + { + if (std::fabs(cpu_out[i] - gpu_out[i]) > tolerance) + { + std::cerr << "Mismatch at index " << i + << ": CPU result = " << cpu_out[i] + << ", GPU result = " << gpu_out[i] << std::endl; + std::abort(); + } + } + + std::cout << "Success: CPU and GPU half-precision addition match within tolerance!" << std::endl; +} diff --git a/docs/tools/example_codes/low_precision_float_fp8.hip b/docs/tools/example_codes/low_precision_float_fp8.hip new file mode 100644 index 0000000000..8a98bd67f1 --- /dev/null +++ b/docs/tools/example_codes/low_precision_float_fp8.hip @@ -0,0 +1,130 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#include +#include +#include + +#define hip_check(hip_call) \ +{ \ + auto hip_res = hip_call; \ + if (hip_res != hipSuccess) \ + { \ + std::cerr << "Failed in HIP call: " << #hip_call \ + << " at " << __FILE__ << ":" << __LINE__ \ + << " with error: " << hipGetErrorString(hip_res) << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__device__ __hip_fp8_storage_t d_convert_float_to_fp8(float in, __hip_fp8_interpretation_t interpret, __hip_saturation_t sat) +{ + return __hip_cvt_float_to_fp8(in, sat, interpret); +} + +__global__ void float_to_fp8_to_float(float *in, __hip_fp8_interpretation_t interpret, __hip_saturation_t sat, float *out, size_t size) +{ + int i = threadIdx.x; + if (i < size) + { + auto fp8 = d_convert_float_to_fp8(in[i], interpret, sat); + // Implicit conversion from fp8 to float is defined by HIP runtime + out[i] = fp8; + } +} + +__hip_fp8_storage_t convert_float_to_fp8(float in, /* Input val */ + __hip_fp8_interpretation_t interpret, /* interpretation of number E4M3/E5M2 */ + __hip_saturation_t sat /* Saturation behavior */ + ) +{ + return __hip_cvt_float_to_fp8(in, sat, interpret); +} + +int main() +{ + constexpr size_t size = 32; + hipDeviceProp_t prop; + hip_check(hipGetDeviceProperties(&prop, 0)); + bool is_supported = (std::string(prop.gcnArchName).find("gfx94") != std::string::npos); // gfx94x + if(!is_supported) + { + std::cerr << "Need a gfx94x, but found: " << prop.gcnArchName << std::endl; + std::cerr << "No device conversions are supported, only host conversions are supported." << std::endl; + return EXIT_SUCCESS; + } + + const __hip_fp8_interpretation_t interpret = (std::string(prop.gcnArchName).find("gfx94") != std::string::npos) + ? __HIP_E4M3_FNUZ // gfx94x + : __HIP_E4M3; + constexpr __hip_saturation_t sat = __HIP_SATFINITE; + + std::vector in; + in.reserve(size); + for (size_t i = 0; i < size; i++) + in.push_back(i + 1.1f); + + std::cout << "Converting float to fp8 and back..." << std::endl; + // CPU convert + std::vector cpu_out; + cpu_out.reserve(size); + for (const auto &fval : in) + { + auto fp8 = convert_float_to_fp8(fval, interpret, sat); + // Implicit conversion from fp8 to float is defined by HIP runtime + cpu_out.push_back(fp8); + } + + // GPU convert + float *d_in, *d_out; + hip_check(hipMalloc(&d_in, sizeof(float) * size)); + hip_check(hipMalloc(&d_out, sizeof(float) * size)); + + hip_check(hipMemcpy(d_in, in.data(), sizeof(float) * in.size(), hipMemcpyHostToDevice)); + + float_to_fp8_to_float<<<1, size>>>(d_in, interpret, sat, d_out, size); + + std::vector gpu_out(size, 0.0f); + hip_check(hipMemcpy(gpu_out.data(), d_out, sizeof(float) * gpu_out.size(), hipMemcpyDeviceToHost)); + + hip_check(hipFree(d_in)); + hip_check(hipFree(d_out)); + + // Validation + for (size_t i = 0; i < size; i++) + { + if (cpu_out[i] != gpu_out[i]) + { + std::cerr << "cpu round trip result: " << cpu_out[i] + << " - gpu round trip result: " << gpu_out[i] << std::endl; + return EXIT_FAILURE; + } + } + std::cout << "...CPU and GPU round trip convert matches." << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/lowered_names.cpp b/docs/tools/example_codes/lowered_names.cpp new file mode 100644 index 0000000000..e14db8555f --- /dev/null +++ b/docs/tools/example_codes/lowered_names.cpp @@ -0,0 +1,202 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include +#include + +#include +#include +#include +#include +#include + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +// [sphinx-source-start] +static constexpr const char gpu_program[] { +R"( + __device__ int V1; // set from host code + static __global__ void f1(int *result) + { + *result = V1 + 10; + } + + namespace N1 + { + namespace N2 + { + __constant__ int V2; // set from host code + __global__ void f2(int *result) + { + *result = V2 + 20; + } + } + } + + template + __global__ void f3(int *result) + { + *result = sizeof(T); + } +)"}; +// [sphinx-source-end] + +int main() +{ + using namespace std::string_literals; + + hiprtcProgram prog; + HIPRTC_CHECK(hiprtcCreateProgram(&prog, gpu_program, "gpu_source.cpp", 0, nullptr, nullptr)); + + std::vector kernel_names; + std::vector variable_names; + std::vector initial_values; + std::vector expected_results; + initial_values.emplace_back(100); + initial_values.emplace_back(200); + expected_results.emplace_back(110); + expected_results.emplace_back(220); + expected_results.emplace_back(static_cast(sizeof(int))); + + // [sphinx-add-expression-start] + kernel_names.emplace_back("&f1"s); + kernel_names.emplace_back("N1::N2::f2"s); + kernel_names.emplace_back("f3"s); + for(auto&& name : kernel_names) + HIPRTC_CHECK(hiprtcAddNameExpression(prog, name.c_str())); + + variable_names.emplace_back("&V1"s); + variable_names.emplace_back("&N1::N2::V2"); + for(auto&& name : variable_names) + HIPRTC_CHECK(hiprtcAddNameExpression(prog, name.c_str())); + // [sphinx-add-expression-end] + + hipDeviceProp_t props; + int device = 0; + HIP_CHECK(hipGetDeviceProperties(&props, device)); + auto sarg = std::string{"--gpu-architecture="} + props.gcnArchName; // device for which binary is to be generated + + const char* options[] = {sarg.c_str()}; + + HIPRTC_CHECK(hiprtcCompileProgram(prog, 1, options)); + + std::size_t logSize; + HIPRTC_CHECK(hiprtcGetProgramLogSize(prog, &logSize)); + if (logSize) + { + std::string log(logSize, '\0'); + HIPRTC_CHECK(hiprtcGetProgramLog(prog, &log[0])); + std::cerr << "Compilation failed or produced warnings: " << log << std::endl; + std::abort(); + } + + std::size_t codeSize; + HIPRTC_CHECK(hiprtcGetCodeSize(prog, &codeSize)); + + std::vector kernel_binary(codeSize); + HIPRTC_CHECK(hiprtcGetCode(prog, kernel_binary.data())); + + std::vector lowered_kernel_names; + std::vector lowered_variable_names; + // [sphinx-get-kernel-name-start] + for(auto&& name : kernel_names) + { + const char* lowered_name = nullptr; + HIPRTC_CHECK(hiprtcGetLoweredName(prog, name.c_str(), &lowered_name)); + lowered_kernel_names.emplace_back(lowered_name); + } + // [sphinx-get-kernel-name-end] + // [sphinx-get-variable-name-start] + for(auto&& name : variable_names) + { + const char* lowered_name = nullptr; + HIPRTC_CHECK(hiprtcGetLoweredName(prog, name.c_str(), &lowered_name)); + lowered_variable_names.emplace_back(lowered_name); + } + // [sphinx-get-variable-name-end] + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + hipModule_t module; + + HIP_CHECK(hipModuleLoadData(&module, kernel_binary.data())); + + for(auto i = std::size_t{0}; i < initial_values.size(); ++i) + { + auto name = lowered_variable_names.at(i); + auto initial_value = initial_values.at(i); + + // [sphinx-update-variable-start] + hipDeviceptr_t variable_addr; + std::size_t bytes{}; + HIP_CHECK(hipModuleGetGlobal(&variable_addr, &bytes, module, name.c_str())); + HIP_CHECK(hipMemcpyHtoD(variable_addr, &initial_value, sizeof(initial_value))); + // [sphinx-update-variable-end] + } + + hipDeviceptr_t d_result; + auto h_result = 0; + HIP_CHECK(hipMalloc(reinterpret_cast(&d_result), sizeof(h_result))); + HIP_CHECK(hipMemcpyHtoD(d_result, &h_result, sizeof(h_result))); + + struct + { + hipDeviceptr_t ptr; + } args{d_result}; + auto args_size = sizeof(args); + + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &args, + HIP_LAUNCH_PARAM_BUFFER_SIZE, &args_size, + HIP_LAUNCH_PARAM_END}; + + for(auto i = std::size_t{0}; i < lowered_kernel_names.size(); ++i) + { + auto name = lowered_kernel_names.at(i); + auto expected = expected_results.at(i); + // [sphinx-launch-kernel-start] + hipFunction_t kernel; + HIP_CHECK(hipModuleGetFunction(&kernel, module, name.c_str())); + HIP_CHECK(hipModuleLaunchKernel(kernel, 1, 1, 1, 1, 1, 1, 0, nullptr, nullptr, config)); + // [sphinx-launch-kernel-end] + HIP_CHECK(hipMemcpyDtoH(&h_result, d_result, sizeof(h_result))); + if(expected != h_result) + { + std::cerr << "Validation failed. expected = " << expected << ", h_result = " << h_result << std::endl; + return EXIT_FAILURE; + } + } + + std::cout << "Validation passed." << std::endl; + + HIP_CHECK(hipFree(reinterpret_cast(d_result))); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/math.hip b/docs/tools/example_codes/math.hip new file mode 100644 index 0000000000..ae5df3cd4c --- /dev/null +++ b/docs/tools/example_codes/math.hip @@ -0,0 +1,118 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ + { \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + exit(EXIT_FAILURE); \ + } \ + } + +// Simple ULP difference calculator +int64_t ulp_diff(float a, float b) +{ + if (a == b) + return 0; + + union + { + float f; + int32_t i; + } ua{a}, ub{b}; + + // For negative values, convert to a positive-based representation + if (ua.i < 0) ua.i = std::numeric_limits::max() - ua.i; + if (ub.i < 0) ub.i = std::numeric_limits::max() - ub.i; + + return std::abs((int64_t)ua.i - (int64_t)ub.i); +} + +// Test kernel +__global__ void test_sin(float* out, int n) +{ + int i = blockIdx.x * blockDim.x + threadIdx.x; + if (i < n) + { + float x = -M_PI + (2.0f * M_PI * i) / (n - 1); + out[i] = sinf(x); + } +} + +int main() +{ + const int n = 1000000; + const int blocksize = 256; + std::vector outputs(n); + float* d_out; + + HIP_CHECK(hipMalloc(&d_out, n * sizeof(float))); + dim3 threads(blocksize); + dim3 blocks((n + blocksize - 1) / blocksize); // Fixed grid calculation + test_sin<<>>(d_out, n); + HIP_CHECK(hipPeekAtLastError()); + HIP_CHECK(hipMemcpy(outputs.data(), d_out, n * sizeof(float), hipMemcpyDeviceToHost)); + + // Step 1: Find the maximum absolute error + double max_abs_error = 0.0; + float max_error_output = 0.0; + float max_error_expected = 0.0; + + for (int i = 0; i < n; i++) + { + float x = -M_PI + (2.0f * M_PI * i) / (n - 1); + float expected = std::sin(x); + double abs_error = std::abs(outputs[i] - expected); + + if (abs_error > max_abs_error) + { + max_abs_error = abs_error; + max_error_output = outputs[i]; + max_error_expected = expected; + } + } + + // Step 2: Compute ULP difference based on the max absolute error pair + int64_t max_ulp = ulp_diff(max_error_output, max_error_expected); + + // Output results + std::cout << "Max Absolute Error: " << max_abs_error << std::endl; + std::cout << "Max ULP Difference: " << max_ulp << std::endl; + std::cout << "Max Error Values -> Got: " << max_error_output + << ", Expected: " << max_error_expected << std::endl; + + HIP_CHECK(hipFree(d_out)); + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/memory_pool.hip b/docs/tools/example_codes/memory_pool.hip new file mode 100644 index 0000000000..ac4ff84f5e --- /dev/null +++ b/docs/tools/example_codes/memory_pool.hip @@ -0,0 +1,109 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +// Kernel to perform some computation on allocated memory. +__global__ void myKernel(int* data, std::size_t numElements) +{ + int tid = threadIdx.x + blockIdx.x * blockDim.x; + if (tid < numElements) + { + data[tid] = tid * 2; + } +} + +int main() +{ + // Create a stream. + hipStream_t stream; + HIP_CHECK(hipStreamCreate(&stream)); + + // Create a memory pool with default properties. + hipMemPoolProps poolProps = {}; + poolProps.allocType = hipMemAllocationTypePinned; + poolProps.handleTypes = hipMemHandleTypePosixFileDescriptor; + poolProps.location.type = hipMemLocationTypeDevice; + poolProps.location.id = 0; // Assuming device 0. + + hipMemPool_t memPool; + HIP_CHECK(hipMemPoolCreate(&memPool, &poolProps)); + + // Allocate memory from the pool asynchronously. + constexpr std::size_t numElements = 1024; + int* devData = nullptr; + HIP_CHECK(hipMallocFromPoolAsync(reinterpret_cast(&devData), + numElements * sizeof(*devData), + memPool, + stream)); + + // Define grid and block sizes. + dim3 blockSize(256); + dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); + + // Launch the kernel to perform computation. + myKernel<<>>(devData, numElements); + + // Synchronize the stream. + HIP_CHECK(hipStreamSynchronize(stream)); + + // Copy data back to host. + int* hostData = new int[numElements]; + HIP_CHECK(hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost)); + + // Print the array. + for (std::size_t i = 0; i < numElements; ++i) + std::cout << "Element " << i << ": " << hostData[i] << std::endl; + + // Free the allocated memory. + HIP_CHECK(hipFreeAsync(devData, stream)); + + // Synchronize the stream again to ensure all operations are complete. + HIP_CHECK(hipStreamSynchronize(stream)); + + // Destroy the memory pool and stream. + HIP_CHECK(hipMemPoolDestroy(memPool)); + HIP_CHECK(hipStreamDestroy(stream)); + + // Free host memory. + delete[] hostData; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/memory_pool_resource_usage_statistics.cpp b/docs/tools/example_codes/memory_pool_resource_usage_statistics.cpp new file mode 100644 index 0000000000..306d61a8ea --- /dev/null +++ b/docs/tools/example_codes/memory_pool_resource_usage_statistics.cpp @@ -0,0 +1,115 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +// Sample helper functions for getting the usage statistics in bulk. +struct usageStatistics +{ + std::uint64_t reservedMemCurrent; + std::uint64_t reservedMemHigh; + std::uint64_t usedMemCurrent; + std::uint64_t usedMemHigh; +}; + +void getUsageStatistics(hipMemPool_t memPool, struct usageStatistics *statistics) +{ + HIP_CHECK(hipMemPoolGetAttribute(memPool, hipMemPoolAttrReservedMemCurrent, &statistics->reservedMemCurrent)); + HIP_CHECK(hipMemPoolGetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &statistics->reservedMemHigh)); + HIP_CHECK(hipMemPoolGetAttribute(memPool, hipMemPoolAttrUsedMemCurrent, &statistics->usedMemCurrent)); + HIP_CHECK(hipMemPoolGetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &statistics->usedMemHigh)); +} + +// Resetting the watermarks resets them to the current value. +void resetStatistics(hipMemPool_t memPool) +{ + std::uint64_t value = 0; + HIP_CHECK(hipMemPoolSetAttribute(memPool, hipMemPoolAttrReservedMemHigh, &value)); + HIP_CHECK(hipMemPoolSetAttribute(memPool, hipMemPoolAttrUsedMemHigh, &value)); +} + +int main() +{ + hipMemPool_t memPool; + hipDevice_t device = 0; // Specify the device index. + + // Initialize the device. + HIP_CHECK(hipSetDevice(device)); + + // Get the default memory pool for the device. + HIP_CHECK(hipDeviceGetDefaultMemPool(&memPool, device)); + + // Allocate memory from the pool (e.g., 1 MB). + std::size_t allocSize = 1 * 1024 * 1024; + void* ptr; + HIP_CHECK(hipMalloc(&ptr, allocSize)); + + // Free the allocated memory. + HIP_CHECK(hipFree(ptr)); + + // Trim the memory pool to a specific size (e.g., 512 KB). + std::size_t newSize = 512 * 1024; + HIP_CHECK(hipMemPoolTrimTo(memPool, newSize)); + + // Get and print usage statistics before resetting. + usageStatistics statsBefore; + getUsageStatistics(memPool, &statsBefore); + std::cout << "Before resetting statistics:" << std::endl; + std::cout << "Reserved Memory Current: " << statsBefore.reservedMemCurrent << " bytes" << std::endl; + std::cout << "Reserved Memory High: " << statsBefore.reservedMemHigh << " bytes" << std::endl; + std::cout << "Used Memory Current: " << statsBefore.usedMemCurrent << " bytes" << std::endl; + std::cout << "Used Memory High: " << statsBefore.usedMemHigh << " bytes" << std::endl; + + // Reset the statistics. + resetStatistics(memPool); + + // Get and print usage statistics after resetting. + usageStatistics statsAfter; + getUsageStatistics(memPool, &statsAfter); + std::cout << "After resetting statistics:" << std::endl; + std::cout << "Reserved Memory Current: " << statsAfter.reservedMemCurrent << " bytes" << std::endl; + std::cout << "Reserved Memory High: " << statsAfter.reservedMemHigh << " bytes" << std::endl; + std::cout << "Used Memory Current: " << statsAfter.usedMemCurrent << " bytes" << std::endl; + std::cout << "Used Memory High: " << statsAfter.usedMemHigh << " bytes" << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/memory_pool_threshold.hip b/docs/tools/example_codes/memory_pool_threshold.hip new file mode 100644 index 0000000000..daaec4ace0 --- /dev/null +++ b/docs/tools/example_codes/memory_pool_threshold.hip @@ -0,0 +1,115 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +// Kernel to perform some computation on allocated memory. +__global__ void myKernel(int* data, std::size_t numElements) +{ + int tid = threadIdx.x + blockIdx.x * blockDim.x; + if (tid < numElements) + { + data[tid] = tid * 2; + } +} + +int main() +{ + // Create a stream. + hipStream_t stream; + HIP_CHECK(hipStreamCreate(&stream)); + + // Create a memory pool with default properties. + hipMemPoolProps poolProps = {}; + poolProps.allocType = hipMemAllocationTypePinned; + poolProps.handleTypes = hipMemHandleTypePosixFileDescriptor; + poolProps.location.type = hipMemLocationTypeDevice; + poolProps.location.id = 0; // Assuming device 0. + + hipMemPool_t memPool; + HIP_CHECK(hipMemPoolCreate(&memPool, &poolProps)); + + // [sphinx-start] + std::uint64_t threshold = std::numeric_limits::max(); + HIP_CHECK(hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold)); + // [sphinx-end] + + // Allocate memory from the pool asynchronously. + constexpr std::size_t numElements = 1024; + int* devData = nullptr; + HIP_CHECK(hipMallocFromPoolAsync(reinterpret_cast(&devData), + numElements * sizeof(*devData), + memPool, + stream)); + + // Define grid and block sizes. + dim3 blockSize(256); + dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); + + // Launch the kernel to perform computation. + myKernel<<>>(devData, numElements); + + // Synchronize the stream. + HIP_CHECK(hipStreamSynchronize(stream)); + + // Copy data back to host. + int* hostData = new int[numElements]; + HIP_CHECK(hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost)); + + // Print the array. + for (std::size_t i = 0; i < numElements; ++i) + std::cout << "Element " << i << ": " << hostData[i] << std::endl; + + // Free the allocated memory. + HIP_CHECK(hipFreeAsync(devData, stream)); + + // Synchronize the stream again to ensure all operations are complete. + HIP_CHECK(hipStreamSynchronize(stream)); + + // Destroy the memory pool and stream. + HIP_CHECK(hipMemPoolDestroy(memPool)); + HIP_CHECK(hipStreamDestroy(stream)); + + // Free host memory. + delete[] hostData; + + return 0; +} diff --git a/docs/tools/example_codes/memory_pool_trim.cpp b/docs/tools/example_codes/memory_pool_trim.cpp new file mode 100644 index 0000000000..c398acfebc --- /dev/null +++ b/docs/tools/example_codes/memory_pool_trim.cpp @@ -0,0 +1,69 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + hipMemPool_t memPool; + hipDevice_t device = 0; // Specify the device index. + + // Initialize the device. + HIP_CHECK(hipSetDevice(device)); + + // Get the default memory pool for the device. + HIP_CHECK(hipDeviceGetDefaultMemPool(&memPool, device)); + + // Allocate memory from the pool (e.g., 1 MB). + std::size_t allocSize = 1 * 1024 * 1024; + void* ptr; + HIP_CHECK(hipMalloc(&ptr, allocSize)); + + // Free the allocated memory. + HIP_CHECK(hipFree(ptr)); + + // Trim the memory pool to a specific size (e.g., 512 KB). + std::size_t newSize = 512 * 1024; + HIP_CHECK(hipMemPoolTrimTo(memPool, newSize)); + + std::cout << "Memory pool trimmed to " << newSize << " bytes." << std::endl; + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/memory_range_attributes.hip b/docs/tools/example_codes/memory_range_attributes.hip new file mode 100644 index 0000000000..227bbe8620 --- /dev/null +++ b/docs/tools/example_codes/memory_range_attributes.hip @@ -0,0 +1,90 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +int main() +{ + int *a, *b, *c; + unsigned int attributeValue; + constexpr std::size_t attributeSize = sizeof(attributeValue); + + int deviceId; + HIP_CHECK(hipGetDevice(&deviceId)); + + // Allocate memory for a, b and c that is accessible to both device and host codes. + HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); + HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); + HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); + + // Setup input values. + *a = 1; + *b = 2; + + HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, deviceId)); + + // Launch add() kernel on GPU. + add<<<1, 1>>>(a, b, c); + + // Wait for GPU to finish before accessing on host. + HIP_CHECK(hipDeviceSynchronize()); + + // Query an attribute of the memory range. + HIP_CHECK(hipMemRangeGetAttribute(&attributeValue, + attributeSize, + hipMemRangeAttributeReadMostly, + a, + sizeof(*a))); + + // Prints the result. + std::cout << *a << " + " << *b << " = " << *c << std::endl; + std::cout << "The array a is" << (attributeValue == 1 ? "" : " NOT") << " set to hipMemRangeAttributeReadMostly" << std::endl; + + // Cleanup allocated memory. + HIP_CHECK(hipFree(a)); + HIP_CHECK(hipFree(b)); + HIP_CHECK(hipFree(c)); + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/multi_device_synchronization.hip b/docs/tools/example_codes/multi_device_synchronization.hip new file mode 100644 index 0000000000..c16714ee13 --- /dev/null +++ b/docs/tools/example_codes/multi_device_synchronization.hip @@ -0,0 +1,133 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__global__ void simpleKernel(double *data) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + data[idx] = idx * 2.0; +} + +int main() +{ + int numDevices; + HIP_CHECK(hipGetDeviceCount(&numDevices)); + + if (numDevices < 2) + { + std::cout << "This example requires at least two HIP devices." << std::endl; + return EXIT_SUCCESS; + } + + double *deviceData0, *deviceData1; + std::size_t size = 1024 * sizeof(*deviceData0); + + // Create streams and events for each device + hipStream_t stream0, stream1; + hipEvent_t startEvent0, stopEvent0, startEvent1, stopEvent1; + + // Initialize device 0 + HIP_CHECK(hipSetDevice(0)); + HIP_CHECK(hipStreamCreate(&stream0)); + HIP_CHECK(hipEventCreate(&startEvent0)); + HIP_CHECK(hipEventCreate(&stopEvent0)); + HIP_CHECK(hipMalloc(&deviceData0, size)); + + // Initialize device 1 + HIP_CHECK(hipSetDevice(1)); + HIP_CHECK(hipStreamCreate(&stream1)); + HIP_CHECK(hipEventCreate(&startEvent1)); + HIP_CHECK(hipEventCreate(&stopEvent1)); + HIP_CHECK(hipMalloc(&deviceData1, size)); + + // Record the start event on device 0 + HIP_CHECK(hipSetDevice(0)); + HIP_CHECK(hipEventRecord(startEvent0, stream0)); + + // Launch the kernel asynchronously on device 0 + simpleKernel<<<1000, 128, 0, stream0>>>(deviceData0); + + // Record the stop event on device 0 + HIP_CHECK(hipEventRecord(stopEvent0, stream0)); + + // Wait for the stop event on device 0 to complete + HIP_CHECK(hipEventSynchronize(stopEvent0)); + + // Record the start event on device 1 + HIP_CHECK(hipSetDevice(1)); + HIP_CHECK(hipEventRecord(startEvent1, stream1)); + + // Launch the kernel asynchronously on device 1 + simpleKernel<<<1000, 128, 0, stream1>>>(deviceData1); + + // Record the stop event on device 1 + HIP_CHECK(hipEventRecord(stopEvent1, stream1)); + + // Wait for the stop event on device 1 to complete + HIP_CHECK(hipEventSynchronize(stopEvent1)); + + // Calculate elapsed time between the events for both devices + float milliseconds0 = 0, milliseconds1 = 0; + HIP_CHECK(hipEventElapsedTime(&milliseconds0, startEvent0, stopEvent0)); + HIP_CHECK(hipEventElapsedTime(&milliseconds1, startEvent1, stopEvent1)); + + std::cout << "Elapsed time on GPU 0: " << milliseconds0 << " ms" << std::endl; + std::cout << "Elapsed time on GPU 1: " << milliseconds1 << " ms" << std::endl; + + // Cleanup for device 0 + HIP_CHECK(hipSetDevice(0)); + HIP_CHECK(hipEventDestroy(startEvent0)); + HIP_CHECK(hipEventDestroy(stopEvent0)); + HIP_CHECK(hipStreamSynchronize(stream0)); + HIP_CHECK(hipStreamDestroy(stream0)); + HIP_CHECK(hipFree(deviceData0)); + + // Cleanup for device 1 + HIP_CHECK(hipSetDevice(1)); + HIP_CHECK(hipEventDestroy(startEvent1)); + HIP_CHECK(hipEventDestroy(stopEvent1)); + HIP_CHECK(hipStreamSynchronize(stream1)); + HIP_CHECK(hipStreamDestroy(stream1)); + HIP_CHECK(hipFree(deviceData1)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/ordinary_memory_allocation.hip b/docs/tools/example_codes/ordinary_memory_allocation.hip new file mode 100644 index 0000000000..d3c59dc93b --- /dev/null +++ b/docs/tools/example_codes/ordinary_memory_allocation.hip @@ -0,0 +1,81 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +// Kernel to perform some computation on allocated memory. +__global__ void myKernel(int* data, std::size_t numElements) +{ + int tid = threadIdx.x + blockIdx.x * blockDim.x; + if (tid < numElements) + { + data[tid] = tid * 2; + } +} + +int main() +{ + // Allocate memory. + constexpr std::size_t numElements = 1024; + int* devData; + HIP_CHECK(hipMalloc(&devData, numElements * sizeof(*devData))); + + // Launch the kernel to perform computation. + dim3 blockSize(256); + dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); + myKernel<<>>(devData, numElements); + + // Copy data back to host. + int* hostData = new int[numElements]; + HIP_CHECK(hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost)); + + // Print the array. + for (std::size_t i = 0; i < numElements; ++i) + std::cout << "Element " << i << ": " << hostData[i] << std::endl; + + // Free memory. + HIP_CHECK(hipFree(devData)); + delete[] hostData; + + // Synchronize to ensure completion. + HIP_CHECK(hipDeviceSynchronize()); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/p2p_memory_access.hip b/docs/tools/example_codes/p2p_memory_access.hip new file mode 100644 index 0000000000..832c1ad46b --- /dev/null +++ b/docs/tools/example_codes/p2p_memory_access.hip @@ -0,0 +1,112 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__global__ void simpleKernel(double *data) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + data[idx] = idx * 2.0; +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + if(deviceCount < 2) + { + std::cout << "This example requires at least two HIP devices." << std::endl; + return EXIT_SUCCESS; + } + + double* deviceData0; + double* deviceData1; + std::size_t size = 1024 * sizeof(*deviceData0); + + int deviceId0 = 0; + int deviceId1 = 1; + + // Enable peer access to the memory (allocated and future) on the peer device. + // Ensure the device is active before enabling peer access. + HIP_CHECK(hipSetDevice(deviceId0)); + HIP_CHECK(hipDeviceEnablePeerAccess(deviceId1, 0)); + + HIP_CHECK(hipSetDevice(deviceId1)); + HIP_CHECK(hipDeviceEnablePeerAccess(deviceId0, 0)); + + // Set device 0 and perform operations + HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current + HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 + simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 + HIP_CHECK(hipDeviceSynchronize()); + + // Set device 1 and perform operations + HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current + HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 + simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 + HIP_CHECK(hipDeviceSynchronize()); + + // Use peer-to-peer access + HIP_CHECK(hipSetDevice(deviceId0)); + + // Now device 0 can access memory allocated on device 1 + HIP_CHECK(hipMemcpy(deviceData0, deviceData1, size, hipMemcpyDeviceToDevice)); + + // Copy result from device 0 + double hostData0[1024]; + HIP_CHECK(hipSetDevice(deviceId0)); + HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); + + // Copy result from device 1 + double hostData1[1024]; + HIP_CHECK(hipSetDevice(deviceId1)); + HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); + + // Display results from both devices + std::cout << "Device 0 data: " << hostData0[0] << std::endl; + std::cout << "Device 1 data: " << hostData1[0] << std::endl; + + // Free device memory + HIP_CHECK(hipFree(deviceData0)); + HIP_CHECK(hipFree(deviceData1)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/p2p_memory_access_failed.hip b/docs/tools/example_codes/p2p_memory_access_failed.hip new file mode 100644 index 0000000000..e56038ba71 --- /dev/null +++ b/docs/tools/example_codes/p2p_memory_access_failed.hip @@ -0,0 +1,106 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +__global__ void simpleKernel(double *data) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + data[idx] = idx * 2.0; +} + +int main() +{ + int deviceCount; + HIP_CHECK(hipGetDeviceCount(&deviceCount)); + if(deviceCount < 2) + { + std::cout << "This example requires at least two HIP devices." << std::endl; + return EXIT_FAILURE; + } + + double* deviceData0; + double* deviceData1; + std::size_t size = 1024 * sizeof(*deviceData0); + + int deviceId0 = 0; + int deviceId1 = 1; + + // Set device 0 and perform operations + HIP_CHECK(hipSetDevice(deviceId0)); // Set device 0 as current + HIP_CHECK(hipMalloc(&deviceData0, size)); // Allocate memory on device 0 + simpleKernel<<<1000, 128>>>(deviceData0); // Launch kernel on device 0 + HIP_CHECK(hipDeviceSynchronize()); + + // Set device 1 and perform operations + HIP_CHECK(hipSetDevice(deviceId1)); // Set device 1 as current + HIP_CHECK(hipMalloc(&deviceData1, size)); // Allocate memory on device 1 + simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 + HIP_CHECK(hipDeviceSynchronize()); + + // Attempt to use deviceData0 on device 1 (This will not work as deviceData0 is allocated on device 0) + HIP_CHECK(hipSetDevice(deviceId1)); + hipError_t err = hipMemcpy(deviceData1, deviceData0, size, hipMemcpyDeviceToDevice); // This should fail + if (err != hipSuccess) + { + std::cout << "Error: Cannot access deviceData0 from device 1, deviceData0 is on device 0" << std::endl; + } + + // Copy result from device 0 + double hostData0[1024]; + HIP_CHECK(hipSetDevice(deviceId0)); + HIP_CHECK(hipMemcpy(hostData0, deviceData0, size, hipMemcpyDeviceToHost)); + + // Copy result from device 1 + double hostData1[1024]; + HIP_CHECK(hipSetDevice(deviceId1)); + HIP_CHECK(hipMemcpy(hostData1, deviceData1, size, hipMemcpyDeviceToHost)); + + // Display results from both devices + std::cout << "Device 0 data: " << hostData0[0] << std::endl; + std::cout << "Device 1 data: " << hostData1[0] << std::endl; + + // Free device memory + HIP_CHECK(hipFree(deviceData0)); + HIP_CHECK(hipFree(deviceData1)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/pageable_host_memory.cpp b/docs/tools/example_codes/pageable_host_memory.cpp new file mode 100644 index 0000000000..a3ee956e6f --- /dev/null +++ b/docs/tools/example_codes/pageable_host_memory.cpp @@ -0,0 +1,80 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +int main() +{ + const int element_number = 100; + + int *host_input, *host_output; + // Host allocation + host_input = new int[element_number]; + host_output = new int[element_number]; + + // Host data preparation + for (int i = 0; i < element_number; i++) { + host_input[i] = i; + } + std::memset(host_output, 0, element_number * sizeof(int)); + + int *device_input, *device_output; + + // Device allocation + HIP_CHECK(hipMalloc((int **)&device_input, element_number * sizeof(int))); + HIP_CHECK(hipMalloc((int **)&device_output, element_number * sizeof(int))); + + // Device data preparation + HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemset(device_output, 0, element_number * sizeof(int))); + + // Run the kernel + // ... + + HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); + + // Free host memory + delete[] host_input; + delete[] host_output; + + // Free device memory + HIP_CHECK(hipFree(device_input)); + HIP_CHECK(hipFree(device_output)); +} +// [sphinx-end] diff --git a/docs/tools/example_codes/per_thread_default_stream.cpp b/docs/tools/example_codes/per_thread_default_stream.cpp new file mode 100644 index 0000000000..57c88dcf7d --- /dev/null +++ b/docs/tools/example_codes/per_thread_default_stream.cpp @@ -0,0 +1,78 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +int main() +{ + // Initialize the HIP runtime + if (auto err = hipInit(0); err != hipSuccess) + { + std::cerr << "Failed to initialize HIP runtime." << std::endl; + return EXIT_FAILURE; + } + + // Get the per-thread default stream + hipStream_t stream = hipStreamPerThread; + + // Use the stream for some operation + // For example, allocate memory on the device + void* d_ptr; + std::size_t size = 1024; + if (auto err = hipMalloc(&d_ptr, size); err != hipSuccess) + { + std::cerr << "Failed to allocate memory." << std::endl; + return EXIT_FAILURE; + } + + // Perform some operation using the stream + // For example, set memory on the device + if (auto err = hipMemsetAsync(d_ptr, 0, size, stream); err != hipSuccess) + { + std::cerr << "Failed to set memory." << std::endl; + return EXIT_FAILURE; + } + + // Synchronize the stream + if (auto err = hipStreamSynchronize(stream); err != hipSuccess) + { + std::cerr << "Failed to synchronize stream." << std::endl; + return EXIT_FAILURE; + } + + // Free the allocated memory + if(auto err = hipFree(d_ptr); err != hipSuccess) + { + std::cerr << "Failed to free memory." << std::endl; + return EXIT_FAILURE; + } + + std::cout << "Operation completed successfully using per-thread default stream." << std::endl; + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/pinned_host_memory.cpp b/docs/tools/example_codes/pinned_host_memory.cpp new file mode 100644 index 0000000000..65c38673c0 --- /dev/null +++ b/docs/tools/example_codes/pinned_host_memory.cpp @@ -0,0 +1,81 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +int main() +{ + const int element_number = 100; + + int *host_input, *host_output; + // Host allocation + HIP_CHECK(hipHostMalloc(&host_input, element_number * sizeof(int))); + HIP_CHECK(hipHostMalloc(&host_output, element_number * sizeof(int))); + + // Host data preparation + for (int i = 0; i < element_number; i++) + { + host_input[i] = i; + } + std::memset(host_output, 0, element_number * sizeof(int)); + + int *device_input, *device_output; + + // Device allocation + HIP_CHECK(hipMalloc(&device_input, element_number * sizeof(int))); + HIP_CHECK(hipMalloc(&device_output, element_number * sizeof(int))); + + // Device data preparation + HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemset(device_output, 0, element_number * sizeof(int))); + + // Run the kernel + // ... + + HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); + + // Free host memory + HIP_CHECK(hipFreeHost(host_input)); + HIP_CHECK(hipFreeHost(host_output)); + + // Free device memory + HIP_CHECK(hipFree(device_input)); + HIP_CHECK(hipFree(device_output)); +} +// [sphinx-end] diff --git a/docs/tools/example_codes/pointer_memory_type.cpp b/docs/tools/example_codes/pointer_memory_type.cpp new file mode 100644 index 0000000000..dca0505f3d --- /dev/null +++ b/docs/tools/example_codes/pointer_memory_type.cpp @@ -0,0 +1,61 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if (err != hipSuccess) \ + { \ + std::cout << "HIP Error: " << hipGetErrorString(err) \ + << " at line " << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +int main() +{ + // [sphinx-start] + double * ptr; + HIP_CHECK(hipMalloc(&ptr, sizeof(double))); + hipPointerAttribute_t attr; + HIP_CHECK(hipPointerGetAttributes(&attr, ptr)); /*attr.type is hipMemoryTypeDevice*/ + if(attr.type == hipMemoryTypeDevice) + std::cout << "ptr is of type hipMemoryTypeDevice" << std::endl; + + double* ptrHost; + HIP_CHECK(hipHostMalloc(&ptrHost, sizeof(double))); + hipPointerAttribute_t attrHost; + HIP_CHECK(hipPointerGetAttributes(&attrHost, ptrHost)); /*attr.type is hipMemoryTypeHost*/ + if(attrHost.type == hipMemoryTypeHost) + std::cout << "ptrHost is of type hipMemoryTypeHost" << std::endl; + // [sphinx-end] + + HIP_CHECK(hipFreeHost(ptrHost)); + HIP_CHECK(hipFree(ptr)); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/rtc_error_handling.cpp b/docs/tools/example_codes/rtc_error_handling.cpp new file mode 100644 index 0000000000..9279c55683 --- /dev/null +++ b/docs/tools/example_codes/rtc_error_handling.cpp @@ -0,0 +1,79 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include +#include + +#include +#include +#include +#include +#include + +#define CHECK_RET_CODE(call, ret_code) \ +{ \ + if ((call) != ret_code) \ + { \ + std::cout << "Failed in call: " << #call << std::endl; \ + std::abort(); \ + } \ +} +#define HIP_CHECK(call) CHECK_RET_CODE(call, hipSuccess) +#define HIPRTC_CHECK(call) CHECK_RET_CODE(call, HIPRTC_SUCCESS) + +int main() +{ + const char* kernel_source = "adafsfgadascvsfgsadfbdt"; + hiprtcProgram prog; + auto rtc_ret_code = hiprtcCreateProgram(&prog, // HIPRTC program handle + kernel_source, // kernel source string + "vector_add.cpp", // Name of the file + 0, // Number of headers + nullptr, // Header sources + nullptr); // Name of header file + + if (rtc_ret_code != HIPRTC_SUCCESS) + { + std::cerr << "Failed to create program" << std::endl; + std::abort(); + } + + hipDeviceProp_t props; + int device = 0; + HIP_CHECK(hipGetDeviceProperties(&props, device)); + auto sarg = std::string{"--gpu-architecture="} + props.gcnArchName; // device for which binary is to be generated + + const char* opts[] = {sarg.c_str()}; + + // [sphinx-start] + hiprtcResult result; + result = hiprtcCompileProgram(prog, 1, opts); + if (result != HIPRTC_SUCCESS) + { + std::cout << "hiprtcCompileProgram fails with error " << hiprtcGetErrorString(result); + } + // [sphinx-end] + + HIPRTC_CHECK(hiprtcDestroyProgram(&prog)); + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/sequential_kernel_execution.hip b/docs/tools/example_codes/sequential_kernel_execution.hip new file mode 100644 index 0000000000..4d6711d8b6 --- /dev/null +++ b/docs/tools/example_codes/sequential_kernel_execution.hip @@ -0,0 +1,131 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE + +// [sphinx-start] +#include + +#include +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +// GPU Kernels +__global__ void kernelA(double* arrayA, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayA[x] += 1.0; + } +} + +__global__ void kernelB(double* arrayA, double* arrayB, std::size_t size) +{ + const std::size_t x = threadIdx.x + blockDim.x * blockIdx.x; + if(x < size) + { + arrayB[x] += arrayA[x] + 3.0; + } +} + +int main() +{ + constexpr int numOfBlocks = 1 << 20; + constexpr int threadsPerBlock = 1024; + constexpr int numberOfIterations = 50; + // The array size smaller to avoid the relatively short kernel launch compared to memory copies + constexpr std::size_t arraySize = 1U << 25; + double *d_dataA; + double *d_dataB; + + double initValueA = 0.0; + double initValueB = 2.0; + + std::vector vectorA(arraySize, initValueA); + std::vector vectorB(arraySize, initValueB); + // Allocate device memory + HIP_CHECK(hipMalloc(&d_dataA, arraySize * sizeof(*d_dataA))); + HIP_CHECK(hipMalloc(&d_dataB, arraySize * sizeof(*d_dataB))); + for(int iteration = 0; iteration < numberOfIterations; iteration++) + { + // Host to Device copies + HIP_CHECK(hipMemcpy(d_dataA, vectorA.data(), arraySize * sizeof(*d_dataA), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(d_dataB, vectorB.data(), arraySize * sizeof(*d_dataB), hipMemcpyHostToDevice)); + // Launch the GPU kernels + kernelA<<>>(d_dataA, arraySize); + kernelB<<>>(d_dataA, d_dataB, arraySize); + // Device to Host copies + HIP_CHECK(hipMemcpy(vectorA.data(), d_dataA, arraySize * sizeof(*vectorA.data()), hipMemcpyDeviceToHost)); + HIP_CHECK(hipMemcpy(vectorB.data(), d_dataB, arraySize * sizeof(*vectorB.data()), hipMemcpyDeviceToHost)); + } + // Wait for all operations to complete + HIP_CHECK(hipDeviceSynchronize()); + + // Verify results + const double expectedA = (double)numberOfIterations; + const double expectedB = initValueB + (3.0 * numberOfIterations) + (expectedA * (expectedA + 1.0)) / 2.0; + bool passed = true; + for(std::size_t i = 0; i < arraySize; ++i) + { + if(vectorA[i] != expectedA) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedA << " got " << vectorA[i] << " at index: " << i << std::endl; + break; + } + if(vectorB[i] != expectedB) + { + passed = false; + std::cerr << "Validation failed! Expected " << expectedB << " got " << vectorB[i] << " at index: " << i << std::endl; + break; + } + } + + if(passed) + { + std::cout << "Sequential execution completed successfully." << std::endl; + } + else + { + std::cerr << "Sequential execution failed." << std::endl; + } + + // Cleanup + HIP_CHECK(hipFree(d_dataA)); + HIP_CHECK(hipFree(d_dataB)); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/set_constant_memory.hip b/docs/tools/example_codes/set_constant_memory.hip new file mode 100644 index 0000000000..ad7174a1d0 --- /dev/null +++ b/docs/tools/example_codes/set_constant_memory.hip @@ -0,0 +1,47 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include + +// [sphinx-start] +__constant__ int const_array[8]; + +void set_constant_memory() +{ + int host_data[8] {1,2,3,4,5,6,7,8}; + + if(auto err = hipMemcpyToSymbol(const_array, host_data, sizeof(int) * 8); err != hipSuccess) + std::cerr << "HIP error " << err << ": " << hipGetErrorString(err) << std::endl; + + // call kernel that accesses const_array +} +// [sphinx-end] + +int main() +{ + set_constant_memory(); + std::cout << "Success!" << std::endl; + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/simple_device_query.cpp b/docs/tools/example_codes/simple_device_query.cpp new file mode 100644 index 0000000000..63c8740077 --- /dev/null +++ b/docs/tools/example_codes/simple_device_query.cpp @@ -0,0 +1,42 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +int main() +{ + int deviceCount; + if (hipGetDeviceCount(&deviceCount) == hipSuccess) + { + for (int i = 0; i < deviceCount; ++i) + { + hipDeviceProp_t prop; + if (hipGetDeviceProperties(&prop, i) == hipSuccess) + std::cout << "Device" << i << prop.name << std::endl; + } + } + + return 0; +} +// [sphinx-end] \ No newline at end of file diff --git a/docs/tools/example_codes/standard_unified_memory.hip b/docs/tools/example_codes/standard_unified_memory.hip new file mode 100644 index 0000000000..5028101b7a --- /dev/null +++ b/docs/tools/example_codes/standard_unified_memory.hip @@ -0,0 +1,73 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +// This example requires HMM support and the environment variable HSA_XNACK needs to be set to 1 +int main() +{ + // Allocate memory for a, b, and c. + int *a = new int[1]; + int *b = new int[1]; + int *c = new int[1]; + + // Setup input values. + *a = 1; + *b = 2; + + // Launch add() kernel on GPU. + add<<<1, 1>>>(a, b, c); + + // Wait for GPU to finish before accessing on host. + HIP_CHECK(hipDeviceSynchronize()); + + // Print the result. + std::cout << *a << " + " << *b << " = " << *c << std::endl; + + // Cleanup allocated memory. + delete[] c; + delete[] b; + delete[] a; + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/static_shared_memory_device.hip b/docs/tools/example_codes/static_shared_memory_device.hip new file mode 100644 index 0000000000..0de0890a1f --- /dev/null +++ b/docs/tools/example_codes/static_shared_memory_device.hip @@ -0,0 +1,46 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "example_utils.hpp" + +#include + +#include +#include + +// [sphinx-start] +__global__ void kernel() +{ + __shared__ int array[128]; + __shared__ double result; +} +// [sphinx-end] + +int main() +{ + kernel<<<64, 512>>>(); + HIP_CHECK(hipPeekAtLastError()); + HIP_CHECK(hipDeviceSynchronize()); + + std::cout << "Success!" << std::endl; + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/static_unified_memory.hip b/docs/tools/example_codes/static_unified_memory.hip new file mode 100644 index 0000000000..e7fb92de12 --- /dev/null +++ b/docs/tools/example_codes/static_unified_memory.hip @@ -0,0 +1,65 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +// Declare a, b and c as static variables. +__managed__ int a, b, c; + +int main() +{ + // Setup input values. + a = 1; + b = 2; + + // Launch add() kernel on GPU. + add<<<1, 1>>>(&a, &b, &c); + + // Wait for GPU to finish before accessing on host. + HIP_CHECK(hipDeviceSynchronize()); + + // Print the result. + std::cout << a << " + " << b << " = " << c << std::endl; + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/stream_ordered_memory_allocation.hip b/docs/tools/example_codes/stream_ordered_memory_allocation.hip new file mode 100644 index 0000000000..5c63cf90dc --- /dev/null +++ b/docs/tools/example_codes/stream_ordered_memory_allocation.hip @@ -0,0 +1,85 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include + +#include +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if (status != hipSuccess) \ + { \ + std::cerr << "HIP error " << status \ + << ": " << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + std::exit(EXIT_FAILURE); \ + } \ +} + +// Kernel to perform some computation on allocated memory. +__global__ void myKernel(int* data, std::size_t numElements) +{ + int tid = threadIdx.x + blockIdx.x * blockDim.x; + if (tid < numElements) + { + data[tid] = tid * 2; + } +} + +int main() +{ + // Stream 0. + constexpr hipStream_t streamId = 0; + + // Allocate memory with stream ordered semantics. + constexpr std::size_t numElements = 1024; + int* devData; + HIP_CHECK(hipMallocAsync(reinterpret_cast(&devData), numElements * sizeof(*devData), streamId)); + + // Launch the kernel to perform computation. + dim3 blockSize(256); + dim3 gridSize((numElements + blockSize.x - 1) / blockSize.x); + myKernel<<>>(devData, numElements); + + // Copy data back to host. + int* hostData = new int[numElements]; + HIP_CHECK(hipMemcpy(hostData, devData, numElements * sizeof(*devData), hipMemcpyDeviceToHost)); + + // Print the array. + for (std::size_t i = 0; i < numElements; ++i) + std::cout << "Element " << i << ": " << hostData[i] << std::endl; + + // Free memory with stream ordered semantics. + HIP_CHECK(hipFreeAsync(devData, streamId)); + delete[] hostData; + + // Synchronize to ensure completion. + HIP_CHECK(hipDeviceSynchronize()); + + return EXIT_SUCCESS; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/template_warp_size_reduction.hip b/docs/tools/example_codes/template_warp_size_reduction.hip index 2d265080d9..81bd2e20bd 100644 --- a/docs/tools/example_codes/template_warp_size_reduction.hip +++ b/docs/tools/example_codes/template_warp_size_reduction.hip @@ -20,16 +20,23 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. +#include "popcount.hpp" + #include -#include + +#include +#include +#include #include -#include #include +#include +#include #define HIP_CHECK(expression) \ { \ const hipError_t status = expression; \ - if(status != hipSuccess){ \ + if(status != hipSuccess) \ + { \ std::cerr << "HIP error " \ << status << ": " \ << hipGetErrorString(status) \ @@ -39,169 +46,185 @@ } // [Sphinx template warp size block reduction kernel start] -template -using lane_mask_t = typename std::conditional::type; - -template -__global__ void block_reduce(int* input, lane_mask_t* mask, int* output, size_t size) { - extern __shared__ int shared[]; - - // Read of input with bounds check - auto read_global_safe = [&](const uint32_t i, const uint32_t lane_id, const uint32_t mask_id) - { - lane_mask_t warp_mask = lane_mask_t(1) << lane_id; - return (i < size) && (mask[mask_id] & warp_mask) ? input[i] : 0; - }; - - const uint32_t tid = threadIdx.x, - lid = threadIdx.x % WarpSize, - wid = threadIdx.x / WarpSize, - bid = blockIdx.x, - gid = bid * blockDim.x + tid; - - // Read input buffer to shared - shared[tid] = read_global_safe(gid, lid, bid * (blockDim.x / WarpSize) + wid); - __syncthreads(); - - // Shared reduction - for (uint32_t i = blockDim.x / 2; i >= WarpSize; i /= 2) - { - if (tid < i) - shared[tid] = shared[tid] + shared[tid + i]; +template +using lane_mask_t = typename std::conditional::type; + +template +__global__ void block_reduce(int* input, lane_mask_t* mask, int* output, size_t size) +{ + extern __shared__ int shared[]; + + // Read of input with bounds check + auto read_global_safe = [&](const std::uint32_t i, const std::uint32_t lane_id, const std::uint32_t mask_id) + { + lane_mask_t warp_mask = lane_mask_t(1) << lane_id; + return (i < size) && (mask[mask_id] & warp_mask) ? input[i] : 0; + }; + + const std::uint32_t tid = threadIdx.x, + lid = threadIdx.x % WarpSize, + wid = threadIdx.x / WarpSize, + bid = blockIdx.x, + gid = bid * blockDim.x + tid; + + // Read input buffer to shared + shared[tid] = read_global_safe(gid, lid, bid * (blockDim.x / WarpSize) + wid); __syncthreads(); - } - - // Use local variable in warp reduction - int result = shared[tid]; - __syncthreads(); - - // This loop would be unrolled the same with the runtime warpSize. - #pragma unroll - for (uint32_t i = WarpSize/2; i >= 1; i /= 2) { - result = result + __shfl_down(result, i); - } - - // Write result to output buffer - if (tid == 0) - output[bid] = result; -}; + + // Shared reduction + for (std::uint32_t i = blockDim.x / 2; i >= WarpSize; i /= 2) + { + if (tid < i) + shared[tid] = shared[tid] + shared[tid + i]; + __syncthreads(); + } + + // Use local variable in warp reduction + int result = shared[tid]; + __syncthreads(); + + // This loop would be unrolled the same with the runtime warpSize. + #pragma unroll + for (std::uint32_t i = WarpSize/2; i >= 1; i /= 2) + { + result = result + __shfl_down(result, i); + } + + // Write result to output buffer + if (tid == 0) + output[bid] = result; +} // [Sphinx template warp size block reduction kernel end] // [Sphinx template warp size mask generation start] -template +template void generate_and_copy_mask( - void *d_mask, - std::vector& vectorExpected, - int numOfBlocks, - int numberOfWarp, - int mask_size, - int mask_element_size) { - - std::random_device rd; - std::mt19937_64 eng(rd()); - - // Host side mask vector - std::vector> mask(mask_size); - // Define uniform unsigned int distribution - std::uniform_int_distribution> distr; - // Fill up the mask - for(int i=0; i < numOfBlocks; i++) { - int count = 0; - for(int j=0; j < numberOfWarp; j++) { - int mask_index = i * numberOfWarp + j; - mask[mask_index] = distr(eng); - if constexpr(WarpSize == 32) - count += __builtin_popcount(mask[mask_index]); - else - count += __builtin_popcountll(mask[mask_index]); + void *d_mask, + std::vector& vectorExpected, + int numOfBlocks, + int numberOfWarp, + int mask_size, + int mask_element_size) +{ + std::random_device rd; + std::mt19937_64 eng(rd()); + + // Host side mask vector + std::vector> mask(mask_size); + // Define uniform unsigned int distribution + std::uniform_int_distribution> distr; + // Fill up the mask + for(int i=0; i < numOfBlocks; i++) + { + int count = 0; + for(int j=0; j < numberOfWarp; j++) + { + int mask_index = i * numberOfWarp + j; + mask[mask_index] = distr(eng); + if constexpr(WarpSize == 32) + count += popcount(static_cast(mask[mask_index])); + else + count += popcount(mask[mask_index]); + } + vectorExpected[i]= count; } - vectorExpected[i]= count; - } - // Copy the mask array - HIP_CHECK(hipMemcpy(d_mask, mask.data(), mask_size * mask_element_size, hipMemcpyHostToDevice)); + // Copy the mask array + HIP_CHECK(hipMemcpy(d_mask, mask.data(), mask_size * mask_element_size, hipMemcpyHostToDevice)); } // [Sphinx template warp size mask generation end] -int main() { - - int deviceId = 0; - int warpSizeHost; - HIP_CHECK(hipDeviceGetAttribute(&warpSizeHost, hipDeviceAttributeWarpSize, deviceId)); - std::cout << "Warp size: " << warpSizeHost << std::endl; - - constexpr int numOfBlocks = 16; - constexpr int threadsPerBlock = 1024; - const int numberOfWarp = threadsPerBlock / warpSizeHost; - const int mask_element_size = warpSizeHost == 32 ? sizeof(uint32_t) : sizeof(uint64_t); - const int mask_size = numOfBlocks * numberOfWarp; - constexpr size_t arraySize = numOfBlocks * threadsPerBlock; - - int *d_data, *d_results; - void *d_mask; - int initValue = 1; - std::vector vectorInput(arraySize, initValue); - std::vector vectorOutput(numOfBlocks); - std::vector vectorExpected(numOfBlocks); - // Allocate device memory - HIP_CHECK(hipMalloc(&d_data, arraySize * sizeof(*d_data))); - HIP_CHECK(hipMalloc(&d_mask, mask_size * mask_element_size)); - HIP_CHECK(hipMalloc(&d_results, numOfBlocks * sizeof(*d_results))); - // Host to Device copy of the input array - HIP_CHECK(hipMemcpy(d_data, vectorInput.data(), arraySize * sizeof(*d_data), hipMemcpyHostToDevice)); +int main() +{ + int deviceId = 0; + int warpSizeHost; + HIP_CHECK(hipDeviceGetAttribute(&warpSizeHost, hipDeviceAttributeWarpSize, deviceId)); + std::cout << "Warp size: " << warpSizeHost << std::endl; + + constexpr int numOfBlocks = 16; + constexpr int threadsPerBlock = 1024; + const int numberOfWarp = threadsPerBlock / warpSizeHost; + const int mask_element_size = warpSizeHost == 32 ? sizeof(std::uint32_t) : sizeof(std::uint64_t); + const int mask_size = numOfBlocks * numberOfWarp; + constexpr std::size_t arraySize = numOfBlocks * threadsPerBlock; + + int *d_data, *d_results; + void *d_mask; + int initValue = 1; + std::vector vectorInput(arraySize, initValue); + std::vector vectorOutput(numOfBlocks); + std::vector vectorExpected(numOfBlocks); + // Allocate device memory + HIP_CHECK(hipMalloc(&d_data, arraySize * sizeof(*d_data))); + HIP_CHECK(hipMalloc(&d_mask, mask_size * mask_element_size)); + HIP_CHECK(hipMalloc(&d_results, numOfBlocks * sizeof(*d_results))); + // Host to Device copy of the input array + HIP_CHECK(hipMemcpy(d_data, vectorInput.data(), arraySize * sizeof(*d_data), hipMemcpyHostToDevice)); - // [Sphinx template warp size select kernel start] - // Fill up the mask variable, copy to device and select the right kernel. - if(warpSizeHost == 32) { - // Generate and copy mask arrays - generate_and_copy_mask<32>(d_mask, vectorExpected, numOfBlocks, numberOfWarp, mask_size, mask_element_size); - - // Start the kernel - block_reduce<32><<>>( - d_data, - static_cast(d_mask), - d_results, - arraySize); - } else if(warpSizeHost == 64) { - // Generate and copy mask arrays - generate_and_copy_mask<64>(d_mask, vectorExpected, numOfBlocks, numberOfWarp, mask_size, mask_element_size); - - // Start the kernel - block_reduce<64><<>>( - d_data, - static_cast(d_mask), - d_results, - arraySize); - } else { - std::cerr << "Unsupported warp size." << std::endl; - return 0; - } - // [Sphinx template warp size select kernel end] - - // Check the kernel launch - HIP_CHECK(hipGetLastError()); - // Check for kernel execution error - HIP_CHECK(hipDeviceSynchronize()); - // Device to Host copy of the result - HIP_CHECK(hipMemcpy(vectorOutput.data(), d_results, numOfBlocks * sizeof(*d_results), hipMemcpyDeviceToHost)); - - // Verify results - bool passed = true; - for(size_t i = 0; i < numOfBlocks; ++i) { - if(vectorOutput[i] != vectorExpected[i]) { - passed = false; - std::cerr << "Validation failed! Expected " << vectorExpected[i] << " got " << vectorOutput[i] << " at index: " << i << std::endl; + // [Sphinx template warp size select kernel start] + // Fill up the mask variable, copy to device and select the right kernel. + if(warpSizeHost == 32) + { + // Generate and copy mask arrays + generate_and_copy_mask<32>(d_mask, vectorExpected, numOfBlocks, numberOfWarp, mask_size, mask_element_size); + + // Start the kernel + block_reduce<32><<>>( + d_data, + static_cast(d_mask), + d_results, + arraySize); + } + else if(warpSizeHost == 64) + { + // Generate and copy mask arrays + generate_and_copy_mask<64>(d_mask, vectorExpected, numOfBlocks, numberOfWarp, mask_size, mask_element_size); + + // Start the kernel + block_reduce<64><<>>( + d_data, + static_cast(d_mask), + d_results, + arraySize); + } + else + { + std::cerr << "Unsupported warp size." << std::endl; + return EXIT_FAILURE; + } + // [Sphinx template warp size select kernel end] + + // Check the kernel launch + HIP_CHECK(hipGetLastError()); + // Check for kernel execution error + HIP_CHECK(hipDeviceSynchronize()); + // Device to Host copy of the result + HIP_CHECK(hipMemcpy(vectorOutput.data(), d_results, numOfBlocks * sizeof(*d_results), hipMemcpyDeviceToHost)); + + // Verify results + bool passed = true; + for(std::size_t i = 0; i < numOfBlocks; ++i) + { + if(vectorOutput[i] != vectorExpected[i]) + { + passed = false; + std::cerr << "Validation failed! Expected " << vectorExpected[i] + << " got " << vectorOutput[i] << " at index: " << i << std::endl; + } } - } - if(passed){ - std::cout << "Execution completed successfully." << std::endl; - }else{ - std::cerr << "Execution failed." << std::endl; - } - - // Cleanup - HIP_CHECK(hipFree(d_data)); - HIP_CHECK(hipFree(d_mask)); - HIP_CHECK(hipFree(d_results)); - return 0; -} \ No newline at end of file + + if(passed) + { + std::cout << "Execution completed successfully." << std::endl; + } + else + { + std::cerr << "Execution failed." << std::endl; + } + + // Cleanup + HIP_CHECK(hipFree(d_data)); + HIP_CHECK(hipFree(d_mask)); + HIP_CHECK(hipFree(d_results)); + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/timer.hip b/docs/tools/example_codes/timer.hip new file mode 100644 index 0000000000..5ed2af535b --- /dev/null +++ b/docs/tools/example_codes/timer.hip @@ -0,0 +1,66 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t status = expression; \ + if(status != hipSuccess) \ + { \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ +} + +// [sphinx-kernel-start] +__global__ void kernel() +{ + long long int start = clock64(); + // kernel code + long long int stop = clock64(); + long long int cycles = stop - start; +} +// [sphinx-kernel-end] + +int main() +{ + int deviceId = 0; + + // [sphinx-query-start] + int wallClkRate = 0; //in kilohertz + HIP_CHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId)); + // [sphinx-query-end] + + kernel<<>>(); + HIP_CHECK(hipDeviceSynchronize()); + + std::cout << "Device's wall clock rate is " << wallClkRate << " kHz." << std::endl; + + return EXIT_SUCCESS; +} diff --git a/docs/tools/example_codes/unified_memory_advice.hip b/docs/tools/example_codes/unified_memory_advice.hip new file mode 100644 index 0000000000..f2ef8549b6 --- /dev/null +++ b/docs/tools/example_codes/unified_memory_advice.hip @@ -0,0 +1,89 @@ +// MIT License +// +// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// [sphinx-start] +#include +#include + +#define HIP_CHECK(expression) \ +{ \ + const hipError_t err = expression; \ + if(err != hipSuccess) \ + { \ + std::cerr << "HIP error: " \ + << hipGetErrorString(err) \ + << " at " << __LINE__ << "\n"; \ + } \ +} + +// Addition of two values. +__global__ void add(int *a, int *b, int *c) +{ + *c = *a + *b; +} + +int main() +{ + int deviceId; + HIP_CHECK(hipGetDevice(&deviceId)); + int *a, *b, *c; + + // Allocate memory for a, b, and c accessible to both device and host codes. + HIP_CHECK(hipMallocManaged(&a, sizeof(*a))); + HIP_CHECK(hipMallocManaged(&b, sizeof(*b))); + HIP_CHECK(hipMallocManaged(&c, sizeof(*c))); + + // Set memory advice for a and b to be read, located on and accessed by the GPU. + HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetPreferredLocation, deviceId)); + HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetAccessedBy, deviceId)); + HIP_CHECK(hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, deviceId)); + + HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetPreferredLocation, deviceId)); + HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetAccessedBy, deviceId)); + HIP_CHECK(hipMemAdvise(b, sizeof(*b), hipMemAdviseSetReadMostly, deviceId)); + + // Set memory advice for c to be read, located on and accessed by the CPU. + HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetPreferredLocation, hipCpuDeviceId)); + HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetAccessedBy, hipCpuDeviceId)); + HIP_CHECK(hipMemAdvise(c, sizeof(*c), hipMemAdviseSetReadMostly, hipCpuDeviceId)); + + // Setup input values. + *a = 1; + *b = 2; + + // Launch add() kernel on GPU. + add<<<1, 1>>>(a, b, c); + + // Wait for GPU to finish before accessing on host. + HIP_CHECK(hipDeviceSynchronize()); + + // Prints the result. + std::cout << *a << " + " << *b << " = " << *c << std::endl; + + // Cleanup allocated memory. + HIP_CHECK(hipFree(a)); + HIP_CHECK(hipFree(b)); + HIP_CHECK(hipFree(c)); + + return 0; +} +// [sphinx-end] diff --git a/docs/tools/example_codes/warp_size_reduction.hip b/docs/tools/example_codes/warp_size_reduction.hip index 0be830ff0e..7ed03a6595 100644 --- a/docs/tools/example_codes/warp_size_reduction.hip +++ b/docs/tools/example_codes/warp_size_reduction.hip @@ -20,16 +20,23 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. +#include "popcount.hpp" + #include -#include + +#include +#include +#include #include -#include #include +#include +#include #define HIP_CHECK(expression) \ { \ const hipError_t status = expression; \ - if(status != hipSuccess){ \ + if(status != hipSuccess) \ + { \ std::cerr << "HIP error " \ << status << ": " \ << hipGetErrorString(status) \ @@ -39,146 +46,164 @@ } // [Sphinx HIP warp size block reduction kernel start] -__global__ void block_reduce(int* input, uint64_t* mask, int* output, size_t size){ - extern __shared__ int shared[]; - // Read of input with bounds check - auto read_global_safe = [&](const uint32_t i, const uint32_t lane_id, const uint32_t mask_id) - { - uint64_t warp_mask = 1ull << lane_id; - return (i < size) && (mask[mask_id] & warp_mask) ? input[i] : 0; - }; - const uint32_t tid = threadIdx.x, - lid = threadIdx.x % warpSize, - wid = threadIdx.x / warpSize, - bid = blockIdx.x, - gid = bid * blockDim.x + tid; - // Read input buffer to shared - shared[tid] = read_global_safe(gid, lid, bid * (blockDim.x / warpSize) + wid); - __syncthreads(); - // Shared reduction - for (uint32_t i = blockDim.x / 2; i >= warpSize; i /= 2) - { - if (tid < i) - shared[tid] = shared[tid] + shared[tid + i]; +__global__ void block_reduce(int* input, std::uint64_t* mask, int* output, std::size_t size) +{ + extern __shared__ int shared[]; + // Read of input with bounds check + auto read_global_safe = [&](const std::uint32_t i, const std::uint32_t lane_id, const std::uint32_t mask_id) + { + std::uint64_t warp_mask = 1ull << lane_id; + return (i < size) && (mask[mask_id] & warp_mask) ? input[i] : 0; + }; + + const std::uint32_t tid = threadIdx.x, + lid = threadIdx.x % warpSize, + wid = threadIdx.x / warpSize, + bid = blockIdx.x, + gid = bid * blockDim.x + tid; + + // Read input buffer to shared + shared[tid] = read_global_safe(gid, lid, bid * (blockDim.x / warpSize) + wid); __syncthreads(); - } - - // Use local variable in warp reduction - int result = shared[tid]; - __syncthreads(); - - // This loop would be unrolled the same with the compile-time WarpSize. - #pragma unroll - for (uint32_t i = warpSize/2; i >= 1; i /= 2) { - result = result + __shfl_down(result, i); - } - - // Write result to output buffer - if (tid == 0) - output[bid] = result; -}; + + // Shared reduction + for (std::uint32_t i = blockDim.x / 2; i >= warpSize; i /= 2) + { + if (tid < i) + shared[tid] = shared[tid] + shared[tid + i]; + __syncthreads(); + } + + // Use local variable in warp reduction + int result = shared[tid]; + __syncthreads(); + + // This loop would be unrolled the same with the compile-time WarpSize. + #pragma unroll + for (std::uint32_t i = warpSize/2; i >= 1; i /= 2) { + result = result + __shfl_down(result, i); + } + + // Write result to output buffer + if (tid == 0) + output[bid] = result; +} // [Sphinx HIP warp size block reduction kernel end] // [Sphinx HIP warp size mask generation start] void generate_and_copy_mask( - uint64_t *d_mask, - std::vector& vectorExpected, - int warpSizeHost, - int numOfBlocks, - int numberOfWarp, - int mask_size, - int mask_element_size) { - - std::random_device rd; - std::mt19937_64 eng(rd()); - - // Host side mask vector - std::vector mask(mask_size); - // Define uniform unsigned int distribution - std::uniform_int_distribution distr; - // Fill up the mask - for(int i=0; i < numOfBlocks; i++) { - int count = 0; - for(int j=0; j < numberOfWarp; j++) { - int mask_index = i * numberOfWarp + j; - mask[mask_index] = distr(eng); - if(warpSizeHost == 32) - count += __builtin_popcount(mask[mask_index]); - else - count += __builtin_popcountll(mask[mask_index]); + std::uint64_t *d_mask, + std::vector& vectorExpected, + int warpSizeHost, + int numOfBlocks, + int numberOfWarp, + int mask_size, + int mask_element_size) +{ + std::random_device rd; + std::mt19937_64 eng(rd()); + + // Host side mask vector + std::vector mask(mask_size); + // Define uniform unsigned int distribution + std::uniform_int_distribution distr; + // Fill up the mask + for(int i=0; i < numOfBlocks; i++) + { + int count = 0; + for(int j=0; j < numberOfWarp; j++) + { + int mask_index = i * numberOfWarp + j; + mask[mask_index] = distr(eng); + if(warpSizeHost == 32) + count += popcount(static_cast(mask[mask_index])); + else + count += popcount(mask[mask_index]); + } + vectorExpected[i]= count; } - vectorExpected[i]= count; - } - // Copy the mask array - HIP_CHECK(hipMemcpy(d_mask, mask.data(), mask_size * mask_element_size, hipMemcpyHostToDevice)); + // Copy the mask array + HIP_CHECK(hipMemcpy(d_mask, mask.data(), mask_size * mask_element_size, hipMemcpyHostToDevice)); } // [Sphinx HIP warp size mask generation end] -int main() { - int deviceId = 0; - int warpSizeHost; - HIP_CHECK(hipDeviceGetAttribute(&warpSizeHost, hipDeviceAttributeWarpSize, deviceId)); - std::cout << "Warp size: " << warpSizeHost << std::endl; - constexpr int numOfBlocks = 16; - constexpr int threadsPerBlock = 1024; - const int numberOfWarp = threadsPerBlock / warpSizeHost; - const int mask_element_size = sizeof(uint64_t); - const int mask_size = numOfBlocks * numberOfWarp; - constexpr size_t arraySize = numOfBlocks * threadsPerBlock; - int *d_data, *d_results; - uint64_t *d_mask; - int initValue = 1; - std::vector vectorInput(arraySize, initValue); - std::vector vectorOutput(numOfBlocks); - std::vector vectorExpected(numOfBlocks); - // Allocate device memory - HIP_CHECK(hipMalloc(&d_data, arraySize * sizeof(*d_data))); - HIP_CHECK(hipMalloc(&d_mask, mask_size * mask_element_size)); - HIP_CHECK(hipMalloc(&d_results, numOfBlocks * sizeof(*d_results))); - // Host to Device copy of the input array - HIP_CHECK(hipMemcpy(d_data, vectorInput.data(), arraySize * sizeof(*d_data), hipMemcpyHostToDevice)); +int main() +{ + int deviceId = 0; + int warpSizeHost; + HIP_CHECK(hipDeviceGetAttribute(&warpSizeHost, hipDeviceAttributeWarpSize, deviceId)); + std::cout << "Warp size: " << warpSizeHost << std::endl; + + constexpr int numOfBlocks = 16; + constexpr int threadsPerBlock = 1024; + const int numberOfWarp = threadsPerBlock / warpSizeHost; + const int mask_element_size = sizeof(std::uint64_t); + const int mask_size = numOfBlocks * numberOfWarp; + constexpr std::size_t arraySize = numOfBlocks * threadsPerBlock; + + int *d_data, *d_results; + std::uint64_t *d_mask; + int initValue = 1; + std::vector vectorInput(arraySize, initValue); + std::vector vectorOutput(numOfBlocks); + std::vector vectorExpected(numOfBlocks); + // Allocate device memory + HIP_CHECK(hipMalloc(&d_data, arraySize * sizeof(*d_data))); + HIP_CHECK(hipMalloc(&d_mask, mask_size * mask_element_size)); + HIP_CHECK(hipMalloc(&d_results, numOfBlocks * sizeof(*d_results))); + // Host to Device copy of the input array + HIP_CHECK(hipMemcpy(d_data, vectorInput.data(), arraySize * sizeof(*d_data), hipMemcpyHostToDevice)); - // [Sphinx HIP warp size select kernel start] - // Generate and copy mask arrays - generate_and_copy_mask( - d_mask, - vectorExpected, - warpSizeHost, - numOfBlocks, - numberOfWarp, - mask_size, - mask_element_size); - - // Start the kernel - block_reduce<<>>( - d_data, - d_mask, - d_results, - arraySize); - // [Sphinx HIP warp size select kernel end] - - // Check the kernel launch - HIP_CHECK(hipGetLastError()); - // Check for kernel execution error - HIP_CHECK(hipDeviceSynchronize()); - // Device to Host copy of the result - HIP_CHECK(hipMemcpy(vectorOutput.data(), d_results, numOfBlocks * sizeof(*d_results), hipMemcpyDeviceToHost)); - // Verify results - bool passed = true; - for(size_t i = 0; i < numOfBlocks; ++i) { - if(vectorOutput[i] != vectorExpected[i]) { - passed = false; - std::cerr << "Validation failed! Expected " << vectorExpected[i] << " got " << vectorOutput[i] << " at index: " << i << std::endl; + // [Sphinx HIP warp size select kernel start] + // Generate and copy mask arrays + generate_and_copy_mask( + d_mask, + vectorExpected, + warpSizeHost, + numOfBlocks, + numberOfWarp, + mask_size, + mask_element_size); + + // Start the kernel + block_reduce<<>>( + d_data, + d_mask, + d_results, + arraySize); + // [Sphinx HIP warp size select kernel end] + + // Check the kernel launch + HIP_CHECK(hipGetLastError()); + // Check for kernel execution error + HIP_CHECK(hipDeviceSynchronize()); + // Device to Host copy of the result + HIP_CHECK(hipMemcpy(vectorOutput.data(), d_results, numOfBlocks * sizeof(*d_results), hipMemcpyDeviceToHost)); + + // Verify results + bool passed = true; + for(std::size_t i = 0; i < numOfBlocks; ++i) + { + if(vectorOutput[i] != vectorExpected[i]) + { + passed = false; + std::cerr << "Validation failed! Expected " << vectorExpected[i] + << " got " << vectorOutput[i] << " at index: " << i << std::endl; + } + } + + if(passed) + { + std::cout << "Execution completed successfully." << std::endl; } - } - if(passed){ - std::cout << "Execution completed successfully." << std::endl; - }else{ - std::cerr << "Execution failed." << std::endl; - } - // Cleanup - HIP_CHECK(hipFree(d_data)); - HIP_CHECK(hipFree(d_mask)); - HIP_CHECK(hipFree(d_results)); - return 0; -} \ No newline at end of file + else + { + std::cerr << "Execution failed." << std::endl; + } + + // Cleanup + HIP_CHECK(hipFree(d_data)); + HIP_CHECK(hipFree(d_mask)); + HIP_CHECK(hipFree(d_results)); + return EXIT_SUCCESS; +} diff --git a/docs/tools/update_example_codes.py b/docs/tools/update_example_codes.py index 0b8bacf7c3..480fe12f5e 100644 --- a/docs/tools/update_example_codes.py +++ b/docs/tools/update_example_codes.py @@ -21,5 +21,300 @@ import urllib.request -urllib.request.urlretrieve("https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/develop/HIP-Basic/opengl_interop/main.hip", "docs/tools/example_codes/opengl_interop.hip") -urllib.request.urlretrieve("https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/develop/HIP-Basic/vulkan_interop/main.hip", "docs/tools/example_codes/external_interop.hip") +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Basic/opengl_interop/main.hip", + "docs/tools/example_codes/opengl_interop.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Basic/vulkan_interop/main.hip", + "docs/tools/example_codes/external_interop.hip" +) + +# HIP-C%2B%2B-Language-Extensions +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/calling_global_functions/main.hip", + "docs/tools/example_codes/calling_global_functions.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/extern_shared_memory/main.hip", + "docs/tools/example_codes/extern_shared_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/launch_bounds/main.hip", + "docs/tools/example_codes/launch_bounds.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/set_constant_memory/main.hip", + "docs/tools/example_codes/set_constant_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/template_warp_size_reduction/main.hip", + "docs/tools/example_codes/template_warp_size_reduction.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/timer/main.hip", + "docs/tools/example_codes/timer.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-C%2B%2B-Language-Extensions/warp_size_reduction/main.hip", + "docs/tools/example_codes/warp_size_reduction.hip" +) + +# HIP-Porting-Guide +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-Porting-Guide/device_code_feature_identification/main.hip", + "docs/tools/example_codes/device_code_feature_identification.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-Porting-Guide/host_code_feature_identification/main.cpp", + "docs/tools/example_codes/host_code_feature_identification.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-Porting-Guide/identifying_compilation_target_platform/main.cpp", + "docs/tools/example_codes/identifying_compilation_target_platform.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/HIP-Porting-Guide/identifying_host_device_compilation_pass/main.hip", + "docs/tools/example_codes/identifying_host_device_compilation_pass.hip" +) + +# Introduction-to-the-HIP-Programming-Model +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Introduction-to-the-HIP-Programming-Model/add_kernel/main.hip", + "docs/tools/example_codes/add_kernel.hip" +) + +# Porting-CUDA-Driver-API +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Porting-CUDA-Driver-API/load_module/main.cpp", + "docs/tools/example_codes/load_module.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Porting-CUDA-Driver-API/load_module_ex/main.cpp", + "docs/tools/example_codes/load_module_ex.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Porting-CUDA-Driver-API/load_module_ex_cuda/main.cpp", + "docs/tools/example_codes/load_module_ex_cuda.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Porting-CUDA-Driver-API/per_thread_default_stream/main.cpp", + "docs/tools/example_codes/per_thread_default_stream.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Porting-CUDA-Driver-API/pointer_memory_type/main.cpp", + "docs/tools/example_codes/pointer_memory_type.cpp" +) + +# Programming-for-HIP-Runtime-Compiler +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/compilation_apis/main.cpp", + "docs/tools/example_codes/compilation_apis.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/linker_apis/main.cpp", + "docs/tools/example_codes/linker_apis.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/linker_apis_file/main.cpp", + "docs/tools/example_codes/linker_apis_file.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/linker_apis_options/main.cpp", + "docs/tools/example_codes/linker_apis_options.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/lowered_names/main.cpp", + "docs/tools/example_codes/lowered_names.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Programming-for-HIP-Runtime-Compiler/rtc_error_handling/main.cpp", + "docs/tools/example_codes/rtc_error_handling.cpp" +) + +# Using-HIP-Runtime-API +# Using-HIP-Runtime-API/Asynchronous-Concurrent-Execution +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Asynchronous-Concurrent-Execution/async_kernel_execution/main.hip", + "docs/tools/example_codes/async_kernel_execution.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Asynchronous-Concurrent-Execution/event_based_synchronization/main.hip", + "docs/tools/example_codes/event_based_synchronization.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/refs/heads/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Asynchronous-Concurrent-Execution/sequential_kernel_execution/main.hip", + "docs/tools/example_codes/sequential_kernel_execution.hip" +) + +# Using-HIP-Runtime-API / Call-Stack +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Call-Stack/call_stack_management/main.cpp", + "docs/tools/example_codes/call_stack_management.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Call-Stack/device_recursion/main.hip", + "docs/tools/example_codes/device_recursion.hip" +) + +# Using-HIP-Runtime-API / Error-Handling +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Error-Handling/error_handling/main.hip", + "docs/tools/example_codes/error_handling.hip" +) + +# Using-HIP-Runtime-API / HIP-Graphs +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/HIP-Graphs/graph_capture/main.hip", + "docs/tools/example_codes/graph_capture.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/HIP-Graphs/graph_creation/main.hip", + "docs/tools/example_codes/graph_creation.hip" +) + +# Using-HIP-Runtime-API / Initialization +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Initialization/simple_device_query/main.cpp", + "docs/tools/example_codes/simple_device_query.cpp" +) + +# Using-HIP-Runtime-API / Memory-Management / Device-Memory +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Device-Memory/constant_memory/main.hip", + "docs/tools/example_codes/constant_memory_device.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Device-Memory/dynamic_shared_memory/main.hip", + "docs/tools/example_codes/dynamic_shared_memory_device.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Device-Memory/explicit_copy/main.cpp", + "docs/tools/example_codes/explicit_copy.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Device-Memory/kernel_memory_allocation/main.hip", + "docs/tools/example_codes/kernel_memory_allocation.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Device-Memory/static_shared_memory/main.hip", + "docs/tools/example_codes/static_shared_memory_device.hip" +) + +# Using-HIP-Runtime-API / Memory-Management / Host-Memory +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Host-Memory/pageable_host_memory/main.cpp", + "docs/tools/example_codes/pageable_host_memory.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Host-Memory/pinned_host_memory/main.cpp", + "docs/tools/example_codes/pinned_host_memory.cpp" +) + +# Using-HIP-Runtime-API / Memory-Management / SOMA +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/stream_ordered_memory_allocation/main.hip", + "docs/tools/example_codes/stream_ordered_memory_allocation.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/ordinary_memory_allocation/main.hip", + "docs/tools/example_codes/ordinary_memory_allocation.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/memory_pool/main.hip", + "docs/tools/example_codes/memory_pool.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/memory_pool_resource_usage_statistics/main.cpp", + "docs/tools/example_codes/memory_pool_resource_usage_statistics.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/memory_pool_threshold/main.hip", + "docs/tools/example_codes/memory_pool_threshold.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/SOMA/memory_pool_trim/main.cpp", + "docs/tools/example_codes/memory_pool_trim.cpp" +) + +# Using-HIP-Runtime-API / Memory-Management / Unified-Memory-Management +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/data_prefetching/main.hip", + "docs/tools/example_codes/data_prefetching.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/dynamic_unified_memory/main.hip", + "docs/tools/example_codes/dynamic_unified_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/explicit_memory/main.hip", + "docs/tools/example_codes/explicit_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/memory_range_attributes/main.hip", + "docs/tools/example_codes/memory_range_attributes.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/standard_unified_memory/main.hip", + "docs/tools/example_codes/standard_unified_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/static_unified_memory/main.hip", + "docs/tools/example_codes/static_unified_memory.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Memory-Management/Unified-Memory-Management/unified_memory_advice/main.hip", + "docs/tools/example_codes/unified_memory_advice.hip" +) + +# Using-HIP-Runtime-API / Multi-Device-Management +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/device_enumeration/main.cpp", + "docs/tools/example_codes/device_enumeration.cpp" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/device_selection/main.hip", + "docs/tools/example_codes/device_selection.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/multi_device_synchronization/main.hip", + "docs/tools/example_codes/multi_device_synchronization.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/p2p_memory_access/main.hip", + "docs/tools/example_codes/p2p_memory_access.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/p2p_memory_access_failed/main.hip", + "docs/tools/example_codes/p2p_memory_access_failed.hip" +) + +# Reference examples from HIP-Doc / Reference + +# CUDA-to-HIP-API-Function-Comparison +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Reference/CUDA-to-HIP-API-Function-Comparison/block_reduction/main.cu", + "docs/tools/example_codes/block_reduction.cu" +) + +# HIP-Complex-Math-API +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Reference/HIP-Complex-Math-API/complex_math/main.hip", + "docs/tools/example_codes/complex_math.hip" +) + +# HIP-Math-API +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Reference/HIP-Math-API/math/main.hip", + "docs/tools/example_codes/math.hip" +) + +# Low-Precision-Floating-Point-Types +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Reference/Low-Precision-Floating-Point-Types/low_precision_float_fp8/main.hip", + "docs/tools/example_codes/low_precision_float_fp8.hip" +) +urllib.request.urlretrieve( + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Reference/Low-Precision-Floating-Point-Types/low_precision_float_fp16/main.hip", + "docs/tools/example_codes/low_precision_float_fp16.hip" +) From 7206a3398430490fb1837bc782e757737ed4184e Mon Sep 17 00:00:00 2001 From: Jan Stephan Date: Tue, 23 Sep 2025 14:53:55 +0200 Subject: [PATCH 3/4] Correct P2P memory access section Signed-off-by: Jan Stephan --- docs/how-to/hip_runtime_api/multi_device.rst | 11 +++++++---- ..._failed.hip => p2p_memory_access_host_staging.hip} | 10 +++------- docs/tools/update_example_codes.py | 4 ++-- 3 files changed, 12 insertions(+), 13 deletions(-) rename docs/tools/example_codes/{p2p_memory_access_failed.hip => p2p_memory_access_host_staging.hip} (90%) diff --git a/docs/how-to/hip_runtime_api/multi_device.rst b/docs/how-to/hip_runtime_api/multi_device.rst index 3facb80f65..b8ee0afb8e 100644 --- a/docs/how-to/hip_runtime_api/multi_device.rst +++ b/docs/how-to/hip_runtime_api/multi_device.rst @@ -73,7 +73,10 @@ applications that require frequent data exchange between GPUs, as it eliminates the need to transfer data through the host memory. By adding peer-to-peer access to the example referenced in -:ref:`multi_device_selection`, data can be copied between devices: +:ref:`multi_device_selection`, data can be efficiently copied between devices. +If peer-to-peer access is not activated, the call to :cpp:func:`hipMemcpy` +still works but internally uses a staging buffer in host memory, which incurs a +performance penalty. .. tab-set:: @@ -82,13 +85,13 @@ By adding peer-to-peer access to the example referenced in .. literalinclude:: ../../tools/example_codes/p2p_memory_access.hip :start-after: // [sphinx-start] :end-before: // [sphinx-end] - :emphasize-lines: 31-37, 51-55 + :emphasize-lines: 43-49, 63-67 :language: cpp .. tab-item:: without peer-to-peer - .. literalinclude:: ../../tools/example_codes/p2p_memory_access.hip + .. literalinclude:: ../../tools/example_codes/p2p_memory_access_host_staging.hip :start-after: // [sphinx-start] :end-before: // [sphinx-end] - :emphasize-lines: 43-49, 53, 58 + :emphasize-lines: 55-57 :language: cpp diff --git a/docs/tools/example_codes/p2p_memory_access_failed.hip b/docs/tools/example_codes/p2p_memory_access_host_staging.hip similarity index 90% rename from docs/tools/example_codes/p2p_memory_access_failed.hip rename to docs/tools/example_codes/p2p_memory_access_host_staging.hip index e56038ba71..e844c90a35 100644 --- a/docs/tools/example_codes/p2p_memory_access_failed.hip +++ b/docs/tools/example_codes/p2p_memory_access_host_staging.hip @@ -53,7 +53,7 @@ int main() if(deviceCount < 2) { std::cout << "This example requires at least two HIP devices." << std::endl; - return EXIT_FAILURE; + return EXIT_SUCCESS; } double* deviceData0; @@ -75,13 +75,9 @@ int main() simpleKernel<<<1000, 128>>>(deviceData1); // Launch kernel on device 1 HIP_CHECK(hipDeviceSynchronize()); - // Attempt to use deviceData0 on device 1 (This will not work as deviceData0 is allocated on device 0) + // Use deviceData0 on device 1. This works but incurs a performance penalty. HIP_CHECK(hipSetDevice(deviceId1)); - hipError_t err = hipMemcpy(deviceData1, deviceData0, size, hipMemcpyDeviceToDevice); // This should fail - if (err != hipSuccess) - { - std::cout << "Error: Cannot access deviceData0 from device 1, deviceData0 is on device 0" << std::endl; - } + HIP_CHECK(hipMemcpy(deviceData1, deviceData0, size, hipMemcpyDeviceToDevice)); // Copy result from device 0 double hostData0[1024]; diff --git a/docs/tools/update_example_codes.py b/docs/tools/update_example_codes.py index 480fe12f5e..278020d1a7 100644 --- a/docs/tools/update_example_codes.py +++ b/docs/tools/update_example_codes.py @@ -285,8 +285,8 @@ "docs/tools/example_codes/p2p_memory_access.hip" ) urllib.request.urlretrieve( - "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/p2p_memory_access_failed/main.hip", - "docs/tools/example_codes/p2p_memory_access_failed.hip" + "https://raw.githubusercontent.com/ROCm/rocm-examples/amd-staging/HIP-Doc/Programming-Guide/Using-HIP-Runtime-API/Multi-Device-Management/p2p_memory_access_host_staging/main.hip", + "docs/tools/example_codes/p2p_memory_access_host_staging.hip" ) # Reference examples from HIP-Doc / Reference From 07e77e942fffedd98f74599e8a42e1da8c013917 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Thu, 9 Oct 2025 15:43:08 +0200 Subject: [PATCH 4/4] WIP --- .github/workflows/linting.yml | 2 +- .gitignore | 1 - .vale.ini | 58 +++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 2 deletions(-) create mode 100644 .vale.ini diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 46cf268483..c5582cb01b 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -17,4 +17,4 @@ on: jobs: call-workflow-passing-data: name: Documentation - uses: ROCm/rocm-docs-core/.github/workflows/linting.yml@develop + uses: ROCm/rocm-docs-core/.github/workflows/linting.yml@vale_check diff --git a/.gitignore b/.gitignore index ffb0b5f8c0..95b93388d8 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,3 @@ -.* !.gitignore !.spellcheck.local.yaml *.o diff --git a/.vale.ini b/.vale.ini new file mode 100644 index 0000000000..06ac8fb096 --- /dev/null +++ b/.vale.ini @@ -0,0 +1,58 @@ +# ========================================== +# Vale configuration for Markdown + RST +# ========================================== + +# Path to custom or downloaded style packages +# You can point to `.github/styles` or another shared directory +StylesPath = .github/styles + +# The minimum alert level to display +# (suggestion, warning, or error) +MinAlertLevel = suggestion + +# By default, Vale will lint all recognized file types. +# You can override or specify formats here. +[*.{md,rst}] +BasedOnStyles = Vale, Google, Microsoft + +# ========================================== +# Markdown-specific rules +# ========================================== +[*.md] +# You can disable or tweak specific Markdown rules +# Examples: +TokenIgnores = (\{\{.*\}\}) # Ignore templating syntax +BlockIgnores = (?s)```.*?``` # Ignore fenced code blocks + +# Customize rules if needed +# Example: disable long sentence warnings +Vale.Terms = YES +Google.Headings = YES +Google.FirstPerson = NO +Google.We = NO +Google.Passive = NO + +# ========================================== +# RST-specific rules +# ========================================== +[*.rst] +# Ensure docutils is installed for parsing in CI +# Disable Markdown-specific rules if they trigger false positives +TokenIgnores = (:ref:`.*`|:doc:`.*`|``.*``) +BlockIgnores = (?s)\.\..*::.*\n(?:[ \t]+.*\n)* + +BasedOnStyles = Vale, Google, Microsoft +Google.Headings = NO # RST doesn't use Markdown-style headings +Google.We = NO +Google.Passive = YES +Microsoft.Spacing = YES +Microsoft.Acronyms = YES + +# ========================================== +# File-specific exclusions (optional) +# ========================================== +[CHANGELOG.md] +BasedOnStyles = Vale # Skip strict style rules for changelogs + +[README.md] +BasedOnStyles = Vale, Google \ No newline at end of file