Skip to content

[BUG] Error profiling CUDA 13 #715

@gianfi12

Description

@gianfi12

Describe the bug
I have tried LIKWID to profile CUDA executable. Althought the recent CUDA version (>=13.0) seems to broken LIKWID. I have tried both on latest Fedora 43 with CUDA 13.1 and driver version 580, and CentOS 10 with CUDA 13.1 and driver version 590.
In both cases I get the same error
DEBUG - [nvmon_perfworks_addEventSet:1929] Add events to GPU device 0 with context 0xf782cc0 DEBUG - [perfworks_check_nv_context:881] Current context 0xf782cc0 DevContext 0xf782cc0 DEBUG - [perfworks_check_nv_context:908] Context 0xf782cc0 fits for device 0 DEBUG - [nvmon_perfworks_addEventSet:1958] SMSP_SASS_THREAD_INST_EXECUTED_OP_FADD_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1964] Adding real event smsp__sass_thread_inst_executed_op_fadd_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1958] SMSP_SASS_THREAD_INST_EXECUTED_OP_FMUL_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1964] Adding real event smsp__sass_thread_inst_executed_op_fmul_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1958] SMSP_SASS_THREAD_INST_EXECUTED_OP_FFMA_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1964] Adding real event smsp__sass_thread_inst_executed_op_ffma_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1986] Increase size of eventSet space on device 0 DEBUG - [nvmon_perfworks_addEventSet:1999] Filling eventset 1 on device 0 ERROR - [./src/includes/nvmon_perfworks.h:nvmon_perfworks_addEventSet:2020] Success. Error: function cuptiProfilerGetCounterAvailability failed with error: 'CUPTI_ERROR_INVALID_PARAMETER' (CUptiResult=1) ERROR - [/opt/packages/likwid/src/nvmon.c:nvmon_addEventSet:665] Operation not permitted. Failed to add event set for GPU 0 Error setting up GPU Marker API. DEBUG - [perfworks_check_nv_context:881] Current context 0xf782cc0 DevContext 0xf782cc0 DEBUG - [perfworks_check_nv_context:908] Context 0xf782cc0 fits for device 0 DEBUG - [nvmon_perfworks_startCounters:2399] Start Counters on device 0 (Eventset 0) DEBUG - [nvmon_perfworks_startCounters:2415] (START)counterDataImageSize 9899 DEBUG - [nvmon_perfworks_startCounters:2417] (START)counterDataScratchBufferSize 40 ERROR - [./src/includes/nvmon_perfworks.h:nvmon_perfworks_startCounters:2419] Operation not permitted. Error: function cuptiProfilerBeginSession failed with error: 'CUPTI_ERROR_INVALID_PARAMETER' (CUptiResult=1)
I have also noticed that some CUPTI functions like cuptiProfilerGetCounterAvailability changed their signature, introducing a new boolean arguments, which I think it requires to add
#include <stdbool.h>
Given the required C version.
Either way I wasn't able to find the route case of the problem. I have also tried to initialize directly this new boolean bAllowDeviceLevelCounters but I get the same error.
I think it maybe be related to the new contextless loading introduced by CUDA 13 link.
Which may requires new context management on src/includes/nvmon_perfworks.h.

To Reproduce
It is sufficient to profile any CUDA executable which uses LIKWID (latest version) with CUDA Marker API and compiled with NVMON_PERF to enable the marker API.

To Reproduce with a LIKWID command
likwid-perfctr -V 3 -G 0 -W FLOPS_SP -m

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions