[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

rafbiels · 2024-02-08T13:44:39Z

Building on top of intel/llvm#12604 + #1318 which adds handleOutOfResources to dpcpp and returns UR_RESULT_ERROR_OUT_OF_RESOURCES, the local mem size check:

unified-runtime/source/adapters/cuda/enqueue.cpp

Lines 294 to 298 in f086f36

    
           if (LocalSize > static_cast<uint32_t>(Device->getMaxCapacityLocalMem())) { 
        
             setErrorMessage("Excessive allocation of local memory on the device", 
        
                             UR_RESULT_ERROR_ADAPTER_SPECIFIC); 
        
             return UR_RESULT_ERROR_ADAPTER_SPECIFIC; 
        
           }

should also return UR_RESULT_ERROR_OUT_OF_RESOURCES and have dedicated error handling case added in handleOutOfResources.

Right now submitting a kernel with too large local mem size results in:

Native API failed. Native API returns: -996 (The plugin has emitted a backend specific error)
Excessive allocation of local memory on the device
 -996 (The plugin has emitted a backend specific error)

which does contain a helpful exception message, but wrapped in generic and confusing "backend specific error" messages and the unhelpful code -996. Having this returning ERROR_OUT_OF_RESOURCES would make it easier for us to cover in the troubleshooting guide, and for users to find it with web search engines.

The text was updated successfully, but these errors were encountered:

kbenzie · 2024-02-15T13:47:12Z

@GeorgeWeb I've assigned this to you since its building on top of your PR's.

kbenzie added the cuda CUDA adapter specific issues label Feb 15, 2024

kbenzie assigned GeorgeWeb Feb 15, 2024

GeorgeWeb mentioned this issue Feb 16, 2024

[CUDA] Use appropriate return code for out of registers kernel launch #1318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

rafbiels commented Feb 8, 2024

kbenzie commented Feb 15, 2024

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

Comments

rafbiels commented Feb 8, 2024

kbenzie commented Feb 15, 2024