Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level Zero Device #483

Closed
wants to merge 15 commits into from
Closed

Level Zero Device #483

wants to merge 15 commits into from

Conversation

therault
Copy link
Contributor

@therault therault commented Feb 3, 2023

This branch implements a Level Zero Device to support Intel (and other OneAPI) GPUs.

@therault therault requested a review from a team as a code owner February 3, 2023 18:54
common_gpu

Add a new level_zero device (WIP)

 - copy device_cuda in device_level_zero and rename things
 - module_init and module_fini for level_zero

Need to factorize a little bit more.

Factorizing (need to do it in base)

Port above new common
  - Add multiple CMake logic files and commands
  - jdf2c.c now generates dpcpp output files when needed
  - make DEV_DPCPP be an alias to DEV_LEVEL_ZERO
  - Command Lists for I/O (streams of id 0 and 1) are still immediate
  - Command Lists for computations (streams of id >= 2) are now normal lists connected to a queue
    that queue exists as a compute level-zero queue and as a DPC++ queue
  - Missing compilation logic to compile generated dpc++ code and link it with the target binary

Risk: it is unclear that the user can still push orders / events in the command list, after it is closed,
and it is necessary to close it to force the orders to be pushed on the queue. I might need to create a
new command list after each close, and attach the command list to the event for garbage collection.

Adapt findlevel-zero.cmake to support systems where pkg-config is broken
…el Zero update

use_cuda / use_cuda_index have been renamed to follow proper naming scheme; do the same for level_zero
…eated immediate (and they cannot be immediate if we want to get their Command Queue, which is necessary for the DPC++ interface)

Typo and multiple CMake fixes to make CMake link with DPCPP generated files
Buffer interface is not required. We can use the USM OneMKL interface, it seems to work ok. Need to check for performance.

We cannot mix immediate and non-immediate command lists apparently. Or at least it makes the passing of command queues unreliable

There is an exception in data.c how we handle GPU copies, it must be ported to Level Zero too.

The Level Zero runtime has a atexit procedure to delete command queues, and this seems to conflict with our own actions to delete the command queues...
NULL is not a valid MPI datatype when compiling with a clone of MPICH. The value doesn't matter in this case, just cast
Some fixes in device level_zero

Temp fix for termination detection -- tag size must be made portable. TODO!
Fix the subsystem test. Need to backport fixes in the MCA device

Fully functional sketch for level zero
…, because command lists (or work) submitted to the command queues by SYCL (typically oneMKL) can complete in parallel with events belonging to other command lists.
…avoid polluting their namespace; cleanup some unused variables
…e LevelZero library as at compile time in PaRSECConfig.cmake
@therault
Copy link
Contributor Author

Superseded by #486

@therault therault closed this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant