Remove cu::Context class #316

csbnw · 2025-02-05T09:28:48Z

Remove the cu::Context class. The Context Management is now done in the constructor of the cu::Device class.
When a Primary Context was already active, e.g. when combining cudawrappers with the NVIDIA Runtime API, that context is retained. If not, a new Context is created.

Some tests had to be removed, while others were adapted slightly. When building in HIP mode, the Context code is disabled. The code is tested locally on NVIDIA en AMD en all pass.

Description

Related issues:

Instructions to review the pull request

Check that CHANGELOG.md has been updated if necessary

john-romein · 2025-02-05T20:48:44Z

I do not think that I like it. It is a major deviation from the idea to stay close to the CUDA driver API. Also, it breaks basically all the libraries and applications that we have.

csbnw · 2025-02-06T07:51:38Z

Going forward, I see a couple of options. In no particular order:

Don't merge the suggested changed
Keep the cu::Context, as is
Keep the cu::Context, but make all functions no-ops
Proceed with merging this code

For 1. and 2., we would also have to revert the deprecation message to improve the user experience.

I would be strongly in favour of option 3. or 4. Option 3. maintains compatibility with existing code, but could also confuse the user. People arguably shouldn't use the main branch and instead use any of the released versions. If they stick to an older version, API changes are not really an issue.

There is one thing that still may be need to be addressed for 3. or 4., the cu::Context::setCurrent. Some codes need it. In case of option 3., this won't work as there will be no CUcontext left. We would have to add something like cu::Device::setCurrentContext.

include/cudawrappers/cu.hpp

matmanc · 2025-02-06T08:49:03Z

If we leave it as is we should print something during compilation when using HIP.

The context functions are doing nothing...

john-romein · 2025-02-06T09:07:01Z

Can't we just make them noops for AMD only? Can a Context::setCurrent() be safely ignored on AMD GPUs?

The Context Management is now done in the constructor of the cu::Device class.

To this end, even when the primary context is retained, the returned Cucontext object must be stored in the _context_manager.

With HIP on an AMD GPU, allocating CU_MEMORYTYPE_UNIFIED has type CU_MEMORYTYPE_HOST, causing the check to fail. It is weird that this didn't cause problems before changing the Context class. Alternatively, we could query and store the memory type in the constuctor.

for more information, see https://pre-commit.ci

csbnw · 2025-02-06T15:11:18Z

@matmanc

If we leave it as is we should print something during compilation when using HIP.

The context functions are doing nothing...

and @john-romein

Can't we just make them noops for AMD only? Can a Context::setCurrent() be safely ignored on AMD GPUs?

Can you check the latest code to see how I solved it? There is no need for no-ops. Both in CUDA and HIP mode, you can use the same cu::Device interface.

The only difference between the two is that CUDA will complain about an invalid device context when you try to use the device without calling cuCtxSetCurrent first. This can now be done with the new Device::ctxSetCurrent function. With HIP, nothing happens (the code will just work). I added test cases to test_cu for these scenarios.

wvbbreu

I have a couple of minor code suggestions. I agree that the flood of deprecated messages is frustrating, so I support finding a better solution as soon as possible.

I do agree with @john-romein that these changes will have a significant impact, maybe even warranting a release bump to 1.x.x. Based on that, I have two suggestions:

Delay the changes to a later version of cudawrappers and first release 0.9.0, either with or without the deprecated messages.
Implement the changes now in cudawrappers version 0.9.0, but temporarily include an empty cu::Context to maintain backward compatibility. Internally, it may forward the method call to the representative cu::Device methods. This would give cudawrappers users a grace period to adapt.

wvbbreu · 2025-02-10T12:44:59Z

include/cudawrappers/cu.hpp

  int _ordinal;
 };

-class Context : public Wrapper<CUcontext> {


Shouldn't some of these functions be implemented inside the Device class? For example: setLimit and getLimit? In the case they will not be included, I would suggest mentioning it in the changelog.

I don't know if anyone is really using these functions. They are not available with HIP, are they? I think it's sufficient to inform the user about cu::Context being removed in the changelog.

wvbbreu · 2025-02-10T12:46:34Z

include/cudawrappers/cu.hpp

@@ -208,134 +230,10 @@ class Device : public Wrapper<CUdevice> {
  int getOrdinal() const { return _ordinal; }

 private:
+  std::shared_ptr<CUcontext> _context_manager;


Why a std::shared_ptr? An std::unique_ptr should be sufficient here, especially as Device will always maintain ownership. The only use case I see is when a Device instance is passed as a copy instead of a reference, but I don't know if this is done in practice. In my view, a reference should be preferred.

This was done for consistency with the Wrapper class, which also uses std::shared_ptr. I think that this is the 'safe' option, but if you insist I am also fine with changing it to std::unique_ptr.

wvbbreu · 2025-02-10T12:48:23Z

include/cudawrappers/cu.hpp

+    checkCudaCall(cuDevicePrimaryCtxGetState(_obj, &flags, &active));
+    if (active) {
+      manager =
+          std::shared_ptr<CUdevice>(new CUdevice(_obj), [](CUdevice *ptr) {


Minor suggestion: use the shorthand std::make_shared (or std::make_unique, see my other comment) to omit the templated argumentation.

wvbbreu · 2025-02-10T12:50:06Z

tests/test_cu.cpp

+  thread.join();
+}
+
+TEST_CASE("Test DeviceMemory with Device::ctxSetCurrent", "[context]") {


Good test case, I think this one deals with the issue we were experiencing with dedisp.

csbnw requested review from john-romein and loostrum February 5, 2025 09:29

csbnw requested a review from matmanc February 6, 2025 07:51

matmanc reviewed Feb 6, 2025

View reviewed changes

include/cudawrappers/cu.hpp Outdated Show resolved Hide resolved

csbnw added 11 commits February 6, 2025 16:07

Remove cu::Context class

cad3d6a

The Context Management is now done in the constructor of the cu::Device class.

Update CHANGELOG

f39d85a

Remove forward declaration of Context

f4a6c2b

Release primary context or destroy context upon Device destruction

0229590

Add Device::ctxSetCurrent

37edb29

To this end, even when the primary context is retained, the returned Cucontext object must be stored in the _context_manager.

Remove unneeded #if guards

2addcf5

Move Context retain and create inside constructor of manager

4b55c39

Add tests for context

65f81f7

Omitting Device::ctxSetCurrent doesn't throw with HIP

8b9674e

Remove unneeded #if guard in ctxSetCurrent

6e62efb

csbnw force-pushed the remove-context branch from cc859b3 to 6e62efb Compare February 6, 2025 15:08

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d7ecbe

for more information, see https://pre-commit.ci

csbnw added 2 commits February 6, 2025 16:13

Update CHANGELOG

2edf1b0

Fix test_graph

fca9487

wvbbreu reviewed Feb 11, 2025

View reviewed changes

loostrum mentioned this pull request Feb 14, 2025

HIP mode always uses device zero #320

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cu::Context class #316

Remove cu::Context class #316

csbnw commented Feb 5, 2025

john-romein commented Feb 5, 2025

csbnw commented Feb 6, 2025

matmanc commented Feb 6, 2025

john-romein commented Feb 6, 2025

csbnw commented Feb 6, 2025

wvbbreu left a comment •

edited

Loading

wvbbreu Feb 10, 2025

csbnw Feb 11, 2025

wvbbreu Feb 10, 2025

csbnw Feb 11, 2025

wvbbreu Feb 10, 2025

wvbbreu Feb 10, 2025

Remove cu::Context class #316

Are you sure you want to change the base?

Remove cu::Context class #316

Conversation

csbnw commented Feb 5, 2025

john-romein commented Feb 5, 2025

csbnw commented Feb 6, 2025

matmanc commented Feb 6, 2025

john-romein commented Feb 6, 2025

csbnw commented Feb 6, 2025

wvbbreu left a comment • edited Loading

Choose a reason for hiding this comment

wvbbreu Feb 10, 2025

Choose a reason for hiding this comment

csbnw Feb 11, 2025

Choose a reason for hiding this comment

wvbbreu Feb 10, 2025

Choose a reason for hiding this comment

csbnw Feb 11, 2025

Choose a reason for hiding this comment

wvbbreu Feb 10, 2025

Choose a reason for hiding this comment

wvbbreu Feb 10, 2025

Choose a reason for hiding this comment

wvbbreu left a comment •

edited

Loading