Enable Cuda in Graphics Implementation for TensorRT backend #100

ashishk98 · 2024-09-05T10:15:29Z

Add cuda context sharing support for TensorRT backend to reduce context switching overhead when graphics workload is running in parallel

…G context

ashishk98 · 2024-09-05T10:17:34Z

@nv-kmcgill53 Please review this as discussed

nnshah1 · 2024-09-18T13:50:47Z

CMakeLists.txt

@@ -269,6 +269,7 @@ target_link_libraries(
    triton-tensorrt-backend
    PRIVATE
      CUDA::cudart
+      CUDA::cuda_driver


@nv-kmcgill53 , @mc-nv, @tanmayv25 - any issues with this dependency

What is the context behind adding this dependency?
From documentation is see:

CUDA Driver Library
The CUDA Driver library (cuda) are used by applications that use calls such as cuMemAlloc, and cuMemFree.
Targets Created:
CUDA::cuda_driver

Aren't this dependency is requisite of TensorRT itself?
Thought by default our product expect driver to be installed and if GPU capability given then available for usage including driver targets and binaries.

Adding this dependency should be fine. Ashish is linking correctly according to the cuda documentation. As it states

Context management can be done through the driver API, but is not exposed in the runtime API

So they will need to link against the driver instead of just linking against the cuda runtime.

So they will need to link against the driver instead of just linking against the cuda runtime.

I'm not agree with this statement, current linkage doesn't explain why user want to add it explicitly.

The cmake documentation isn't exhaustive when it mentions cuMemAlloc and cuMemFree. The user in this case is using the Driver API to set/pass the cuda context around in the backend, rather than letting the core take care of this. This is the reason for adding the CUDA::cuda_driver lib to the linking path. This PR necessarily makes use of functions in the driver where the trt_backend didn't before.

Triton TensorRT Backend is unable to work without CUDA, Triton Inference Server and TensorRT installation.
Current change, per my understanding, uses only cudaSetDevice (CUDA::cudart) and cudaGetErrorString (CUDA runtime API) and those dependencies are satisfied. There why I don't see any reason to link against CUDA::cuda_driver.

The additional dependency is required for cuCtxPushCurrent() / cuCtxPopCurrent() @ https://github.com/triton-inference-server/tensorrt_backend/pull/100/files#diff-3137e95db7c97b9ddd71fa1b600e3dd646025c5a872ad94d4d09f04313fe3fcdR66

The reason we need this dependency is because we are using a special context call Cuda in Graphics (CiG) context which has to work with the cuda driver dll for its operations.

nnshah1 · 2024-09-18T14:01:45Z

@ashishk98 - can you install and run pre-commit checks locally?

cd repo; pre-commit install; pre-commit run --all

ashishk98 · 2024-09-19T06:58:35Z

@nnshah1 fixed pre-commit

…conditional

ashishk98 · 2024-09-19T08:31:29Z

@nnshah1 @mc-nv I have added a new cmake option TRITON_ENABLE_CIG which only conditionally enables the CiG code path as well as conditionally links the cuda_driver component of cuda

nnshah1 · 2024-09-24T21:04:42Z

src/tensorrt_model.h

+  //! for gaming use case. Creating a shared contexts reduces context switching
+  //! overhead and leads to better performance of model execution along side
+  //! Graphics workload.
+  CUcontext GetCiGContext() { return cig_ctx_; }


@ashishk98 question: is this specific to CIG - or could be applied to any application provided cuda context?

mc-nv · 2024-09-24T21:23:14Z

CMakeLists.txt

+  target_compile_definitions(
+      triton-tensorrt-backend
+      PRIVATE TRITON_ENABLE_CIG
+  )
+  target_link_libraries(
+      triton-tensorrt-backend
+      PRIVATE
+        CUDA::cuda_driver
+  )


These setting could be achieved with generator expression, isn't?

What is a generator expression?

nnshah1 · 2024-09-25T14:44:36Z

src/tensorrt_model.cc

+    triton::common::TritonJson::Value value;
+    std::string ptr_value;
+    if (parameters.Find("CIG_CONTEXT_PTR", &value)) {
+      RETURN_IF_ERROR(value.MemberAsString("string_value", &ptr_value));


@ashishk98 instead of directly converting here as a special case, I would prefer to use something similar to what is done in the trt-llm backend:

https://github.com/triton-inference-server/tensorrtllm_backend/blob/8ffb174c0fe88e677eeed7928348e20be548f3f6/inflight_batcher_llm/src/model_state.cc#L204

In this case there is a template method to convert from a parameter to a value - I think the code will be a little clearer to follow.

Also - can we convert to and from a 64 bit integer?

so something like:

model_state->GetParameter<uint64>("CUDA_CONTEXT");

Also it strikes me that although we use value.MemberAsString()

we could also use value.MemberAsUint("string_value",&ptr_value)

Instead (https://github.com/triton-inference-server/common/blob/578491fc3944f77d16a6a38e3d7691c485c47ba0/include/triton/common/triton_json.h#L927)

So two things - 1) add a templated GetParameter() method and 2) we can use MemberAsUint for the uint64 template. 3) officially transfer uint64 values and convert them to and from context.

I have added a GetParameter call for std::string instead of UINT64. This is because when we add the parameter to model config it is directly converted into a hex string instead of a numeric string. Hence while parsing the pointer, MemberAsUint fails because it gets a hex string to parse.

nnshah1 · 2024-09-25T14:54:16Z

src/model_state.cc

-            .c_str());
+#ifdef TRITON_ENABLE_CIG
+  // Set device if CiG is disabled
+  if (!isCiGEnabled())


@tanmayv25, @ashishk98 - is there a way to have a single scoped object

ScopedCudaDeviceContext

That internally checks if there is an application_context and if there is an application context uses push / pop - if not uses cudaSetDevice ?

We don't currently use them in the same locations - but am wondering if that would be possible - I think it would be cleaner logically - where basically an 'application_context' takes the place of the 'device' but otherwise the logic remains the same.

ScopedObject(Device); ScopedObject(Context);

We can take a look at this in the next iteration

nnshah1 · 2024-09-25T14:56:56Z

src/model_state.cc

@@ -175,7 +175,13 @@ ModelState::ModelState(TRITONBACKEND_Model* triton_model)
 ModelState::~ModelState()
 {
  for (auto& device_engine : device_engines_) {
-    cudaSetDevice(device_engine.first.first);
+#ifdef TRITON_ENABLE_CIG


I know I had asked for looking at macros to enable, but I would like to avoid this kind of guard - if we can use a single method and then have two different implementations of that method / object would prefer that to having the macros embedded in the functions / methods.

nnshah1 · 2024-09-25T15:02:21Z

@tanmayv25 , @ashishk98

I would like to see:

Can we generalize this to: "TRITON_ENABLE_APPLICATION_CONTEXT" instead of CIG - as it looks to be general for any cuda context?
Can we use a more generic GetParameter<uint64> templated method similar to tensorrtllm backend. For the future we may consider adding this to the backend apis as it seems like a common pattern. small optimization here could be to use MemberAsUint() instead of MemberAsString in the template
Can we create a method / function that would selectively use setdevice or push/pop but otherwise be the same logic? We can then enable the alternate constructor / check in one place instead of multiple.

GuanLuo · 2024-09-25T18:30:12Z

src/instance_state.cc

+  if (!model_state_->isCiGEnabled())
+#endif  // TRITON_ENABLE_CIG
+  {
+    cudaSetDevice(DeviceId());


Do you mind to share the reasoning of avoiding the set device calls? Wouldn't that cause the issue of model not being placed / executed on selected device (based on model config)?

The intended use of cuda context sharing is targeted only of single GPU (RTX end-user) systems. I wanted to avoid complications with this use case

When we call cudaSetDevice() the cuda runtime resets the to using the default cuda context for the thread

tanmayv25 · 2024-09-25T20:55:24Z

How model instances on multiple GPUs will be handled? AFAIK a cuda context is per device. If we have more than one GPU devices, then we should pass cuda context handles for each GPU device. Or am I missing something here?

…hidden ifdefs

tanmayv25 · 2024-12-20T22:44:52Z

@ashishk98 I believe we still need to raise an error if someone tries to use pre-built cuda context with multi-GPU environment, right?

src/tensorrt_model.cc

src/tensorrt_model.h

fpetrini15 · 2025-01-07T20:58:12Z

Can the stakeholders provide another round of reviews on this PR? We'd like to get these changes into a release asset this week.

tanmayv25 · 2025-01-23T18:39:25Z

@ashishk98 I believe we still need to raise an error if someone tries to use pre-built cuda context with multi-GPU environment, right?

I still don't see a guard against preventing the use of pre-built cuda context on a multi-GPU system. We are skipping calling cudaSetDevice() when using CiG context. And as we are passing only a single CiG context all the model instances will hit the same GPU device. We need to put a check if Triton is loading model instance on different device and throw a meaningful error.

ashishk98 · 2025-01-27T16:31:38Z

@tanmayv25 please review the latest commit

src/model_state.cc

src/instance_state.cc

src/model_state.cc

src/tensorrt_model.h

tanmayv25 · 2025-01-27T21:47:35Z

Some minor fixes, otherwise LGTM.

ashishk98 · 2025-01-28T10:38:43Z

Resolved minor fixes as well

tanmayv25 · 2025-01-30T18:12:17Z

@nvda-mesharma did you verify these changes in CI? The pipeline I launched yesterday seems to be failing with this error?

gmake[2]: *** [CMakeFiles/triton-tensorrt-backend.dir/build.make:202: CMakeFiles/triton-tensorrt-backend.dir/src/loader.cc.o] Error 1
In file included from /tmp/tritonbuild/tensorrt/src/model_state.h:33,
                 from /tmp/tritonbuild/tensorrt/src/instance_state.h:36,
                 from /tmp/tritonbuild/tensorrt/src/instance_state.cc:27:
/tmp/tritonbuild/tensorrt/src/tensorrt_model.h:69:13: error: explicit specialization in non-namespace scope 'class triton::backend::tensorrt::TensorRTModel'
   69 |   template <>
      |             ^
/tmp/tritonbuild/tensorrt/src/tensorrt_model.h:70:23: error: template-id 'GetParameter<std::string>' in declaration of primary template
   70 |   TRITONSERVER_Error* GetParameter<std::string>(
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/tritonbuild/tensorrt/src/model_state.h:33,
                 from /tmp/tritonbuild/tensorrt/src/instance_state.h:36,
                 from /tmp/tritonbuild/tensorrt/src/tensorrt.cc:39:
/tmp/tritonbuild/tensorrt/src/tensorrt_model.h:69:13: error: explicit specialization in non-namespace scope 'class triton::backend::tensorrt::TensorRTModel'
   69 |   template <>
      |             ^
/tmp/tritonbuild/tensorrt/src/tensorrt_model.h:70:23: error: template-id 'GetParameter<std::string>' in declaration of primary template
   70 |   TRITONSERVER_Error* GetParameter<std::string>(
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/tritonbuild/tensorrt/src/instance_state.cc: In member function 'void triton::backend::tensorrt::ModelInstanceState::ProcessResponse()':
/tmp/tritonbuild/tensorrt/src/instance_state.cc:1216:29: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
 1216 |     auto payload = std::move(completion_queue_.Get());

https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/jobs/137650575

@ashishk98 for viz.

…100)" This reverts commit ab13c10.

…100)" (#105) This reverts commit ab13c10.

* Enable CiG support in Tensorrt backend * Creating scoped runtime context structure for better management of CiG context * instance_state null check * Minor bug fix * pre-commit fixes * Added new cmake flag TRITON_ENABLE_CIG to make the CiG support build conditional * pre-commit fixes * CiG->Cuda. Making the changes more generic to cuda context sharing + hidden ifdefs * remove todo * Add GetParameter to fetch string params * Cig->Cuda + comment updates * Use RETURN_IF_ERROR macro * Handle Multi-GPU failure case * typo + styling fixes --------- Co-authored-by: Ashish Karale <akarale@nvidia.com>

Akarale98 added 4 commits August 28, 2024 08:27

Enable CiG support in Tensorrt backend

134bf33

Creating scoped runtime context structure for better management of Ci…

7353671

…G context

instance_state null check

ed1296d

Minor bug fix

05e3786

nv-kmcgill53 requested review from tanmayv25 and nnshah1 September 5, 2024 16:17

nnshah1 reviewed Sep 18, 2024

View reviewed changes

pre-commit fixes

9ae5a09

Added new cmake flag TRITON_ENABLE_CIG to make the CiG support build …

89ab580

…conditional

pre-commit fixes

b624b98

nnshah1 reviewed Sep 24, 2024

View reviewed changes

mc-nv reviewed Sep 24, 2024

View reviewed changes

nnshah1 reviewed Sep 25, 2024

View reviewed changes

GuanLuo reviewed Sep 25, 2024

View reviewed changes

Akarale98 added 3 commits October 30, 2024 03:40

CiG->Cuda. Making the changes more generic to cuda context sharing + …

1f1ae7e

…hidden ifdefs

remove todo

b42a8d6

Add GetParameter to fetch string params

d9aff26

pvijayakrish pushed a commit that referenced this pull request Nov 19, 2024

Bumping min required cxx standard to 17 (#100)

39d7a0c

tanmayv25 reviewed Dec 20, 2024

View reviewed changes

src/tensorrt_model.cc Outdated Show resolved Hide resolved

src/tensorrt_model.cc Outdated Show resolved Hide resolved

src/tensorrt_model.h Outdated Show resolved Hide resolved

Akarale98 added 2 commits January 6, 2025 00:24

Merge remote-tracking branch 'main/main'

80ba393

Cig->Cuda + comment updates

2ca19ac

Use RETURN_IF_ERROR macro

20283c6

nvda-mesharma requested review from nv-kmcgill53, mc-nv, nnshah1, tanmayv25 and GuanLuo January 23, 2025 18:22

Handle Multi-GPU failure case

e2a9336

tanmayv25 reviewed Jan 27, 2025

View reviewed changes

typo + styling fixes

625d19e

tanmayv25 self-requested a review January 28, 2025 22:08

tanmayv25 approved these changes Jan 28, 2025

View reviewed changes

nvda-mesharma merged commit ab13c10 into triton-inference-server:main Jan 30, 2025
1 check passed

mc-nv added a commit that referenced this pull request Jan 30, 2025

Revert "Enable Cuda in Graphics Implementation for TensorRT backend (#…

725631c

…100)" This reverts commit ab13c10.

mc-nv mentioned this pull request Jan 30, 2025

Revert "Enable Cuda in Graphics Implementation for TensorRT backend (… #105

Merged

mc-nv added a commit that referenced this pull request Jan 30, 2025

Revert "Enable Cuda in Graphics Implementation for TensorRT backend (#…

2b060ca

…100)" (#105) This reverts commit ab13c10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Cuda in Graphics Implementation for TensorRT backend #100

Enable Cuda in Graphics Implementation for TensorRT backend #100

ashishk98 commented Sep 5, 2024

ashishk98 commented Sep 5, 2024 •

edited

Loading

nnshah1 Sep 18, 2024

mc-nv Sep 18, 2024 •

edited

Loading

nv-kmcgill53 Sep 18, 2024

mc-nv Sep 18, 2024

nv-kmcgill53 Sep 18, 2024 •

edited

Loading

mc-nv Sep 18, 2024

ashishk98 Sep 19, 2024

ashishk98 Sep 19, 2024

nnshah1 commented Sep 18, 2024

ashishk98 commented Sep 19, 2024

ashishk98 commented Sep 19, 2024

nnshah1 Sep 24, 2024

mc-nv Sep 24, 2024

ashishk98 Oct 30, 2024

nnshah1 Sep 25, 2024

ashishk98 Oct 30, 2024

nnshah1 Sep 25, 2024 •

edited

Loading

ashishk98 Oct 30, 2024

nnshah1 Sep 25, 2024

ashishk98 Oct 30, 2024

nnshah1 commented Sep 25, 2024

GuanLuo Sep 25, 2024

ashishk98 Oct 30, 2024

tanmayv25 commented Sep 25, 2024 •

edited

Loading

tanmayv25 commented Dec 20, 2024

fpetrini15 commented Jan 7, 2025

tanmayv25 commented Jan 23, 2025

ashishk98 commented Jan 27, 2025

tanmayv25 commented Jan 27, 2025

ashishk98 commented Jan 28, 2025

tanmayv25 commented Jan 30, 2025

Enable Cuda in Graphics Implementation for TensorRT backend #100

Enable Cuda in Graphics Implementation for TensorRT backend #100

Conversation

ashishk98 commented Sep 5, 2024

ashishk98 commented Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

mc-nv Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nv-kmcgill53 Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnshah1 commented Sep 18, 2024

ashishk98 commented Sep 19, 2024

ashishk98 commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnshah1 Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnshah1 commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanmayv25 commented Sep 25, 2024 • edited Loading

tanmayv25 commented Dec 20, 2024

fpetrini15 commented Jan 7, 2025

tanmayv25 commented Jan 23, 2025

ashishk98 commented Jan 27, 2025

tanmayv25 commented Jan 27, 2025

ashishk98 commented Jan 28, 2025

tanmayv25 commented Jan 30, 2025

ashishk98 commented Sep 5, 2024 •

edited

Loading

mc-nv Sep 18, 2024 •

edited

Loading

nv-kmcgill53 Sep 18, 2024 •

edited

Loading

nnshah1 Sep 25, 2024 •

edited

Loading

tanmayv25 commented Sep 25, 2024 •

edited

Loading