Can I run Triton Inference Server using multiple MIG instances? #6468

tunahanertekin · 2023-10-23T11:24:49Z

tunahanertekin
Oct 23, 2023

Hi,

When I started Triton Inference Server using Docker in a machine that has 4 V100 GPUs, it distributed the load to other GPUs (I gave device IDs as --gpus '"device=0,1,2"'). Then I aimed to do the same thing with multiple MIG instances on an instance with A100.

I used this manual to deploy Triton Inference Server to my Kubernetes cluster. I have A100 80Gb GPU and 7 MIG instances with type 1g.10gb. However, Triton Inference Server does not seem to distribute the load after I assigned 4 MIG instances (instead of 1) (1g.10gb) to the container. It only uses the GPU with ID 0.

Is there a way to use multiple MIG instances with Triton Inference Server? Any kind of help is appreciated.

nickisworking · 2024-01-23T16:34:44Z

nickisworking
Jan 23, 2024

any updates?

0 replies

rmccorm4 · 2024-02-09T22:08:18Z

rmccorm4
Feb 9, 2024
Collaborator

Hi @tunahanertekin,

It is current a limitation of MIG/CUDA that multiple MIG instances can not be enumerated by the same process or container: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices.

CUDA can only enumerate a single compute instance

The blog post achieves scaling multiple mig instances by assigning each container/pod exactly one instance.

Triton is just limited by CUDA functionality in this case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I run Triton Inference Server using multiple MIG instances? #6468

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Can I run Triton Inference Server using multiple MIG instances? #6468

tunahanertekin Oct 23, 2023

Replies: 2 comments

nickisworking Jan 23, 2024

rmccorm4 Feb 9, 2024 Collaborator

tunahanertekin
Oct 23, 2023

nickisworking
Jan 23, 2024

rmccorm4
Feb 9, 2024
Collaborator