Can I run Triton Inference Server using multiple MIG instances? #6468
tunahanertekin
started this conversation in
General
Replies: 2 comments
-
any updates? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @tunahanertekin, It is current a limitation of MIG/CUDA that multiple MIG instances can not be enumerated by the same process or container: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices.
The blog post achieves scaling multiple mig instances by assigning each container/pod exactly one instance. Triton is just limited by CUDA functionality in this case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
When I started Triton Inference Server using Docker in a machine that has 4 V100 GPUs, it distributed the load to other GPUs (I gave device IDs as
--gpus '"device=0,1,2"'
). Then I aimed to do the same thing with multiple MIG instances on an instance with A100.I used this manual to deploy Triton Inference Server to my Kubernetes cluster. I have A100 80Gb GPU and 7 MIG instances with type
1g.10gb
. However, Triton Inference Server does not seem to distribute the load after I assigned 4 MIG instances (instead of 1) (1g.10gb
) to the container. It only uses the GPU with ID 0.Is there a way to use multiple MIG instances with Triton Inference Server? Any kind of help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions