Existing Scaling Benchmarks and Model Grouping Pattern #6831
Unanswered
nastrofaction
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We have about 100+ Models and are trying to figure out the best deployment model for them using Triton. While we are going to do performance tests on latency and resource utilization using the perf tool and then try to come up with a pattern on how to group models together. I wanted to understand if we have any existing stress benchmarks available on Triton. I am looking for answers for questions like.
a) The maximum number of model endpoints that are supported by a single container while maintaining relative performance
b) The maximum resource allotment supported by Triton
Right now, this is for CPU only cases. If anyone has any other recommendations on how we should approach the model grouping. Please share.
Beta Was this translation helpful? Give feedback.
All reactions