Existing Scaling Benchmarks and Model Grouping Pattern #6831

nastrofaction · 2024-01-25T13:54:47Z

nastrofaction
Jan 25, 2024

We have about 100+ Models and are trying to figure out the best deployment model for them using Triton. While we are going to do performance tests on latency and resource utilization using the perf tool and then try to come up with a pattern on how to group models together. I wanted to understand if we have any existing stress benchmarks available on Triton. I am looking for answers for questions like.
a) The maximum number of model endpoints that are supported by a single container while maintaining relative performance
b) The maximum resource allotment supported by Triton

Right now, this is for CPU only cases. If anyone has any other recommendations on how we should approach the model grouping. Please share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing Scaling Benchmarks and Model Grouping Pattern #6831

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Existing Scaling Benchmarks and Model Grouping Pattern #6831

nastrofaction Jan 25, 2024

Replies: 0 comments

nastrofaction
Jan 25, 2024