low RPS of SeamlessM4t models during load test #468
Unanswered
gupta9ankit5
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Has anyone conducted a load test on Seamless models for the S2TT task?
The models are deployed within a Docker container as a flask application using Gunicorn in Amazon AWS EC2 r5.8xlarge instance.
The resource constraints are configured as follows:
- CPU: Minimum of 8 vCPUs, maximum of 16 vCPUs.
- Memory: Minimum of 8 GiB, maximum of 60 GiB.
Locust is used for load testing. My input consists of audio files, each ranging from 1 to 5 seconds.
Throughout the load test, requests per second (RPS) remain below 2.
Can someone provide any assistance or share any benchmark load test performance data for this setup?
Here is the locust script. The audios are encoded into base64 encoding and sent as a json request.
And this is the code that handles the request, decodes it and transcribes it using model as per the MODEL_TYPE (seamless here) and sends back the response.
Beta Was this translation helpful? Give feedback.
All reactions