Replies: 1 comment 2 replies
-
note that The throughput per individual request is 6-7 tokens/s. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I deployed the bf16 DeepSeek-R1 model using 32 A800 GPUs with the following command:

However, the token usage seems consistently low (around 0.02) with a throughput of approximately 18 tokens/s. The GPU utilization remains at 100%, which suggests not just partial parameter activation per token.
Could you advise if there are specific hyperparameters I should configure to address this issue?
Beta Was this translation helpful? Give feedback.
All reactions