Skip to content

Commit

Permalink
small change
Browse files Browse the repository at this point in the history
  • Loading branch information
snova-rodrigom committed Sep 27, 2024
1 parent 48965c8 commit 8ee8d83
Showing 1 changed file with 55 additions and 55 deletions.
110 changes: 55 additions & 55 deletions benchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,62 +167,62 @@ This option allows you to evaluate the performance of the selected LLM on synthe
- ```LLM requests across time```: This gantt plot shows the duration of the TTFT and end-to-end latency per request in a timeline. One should expect latencies considerably greater than TTFTs, and multiple bars starting at the same time as number of concurrent requests specified. In addition, if the endpoint allows dynamic batching, one could see grouped bars according to the batch sizes supported.

5. Synthetic Performance common examples
1. Dynamic batching speed

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 250
- Number of output tokens: 250
- Number of total requests: 80
- Number of concurrent requests: 20
- Timeout: 600

Results:

![dynamic_batching_speed-ttft](./imgs/dynamic_batching_speed-ttft.png)
![dynamic_batching_speed-latency](./imgs/dynamic_batching_speed-latency.png)
![dynamic_batching_speed-output_throughput](./imgs/dynamic_batching_speed-output_throughput.png)
![dynamic_batching_speed-total_output_throughput](./imgs/dynamic_batching_speed-total_output_throughput.png)
![dynamic_batching_speed-gantt](./imgs/dynamic_batching_speed-gantt.png)

2. High input and ouput tokens

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 2000
- Number of output tokens: 2000
- Number of total requests: 80
- Number of concurrent requests: 25
- Timeout: 600

Results:

![high_input_output_tokens-ttft](./imgs/high_input_output_tokens-ttft.png)
![high_input_output_tokens-latency](./imgs/high_input_output_tokens-latency.png)
![high_input_output_tokens-output_throughput](./imgs/high_input_output_tokens-output_throughput.png)
![high_input_output_tokens-total_output_throughput](./imgs/high_input_output_tokens-total_output_throughput.png)
![high_input_output_tokens-gantt](./imgs/high_input_output_tokens-gantt.png)

3. Running multiple concurrent requests

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 250
- Number of output tokens: 250
- Number of total requests: 1000
- Number of concurrent requests: 100
- Timeout: 600

Results:
1. Dynamic batching speed

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 250
- Number of output tokens: 250
- Number of total requests: 80
- Number of concurrent requests: 20
- Timeout: 600

Results:

![dynamic_batching_speed-ttft](./imgs/dynamic_batching_speed-ttft.png)
![dynamic_batching_speed-latency](./imgs/dynamic_batching_speed-latency.png)
![dynamic_batching_speed-output_throughput](./imgs/dynamic_batching_speed-output_throughput.png)
![dynamic_batching_speed-total_output_throughput](./imgs/dynamic_batching_speed-total_output_throughput.png)
![dynamic_batching_speed-gantt](./imgs/dynamic_batching_speed-gantt.png)

![running_multiple_concurrent_requests-ttft](./imgs/running_multiple_concurrent_requests-ttft.png)
![running_multiple_concurrent_requests-latency](./imgs/running_multiple_concurrent_requests-latency.png)
![running_multiple_concurrent_requests-output_throughput](./imgs/running_multiple_concurrent_requests-output_throughput.png)
![running_multiple_concurrent_requests-total_output_throughput](./imgs/running_multiple_concurrent_requests-total_output_throughput.png)
![running_multiple_concurrent_requests-gantt](./imgs/running_multiple_concurrent_requests-gantt.png)
2. High input and ouput tokens

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 2000
- Number of output tokens: 2000
- Number of total requests: 80
- Number of concurrent requests: 25
- Timeout: 600

Results:

![high_input_output_tokens-ttft](./imgs/high_input_output_tokens-ttft.png)
![high_input_output_tokens-latency](./imgs/high_input_output_tokens-latency.png)
![high_input_output_tokens-output_throughput](./imgs/high_input_output_tokens-output_throughput.png)
![high_input_output_tokens-total_output_throughput](./imgs/high_input_output_tokens-total_output_throughput.png)
![high_input_output_tokens-gantt](./imgs/high_input_output_tokens-gantt.png)

3. Running multiple concurrent requests

Parameters:
- Model name: COE/Meta-Llama-3.1-70B-Instruct
- API type: SambaStudio
- Number of input tokens: 250
- Number of output tokens: 250
- Number of total requests: 1000
- Number of concurrent requests: 100
- Timeout: 600

Results:

![running_multiple_concurrent_requests-ttft](./imgs/running_multiple_concurrent_requests-ttft.png)
![running_multiple_concurrent_requests-latency](./imgs/running_multiple_concurrent_requests-latency.png)
![running_multiple_concurrent_requests-output_throughput](./imgs/running_multiple_concurrent_requests-output_throughput.png)
![running_multiple_concurrent_requests-total_output_throughput](./imgs/running_multiple_concurrent_requests-total_output_throughput.png)
![running_multiple_concurrent_requests-gantt](./imgs/running_multiple_concurrent_requests-gantt.png)

</details>

Expand Down

0 comments on commit 8ee8d83

Please sign in to comment.