small change

snova-rodrigom · Sep 27, 2024 · 8ee8d83 · 8ee8d83
1 parent 48965c8
commit 8ee8d83
Showing 1 changed file with 55 additions and 55 deletions.
diff --git a/benchmarking/README.md b/benchmarking/README.md
@@ -167,62 +167,62 @@ This option allows you to evaluate the performance of the selected LLM on synthe
     - ```LLM requests across time```: This gantt plot shows the duration of the TTFT and end-to-end latency per request in a timeline. One should expect latencies considerably greater than TTFTs, and multiple bars starting at the same time as number of concurrent requests specified. In addition, if the endpoint allows dynamic batching, one could see grouped bars according to the batch sizes supported.
 
 5. Synthetic Performance common examples
-  1. Dynamic batching speed  
-
-      Parameters:
-      - Model name: COE/Meta-Llama-3.1-70B-Instruct
-      - API type: SambaStudio
-      - Number of input tokens: 250
-      - Number of output tokens: 250
-      - Number of total requests: 80
-      - Number of concurrent requests: 20
-      - Timeout: 600
-
-      Results:
-
-      ![dynamic_batching_speed-ttft](./imgs/dynamic_batching_speed-ttft.png)
-      ![dynamic_batching_speed-latency](./imgs/dynamic_batching_speed-latency.png)
-      ![dynamic_batching_speed-output_throughput](./imgs/dynamic_batching_speed-output_throughput.png)
-      ![dynamic_batching_speed-total_output_throughput](./imgs/dynamic_batching_speed-total_output_throughput.png)
-      ![dynamic_batching_speed-gantt](./imgs/dynamic_batching_speed-gantt.png)
-
-  2. High input and ouput tokens
-
-      Parameters:
-      - Model name: COE/Meta-Llama-3.1-70B-Instruct
-      - API type: SambaStudio
-      - Number of input tokens: 2000
-      - Number of output tokens: 2000
-      - Number of total requests: 80
-      - Number of concurrent requests: 25
-      - Timeout: 600
-
-      Results:
-
-      ![high_input_output_tokens-ttft](./imgs/high_input_output_tokens-ttft.png)
-      ![high_input_output_tokens-latency](./imgs/high_input_output_tokens-latency.png)
-      ![high_input_output_tokens-output_throughput](./imgs/high_input_output_tokens-output_throughput.png)
-      ![high_input_output_tokens-total_output_throughput](./imgs/high_input_output_tokens-total_output_throughput.png)
-      ![high_input_output_tokens-gantt](./imgs/high_input_output_tokens-gantt.png)
-
-  3. Running multiple concurrent requests
-
-      Parameters:
-      - Model name: COE/Meta-Llama-3.1-70B-Instruct
-      - API type: SambaStudio
-      - Number of input tokens: 250
-      - Number of output tokens: 250
-      - Number of total requests: 1000
-      - Number of concurrent requests: 100
-      - Timeout: 600
-
-      Results:
+    1. Dynamic batching speed  
+
+        Parameters:
+        - Model name: COE/Meta-Llama-3.1-70B-Instruct
+        - API type: SambaStudio
+        - Number of input tokens: 250
+        - Number of output tokens: 250
+        - Number of total requests: 80
+        - Number of concurrent requests: 20
+        - Timeout: 600
+
+        Results:
+
+        ![dynamic_batching_speed-ttft](./imgs/dynamic_batching_speed-ttft.png)
+        ![dynamic_batching_speed-latency](./imgs/dynamic_batching_speed-latency.png)
+        ![dynamic_batching_speed-output_throughput](./imgs/dynamic_batching_speed-output_throughput.png)
+        ![dynamic_batching_speed-total_output_throughput](./imgs/dynamic_batching_speed-total_output_throughput.png)
+        ![dynamic_batching_speed-gantt](./imgs/dynamic_batching_speed-gantt.png)
 
-      ![running_multiple_concurrent_requests-ttft](./imgs/running_multiple_concurrent_requests-ttft.png)
-      ![running_multiple_concurrent_requests-latency](./imgs/running_multiple_concurrent_requests-latency.png)
-      ![running_multiple_concurrent_requests-output_throughput](./imgs/running_multiple_concurrent_requests-output_throughput.png)
-      ![running_multiple_concurrent_requests-total_output_throughput](./imgs/running_multiple_concurrent_requests-total_output_throughput.png)
-      ![running_multiple_concurrent_requests-gantt](./imgs/running_multiple_concurrent_requests-gantt.png)
+    2. High input and ouput tokens
+
+        Parameters:
+        - Model name: COE/Meta-Llama-3.1-70B-Instruct
+        - API type: SambaStudio
+        - Number of input tokens: 2000
+        - Number of output tokens: 2000
+        - Number of total requests: 80
+        - Number of concurrent requests: 25
+        - Timeout: 600
+
+        Results:
+
+        ![high_input_output_tokens-ttft](./imgs/high_input_output_tokens-ttft.png)
+        ![high_input_output_tokens-latency](./imgs/high_input_output_tokens-latency.png)
+        ![high_input_output_tokens-output_throughput](./imgs/high_input_output_tokens-output_throughput.png)
+        ![high_input_output_tokens-total_output_throughput](./imgs/high_input_output_tokens-total_output_throughput.png)
+        ![high_input_output_tokens-gantt](./imgs/high_input_output_tokens-gantt.png)
+
+    3. Running multiple concurrent requests
+
+        Parameters:
+        - Model name: COE/Meta-Llama-3.1-70B-Instruct
+        - API type: SambaStudio
+        - Number of input tokens: 250
+        - Number of output tokens: 250
+        - Number of total requests: 1000
+        - Number of concurrent requests: 100
+        - Timeout: 600
+
+        Results:
+
+        ![running_multiple_concurrent_requests-ttft](./imgs/running_multiple_concurrent_requests-ttft.png)
+        ![running_multiple_concurrent_requests-latency](./imgs/running_multiple_concurrent_requests-latency.png)
+        ![running_multiple_concurrent_requests-output_throughput](./imgs/running_multiple_concurrent_requests-output_throughput.png)
+        ![running_multiple_concurrent_requests-total_output_throughput](./imgs/running_multiple_concurrent_requests-total_output_throughput.png)
+        ![running_multiple_concurrent_requests-gantt](./imgs/running_multiple_concurrent_requests-gantt.png)
 
   </details>