Corrected latency statistics when batchsize is greater than 1 #336

wgzintel · 2024-04-01T15:21:08Z

batch size = 2
[ INFO ] [1] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.70ms, Detokenization Time: 0.64ms, Generation Time: 38.15s, Latency: 298.04 ms/2tokens
[ INFO ] [1] First token latency: 540.01 ms/2tokens, other tokens latency: 296.11 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [1] First infer latency: 538.62 ms/infer, other infers latency: 294.77 ms/infer, inference count: 128
[ INFO ] [1] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [1] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.74ms, Detokenization Time: 3.35ms, Generation Time: 17.36s, Latency: 1446.61 ms/2tokens
[ INFO ] [1] First token latency: 12741.76 ms/2tokens, other tokens latency: 419.68 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [1] First infer latency: 12740.33 ms/infer, other infers latency: 418.07 ms/infer, inference count: 12
[ INFO ] [1] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] [2] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.78ms, Detokenization Time: 0.35ms, Generation Time: 36.13s, Latency: 282.23 ms/2tokens
[ INFO ] [2] First token latency: 470.72 ms/2tokens, other tokens latency: 280.72 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [2] First infer latency: 469.27 ms/infer, other infers latency: 279.52 ms/infer, inference count: 128
[ INFO ] [2] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [2] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.35ms, Detokenization Time: 1.34ms, Generation Time: 17.26s, Latency: 1438.58 ms/2tokens
[ INFO ] [2] First token latency: 12699.77 ms/2tokens, other tokens latency: 414.76 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [2] First infer latency: 12698.76 ms/infer, other infers latency: 413.80 ms/infer, inference count: 12
[ INFO ] [2] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] <<< Warm-up iteration is excluded. >>>
[ INFO ] [Total] Iterations: 4
[ INFO ] [Average] Prompt[0] Input token size: 64, 1st token lantency: 505.36 ms/2tokens, 2nd tokens latency: 288.42 ms/2tokens, 2nd tokens throughput: 6.93 tokens/s
[ INFO ] [Average] Prompt[1] Input token size: 2048, 1st token lantency: 12720.76 ms/2tokens, 2nd tokens latency: 417.22 ms/2tokens, 2nd tokens throughput: 4.79 tokens/s

…genai into guozhong/corrected_lantency_statistics

Corrected lantency statistics when batchsize is greater than 1

bafd11c

github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Apr 1, 2024

wgzintel added 3 commits April 2, 2024 18:57

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

63a8ecd

…genai into guozhong/corrected_lantency_statistics

output 1st token and 2nd tokens avg latency and tput

d5e671e

format code

b87b36d

wgzintel requested a review from peterchen-intel April 3, 2024 15:19

ilya-lavrenov assigned peterchen-intel Apr 5, 2024

Merge branch 'master' into guozhong/corrected_lantency_statistics

63e0357

peterchen-intel changed the title ~~Corrected lantency statistics when batchsize is greater than 1~~ Corrected latency statistics when batchsize is greater than 1 Apr 8, 2024

peterchen-intel approved these changes Apr 8, 2024

View reviewed changes

peterchen-intel assigned eaidova Apr 8, 2024

Merge branch 'master' into guozhong/corrected_lantency_statistics

e318633

peterchen-intel merged commit 28286d4 into openvinotoolkit:master Apr 11, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrected latency statistics when batchsize is greater than 1 #336

Corrected latency statistics when batchsize is greater than 1 #336

wgzintel commented Apr 1, 2024 •

edited

Loading

Corrected latency statistics when batchsize is greater than 1 #336

Corrected latency statistics when batchsize is greater than 1 #336

Conversation

wgzintel commented Apr 1, 2024 • edited Loading

wgzintel commented Apr 1, 2024 •

edited

Loading