Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected latency statistics when batchsize is greater than 1 #336

Conversation

wgzintel
Copy link
Contributor

@wgzintel wgzintel commented Apr 1, 2024

batch size = 2
[ INFO ] [1] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.70ms, Detokenization Time: 0.64ms, Generation Time: 38.15s, Latency: 298.04 ms/2tokens
[ INFO ] [1] First token latency: 540.01 ms/2tokens, other tokens latency: 296.11 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [1] First infer latency: 538.62 ms/infer, other infers latency: 294.77 ms/infer, inference count: 128
[ INFO ] [1] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [1] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.74ms, Detokenization Time: 3.35ms, Generation Time: 17.36s, Latency: 1446.61 ms/2tokens
[ INFO ] [1] First token latency: 12741.76 ms/2tokens, other tokens latency: 419.68 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [1] First infer latency: 12740.33 ms/infer, other infers latency: 418.07 ms/infer, inference count: 12
[ INFO ] [1] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] [2] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.78ms, Detokenization Time: 0.35ms, Generation Time: 36.13s, Latency: 282.23 ms/2tokens
[ INFO ] [2] First token latency: 470.72 ms/2tokens, other tokens latency: 280.72 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [2] First infer latency: 469.27 ms/infer, other infers latency: 279.52 ms/infer, inference count: 128
[ INFO ] [2] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [2] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.35ms, Detokenization Time: 1.34ms, Generation Time: 17.26s, Latency: 1438.58 ms/2tokens
[ INFO ] [2] First token latency: 12699.77 ms/2tokens, other tokens latency: 414.76 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [2] First infer latency: 12698.76 ms/infer, other infers latency: 413.80 ms/infer, inference count: 12
[ INFO ] [2] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] <<< Warm-up iteration is excluded. >>>
[ INFO ] [Total] Iterations: 4
[ INFO ] [Average] Prompt[0] Input token size: 64, 1st token lantency: 505.36 ms/2tokens, 2nd tokens latency: 288.42 ms/2tokens, 2nd tokens throughput: 6.93 tokens/s
[ INFO ] [Average] Prompt[1] Input token size: 2048, 1st token lantency: 12720.76 ms/2tokens, 2nd tokens latency: 417.22 ms/2tokens, 2nd tokens throughput: 4.79 tokens/s

@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Apr 1, 2024
@peterchen-intel peterchen-intel changed the title Corrected lantency statistics when batchsize is greater than 1 Corrected latency statistics when batchsize is greater than 1 Apr 8, 2024
@peterchen-intel peterchen-intel merged commit 28286d4 into openvinotoolkit:master Apr 11, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants