Corrected latency statistics when batchsize is greater than 1 #336
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
batch size = 2
[ INFO ] [1] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.70ms, Detokenization Time: 0.64ms, Generation Time: 38.15s, Latency: 298.04 ms/2tokens
[ INFO ] [1] First token latency: 540.01 ms/2tokens, other tokens latency: 296.11 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [1] First infer latency: 538.62 ms/infer, other infers latency: 294.77 ms/infer, inference count: 128
[ INFO ] [1] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [1] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [1] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.74ms, Detokenization Time: 3.35ms, Generation Time: 17.36s, Latency: 1446.61 ms/2tokens
[ INFO ] [1] First token latency: 12741.76 ms/2tokens, other tokens latency: 419.68 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [1] First infer latency: 12740.33 ms/infer, other infers latency: 418.07 ms/infer, inference count: 12
[ INFO ] [1] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] [2] Batch_size=2, all input token size after padding: 32 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 64, Output size: 256, Infer count: 128, Tokenization Time: 0.78ms, Detokenization Time: 0.35ms, Generation Time: 36.13s, Latency: 282.23 ms/2tokens
[ INFO ] [2] First token latency: 470.72 ms/2tokens, other tokens latency: 280.72 ms/2tokens, len of tokens: 128 * 2
[ INFO ] [2] First infer latency: 469.27 ms/infer, other infers latency: 279.52 ms/infer, inference count: 128
[ INFO ] [2] Result MD5:['251ac2d9f383877765d02205c9ab9e39', '251ac2d9f383877765d02205c9ab9e39']
[ INFO ] [2] Batch_size=2, all input token size after padding: 1024 * 2, all max_output_token_size: 128 * 2
[ INFO ] [2] Input token size: 2048, Output size: 24, Infer count: 128, Tokenization Time: 4.35ms, Detokenization Time: 1.34ms, Generation Time: 17.26s, Latency: 1438.58 ms/2tokens
[ INFO ] [2] First token latency: 12699.77 ms/2tokens, other tokens latency: 414.76 ms/2tokens, len of tokens: 12 * 2
[ INFO ] [2] First infer latency: 12698.76 ms/infer, other infers latency: 413.80 ms/infer, inference count: 12
[ INFO ] [2] Result MD5:['6b78eddc9fc612253bd4696c3b46ecf8', '6b78eddc9fc612253bd4696c3b46ecf8']
[ INFO ] <<< Warm-up iteration is excluded. >>>
[ INFO ] [Total] Iterations: 4
[ INFO ] [Average] Prompt[0] Input token size: 64, 1st token lantency: 505.36 ms/2tokens, 2nd tokens latency: 288.42 ms/2tokens, 2nd tokens throughput: 6.93 tokens/s
[ INFO ] [Average] Prompt[1] Input token size: 2048, 1st token lantency: 12720.76 ms/2tokens, 2nd tokens latency: 417.22 ms/2tokens, 2nd tokens throughput: 4.79 tokens/s