使用模型 Yi-34B-Chat
机器配置 A40 24GB
- 零上下文时生成 1024 Tokens(禁用 EOS)
- 生成三轮取平均值
- 第一次生成前会进行 256 Tokens 的 Warm-up,记录热身前后显存变化
llama_print_timings: load time = 72.86 ms
llama_print_timings: sample time = 4516.47 ms / 1023 runs ( 4.41 ms per token, 226.50 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_print_timings: eval time = 39367.24 ms / 1023 runs ( 38.48 ms per token, 25.99 tokens per second)
llama_print_timings: total time = 48351.40 ms / 1024 tokens
Output generated in 48.90 seconds (21.25 tokens/s, 1039 tokens, context 1, seed 1683353357)
llama_print_timings: load time = 72.86 ms
llama_print_timings: sample time = 4532.99 ms / 1023 runs ( 4.43 ms per token, 225.68 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_print_timings: eval time = 39227.25 ms / 1023 runs ( 38.35 ms per token, 26.08 tokens per second)
llama_print_timings: total time = 47776.10 ms / 1024 tokens
Output generated in 48.31 seconds (21.84 tokens/s, 1055 tokens, context 1, seed 1864893420)
llama_print_timings: load time = 72.86 ms
llama_print_timings: sample time = 4649.50 ms / 1023 runs ( 4.54 ms per token, 220.02 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_print_timings: eval time = 39227.13 ms / 1023 runs ( 38.35 ms per token, 26.08 tokens per second)
llama_print_timings: total time = 47838.45 ms / 1024 tokens
Output generated in 48.40 seconds (21.26 tokens/s, 1029 tokens, context 1, seed 314197904)
Average: 21.45 tokens/s
- 加载后:
21208 MiB