Skip to content

Commit

Permalink
llama : update llama_timings.n_p_eval setting (ggerganov#7160)
Browse files Browse the repository at this point in the history
This commit changes the value assigned to llama_timings.n_p_eval when
ctx->n_p_eval is 0 to be 1 instead of 1 which is the current value.

The motivation for this change is that if session caching is enabled,
for example using the `--prompt-cache main-session.txt` command line
argument for the main example, and if the same prompt is used then on
subsequent runs, the prompt tokens will not actually be passed to
llama_decode, and n_p_eval will not be updated by llama_synchoronize.

But the value of n_p_eval will be set 1 by llama_get_timings because
ctx->n_p_eval will be 0. This could be interpreted as 1 token was
evaluated for the prompt which could be misleading for applications
using this value.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
  • Loading branch information
danbev authored May 9, 2024
1 parent 2284216 commit fd9f92b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17879,7 +17879,7 @@ struct llama_timings llama_get_timings(struct llama_context * ctx) {
/*.t_eval_ms =*/ 1e-3 * ctx->t_eval_us,

/*.n_sample =*/ std::max(1, ctx->n_sample),
/*.n_p_eval =*/ std::max(1, ctx->n_p_eval),
/*.n_p_eval =*/ std::max(0, ctx->n_p_eval),
/*.n_eval =*/ std::max(1, ctx->n_eval),
};

Expand Down

0 comments on commit fd9f92b

Please sign in to comment.