Issue while passing logits_all=True #11365

RomitS007 · 2025-01-23T05:43:52Z

RomitS007
Jan 23, 2025

Hi everyone,

I'm experimenting with llama.cpp for a project, and I’d like to get the logits from the GGUF model. However, when passing logits=True, it takes almost double the time compared to generating only the tokens. how can I optimize it?

If anyone could provide suggestions or guidance to optimize this process or retrieve logits efficiently, I’d greatly appreciate it!

ggerganov · 2025-01-24T10:32:22Z

ggerganov
Jan 24, 2025
Maintainer

I think this was discussed in the past but I can't recall what was the conclusion. I suppose that more logits requires more memory to be copied from the GPU so it's normal to cause slowdown. Though I am not sure if the current implementation is optimal. I will likely revisit this logic soon within the context of #11213.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while passing logits_all=True #11365

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Issue while passing logits_all=True #11365

RomitS007 Jan 23, 2025

Replies: 1 comment

ggerganov Jan 24, 2025 Maintainer

RomitS007
Jan 23, 2025

ggerganov
Jan 24, 2025
Maintainer