-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
It's not clear to me from looking at the code if this library supports the following pattern:
mlx_lm.cache_prompt --prompt 'Here are 100 examples of how to produce a desired output: {examples}'
... cachedprompt.safetensors saved to cwd
prompt_template = "Now produce an output from this sentence: {sentence}"
prompts_raw = [prompt_template.format(sentence=sentence) for sentence in sentences]
response = batch_generate(kv-cache-file=cachedprompt.safetensors, prompts=prompts_raw)
... batched generations created
Is this something the library can or could do? I'm interested in being able to provide multi-shot examples without introducing huge prompt processing times due to wasted compute on re-encoding the same pre-prompt
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels