Support for KV-cache loading?

It's not clear to me from looking at the code if this library supports the following pattern:

`mlx_lm.cache_prompt --prompt 'Here are 100 examples of how to produce a desired output: {examples}'`

... cachedprompt.safetensors saved to cwd

```
prompt_template = "Now produce an output from this sentence: {sentence}"
prompts_raw = [prompt_template.format(sentence=sentence) for sentence in sentences]
response = batch_generate(kv-cache-file=cachedprompt.safetensors, prompts=prompts_raw)
```

... batched generations created

Is this something the library can or could do? I'm interested in being able to provide multi-shot examples without introducing huge prompt processing times due to wasted compute on re-encoding the same pre-prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for KV-cache loading? #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for KV-cache loading? #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions