Add prefix sharing to continuous batching #42094

remi-or · 2025-11-07T15:57:03Z

This PR adds a prefix sharing mechanism to the continuous batching API like the one present in VLLM.
It only activates if the model is a full-attention model, as is the case in VLLM.

The mechanism has two main components:

block hashing: each block in the cache, once it is filled up, is given a hash that depends on all the tokens in the sequence up to and including the ones in the block
prefix detection: when starting prefill for a request, we first look for a prefix with KV cache already computed, and if such a prefix is found, we skip the KV computation for it, using references to completed blocks to save compute
block de-reference: when a block is given a hash, we check that no other block shares the same hash. This ensures that each block sharing the same information is unique, and helps keep the cache size in control

What is missing from this PR:

more documentations
checking the code again
edge case: if the prefix is the entire initial request, we still need to do a forward with the last token of the request
checks TODOs to adress

The PR is draft until these are resolved but any early comment is welcome

HuggingFaceDocBuilderDev · 2025-11-07T16:06:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

remi-or added 8 commits November 4, 2025 13:58

Fix a bug in the CB memory calcuation

53f953c

Nit in example

efde9d5

Replace _free_blocks with a proper object BlockManager

3824e5d

Removed dead code

f6e4cc6

Added hasing mechanism (wip)

25ac7a0

Added de-duplication

c85ee38

Add de-initialization mechnaism

041690f

Add prefix detection

a9c0f9a

remi-or requested a review from McPatate November 7, 2025 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prefix sharing to continuous batching #42094

Add prefix sharing to continuous batching #42094

remi-or commented Nov 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add prefix sharing to continuous batching #42094

Are you sure you want to change the base?

Add prefix sharing to continuous batching #42094

Conversation

remi-or commented Nov 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants