Skip to content

Conversation

@KoyamaSohei
Copy link
Contributor

Hello maintainers, long time no see!

We’ve added a new demo using the CxS benchmark to demonstrate the effectiveness of KV tiering. By leveraging DDN EXAScaler, we show that the number of sessions that can be sustained with TTFT < 2s increases.

Of course, vLLM’s prefix caching and chunked prefill are both enabled, and LMCache’s host memory cache is also enabled. Even with all of these optimizations, this benchmark highlights the added value of integrating storage—one of the key strengths of the CxS benchmark.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
@KoyamaSohei
Copy link
Contributor Author

This benchmark can be used even without DDN EXAScaler—local disks are sufficient—and it should also allow for fair comparisons with services like InfiniStore.

@KoyamaSohei KoyamaSohei force-pushed the patch-3 branch 2 times, most recently from 73ac65d to 4857a88 Compare June 16, 2025 12:17
Signed-off-by: Sohei Koyama <skoyama@ddn.com>
@KoyamaSohei
Copy link
Contributor Author

@Siddhant-Ray @Shaoting-Feng
Please take a look when you have time :)

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The results look interesting.

@Siddhant-Ray Siddhant-Ray merged commit f4abc35 into LMCache:main Jun 18, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants