kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.
serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve
-
Updated
Sep 20, 2025 - Python