Skip to content
/ Comb Public

Comb is a plug-and-play storage system for long-context LLM serving.

License

Notifications You must be signed in to change notification settings

shijuzhao/Comb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMB

COMB

COMB is a plug-and-play caching system for long-context LLM serving.

Code Structure

COMB
├── benchmarks                   # For benchmarking
├── comb
│   ├── entrypoints
│   │   ├── api_server.py        # For online server
│   │   └── comb.py              # For offline inference
│   ├── integration
│   │   ├── hf                   # hf transformers backend
│   │   ├── vllm                 # vLLM backend
│   │   └── __init__.py
│   ├── storage
│   │   ├── chunk_processor.py   # For generating PIC
│   │   ├── pic_allocator.py     # For allocating memory
│   │   ├── pic_manager.py       # For managing PIC
│   │   └── pic_utils.py
│   ├── transfer
│   │   └── cuda_ipc_utils.py    # For inter-process communication
│   ├── __init__.py
│   ├── output.py
│   └── supported_models.py
├── data
├── examples                     # For use case
├── training                     # For training
├── environment.yml
└── requirements.txt

Getting Started

Run the following commands to prepare the environment. We recommend appending two export commands to the end of ~/.bashrc.

export PYTHONPATH=~/Comb:$PYTHONPATH
export TOKENIZERS_PARALLELISM=true
pip install -r requirements.txt

Install vllm. (Recommended for efficiency and benchmarking)

pip install vllm

Currently we only support meta-llama/Llama-3.1-8B-Instruct and deepseek-ai/DeepSeek-V2-Lite-Chat. If you want to use another model, you can also train a Comb model by yourself through following our instructions.

Usage

You can find examples in the folder examples.

Benchmark

See Instructions.

Demo

In this example, we simulate two requests with different prefixes. The requests contain the same question and retrieved context, enabling the KV cache to be reused through PIC.

🖥️ Demo

demo.mp4

About

Comb is a plug-and-play storage system for long-context LLM serving.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages