Skip to content

scitix/SiLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SiLLM


About

SiLLM is a high-performance asynchronous inference engine designed to optimize model execution through two parallelism mechanisms.

  • GPU-CPU Overlapping

    • Fully asynchronous inference scheduling
    • Fully asynchronous input processing
    • Fully asynchronous output processing
  • Sequence-Parallel Sampling

    • Fully parallel sampling across GPUs

Getting Started

SiLLM is built on top of vLLM, utilizing vLLM's front end for model loading and leveraging PagedAttention for model execution. Additionally, it integrates custom plugins to enable asynchronous scheduling, asynchronous input/output processing, and parallel sampling.

Step 1: Install vLLM from pip

# Install vLLM
pip install openai==1.45.0 gputil aioprometheus psutil transformers termcolor ipywidgets
pip install vllm==0.6.0

Step 2: Install Albireo plugin from source

# Install Albireo Plugin
python3 python_only_dev.py
apt-get install libboost-all-dev
cd albireo
pip install -v .

License

This library is licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published