A simple web interface for LLaSA using ExLlamaV2 with an OpenAI compatible FastAPI server.
Clone the repo:
git clone https://github.com/zuellni/llasa-webui
cd llasa-webui
Create a conda/mamba/python env:
conda create -n llasa-webui python=3.12
conda activate llasa-webui
Install dependencies, ignore any xcodec2
errors:
pip install -r requirements.txt
pip install xcodec2 --no-deps
If you want to use torch+cu126
, keep in mind that you'll need to compile exllamav2
and (optionally) flash-attn
, and for python=3.13
you may need to compile sentencepiece
.
python server.py --model <path or repo id>
You can use the HF models or EXL2 quants from here. Add --cache q4 --dtype bf16
for less VRAM usage.