Running local LLMs is as simple as breathing.
- openai compatible api
- Basic features
-
/v1/completions
-
/v1/chat/completions
-
/v1/embedding
-
/v1/models
-
- Advanced features
- logprobs
- fake multi-model (base on whisper/florence onnx model)
- prompt cache
- vram manager: auto offload / vram limit / cpu limit
- Basic features
- Packaged to bin file
- evaluator
- Special API provided for model evaluation
WIP
pnpm install
pnpm dev
运行程序需要在执行目录下创建一个配置文件 boot.config.json
{
"server": {
"port": 4567
},
"no_docs": false,
"model_dirs": ["~/llm_models","~/ggufs"]
}
AGPL-3.0 license