Skip to content

ChatBees/chatbees-dev-llm

Repository files navigation

chatbees-dev-llm

Run locally with the small LLMs and ChatBees test container to develop and test LLM applications. This repo describes how to run the small LLMs locally.

The gte-multilingual-base model is used for embedding. The chat completion supports 3 models:

Simply run python start_server.py to start a simple server that hosts these 2 models. To specify which model to use, set env before python start_server.py

  • export ENV_LOCAL_COMPLETION_MODEL=google/gemma-2-2b-it
  • export ENV_LOCAL_COMPLETION_MODEL=meta-llama/Llama-3.2-1B-Instruct or meta-llama/Llama-3.2-3B-Instruct

For the first run, you need to add a read-only Hugging face token to download the models to local disk. You can explicitly add your huggingface token to ~/.cache/huggingface/token, or call below code.

from huggingface_hub import login
login(token=your_hf_read_only_token)

About

Run locally with the small LLMs and ChatBees test container to develop and test LLM applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages