LLMux

LLMux is a high-performance LLM gateway written in Go. It supports standalone deployments and distributed, enterprise-grade governance with optional Postgres and Redis. A Next.js dashboard provides management and analytics when enabled.

Overview

Unified OpenAI-compatible APIs: chat, responses, embeddings, models
Multi-provider routing with six strategies (shuffle, round-robin, lowest-latency, least-busy, lowest-tpm-rpm, lowest-cost)
Governance: multi-tenant auth, budgets, rate limits, audit logging
Ops-friendly: Prometheus metrics, OpenTelemetry tracing, health checks
Optional Next.js dashboard for management and analytics

Performance: LLMux vs LiteLLM

We benchmarked LLMux (Go) against LiteLLM (Python) on identical hardware (4 CPU cores) using a local mock server with fixed 50ms latency.

Metric	LLMux (Go)	LiteLLM (Python)	Difference
Throughput (RPS)	1943.35	246.52	~8x Faster
Mean Latency	51.29 ms	403.94 ms	~8x Lower Overhead
P99 Latency	91.71 ms	845.37 ms	Stable vs Jittery

Benchmark config: 10k requests, 100 concurrency, 4 CPU cores, 50ms backend latency.

Quick Start

Prerequisites

Go 1.23+
Node.js 18+ (dashboard)
Optional: PostgreSQL for auth/usage tracking
Optional: Redis for distributed routing + rate limiting

Build and Run

git clone https://github.com/blueberrycongee/llmux.git
cd llmux

cp .env.example .env
# Edit .env with your API keys

make build
cp config/config.example.yaml config/config.yaml
./bin/llmux --config config/config.yaml

Run Dashboard

cd ui
npm install
npm run dev

Docker

docker build -t llmux .
docker run -p 8080:8080 -v $(pwd)/config:/config llmux

Configuration

Environment Variables

# Provider API Keys
OPENAI_API_KEY=sk-xxx
ANTHROPIC_API_KEY=sk-ant-xxx
GOOGLE_API_KEY=xxx
AZURE_OPENAI_API_KEY=xxx

# Database (optional, enables enterprise features)
DB_HOST=localhost
DB_USER=llmux
DB_PASSWORD=xxx
DB_NAME=llmux

# Redis (optional, for distributed caching)
REDIS_ADDR=localhost:6379
REDIS_PASSWORD=xxx

# Dashboard
NEXT_PUBLIC_API_URL=http://localhost:8080

config.yaml (minimal)

server:
  port: 8080
  admin_port: 0
  read_timeout: 30s
  write_timeout: 120s

deployment:
  mode: standalone  # standalone, distributed, development

providers:
  - name: openai
    type: openai
    api_key: ${OPENAI_API_KEY}
    base_url: https://api.openai.com/v1
    models:
      - gpt-4o
      - gpt-4o-mini

routing:
  strategy: simple-shuffle
  fallback_enabled: true
  retry_count: 3
  distributed: false

metrics:
  enabled: true
  path: /metrics

Note: auth.enabled defaults to true for non-development modes. For unauthenticated local testing, set deployment.mode=development and auth.enabled=false.

Deployment Modes

standalone: in-memory state, intended for single-instance runs.
distributed: requires PostgreSQL for auth/usage state and Redis for routing stats and rate limiting.
development: allows in-memory state for multi-instance testing (not consistent).

Routing Strategies

Strategy	Description
`simple-shuffle`	Random selection with optional weight/rpm/tpm weighting
`round-robin`	Cycles through deployments, Redis-backed when distributed
`lowest-latency`	Selects deployment with lowest average latency (streaming-aware)
`least-busy`	Selects deployment with fewest active requests
`lowest-tpm-rpm`	Selects deployment with lowest TPM/RPM usage
`lowest-cost`	Selects deployment with lowest cost per token

API Reference

Chat Completions

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Responses

curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gpt-4o",
    "input": "Hello!"
  }'

List Models

curl http://localhost:8080/v1/models

Health Check

curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready

Management API

Management endpoints (and the dashboard UI) are exposed only when server.admin_port > 0 is configured. Database configuration controls whether management state is persisted (Postgres) or ephemeral (in-memory).

Bootstrap (optional): set auth.bootstrap_token and pass X-LLMux-Bootstrap-Token to access management endpoints for initial setup.

Key categories:

Keys: /key/*
Users: /user/*
Teams: /team/*
Organizations: /organization/*
Spend/usage: /spend/*, /global/*
Audit: /audit/*
Control: /control/*

Operations and Observability

Metrics: Prometheus at metrics.path (default /metrics)
Tracing: OpenTelemetry exporter configuration via tracing.* settings
Logs: structured JSON logs from the gateway and management APIs
Auditing: append-only audit logs when the audit store is configured

Production Notes

Standalone mode is single-node and uses in-memory state only.
Distributed mode requires Postgres for auth/usage and Redis for routing stats and rate limiting; missing dependencies degrade related features.
/v1/audio/* and /v1/batches currently return invalid_request_error until provider support is implemented.

Project Structure

llmux/
|-- cmd/server/           # Gateway entry point
|-- config/               # Configuration files
|-- internal/
|   |-- api/              # HTTP handlers & management endpoints
|   |-- auth/             # Authentication, authorization & stores
|   |-- cache/            # Response caching (local/redis/dual)
|   |-- config/           # Configuration loading
|   |-- metrics/          # Prometheus & OpenTelemetry
|   `-- router/           # Request routing strategies
|-- providers/            # LLM provider adapters
|-- pkg/
|   |-- types/            # Shared types
|   `-- errors/           # Error definitions
|-- ui/                   # Next.js Dashboard
|-- deploy/               # Deployment configs
|-- bench/                # Benchmark tools
`-- tests/                # Integration tests

Developer Info

Documentation

docs/DEVELOPMENT.md
docs/PRODUCTION_TEST_GUIDE.md
docs/runbooks/DISTRIBUTED_MODE.md

Development Commands

make test
make coverage
make lint
make fmt
make check

Contributing

See CONTRIBUTING.md for contribution guidelines.

License

MIT License - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
.github/workflows		.github/workflows
bench		bench
caches		caches
cmd/server		cmd/server
config		config
deploy		deploy
docs		docs
internal		internal
pkg		pkg
providers		providers
routers		routers
scripts		scripts
tests		tests
ui		ui
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
client.go		client.go
client_cache_key_test.go		client_cache_key_test.go
client_default_provider_test.go		client_default_provider_test.go
client_embedding_test.go		client_embedding_test.go
client_fallback_reporter_test.go		client_fallback_reporter_test.go
client_pricing_test.go		client_pricing_test.go
client_ratelimit_check_test.go		client_ratelimit_check_test.go
client_ratelimiter_test.go		client_ratelimiter_test.go
client_retry_backoff_test.go		client_retry_backoff_test.go
client_routing_context_test.go		client_routing_context_test.go
client_stream_complex_test.go		client_stream_complex_test.go
client_stream_large_chunk_test.go		client_stream_large_chunk_test.go
client_stream_recovery_test.go		client_stream_recovery_test.go
client_stream_test.go		client_stream_test.go
client_test.go		client_test.go
docker-compose.test.yaml		docker-compose.test.yaml
go.mod		go.mod
go.sum		go.sum
llmux.go		llmux.go
options.go		options.go
options_ratelimiter_test.go		options_ratelimiter_test.go
rate_limit_context.go		rate_limit_context.go
routing_context.go		routing_context.go
routing_context_test.go		routing_context_test.go
stream.go		stream.go
test_helpers_test.go		test_helpers_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMux

Overview

Performance: LLMux vs LiteLLM

Quick Start

Prerequisites

Build and Run

Run Dashboard

Docker

Configuration

Environment Variables

config.yaml (minimal)

Deployment Modes

Routing Strategies

API Reference

Chat Completions

Responses

List Models

Health Check

Management API

Operations and Observability

Production Notes

Project Structure

Developer Info

Documentation

Development Commands

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

blueberrycongee/llmux

Folders and files

Latest commit

History

Repository files navigation

LLMux

Overview

Performance: LLMux vs LiteLLM

Quick Start

Prerequisites

Build and Run

Run Dashboard

Docker

Configuration

Environment Variables

config.yaml (minimal)

Deployment Modes

Routing Strategies

API Reference

Chat Completions

Responses

List Models

Health Check

Management API

Operations and Observability

Production Notes

Project Structure

Developer Info

Documentation

Development Commands

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages