A simple framework for using Claude Code or Codex CLI as the frontend to any cloud or local LLM on Apple Silicon. Connect locally via LiteLLM + MLX or LM Studio, or remotely via Z.AI, Gemini/Google AI Studio, DeepSeek, or OpenRouter.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI CLI SWITCHBOARD β
β One Interface β Any LLM (Cloud or Local) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ βββββββββββββββ
β Claude Code β β Codex CLI β
ββββ¬βββββββββ¬ββ ββββββββ¬βββββββ
β β β
ANTHROPIC_BASE_URL Codex profiles
+ ANTHROPIC_API_KEY (config.yaml)
β β β
β βββββββββββββββββββββ
β β
open.bigmodel.cn localhost:18080
(Claude only) β
β βΌ
βΌ ββββββββββββ
ββββββββββββ β LiteLLM β
β Z.AI β β Proxy β
β Direct β β :18080 β
β GLM-4.5 β ββββββ¬ββββββ
ββββββββββββ β
ββββββ΄βββββ¬βββββββββ
β β β
βΌ βΌ βΌ
βββββββ ββββββββββ βββββββββββ
β MLX β β LM β β Remote β
β18081β β Studio β β APIs β
ββββ¬βββ βββββ¬βββββ ββββββ¬βββββ
β β β
βΌ βΌ βΌ
βββββββ ββββββββββ βββββββββββ
βGLM9Bβ β Llama β βDeepSeek β
βGLM32β β Qwen β β Gemini β
βββββββ ββββββββββ βββββββββββ
Apple Better Cloud
Silicon Tools APIs
- macOS with Apple Silicon (M1/M2/M3/M4) for local MLX models
- Python 3.10+
- Claude Code - Install from Anthropic's official CLI
- LM Studio (optional) - For better local model management and tool calling
# Run the setup script (installs dependencies)
./setup.sh
Manual installation:
# Install LiteLLM (required for all models)
pip3 install 'litellm[proxy]'
# Install MLX (required for MLX local models only)
pip3 install mlx-lm
# Optional: Install LM Studio for enhanced local model support
# Download from: https://lmstudio.ai/
# Set Python path for user installs (improves command availability)
PYTHON_VERSION=$(python3 --version | cut -d' ' -f2 | cut -d'.' -f1-2)
export PATH="$HOME/Library/Python/$PYTHON_VERSION/bin:$PATH"
Create a .env
file for API keys:
# Required for remote models
DEEPSEEK_API_KEY=your_deepseek_key
GEMINI_API_KEY=your_gemini_key
OPENROUTER_API_KEY=your_openrouter_key
# List available configurations
ls configs/
# Remote models via LiteLLM (require API keys)
./scripts/start-remote.sh configs/remote-deepseek.yaml
# Remote models via Z.AI direct (require Z.AI API key)
./scripts/claude-zai.sh # GLM-4.5
# Local models via MLX (no API keys needed, requires model download)
./scripts/start-local.sh configs/local-glm-9b.yaml
# Local models via LM Studio (better tool calling support)
./scripts/start-lmstudio.sh configs/lmstudio-llama-groq-tool.yaml
# See all available models
ls configs/
# Option 1: Set environment variable each time
ANTHROPIC_BASE_URL=http://localhost:18080 ANTHROPIC_API_KEY=dummy-key claude
# Option 2: Use the claudel alias (after setting up aliases)
claudel
Note: When starting a claudel
session, you may see this authentication warning:
β Auth conflict: Both a token (claude.ai) and an API key (ANTHROPIC_API_KEY) are set.
This may lead to unexpected behavior.
This warning can be safely ignored when using local models via the proxy. The claudel
alias is designed to work with this configuration.
# Install Codex CLI if needed
npm install -g @openai/codex
# Step 1: Generate Codex profiles
./scripts/setup-codex.sh
# Step 2: Generate shell aliases (includes codex-* shortcuts)
./scripts/setup-aliases.sh
source ai-aliases.sh
# Now use Codex with any model (starts backend automatically)
codex-models # List all available Codex profiles
codex-local-glm-9b # Start GLM-9B backend + launch Codex
codex-lmstudio-llama-groq # Start Llama Groq backend + launch Codex
# Pass Codex flags as usual
codex-local-glm-9b --sandbox danger-full-access
Codex profiles mirror the Claude aliases. Each codex-*
helper first boots the corresponding configuration (calling start-local.sh
, start-lmstudio.sh
, or start-remote.sh
) and then launches Codex pointed at the LiteLLM proxy (http://localhost:18080/v1
). The helper also sets LITELLM_API_KEY=${LITELLM_API_KEY:-dummy-key}
so it works out of the box; export a different key beforehand if your proxy requires one.
# Start models by type
./scripts/start-remote.sh <config> # Start remote model via LiteLLM
./scripts/start-local.sh <config> # Start local model via MLX
./scripts/start-lmstudio.sh <config> # Start local model via LM Studio
./scripts/claude-zai.sh [--air] # Start GLM model via Z.AI direct
# Universal management
./scripts/stop.sh # Stop all services
./scripts/status.sh # Check server status
The setup script now automatically generates aliases from config metadata - no manual updates needed when adding new models!
./scripts/setup-aliases.sh # Creates ai-aliases.sh from configs
source ai-aliases.sh # Load aliases
# Now use shortcuts like:
claude-remote-deepseek # Start DeepSeek-R1 (example remote)
claude-local-glm-9b # Start local GLM-9B (example local)
claude-lmstudio-llama-groq # Start Llama 3 Groq via LM Studio (example)
claude-stop # Stop services
claude-status # Check status
claude-models # Show all available commands
claudel # Run Claude Code with local proxy
π§ How it works:
- Each config file has an
alias_config
section with metadata setup-aliases.sh
reads all configs and generates aliases automatically- Adding new configs = new aliases automatically appear
- No more manual alias maintenance!
The system supports four different runner types, each optimized for specific model deployment scenarios:
- Engine: MLX framework for Apple Silicon optimization
- Memory usage: Varies by model (1-30GB)
- Requirements: Apple Silicon Mac, MLX installed
- Example: GLM-4-9B
- Best for: Fast local inference on Apple Silicon
- Engine: LM Studio with enhanced tool calling support
- Memory usage: Varies by model (1-8GB)
- Requirements: LM Studio installed, models downloaded via LM Studio
- Example: Llama 3 Groq Tool Use
- Best for: Models requiring better function calling capabilities
- Engine: LiteLLM proxy for API model abstraction
- Requirements: API keys, internet connection
- Example: DeepSeek-R1
- Best for: Access to state-of-the-art remote models
- Engine: Direct connection to Z.AI endpoints
- Requirements: Z.AI API key, no local proxy needed
- Example: GLM-4.5
- Best for: Direct access to GLM models without proxy overhead
The system supports 20+ models across four categories. Use ls configs/
to see all available configurations.
- Example: DeepSeek-R1 - Advanced reasoning model ($2.19/1M output tokens)
- See
configs/remote-*.yaml
for all remote options
- Example: GLM-4-9B - Smaller, faster model (~2GB memory)
- See
configs/local-*.yaml
for all local options
- Example: Llama 3 Groq 8B Tool Use - Specialized for function calling (~5GB memory, 89.06% BFCL)
- See
configs/lmstudio-*.yaml
for all LM Studio options
To add a new model:
-
Create a config file in
configs/
following the naming pattern:remote-{name}.yaml
for remote modelslocal-{name}.yaml
for MLX local modelslmstudio-{name}.yaml
for LM Studio models
-
Copy an existing config as a template and modify:
- Model name and settings
- Runner type (
remote_litellm
,local_mlx
,local_lmstudio
) - Alias configuration for automatic alias generation
-
Regenerate aliases:
./scripts/setup-aliases.sh source ai-aliases.sh
-
Your new model will automatically appear in
claude-models
list!
- Proxy: LiteLLM on port 18080 (for remote_litellm runner type)
- Local Engines: MLX for Apple Silicon optimization, LM Studio for enhanced tool calling
- Direct Connections: Z.AI direct endpoints (no proxy needed)
- API Compatible: Works seamlessly with Claude Code interface
- Environment: Automatic .env loading for API keys
- Config System: YAML-based model configurations with metadata-driven alias generation
- Service Management: Unified stop/start functionality across all runner types
- Won't start: Check
litellm-proxy.log
for errors - Port conflict: Another service using port 18080
- API key errors: Verify keys in
.env
file
- Model not found: Models download automatically on first use
- Memory issues: Use GLM-9B for lower memory usage
- MLX errors: Ensure you're on Apple Silicon with MLX installed
- Authentication failed: Check API keys in
.env
- Rate limiting: Most services have usage limits
- Billing required: Gemini models require billing setup
Model used by GenerateContent request (models/gemini-2.5-flash-lite) and CachedContent (models/gemini-2.5-pro) has to be the same.
This occurs because Google's Vertex AI service caches content based on message content rather than model names. When switching between different Gemini models, cached content from one model may conflict with another.
Workarounds:
- Use only one Gemini model type per session
- Wait for cached content to expire naturally (typically 1 hour)
This is a known limitation of Google's Vertex AI context caching system when used with LiteLLM.
litellm
command not found: Runpip3 install 'litellm[proxy]' --user
and add the appropriate Python user bin directory to PATH (e.g.,export PATH="$HOME/Library/Python/$(python3 --version | cut -d' ' -f2 | cut -d'.' -f1-2)/bin:$PATH")
claude
command not found: Install Claude Code from https://github.com/anthropics/claude-code- MLX installation: Requires Apple Silicon Mac
configs/ # Model configuration files
βββ local-*.yaml # Local MLX models
βββ lmstudio-*.yaml # LM Studio models
βββ remote-*.yaml # Remote API models
βββ openrouter-*.yaml # OpenRouter models
scripts/ # Management scripts
βββ start-remote.sh # Start remote models
βββ start-local.sh # Start local models
βββ start-lmstudio.sh # Start LM Studio models
βββ claude-zai.sh # Start Z.AI models directly
βββ stop.sh # Stop all services
βββ status.sh # Check status
βββ setup-aliases.sh # Create convenience aliases
βββ setup-codex.sh # Setup Codex CLI profiles
βββ common-utils.sh # Shared utility functions
βββ download-lmstudio-model.sh # Download LM Studio models
setup.sh # Main setup script
.env # API keys (create this)
ai-aliases.sh # Generated aliases (after setup)
litellm-proxy.log # LiteLLM proxy logs
mlx-server.log # MLX server logs (when applicable)
AGENTS.md # Claude Code agents documentation
Note: The claude
command by default connects to Anthropic's servers. To use your local models, either set the environment variables manually or use the claudel
alias provided in the setup.