Colima Model Runner with GPU Support

Run Docker's AI Model Runner with native GPU acceleration on macOS Apple Silicon — without Docker Desktop.

This project provides an automated setup script that configures Docker Model Runner to use your Mac's GPU (Metal) for AI inference, using Colima instead of Docker Desktop. Perfect for running local LLMs with optimal performance.

Why This Project?

The Problem

Docker has open-sourced their Model Runner and CLI plugin, but they're designed to work exclusively with Docker Desktop. When you try to use docker model with open-source alternatives like Colima on macOS, you lose GPU acceleration and fall back to CPU-only inference—resulting in significantly slower performance.

Docker Desktop achieves GPU acceleration by running llama.cpp in an Apple native sandbox with Metal support. However, there was no equivalent solution for users who want a fully open-source Docker setup with Colima.

The Solution

This project provides a forked model-runner with patches that enable the same GPU-accelerated architecture outside of Docker Desktop:

Runs llama.cpp in Apple's native sandbox: Same security model as Docker Desktop, with Metal GPU acceleration
Works seamlessly with Colima: Fully open-source Docker environment
Drop-in replacement: Uses the same docker model CLI commands
Host-level service: model-runner runs as a macOS LaunchAgent for direct GPU access

Benefits

Native GPU Performance: Full Metal acceleration via llama.cpp, just like Docker Desktop
Fully Open Source: Both Colima and our forked model-runner are open source
Sandboxed Security: Models run in Apple's sandbox for container-like security isolation
OpenAI-Compatible API: Drop-in replacement for OpenAI endpoints at localhost:12434
Container Integration: Accessible from Colima containers via host.lima.internal:12434
Automatic Service Management: Runs as macOS LaunchAgent (starts on boot)

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Homebrew installed
Colima installed: brew install colima

Quick Start

1. Run the Setup Script

git clone https://github.com/Liquescent-Development/colima-model-runner.git
cd colima-model-runner
./setup-colima-gpu-model-runner.sh

The script will:

Install dependencies (llama.cpp, Docker CLI)
Download pre-built model-runner binary from our GPU-optimized fork
Configure model-runner as a macOS LaunchAgent service
Set up Colima for Docker support
Test GPU support and connectivity

2. Reload Your Shell

source ~/.zshrc  # or ~/.bashrc if you use bash

3. Pull and Run a Model

# Pull a model
docker model pull ai/smollm2

# List installed models
docker model ls

# Run inference (interactive)
docker model run ai/smollm2

# Run inference (single prompt)
docker model run ai/smollm2 "Explain quantum computing in simple terms"

Usage Examples

Command Line Interface

# Interactive chat
docker model run ai/smollm2

# Single prompt
docker model run ai/smollm2 "Write a haiku about code"

# List models
docker model ls

# Remove a model
docker model rm ai/smollm2

OpenAI-Compatible API

curl http://localhost:12434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

From Docker Containers

# Use from any container running in Colima
docker run -e OPENAI_API_BASE=http://host.lima.internal:12434/v1 your-app

Python Example

import openai

client = openai.OpenAI(
    base_url="http://localhost:12434/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         macOS Host                          │
│                                                             │
│  ┌────────────────────────────────────────────────────┐    │
│  │ model-runner (LaunchAgent)                         │    │
│  │ • Port: 12434                                      │    │
│  │ • GPU: Metal acceleration via llama.cpp            │    │
│  │ • API: OpenAI-compatible                           │    │
│  └────────────────────────────────────────────────────┘    │
│                           ▲                                 │
│                           │ http://localhost:12434          │
│                           │                                 │
│  ┌────────────────────────┴───────────────────────────┐    │
│  │ docker model CLI                                   │    │
│  └────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌────────────────────────────────────────────────────┐    │
│  │ Colima (Lima VM)                                   │    │
│  │ • Docker daemon                                    │    │
│  │ • Containers access via host.lima.internal:12434   │    │
│  └────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Service Management

Check Service Status

launchctl list | grep model-runner

View Logs

# Standard output
tail -f ~/Library/Logs/model-runner.log

# Errors
tail -f ~/Library/Logs/model-runner.err

Restart Service

launchctl unload ~/Library/LaunchAgents/com.liquescent.model-runner.plist
launchctl load ~/Library/LaunchAgents/com.liquescent.model-runner.plist

Stop Service

launchctl unload ~/Library/LaunchAgents/com.liquescent.model-runner.plist

Monitor GPU Usage

sudo powermetrics --samplers gpu_power -i 1000

Verifying GPU Acceleration

Check that GPU support is enabled:

grep "gpuSupport=true" ~/Library/Logs/model-runner.log

You should see output confirming GPU support is active.

Troubleshooting

Service Won't Start

Check logs: tail -f ~/Library/Logs/model-runner.err
Verify binary exists: ls -la ~/.local/bin/model-runner
Ensure Xcode CLT installed: xcode-select -p

GPU Not Detected

Verify you're on Apple Silicon: uname -m (should show arm64)
Check llama.cpp installation: which llama-server
Review model-runner logs for Metal initialization

Docker Model CLI Issues

# Ensure environment variable is set
echo $MODEL_RUNNER_HOST  # Should show http://localhost:12434

# Test connectivity
curl http://localhost:12434/models

# Reinstall Docker CLI
brew upgrade docker

Colima Issues

# Restart Colima
colima stop
colima start --cpu 4 --memory 8 --disk 60 --vm-type=vz

# Check status
colima status

Configuration

Default Settings

Service Port: 12434
Binary Location: ~/.local/bin/model-runner
Installation Directory: ~/.local/share/model-runner
Logs: ~/Library/Logs/model-runner.{log,err}
LaunchAgent: ~/Library/LaunchAgents/com.liquescent.model-runner.plist

Environment Variables

MODEL_RUNNER_HOST=http://localhost:12434
MODEL_RUNNER_PORT=12434
LLAMA_SERVER_PATH=/opt/homebrew/bin

Performance Tips

Choose the Right Model Size: Start with quantized models (Q4_K_M) for best speed/quality balance
Monitor GPU Usage: Use powermetrics to verify GPU utilization during inference
Colima Resources: Allocate enough RAM and CPU for your containers
Model Caching: Pulled models are cached locally for fast reuse

Uninstallation

# Stop and remove service
launchctl unload ~/Library/LaunchAgents/com.liquescent.model-runner.plist
rm ~/Library/LaunchAgents/com.liquescent.model-runner.plist

# Remove binaries and data
rm -rf ~/.local/share/model-runner
rm ~/.local/bin/model-runner

# Remove environment variable from shell profile
# Edit ~/.zshrc or ~/.bashrc and remove MODEL_RUNNER_HOST export

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Related Projects

Docker Model Runner - Official Docker AI documentation
Liquescent-Development/model-runner - Our GPU-optimized fork
Colima - Container runtime for macOS
llama.cpp - LLM inference with Metal support

License

See LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: See CLAUDE.md for development details
Docker Model Runner Docs: https://docs.docker.com/ai/model-runner/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
setup-colima-gpu-model-runner.sh		setup-colima-gpu-model-runner.sh
uninstall-model-runner.sh		uninstall-model-runner.sh

License

Liquescent-Development/colima-model-runner

Folders and files

Latest commit

History

Repository files navigation

Colima Model Runner with GPU Support

Why This Project?

The Problem

The Solution

Benefits

Prerequisites

Quick Start

1. Run the Setup Script

2. Reload Your Shell

3. Pull and Run a Model

Usage Examples

Command Line Interface

OpenAI-Compatible API

From Docker Containers

Python Example

Architecture

Service Management

Check Service Status

View Logs

Restart Service

Stop Service

Monitor GPU Usage

Verifying GPU Acceleration

Troubleshooting

Service Won't Start

GPU Not Detected

Docker Model CLI Issues

Colima Issues

Configuration

Default Settings

Environment Variables

Performance Tips

Uninstallation

Contributing

Related Projects

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages