Deploy a powerful, private AI content engine in under an hour on your favorite cloud provider. This guide provides a full-stack, Docker-based solution optimized for ARM CPUs but easily adaptable for x86 and NVIDIA GPUs.
The Private AI Stack is a self-contained system for running LLM inference, automation workflows, and creative UI experiences on your own hardware β cloud or local.
It uses:
- π³ Docker Compose for orchestration
- β‘ llama.cpp (Ampere-tuned) for inference
- π OpenWebUI for chat + creative UI
- π n8n for workflow automation
- ποΈ PostgreSQL and Redis for persistence
- π§© Traefik for routing and reverse proxy
- Quick Start
- Cloud Deployment (OCI, AWS, GCP)
- Local Deployment (Mac, Windows, Linux)
- AI Model Configuration
- Environment and Override Files
- Static Network Configuration
- Verifying Connectivity
- Security Tips
- Next Steps
- License
- Docker + Docker Compose (v2+)
- Git
- 16GB+ RAM recommended (24GB+ optimal)
- Optional: GPU (NVIDIA, AMD ROCm, or Apple Metal)
Follow these steps for cloud setup (Oracle Cloud Infrastructure recommended for best ARM performance).
| Provider | Recommended Specs |
|---|---|
| OCI (Ampere) | 4 CPUs Γ 80 cores (A1.Flex), 24GB RAM |
| AWS | c7g.4xlarge (Graviton3 ARM) |
| GCP | t2a-standard-8 (ARM-based) |
Use Ubuntu 22.04 LTS and ensure you open inbound ports:
- 80, 443 (HTTP/HTTPS)
- 8080 (LLM Server)
- 5678 (n8n)
- 3000 (OpenWebUI)
ssh ubuntu@<your_server_ip>
sudo apt update && sudo apt install -y docker.io docker-compose gitgit clone https://github.com/pantaleone-ai/private-ai-stack.git
cd private-ai-stack
cp .env.example .envdocker compose up -d| Platform | Requirements |
|---|---|
| Mac M1/M2/M3 | Docker Desktop (Apple Silicon), Metal acceleration supported |
| Windows 11 | Docker Desktop with WSL2 enabled |
| Ubuntu Linux | Docker Engine + Compose Plugin |
git clone https://github.com/pantaleone-ai/private-ai-stack.git
cd private-ai-stack
cp .env.example .envYou can now edit .env to match your system specs and choose CPU or GPU mode.
LLAMA_ACCEL=cpuLLAMA_ACCEL=cudaLLAMA_ACCEL=metalLLAMA_ACCEL=rocmThen
docker compose build llama
docker compose up -d/models/qwen-3-4b-2507.ggufDownload models from anywhere such as:
- huggingface.co
- Ampere optimized models - https://huggingface.co/AmpereComputing
Then place your .gguf file in the /models directory.
This project includes:
- .env.example β editable environment settings
- docker-compose.override.yml β static IP, resource controls, and GPU toggles
cp .env.example .env
docker compose up -dModify .env to customize:
- Model location
- Thread count
- Port mappings
- Auth credentials
| Profile | Includes | Start Command |
|---|---|---|
observability |
Adds LangFuse + Prometheus | docker compose --profile observability up -d |
secure-tunnel |
Adds Cloudflared tunnel | docker compose --profile secure-tunnel up -d |
These can be added in docker-compose.override.yml as modular profiles for clean separation.
| Service | IP | Port |
|---|---|---|
| Postgres | 172.18.0.5 | 5432 |
| Redis | 172.18.0.10 | 6379 |
| n8n | 172.18.0.6 | 5678 |
| OpenWebUI | 172.18.0.7 | 3000 |
| Llama.cpp | 172.18.0.8 | 8080 |
| Traefik | 172.18.0.9 | 80 / 443 |
curl http://172.18.0.8:8080/modelsExpected response:
{"models":[{"name":"/models/qwen-3-4b-2507.gguf", "object":"model"}]}| Action | Command | Description |
|---|---|---|
| View logs (specific service) | docker compose logs -f [service_name] |
Follow logs for a specific service. |
| Start stack (detached) | docker compose up -d |
Start services in the background. |
| Stop services | docker compose stop |
Stop running services without removing containers. |
| Start existing services | docker compose start |
Start previously stopped services. |
| View running services | docker compose ps |
List all services and their status. |
| Execute command in service | docker compose exec [service_name] [command] |
Run a command inside a running service container. |
| Remove stopped containers | docker compose rm |
Remove stopped service containers. |
| Pull service images | docker compose pull |
Pull all service images. |
| View configuration | docker compose config |
Validate and view the Compose file configuration. |
| Scale a service | docker compose up -d --scale [service_name]=[number] |
Scale a service to a desired number of containers. |
| Force recreate containers | docker compose up -d --force-recreate |
Recreate all containers, even if their configuration hasn't changed. |
| View resource usage | docker stats |
View live stream of container(s) resource usage statistics. (Not compose specific but very useful when using compose) |
- Change all default passwords in .env; never share .env files with sensitive data in public!
- Use strong encryption keys for n8n
- Restrict ports to trusted networks or enable Traefik HTTPS routing
- Integrate n8n workflows for data-driven automation
- Extend OpenWebUI with fine-tuned models
- Add LangFuse for observability
- Use Cloudflared for secure remote access
MIT Β© 2025 β Pantaleone AI & varous open source licenses as specified by each modular software or hardware component.
Each service is modular β meaning you can add your own LLMs, APIs, or automation tools. Excited to see what you will build!
Note: All code in this repo is provided for example purposes only. It is not intended for use in a production environment and has not been tested for security, reliability, or performance.