Production-focused Docker Swarm control plane for cluster operations, Git-based deployments, observability, and auditability.
Quick Start • Capabilities • Security • Cluster Services • API • Configuration • Development
Scale Swarm turns Docker Swarm operations into a single control plane: bootstrap, node management, project deployment, ingress routing, container management, private registry, monitoring, alerts, and audit logs.
| Area | What you get |
|---|---|
| Infrastructure | Swarm bootstrap, node provisioning, SSH/TLS remote Docker connections |
| Deployment | Git-backed projects (dockerfile or compose), webhook triggers, rollback |
| Multi-Cluster | Cluster records, selector in UI, X-Cluster-Id request scoping |
| Ingress & Routing | Traefik reverse proxy with automatic Let's Encrypt TLS |
| Cluster Services | Portainer CE (container management), private Docker Registry — all with Traefik domain routing |
| Observability | Prometheus + Grafana + Loki stack, external monitoring targets, hardened-by-default monitoring exposure |
| Security & Policy | Project IP allowlists, exposure modes, admin-enforced public port policy, optional node firewall hardening |
| Operations | Alerts, audit log, user/role management, admin settings (GitHub OAuth, security policy, monitoring access) |
Recommended path for a fresh install: run
install.sh, complete first-run admin setup, then deploy Traefik and configure security policy before exposing services.
Requirements:
- Docker
- Docker Compose plugin (
docker compose)
curl -fsSL https://raw.githubusercontent.com/Noqte-AI/scale-swarm/main/install.sh | bashThen open the UI at http://localhost (or the port in SCALE_SWARM_HTTP_PORT).
- Create the first admin user
- Add or connect a cluster
- Deploy Traefik from Ingress
- Configure Settings → Admin Security Policy (public ports / CIDRs)
- Configure Settings → Monitoring Access if you want Grafana on a domain
- Keep direct public ports disabled unless required
- Use domain routing (
Traefik) withallowed_cidrsfor admin/ops services - Keep monitoring internal by default (expose Grafana only through domain + allowlist)
- Enable node firewall hardening during provisioning if your environment supports UFW
For expanded install options (panel domain mode, Cloudflare shortcut, manual Docker Compose startup), see Installation Details.
graph TB
subgraph UI[Frontend - React + Vite]
DASH[Dashboard / Projects / Nodes]
ING[Ingress & Cluster Services]
MON[Monitoring / Alerts / Audit]
end
subgraph API[Backend - FastAPI]
AUTH[Auth + RBAC]
CL[Cluster Context]
PROJ[Projects + Git/Webhooks]
STACKS[Stack Services]
OPS[Swarm/Nodes/Monitoring/Alerts]
JOBS[Background Job Worker]
end
DB[(PostgreSQL)]
ANS[Ansible Playbooks]
GITHUB[GitHub OAuth/API]
subgraph MC[Managed Clusters]
subgraph C1[Cluster]
TRF[Traefik Ingress]
PORT[Portainer CE]
REG[Docker Registry]
PROM[Prometheus + Grafana]
SVCS[Project Services]
end
end
DASH --> API
ING --> API
MON --> API
API --> DB
API --> ANS
PROJ --> GITHUB
CL --> MC
JOBS --> MC
STACKS --> MC
All cluster-level services share the traefik-public overlay network. Traefik is the foundation — Portainer and Registry depend on it for domain routing and TLS.
graph LR
subgraph "traefik-public (overlay network)"
T["Traefik<br/>:80 / :443<br/>reverse proxy + TLS"]
P["Portainer CE<br/>:9000 internal<br/>container management"]
R["Docker Registry<br/>:5000 internal<br/>private images"]
S["Project Services<br/>routed by domain"]
end
subgraph "mon_net (overlay network)"
PR["Prometheus"]
G["Grafana"]
L["Loki + Promtail"]
NE["Node Exporter"]
CA["cAdvisor"]
end
Internet((Internet)) -->|":80/:443"| T
T -->|"Host(portainer.example.com)"| P
T -->|"Host(registry.example.com)"| R
T -->|"Host(app.example.com)"| S
graph TD
TRAEFIK[Traefik Ingress] -->|creates network| NET[traefik-public]
NET --> PORTAINER[Portainer CE]
NET --> REGISTRY[Docker Registry]
NET --> PROJECTS[Project Services]
PORTAINER -.->|"domain routing<br/>(optional)"| TRAEFIK
REGISTRY -.->|"domain routing<br/>(optional)"| TRAEFIK
PROJECTS -.->|"domain routing<br/>(optional)"| TRAEFIK
style TRAEFIK fill:#3b82f6,color:#fff
style PORTAINER fill:#06b6d4,color:#fff
style REGISTRY fill:#9333ea,color:#fff
style PROJECTS fill:#10b981,color:#fff
- Swarm init / remote bootstrap
- Add, update, remove, promote, and demote nodes
- SSH-based and TLS-based remote Docker connectivity
- Cluster connection testing and persistent cluster records
- Create projects from Git repos
- Deploy type:
dockerfileorcompose - Repo branch selection (manual or GitHub OAuth-backed)
- Environment variables, ports, replicas, resource limits
- Domain assignment + HTTPS via Traefik labels
- Project network access controls:
exposure_mode(public/domain_only/internal) - Domain IP allowlists via Traefik (
allowed_cidrs) - Deploy/redeploy, logs (SSE), status sync, scaling (non-compose)
- Build history and rollback to successful builds
- GitHub OAuth connect/disconnect per user
- List repositories and branches from connected GitHub account
- Admin-managed GitHub OAuth app configuration in Settings
- Webhook endpoint for build triggers with secret-based signature validation
- Central Admin Security Policy for public port governance
- Allow/block public ports (including compose stack published ports)
- Enforce project exposure mode (
public,domain_only,internal) - Require/restrict domain CIDRs across projects
- Deploy-time policy enforcement (create/update/deploy all validated)
- Optional node firewall hardening via Ansible + UFW during provisioning
- Monitoring access controls (Grafana domain + HTTPS + Traefik IP allowlist)
- Portainer domain IP allowlists and policy-aware direct port behavior
All cluster services are managed from the Ingress page in the UI. They are deployed as Docker Swarm stacks via SSH to the cluster manager node.
The foundation service. Must be deployed before any other service can use domain routing.
| Feature | Details |
|---|---|
| Image | traefik:v2.11 |
| Ports | 80 (HTTP) and 443 (HTTPS) published on host |
| TLS | Automatic Let's Encrypt certificates via ACME HTTP challenge |
| Discovery | Automatic service discovery via Docker Swarm labels |
| Network | Creates and manages the traefik-public overlay network |
| Placement | Manager node only |
How it works: Traefik watches the Docker Swarm API for services with traefik.enable=true labels. When a project or cluster service specifies a domain, Traefik automatically routes traffic and provisions TLS certificates.
Visual management UI for Docker Swarm. Deployed as a stack behind Traefik.
| Feature | Details |
|---|---|
| Image | portainer/portainer-ce:latest |
| Service Port | 9000 |
| Domain | Optional, routed through Traefik with HTTPS |
| IP Allowlist | Optional Traefik ipAllowList middleware when using domain routing |
| Direct Access | 9000 can be published when no domain is used (subject to admin security policy) |
| Admin Password | Can be set at deploy time via --admin-password-file, or set on first login |
| Volumes | portainer_data:/data + Docker socket mount |
| Placement | Manager node only |
| Dependency | Requires Traefik for domain routing (UI enforces this) |
Password behavior:
- If
admin_passwordis provided during deploy, it is passed via a Docker Swarm secret and read by Portainer during bootstrap. This only takes effect on initial setup. - If left empty, Portainer shows its own password setup screen on first access.
- Password can always be changed later through the Portainer UI.
Private Docker Registry for storing and distributing container images within the cluster.
| Feature | Details |
|---|---|
| Image | registry:2 |
| Service Port | 5000 |
| Domain | Optional, routed through Traefik with HTTPS |
| Direct Access | 5000 can be published when no domain is used |
| Storage | registry_data:/var/lib/registry (local volume) |
| Placement | Manager node only |
| Dependency | Requires Traefik for domain routing (can deploy without, accessible internally) |
Usage with domain:
# Tag and push to your private registry
docker tag myapp:latest registry.example.com/myapp:latest
docker push registry.example.com/myapp:latestUsage without domain (cluster-internal):
# Push using swarm node IP
docker tag myapp:latest <manager-ip>:5000/myapp:latest
docker push <manager-ip>:5000/myapp:latestFull observability stack deployed via Docker SDK.
| Component | Image | Mode |
|---|---|---|
| Prometheus | prom/prometheus:v2.54.1 |
replicated (1) |
| Grafana | grafana/grafana:11.2.0 |
replicated (1) |
| Loki | grafana/loki:3.1.1 |
replicated (1) |
| Promtail | grafana/promtail:3.1.1 |
global |
| Node Exporter | prom/node-exporter:v1.8.2 |
global |
| cAdvisor | gcr.io/cadvisor/cadvisor:v0.49.1 |
global |
- Runs on isolated
mon_netoverlay network - Monitoring service ports are not published by default (hardened default)
- Grafana admin password configurable via
SCALE_SWARM_GRAFANA_ADMIN_PASSWORD - Optional Grafana domain routing via Traefik (
Settings → Monitoring Access) - Optional Grafana Traefik IP allowlist (
grafana_allowed_cidrs) - External monitoring targets can be added with SSH-based exporter installation
- Grafana reset action for fresh redeploy
- Alerts (rules + channels + history)
- Audit log for privileged/operational actions
sequenceDiagram
actor User
participant UI as UI
participant API as FastAPI
participant DB as PostgreSQL
participant JQ as Job Queue
participant GB as Git Builder
participant SW as Swarm Cluster
User->>UI: Create project (repo/branch, deploy type)
UI->>API: POST /api/v1/projects
API->>DB: Save project + webhook secret
API->>JQ: enqueue git_build
JQ->>GB: build_and_deploy(project_id)
GB->>SW: Build/push/update service(s)
GB->>DB: Update status + build history
UI->>API: GET /api/v1/projects/:id + /builds + /logs
Project features included in current backend:
- Dockerfile and Compose deploy modes
- Compose validation (
POST /api/v1/projects/validate-compose) - Auto-generated webhook URL/secret per project
- Manual deploy trigger and webhook deploy trigger
- Build history pagination and rollback endpoint
This section expands the quick start above with panel-domain mode, Cloudflare automation, and manual Docker Compose setup.
install.shis the recommended bootstrap path for a fresh machine (generates.env, runs migrations, starts services).
Requirements:
- Docker
- Docker Compose plugin (
docker compose)
Run:
curl -fsSL https://raw.githubusercontent.com/Noqte-AI/scale-swarm/main/install.sh | bashOptional (custom install directory / branch):
curl -fsSL https://raw.githubusercontent.com/Noqte-AI/scale-swarm/main/install.sh | \
SCALE_SWARM_INSTALL_DIR=/opt/scale-swarm SCALE_SWARM_REPO_BRANCH=main bashIf you've already cloned the repo locally, you can still run:
./install.shOptional: panel domain + IP allowlist (self-hosted Scale Swarm UI)
curl -fsSL https://raw.githubusercontent.com/Noqte-AI/scale-swarm/main/install.sh | \
SCALE_SWARM_PANEL_DOMAIN=swarm.example.com \
SCALE_SWARM_PANEL_ALLOWED_CIDRS="203.0.113.10/32,198.51.100.0/24" \
SCALE_SWARM_PANEL_ACME_EMAIL=ops@example.com bashOptional Cloudflare DNS shortcut (auto create/update A record):
curl -fsSL https://raw.githubusercontent.com/Noqte-AI/scale-swarm/main/install.sh | \
SCALE_SWARM_PANEL_DOMAIN=swarm.example.com \
SCALE_SWARM_CLOUDFLARE_API_TOKEN=cf_token_with_zone_dns_edit \
SCALE_SWARM_CLOUDFLARE_ZONE_ID=your_zone_id \
SCALE_SWARM_CLOUDFLARE_PROXIED=false bashNotes:
- Panel domain mode runs a local Caddy reverse proxy for the Scale Swarm UI and can enforce CIDR allowlists.
- In panel domain mode, the app frontend is moved to
127.0.0.1:8080by default and Caddy binds80/443. - If you set
SCALE_SWARM_CLOUDFLARE_PROXIED=true, server-side CIDR allowlist will see Cloudflare edge IPs (use Cloudflare Zero Trust/WAF for client-IP restrictions). - Panel domain mode may conflict with cluster Traefik ingress on the same host if both need ports
80/443.
What it does:
- creates
.env(if missing) with generated secrets - starts PostgreSQL
- runs Alembic migrations
- starts backend and frontend containers
Then open the UI at http://localhost (or the port in SCALE_SWARM_HTTP_PORT) and complete the first-run setup wizard.
- Create env file:
cp .env.example .env- Edit
.envand set strong values for at least:
POSTGRES_PASSWORDSCALE_SWARM_JWT_SECRETSCALE_SWARM_ENCRYPTION_KEYSCALE_SWARM_GRAFANA_ADMIN_PASSWORD
- Start database:
docker compose up -d --build db- Run migrations:
docker compose build backend
docker compose run --rm -e PYTHONPATH=/app backend alembic upgrade head- Start app services:
docker compose up -d --build backend frontend- Open the UI:
http://localhost- or
http://localhost:<SCALE_SWARM_HTTP_PORT>if you changed the port
| Item | Required | Why |
|---|---|---|
Docker + docker compose |
Yes | Runs DB, backend, frontend |
Strong JWT_SECRET |
Yes (prod) | JWT token signing |
ENCRYPTION_KEY |
Yes (prod) | Encrypts stored secrets (OAuth tokens, webhooks, etc.) |
| Alembic migrations | Yes | Keeps schema aligned with current features |
Scale Swarm now uses JWT auth by default.
Default mode is JWT-based auth. Legacy
X-Api-Keyis optional and disabled unless explicitly enabled.
Flow:
GET /api/v1/auth/setup-statusto check if setup is requiredPOST /api/v1/auth/setup-adminto create the first admin userPOST /api/v1/auth/loginto obtain access/refresh tokens- Use
Authorization: Bearer <token>for authenticated requests
Legacy mode (optional):
- Set
SCALE_SWARM_ALLOW_LEGACY_API_KEY_AUTH=true - Send
X-Api-Key: <key>
Notes:
- Some features (for example GitHub connections) require JWT auth and a real user context.
- Admin-only endpoints include audit logs, user management, and GitHub OAuth app settings.
Most operational endpoints are cluster-scoped.
The UI stores the selected cluster and automatically injects
X-Cluster-Idin API requests.
- The frontend automatically sends
X-Cluster-Idfor the selected cluster. - The backend resolves the active Docker client using that header.
- Cluster definitions are stored in the database and can be tested/reconnected.
Example:
curl http://localhost/api/v1/projects \
-H "Authorization: Bearer <token>" \
-H "X-Cluster-Id: 1"All endpoints are under /api/v1.
GET /auth/setup-statusPOST /auth/setup-adminPOST /auth/loginPOST /auth/refreshGET /auth/mePOST /auth/users(admin)GET /auth/users(admin)DELETE /auth/users/{user_id}(admin)
GET|POST /clustersGET|PATCH|DELETE /clusters/{cluster_id}POST /clusters/{cluster_id}/connect
POST /swarm/initPOST /swarm/bootstrapGET /swarm/statusGET /swarm/tokensGET|POST /nodesGET|PATCH|DELETE /nodes/{node_id}POST /nodes/{node_id}/promotePOST /nodes/{node_id}/demoteGET /nodes/app-public-keyPOST /nodes/add-with-ssh-setup
GET|POST /projectsPOST /projects/validate-composeGET|PATCH|DELETE /projects/{project_id}POST /projects/{project_id}/deployPOST /projects/{project_id}/deploy/stream(SSE)POST /projects/{project_id}/scaleGET /projects/{project_id}/webhookPOST /projects/{project_id}/webhook/regenerateGET /projects/{project_id}/buildsPOST /projects/{project_id}/rollbackPOST /webhooks/project/{project_id}(webhook receiver; HMAC validated)
Project access control fields (create/update payloads) include:
exposure_mode(public/domain_only/internal)allowed_cidrs(Traefik IP allowlist for domain-routed apps)
GET /git/connectionsGET /git/auth/githubGET /git/auth/github/callbackDELETE /git/connections/{connection_id}GET /git/reposGET /git/repos/{owner}/{repo}/branches
POST /ingress/traefik/deploy— deploy or update Traefik stack (body:acme_email)GET /ingress/traefik/status— deployment status, tasks, network infoDELETE /ingress/traefik/remove— remove Traefik stack
POST /portainer/deploy— deploy Portainer CE (body:domain?,enable_https?,admin_password?,allowed_cidrs?)GET /portainer/status— deployment status, tasks, network infoDELETE /portainer/remove— remove Portainer stack
POST /registry/deploy— deploy private Docker Registry (body:domain?,enable_https?)GET /registry/status— deployment status, tasks, network infoDELETE /registry/remove— remove Registry stack
- Monitoring stack:
/monitoring/*(deploy/status/remove, grafana reset, targets) - Cluster metrics overview:
/monitor/* - Alerts:
/alerts/*(channels, rules, history) - Audit logs:
GET /audit/logs(admin) - Admin settings:
- GitHub OAuth app config:
/settings/github* - Security policy:
/settings/security-policy* - Monitoring access (Grafana domain/CIDRs):
/settings/monitoring-access*
- GitHub OAuth app config:
Environment variables use the SCALE_SWARM_ prefix.
| Variable | Default | Notes |
|---|---|---|
SCALE_SWARM_ENVIRONMENT |
development |
production enforces stricter validation |
SCALE_SWARM_DATABASE_URL |
local Postgres DSN | Async SQLAlchemy connection string |
SCALE_SWARM_JWT_SECRET |
change-me-in-production |
Must be strong in production |
SCALE_SWARM_ENCRYPTION_KEY |
empty | Required in production for secret encryption |
SCALE_SWARM_GRAFANA_ADMIN_PASSWORD |
change-me |
Set a strong password |
SCALE_SWARM_CORS_ORIGINS |
["http://localhost:5173"] |
JSON array |
SCALE_SWARM_ALLOW_LEGACY_API_KEY_AUTH |
false |
Enables X-Api-Key auth |
SCALE_SWARM_API_KEY |
changeme |
Used only if legacy API-key auth is enabled |
SCALE_SWARM_ENABLE_ALERT_CHECKER |
true |
Background alert scheduler |
SCALE_SWARM_AUTO_CREATE_TABLES |
true (dev) |
Prefer Alembic in production |
| Variable | Default | Notes |
|---|---|---|
SCALE_SWARM_GRAFANA_ALLOW_ANONYMOUS |
false |
Keep disabled unless you intentionally expose dashboards |
SCALE_SWARM_GRAFANA_ALLOW_EMBEDDING |
false |
Enable only if embedding Grafana is required |
SCALE_SWARM_MONITORING_EXPOSE_INTERNAL_PORTS |
false |
Publishes internal monitoring ports (Prometheus/Loki/cAdvisor/node_exporter); not recommended publicly |
SCALE_SWARM_MONITORING_EXPOSE_GRAFANA_PORT |
false |
Publishes Grafana direct host port (3001) |
| Variable | Default | Notes |
|---|---|---|
SCALE_SWARM_NODE_FIREWALL_MANAGE |
false |
Enable UFW management during node provisioning |
SCALE_SWARM_NODE_FIREWALL_SSH_ALLOWED_CIDRS |
[] |
Allowed sources for SSH (22/tcp) |
SCALE_SWARM_NODE_FIREWALL_NODE_EXPORTER_ALLOWED_CIDRS |
[] |
Allowed sources for 9100/tcp if used |
SCALE_SWARM_NODE_FIREWALL_SWARM_ALLOWED_CIDRS |
[] |
Allowed sources for Swarm ports (2377, 7946, 4789) |
SCALE_SWARM_NODE_FIREWALL_HTTP_HTTPS_ALLOWED_CIDRS |
[] |
Allowed sources for 80/443 on provisioned nodes |
Configured from the UI (Settings) and stored in the database:
- Admin Security Policy: public port governance, default/required CIDRs, enforced project exposure mode
- Monitoring Access: Grafana domain, HTTPS flag, Traefik IP allowlist
| Variable | Default | Notes |
|---|---|---|
SCALE_SWARM_DEPLOY_MODE |
local |
local or remote |
SCALE_SWARM_MANAGER_IP |
empty | Legacy/default cluster remote manager |
SCALE_SWARM_MANAGER_SSH_USER |
root |
SSH user |
SCALE_SWARM_MANAGER_SSH_KEY_PATH |
~/.ssh/id_rsa |
SSH key path |
SCALE_SWARM_DOCKER_TLS_CA |
empty | TLS CA cert path |
SCALE_SWARM_DOCKER_TLS_CERT |
empty | TLS client cert path |
SCALE_SWARM_DOCKER_TLS_KEY |
empty | TLS client key path |
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -e .
alembic upgrade head
uvicorn app.main:app --reloadNotes:
- Default backend URL:
http://localhost:8000 - Requires PostgreSQL running (you can use
docker compose up -d dbfrom the repo root)
cd frontend
npm install
npm run devFrontend dev server runs on http://localhost:5173.
scale-swarm/
├── backend/
│ ├── alembic/ # DB migrations
│ ├── ansible/ # Bootstrap/provision/remove playbooks + roles
│ ├── app/
│ │ ├── api/v1/ # FastAPI route modules
│ │ │ ├── ingress.py # Traefik ingress endpoints
│ │ │ ├── portainer_stack.py # Portainer endpoints
│ │ │ ├── registry_stack.py # Registry endpoints
│ │ │ ├── monitoring_stack.py # Monitoring endpoints
│ │ │ └── ...
│ │ ├── core/ # config, auth, DB, cluster context, security
│ │ ├── models/ # ORM models (cluster, project, build_history, user, ...)
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/
│ │ ├── ingress_stack.py # Traefik stack deploy/status/remove
│ │ ├── portainer_stack.py # Portainer stack deploy/status/remove
│ │ ├── registry_stack.py # Registry stack deploy/status/remove
│ │ ├── monitoring_stack.py # Prometheus+Grafana+Loki stack
│ │ ├── git_builder.py # Git clone + Docker build pipeline
│ │ ├── remote_exec.py # SSH command execution
│ │ └── ...
│ └── pyproject.toml
├── frontend/
│ ├── src/
│ │ ├── components/ # Layout, cluster selector, dialogs, GitHub connect, etc.
│ │ ├── context/ # Auth and cluster context providers
│ │ ├── pages/
│ │ │ ├── Ingress.tsx # Traefik + Portainer + Registry management
│ │ │ ├── Monitoring.tsx # Prometheus/Grafana stack + targets
│ │ │ └── ...
│ │ └── api/client.ts # API client + auth/cluster headers
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml # Root compose for db + backend + frontend
├── .env.example
└── install.sh # Recommended setup script
- Production deployments should use Alembic migrations (
alembic upgrade head) instead of relying on auto table creation. - Secrets (Git tokens, OAuth tokens, alert channel configs, webhook secrets) are stored encrypted using
SCALE_SWARM_ENCRYPTION_KEY. - Some background functionality (builds, alert checks) depends on the internal job worker started by the backend lifespan process.
- Cluster services (Traefik, Portainer, Registry) are deployed via SSH +
docker stack deployto the cluster manager node. - The monitoring stack uses the Docker Python SDK directly for service creation.
No license file is currently included in this repository.