This repository takes inspiration from the Kubernetes in Docker project. It's intended as a quick and easy playground for Nomad for testing as well as enabling multi-node integration and failure scenarios. It is a WIP and current functionality is very barebones.
Implemented from the stack:
- nomad
- consul
Bits and pieces adapted from:
https://github.com/kubernetes-sigs/kind
https://github.com/multani/docker-nomad
https://github.com/hashicorp/docker-consul
To build the docker containers
make build
To run the stack
make up
To tear down the stack
make down
To scale out the clients
note: to scale in you need to docker stop/rm the containers or run
make down
to reset the stack
NOMAD_CLIENT_COUNT=n make up # where n is the desired client count
To connect to the web ui for nomad/consul
# nomad
open http://localhost:4646/ui
# consul
open http://localhost:8500/ui
To run an example job
nomad run jobs/example.hcl
The following setup and information is based off of the cosmonic blog post. It hasn't been fully tested to ensure everything is working as expected.
To enable cilium, run make up
with the environment variable set. If you're already up and running you'll need to run make down
first to recreate the nomad client.
CILIUM_ENABLED=1 make up
Once the stack is up and running, you can deploy the netreap service
# Check cilium health
docker exec hind.nomad.client.01 cilium status
It might take a few minutes for Cilium to come up as healthy. When the last line says Cluster health: 1/1 reachable
and the remaining helthchecks are passing you should be good to move on (approx 2-5min).
Once you've confirmed cilium agent is healthy you'll need to restart the nomad service.
docker exec hind.nomad.client.01 systemctl restart nomad
You can now run the netreap job.
nomad run cilium/netreap.hcl
Apply a policy
consul kv put netreap.io/policy @cilium/policy-allow-all.json
Run an example job using cilium and test different network policies
nomad run jobs/example-cilium.hcl
Test curl against google and see that we can connect.
nomad exec -i \
-t $(curl localhost:4646/v1/job/example_cilium/allocations 2>/dev/null \
| jq -r '.[0].ID') \
curl google.com -v
Apply deny policy
consul kv put netreap.io/policy @cilium/policy-blocked-egress.json
Test curl again, and now see that the connection is blocked.
nomad exec -i \
-t $(curl localhost:4646/v1/job/example_cilium/allocations 2>/dev/null \
| jq -r '.[0].ID') \
curl google.com -v
The hubble relay job is configured to run as a service job, it will let you interact with hubble using the cli.
To deploy the relay
nomad run cilium/hubble-relay.hcl
Checking the deployment health (assumes the job is deployed to the first node)
docker exec hind.nomad.client.01 hubble status
Healthcheck (via localhost:4245): Ok
Current/Max Flows: 485/4,095 (11.84%)
Flows/s: 2.45
Connected Nodes: 1/1
This project has mainly been tested using MacOS, colima 0.6.x, and requires cgroupsv2 enabled on the docker host.
- colima
- docker-cli
- buildx
- nomad (used for cli commands)
- consul (used for cli commands)
System and docker information confirmed working
-
Colima, 0.6.7
-
Debian GNU/Linux 12 (bookworm) - click for details
Client: Docker Engine - Community Version: 24.0.7 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.11.2 Path: /Users/user/.docker/cli-plugins/docker-buildx Server: Containers: 4 Running: 4 Paused: 0 Stopped: 0 Images: 11 Server Version: 25.0.0 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: a1496014c916f9e62104b33d1bb5bd03b0858e59 runc version: v1.1.11-0-g4bccb38 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 6.1.0-17-amd64 Operating System: Debian GNU/Linux 12 (bookworm) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 1.921GiB Name: debian ID: 3eb5fae9-6504-4117-a93d-c099568e79c2 Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
-
Rocky Linux 9.3 (Blue Onyx) - click for details
Client: Docker Engine - Community Version: 24.0.7 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.11.2 Path: /Users/user/.docker/cli-plugins/docker-buildx Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 24.0.7 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: a1496014c916f9e62104b33d1bb5bd03b0858e59 runc version: v1.1.11-0-g4bccb38 init version: de40ad0 Security Options: seccomp Profile: builtin selinux cgroupns Kernel Version: 5.14.0-362.13.1.el9_3.x86_64 Operating System: Rocky Linux 9.3 (Blue Onyx) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 1.722GiB Name: default ID: 84a6e94a-7b3e-42ff-9370-1d0820695a55 Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
There is no client persistence when running up and down.
- install bpftool?
- improve cluster management tooling
- add ingress load balancer