|
1 |
| -# itsUP |
| 1 | +# itsUP <!-- omit in toc --> |
2 | 2 |
|
3 |
| -**Lean, automated, poor man's infra for lightweight services running in docker.** |
| 3 | +_Lean, secure, automated, zero downtime<sup>\*</sup>, poor man's infra for services running in docker._ |
4 | 4 |
|
5 |
| -## Fully managed docker compose infra |
| 5 | +<img align="center" src="assets/freight-conductor.png"> |
| 6 | +<p></p> |
| 7 | +<p> |
| 8 | +Running a home network? Then you may already have a custom setup, probably using docker compose. You might enjoy all the maintenance and tinkering, but you are surely aware of the pitfalls and potential downtime. If you think that is ok, or if you don't want automation, then this stack is probably not for you. |
| 9 | +Still interested? Then read on... |
| 10 | +</p> |
6 | 11 |
|
7 |
| -Single machine multi docker compose architecture with a low cpu/storage footprint and near-zero<sup>\*</sup> downtime. |
| 12 | +**Table of contents:** |
8 | 13 |
|
9 |
| -It runs two nginx proxies in series (proxy -> terminate) to be able to: |
| 14 | +- [Key concepts](#key-concepts) |
| 15 | + - [Single source of truth](#single-source-of-truth) |
| 16 | + - [Managed proxy setup](#managed-proxy-setup) |
| 17 | + - [Managed service deployments \& updates](#managed-service-deployments--updates) |
| 18 | + - [\*Zero downtime?](#zero-downtime) |
| 19 | +- [Prerequisites](#prerequisites) |
| 20 | +- [Howto](#howto) |
| 21 | + - [Install \& run](#install--run) |
| 22 | + - [Configure services](#configure-services) |
| 23 | + - [Configure plugins](#configure-plugins) |
| 24 | + - [CrowdSec](#crowdsec) |
| 25 | + - [Using the Api \& OpenApi spec](#using-the-api--openapi-spec) |
| 26 | + - [Webhooks](#webhooks) |
| 27 | +- [Dev/ops tools](#devops-tools) |
| 28 | + - [utility functions for dev workflow](#utility-functions-for-dev-workflow) |
| 29 | + - [Utility scripts](#utility-scripts) |
| 30 | +- [Questions one might have](#questions-one-might-have) |
| 31 | + - [What about Nginx?](#what-about-nginx) |
| 32 | + - [Does this scale to more machines?](#does-this-scale-to-more-machines) |
| 33 | +- [Disclaimer](#disclaimer) |
10 | 34 |
|
11 |
| -- terminate SSL/TLS |
12 |
| -- do SSL/TLS passthrough |
13 |
| -- target many upstream endpoints |
| 35 | +## Key concepts |
14 | 36 |
|
15 |
| -**Advantages:** |
| 37 | +### Single source of truth |
16 | 38 |
|
17 |
| -- shared docker network: encrypted intra-node communication (over a shared network named `proxynet`) |
18 |
| -- near-zero-downtime\* |
| 39 | +One file (`db.yml`) is used for all the infra and workloads it creates and manages, to ensure a predictable and reliable automated workflow. |
| 40 | +This means abstractions are used which means a trade off between flexibility and reliability, but the stack is easily modified and enhanced to meet your needs. We strive to mirror docker compose functionality, which means no concessions are necessary from a docker compose enthusiast's perspective. |
19 | 41 |
|
20 |
| -_<sup>_</sup>Near-zero-downtime?\* |
| 42 | +### Managed proxy setup |
21 | 43 |
|
22 |
| -Well, all (stateless) nodes that get rotated do not incur downtime, yet nginx neads a reload signal. During low traffic that will allow for a graceful handling of outstanding http sessions and no downtime, but may be problematic if nginx needs to wait for a magnitude of open sessions. In that case a timeout will force the last open sessions to be terminated. |
23 |
| -This approach is a very reliable and cheap approach to achieve zero downtime. |
| 44 | +itsUP generates and manages `proxy/docker-compose.yml` that runs two proxies in series (tcp -> web), with only the first exposing ports, to be able to: |
24 | 45 |
|
25 |
| -_But what about stateful services?_ |
| 46 | +1. do TLS passthrough to existing endpoints (most people have secure Home Assistant setups already) |
| 47 | +2. terminate TLS and forward securely to managed endpoints over an encrypted `proxynet` docker network |
26 | 48 |
|
27 |
| -It is surely possible to deploy stateful services but those MUST NOT be targeted with the `entrypoint: xxx` prop, as those services are the entrypoints which MUST be stateless, as those are rolled up with the `docker rollout` by the automation. In order to update those services you are on your own, but it's a breeze compared to local installs, as you can just docker compose commands. |
| 49 | +### Managed service deployments & updates |
28 | 50 |
|
29 |
| -**Prerequisites:** |
| 51 | +itsUP generates and manages `upstream/{project}/docker-compose.yml` files to deploy container workloads as defined as a service in `db.yml`. |
| 52 | +This centralizes and abstracts away the plethora of custom docker compose setups that are mostly uniform in their approach anyway, so controlling their artifacts from one source of truth makes a lot of sense. |
30 | 53 |
|
31 |
| -- [docker](https://www.docker.com) |
| 54 | +### <sup>\*</sup>Zero downtime? |
| 55 | + |
| 56 | +Like with all docker orchestration platforms (even Kubernetes) this is dependent on the containers: |
| 57 | + |
| 58 | +- are healthchecks correctly implemented? |
| 59 | +- Are SIGHUP signals respected to shutdown within an acceptable time frame? |
| 60 | +- Are the containers stateless? |
| 61 | + |
| 62 | +itsUP will rollout changes by: |
| 63 | + |
| 64 | +1. bringing up a new container and wait till it is healthy (if it has a health check then max 60s, otherwise assumes it is healthy after 10s) |
| 65 | +2. kill the old container and wait for it to drain, then removes it |
| 66 | + |
| 67 | +_What about stateful services?_ |
| 68 | + |
| 69 | +It is surely possible to deploy stateful services but beware that those might not be good candidates for the `docker rollout` automation. In order to update those services it is strongly advised to first read the upgrade documentation for the newer version and follow the prescribed steps. More mature databases might have integrated these steps in the runtime, but expect that to be an exception. So, to garner correct results you are on your own and will have to read up on your chosen solutions. |
| 70 | + |
| 71 | +## Prerequisites |
| 72 | + |
| 73 | +**Tools:** |
| 74 | + |
| 75 | +- [docker](https://www.docker.com) daemon and client |
32 | 76 | - docker [rollout](https://github.com/Wowu/docker-rollout) plugin
|
33 |
| -- Portforwarding of port `80` and `443` to the machine running this stack. |
34 | 77 |
|
35 |
| -## Howto |
| 78 | +**Infra:** |
36 | 79 |
|
37 |
| -### Configure |
| 80 | +- Portforwarding of port `80` and `443` to the machine running this stack. This stack MUST overtake whatever routing you now have, but don't worry, as it supports your home assistant setup and forwards any traffic it expects to it (if you finish the pre-configured `home-assistant` project in `db.yml`) |
| 81 | +- A wildcard dns domain like `*.itsup.example.com` that points to your home ip. This allows to choose whatever subdomain for your services. You may of course choose and manage any domain in a similar fashion for a public service, but I suggest not going through such trouble for anything private. |
38 | 82 |
|
39 |
| -1. Copy `db.yml.sample` to `db.yml` and edit your project and their services (see explanations below). |
40 |
| -2. Copy `.env.sample` to `.env` and set the correct info. |
41 |
| -3. [OPTIONAL] In case you want to run the api create an `API_KEY` (`openssl rand -hex 16`) and put in `.env`. |
| 83 | +## Howto |
42 | 84 |
|
43 | 85 | ### Install & run
|
44 | 86 |
|
45 |
| -Install everything and start the proxy and api so that we can receive incoming challenge webhooks. |
| 87 | +These are the scripts to install everything and start the proxy and api so that we can receive incoming challenge webhooks: |
46 | 88 |
|
47 |
| -1. `bin/install.sh`: installs all project deps. |
48 |
| -2. `bin/start-all.sh`: starts the proxy and the api server. |
| 89 | +1. `bin/install.sh`: creates a local `.venv` and installs all python project deps. |
| 90 | +2. `bin/start-all.sh`: starts the proxy (docker compose) and the api server (uvicorn). |
49 | 91 | 3. `bin/apply.py`: applies all of `db.yml`.
|
50 |
| -4. `bin/api-logs.sh`: tail the output of the api server. |
| 92 | +4. `bin/api-logs.sh`: tails the output of the api server. (The |
| 93 | + |
| 94 | +But before doing so please configure your stuff: |
| 95 | + |
| 96 | +### Configure services |
| 97 | + |
| 98 | +1. Copy `.env.sample` to `.env` and set the correct info (comments should be self explanatory). |
| 99 | +2. Copy `db.yml.sample` to `db.yml` and edit your project and their services (see explanations below). |
51 | 100 |
|
52 |
| -### Adding an upstream service |
| 101 | +Project and service configuration is explained below with the following scenarios: |
53 | 102 |
|
54 |
| -1. Edit `db.yml` and add your projects with their service(s), and make sure the project has `entrypoint: {your_entrypoint_svc}`. |
55 |
| -2. Run `bin/apply.py` to get certs, write needed artifacts, update relevant docker stacks and reload nginx. |
| 103 | +**Adding an upstream service that will be deployed and managed:** |
56 | 104 |
|
57 |
| -### Adding a passthrough endpoint |
| 105 | +1. Edit `db.yml` and add your projects with their service(s), and make sure the project has `entrypoint: {the name of your entrypoint svc}`. |
| 106 | +2. Run `bin/apply.py` to write all artifacts and deploy/update relevant docker stacks. |
| 107 | + |
| 108 | +**Adding a passthrough endpoint:** |
58 | 109 |
|
59 | 110 | 1. Add a project without `entrypoint:` and one service, which now need `name`, `domain` and `passthrough: true`.
|
60 | 111 | 2. Run `bin/apply.py` to roll out the changes.
|
61 | 112 |
|
62 |
| -### Adding a local (host) endpoint |
| 113 | +**Adding a local (host) endpoint:** |
63 | 114 |
|
64 | 115 | 1. Add a project without `entrypoint:` and one service, which only need `name` and `domain`.
|
65 | 116 | 2. Run `bin/apply.py` to roll out the changes.
|
66 | 117 |
|
67 |
| -### Plugins |
| 118 | +### Configure plugins |
68 | 119 |
|
69 | 120 | You can enable and configure plugins in `db.yml`. Right now we support the following:
|
70 | 121 |
|
71 | 122 | #### CrowdSec
|
72 | 123 |
|
73 |
| -[CrowdSec](https://www.crowdsec.net) can run as a container via plugin [crowdsec-bouncer-traefik-plugin](https://github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin). First set `enable: true`, run `bin/write-artifacts.py`, and bring up the stack (or just the `crowdsec` container: |
| 124 | +[CrowdSec](https://www.crowdsec.net) can run as a container via plugin [crowdsec-bouncer-traefik-plugin](https://github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin). |
| 125 | + |
| 126 | +**Step 1: generate api key** |
| 127 | + |
| 128 | +First set `enable: true`, run `bin/write-artifacts.py`, and bring up the `crowdsec` container: |
| 129 | + |
| 130 | +``` |
| 131 | +docker compose up -d crowdsec |
| 132 | +``` |
| 133 | + |
| 134 | +Now we can execute the command to get the key: |
74 | 135 |
|
75 | 136 | ```
|
76 | 137 | docker compose exec crowdsec cscli bouncers add crowdsecBouncer
|
77 | 138 | ```
|
78 | 139 |
|
79 | 140 | Put the resulting api key in the plugin configuration in `db.yml` and apply with `bin/apply.py`.
|
| 141 | +Crowdsec is now running and wired up, but does not use any blocklists yet. Those can be managed manually, but preferable is to become part of the community by creating an account with CrowdSec to get access and contribute to the community blocklists, as well as view results in your account's dashboards. |
| 142 | + |
| 143 | +**Step 2: connect your instance with the CrowdSec console** |
| 144 | + |
| 145 | +After creating an account create a machine instance in the console, and register the enrollment key in your stack: |
| 146 | + |
| 147 | +``` |
| 148 | +docker compose exec crowdsec cscli console enroll ${enrollment key} |
| 149 | +``` |
| 150 | + |
| 151 | +**Step 3: subscribe to 3rd party blocklists** |
80 | 152 |
|
81 |
| -### Api & OpenApi spec |
| 153 | +### Using the Api & OpenApi spec |
82 | 154 |
|
83 | 155 | The API allows openapi compatible clients to do management on this stack (ChatGPT works wonders).
|
84 | 156 |
|
@@ -110,13 +182,31 @@ Source `lib/functions.sh` to get:
|
110 | 182 | - `dcp`: run a `docker compose` command targeting the proxy stack (`proxy` + `terminate` services): `dcp logs -f`
|
111 | 183 | - `dcu`: run a `docker compose` command targeting a specific upstream: `dcu test up`
|
112 | 184 | - `dca`: run a `docker compose` command targeting all upstreams: `dca ps`
|
| 185 | +- `dcpx`: execute a command in one of the proxy containers: `dcpx traefik-web 'rm -rf /etc/acme/acme.json && shutdown' && dcp up` |
| 186 | +- `dcux`: execute a command in one of the upstream containers: `dcux test test-informant env` |
113 | 187 |
|
114 |
| -In effect these wrapper commands achieve the same as when going into an `upstream/*` folder and running `docker compose` there. |
115 |
| -I don't want to switch folders/terminals all the time and want to keep history of my commands so I choose this approach. |
| 188 | +In effect these wrapper commands achieve the same as when going into an `upstream/\*`folder and running`docker compose` there. |
| 189 | +I don't want to switch folders/terminals all the time and want to keep a "project root" history of my commands so I choose this approach. |
116 | 190 |
|
117 |
| -### Scripts |
| 191 | +### Utility scripts |
118 | 192 |
|
119 | 193 | - `bin/update-certs.py`: pull certs and reload the proxy if any certs were created or updated. You could run this in a crontab every week if you want to stay up to date.
|
120 | 194 | - `bin/write-artifacts.py`: after updating `db.yml` you can run this script to generate new artifacts.
|
121 |
| -- `bin/validate-db.py`: after manually editing `db.yml` please run this (also ran from `bin/write-artifacts.py`) |
| 195 | +- `bin/validate-db.py`: also ran from `bin/write-artifacts.py` |
122 | 196 | - `bin/requirements-update.sh`: You may want to update requirements once in a while ;)
|
| 197 | + |
| 198 | +## Questions one might have |
| 199 | + |
| 200 | +### What about Nginx? |
| 201 | + |
| 202 | +As you may have noted there is a lot of functionality based on Nginx in this repo. I started out using their proxy, but later on ran into the problem of their engine not picking up upstream changes, learning that only the paid Nginx+ does that. I heavily relied on kubernetes in the past years and such was not an issue in their `ingress-NGINX` controller. When I found that Traefik does not suffer this, AND manages letsencrypt certs gracefully, AND gives us label based L7 functionality (like in Kubernetes), I decided to integrate that instead. Weary about its performance though, I intended to keep both approaches side by side. The Nginx part is not working anymore, but I left the code for others to see how one can overcome certain problems in that ecosystem. If one would like to use Nginx for some reason (it is about 40% faster), it is very easy to switch back. But be aware it implies hooking up the hacky `bin/update-certs.py` script to a cron tab for automatic cert rotation. |
| 203 | + |
| 204 | +### Does this scale to more machines? |
| 205 | + |
| 206 | +In the future we might consider expanding this setup to use docker swarm, as it should be easy to do. For now we like to keep it simple. |
| 207 | + |
| 208 | +## Disclaimer |
| 209 | + |
| 210 | +**Don't blame this infra automation tooling for anything going wrong inside your containers!** |
| 211 | + |
| 212 | +I suggest you repeat that mantra now and then and question yourself when things go wrong: where lies the problem? |
0 commit comments