-
Notifications
You must be signed in to change notification settings - Fork 91
2025-03-03 ZeroTier - health checking - alternative proposal #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR follows on from the extensive discussion associated with zyclonite#37. Never before have I even *contemplated* submitting a PR covering the same ground as an existing open PR. However, on this occasion I thought it might be useful to have a concrete proposal to compare and contrast with zyclonite#37. I sincerely hope that laying this on the (virtual) table and then minimising further interaction *might* help us converge on a solution. <hr> Changes: * `docker-compose.yml` and `docker-compose-router.yml`: - replaces deprecated `version` statement with `---`. - adds example environment variables. * `Dockerfile` - corrects case of "as" to "AS" (silences build warning). - adds and configures `healthcheck.sh` (as per zyclonite#37). - includes `tzdata` package (moved from `Dockerfile.router`) so messages have local timestamps. * `Dockerfile.router` - removes `tzdata` (moved to `Dockerfile`). * `entrypoint-router.sh`: - code for first launch auto join of listed networks expanded to include additional help material. * `entrypoint.sh`: - "first launch" auto join of listed networks (code copied from `entrypoint-router.sh`, as modified per above). - "self repair" of permissions in persistent store (code copied from `entrypoint-router.sh`). - adds launch-time message to make it clear that the client is launching (complements messages in `entrypoint-router.sh`). - abstracts some common strings to environment variables (opportunistic change). * `README.md`: - updates examples. - describes new environment variables (including move of `ZEROTIER_ONE_NETWORK_IDS` from `README-router.md`. - documents health-checking. * `README-router.md` - updates examples. - explains relationship of router and client. Added: * `healthcheck.sh`, based on original proposal in zyclonite#37 and subsequent suggestions for modification by me. I gave serious consideration to the code for synchronising networks in the entry point scripts. The idea is quite attractive. It is safe to automate joins in a "clean slate" situation. However, a *leave* followed by a *join* is not guaranteed to be idempotent. That's because the *leave* destroys the network-specific configuration options (`allowManaged`, `allowGlobal`, `allowDefault`, `allowDNS`). On balance I think it's better left to users to send explicit *leave* commands via the CLI and take responsibility for restoring lost configuration options on any subsequent *join*. I will post the results of testing this PR separately. Signed-off-by: Phill Kelley <34226495+Paraphraser@users.noreply.github.com>
My humble question for a small use case scenario : |
first i was on vacation and last week a bit downing in work - sorry for the delay regarding buildah version - i fear we have to live with the one provided with the ubuntu 24 gh actions runner (they might upgrade it from time to time) about the sponsored by comment - i fully support individual contributions, so you are free to add your real name but i would like to not go down the route with having companies sponsoring code as this might be tricky from a licensing perspective if not in sync with the individual contributor and so on... |
@zyclonite However, as per my agreement with PMGA Tech LLP, the code cannot be used without attaching the proper licensing text. It is now not possible for me to go back on it. Further, I still do not understand the fuss behind all this as many major MIT projects owned by corporates also include license files (Examples already given above); and also since ZeroTier itself is not MIT but BSL, you can check here: https://github.com/zerotier/ZeroTierOne?tab=License-1-ov-file As I do not want to take this argument further, so its your call on whether to merge the code or cancel it. Please note one more thing, in case you intend to merge this proposal, the proper licensing text needs to be copied from PR #37 . I would suggest you merge PR #37 since this is just a copy of that MR with minor changes as already admitted by the OP in the first post. (attaching screenshot for future reference in case required) |
1. Full rewrite, including copious comments explaining theory of operation. In essence, if a «networkID» is mentioned in (internal path): ``` /var/lib/zerotier-one/networks.d ``` then it should be matched by a route in the host's routing table: * Zero «networkID» = zero routes * One «networkID» = one route * Two «networkID» = two routes * ... Any mismatch causes the container to go unhealthy. From the perspective of the health-checking script, the question of *which* network is immaterial so there is no need to employ techniques such as iterating to discover *which* network is causing a problem. This is because the script has no way of communicating anything other than an exit status. Any `echo` statements it issues will not make it into the container's log. The simple presence/absence of a «networkID» in `networks.d` is taken to indicate which networks the user *intends* the container to join. There is no reliance on environment variables to propagate any health-checking information into the container. No other variables are introduced so the argument about naming conventions goes away. `Dockerfile`: 1. Removes `--start-interval` flag from `HEALTHCHECK` command. The flag was preventing `buildah` builds from succeeding. `README.md`: 1. Removes references to the following environment variables: * `ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS` * `ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH` 2. Rewrites explanation of health-checking. `README-router.md`: 1. Removes references to the following environment variables: * `ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS` * `ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH` Signed-off-by: Phill Kelley <34226495+Paraphraser@users.noreply.github.com>
Testing (with 2025-04-06 changes)Reference service definition zerotier:
container_name: zerotier
image: "zyclonite/zerotier:local"
restart: unless-stopped
network_mode: host
environment:
- TZ=${TZ:-Etc/UTC}
# - ZEROTIER_ONE_NETWORK_IDS=${ZEROTIER_ONE_NETWORK_IDS}
volumes:
- ./volumes/zerotier-one:/var/lib/zerotier-one
devices:
- "/dev/net/tun:/dev/net/tun"
cap_add:
- NET_ADMIN
- SYS_ADMIN Note:
Test 1 - clean slate
Test 2 - Join first network
Test 3 - interrupt first network
Test 4 - join second network
Test 5 - interrupt one network
Test 6 - down, up the container
|
This PR follows on from the extensive discussion associated with #37.
Never before have I even contemplated submitting a PR covering the same ground as an existing open PR. However, on this occasion I thought it might be useful to have a concrete proposal to compare and contrast with #37.
I sincerely hope that laying this on the (virtual) table and then minimising further interaction might help us converge on a solution.
docker-compose.yml
anddocker-compose-router.yml
:replaces deprecated
version
statement with---
.adds example environment variables.
Dockerfile
corrects case of "as" to "AS" (silences build warning).
adds and configures
healthcheck.sh
(as per [Enhancement] Add Health-Check to DockerFile #37).includes
tzdata
package (moved fromDockerfile.router
) so messages have local timestamps.Dockerfile.router
tzdata
(moved toDockerfile
).entrypoint-router.sh
:entrypoint.sh
:"first launch" auto join of listed networks (code copied from
entrypoint-router.sh
, as modified per above)."self repair" of permissions in persistent store (code copied from
entrypoint-router.sh
).adds launch-time message to make it clear that the client is launching (complements messages in
entrypoint-router.sh
).abstracts some common strings to environment variables (opportunistic change).
README.md
:updates examples.
describes new environment variables (including move of
ZEROTIER_ONE_NETWORK_IDS
fromREADME-router.md
.documents health-checking.
README-router.md
Added:
healthcheck.sh
, based on original proposal in [Enhancement] Add Health-Check to DockerFile #37 and subsequent suggestions for modification by me.I gave serious consideration to the code for synchronising networks in the entry point scripts. The idea is quite attractive. It is safe to automate joins in a "clean slate" situation. However, a leave followed by a join is not guaranteed to be idempotent. That's because the leave destroys the network-specific configuration options (
allowManaged
,allowGlobal
,allowDefault
,allowDNS
).On balance I think it's better left to users to send explicit leave commands via the CLI and take responsibility for restoring lost configuration options on any subsequent join.
I will post the results of testing this PR separately.
Additional changes as at 2025-04-06
healthcheck.sh
:Full rewrite, including copious comments explaining theory of
operation. In essence, if a «networkID» is mentioned in
(internal path):
then it should be matched by a route in the host's routing table:
Any mismatch causes the container to go unhealthy. From the
perspective of the health-checking script, the question of which
network is immaterial so there is no need to employ techniques such
as iterating to discover which network is causing a problem. This
is because the script has no way of communicating anything other
than an exit status. Any
echo
statements it issues will not makeit into the container's log.
The simple presence/absence of a «networkID» in
networks.d
istaken to indicate which networks the user intends the container
to join.
There is no reliance on environment variables to propagate any
health-checking information into the container. No other variables
are introduced so the argument about naming conventions goes away.
Dockerfile
:--start-interval
flag fromHEALTHCHECK
command. The flagwas preventing
buildah
builds from succeeding.README.md
:Removes references to the following environment variables:
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH
Rewrites explanation of health-checking.
README-router.md
:Removes references to the following environment variables:
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH