Conversation
Lint & Test Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the detailed Ziti integration work. I found a couple of blockers:
- The docker-runner ingress proxies to
config.host, which defaults to0.0.0.0, so it can’t reach the local Fastify server and the Ziti listener never comes up. - Controller reconciliation never creates an identity→edge-router policy, so the platform-server and runner identities cannot actually use the router (
NO_EDGE_ROUTERS_AVAILABLE).
Docs also reference a non-existent controller-init compose service; see inline notes.
Lint & Test Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Ziti ingress now targets a routable loopback host by default, and controller reconciliation provisions the missing identity→edge-router policy. Docs reference the correct compose service names. Looks good to merge.
Lint, Typecheck & Test Summary
|
Build, Lint & Test Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-reviewed the latest commits: the runner/server configs now supply deterministic default Ziti payloads even when env vars are missing, and the management client gained the Response typing fix. No regressions spotted. ✅
Test & Lint Summary
|
Local Test & Lint Summary\n-
|
noa-lucent
left a comment
There was a problem hiding this comment.
Latest changes introduced a regression in the controller reconciliation: the bind/dial service policies are now flipped (platform-server is granted Dial and docker-runner is granted Bind), so neither identity can actually perform its role. Please restore runner identities on the dial policy and platform identities on the bind policy. See inline comment for details.
| name: `${profile.serviceName}.dial`, | ||
| type: 'Dial', | ||
| semantic: 'AllOf', | ||
| identityRoles: profile.identities.platform.selectors, |
There was a problem hiding this comment.
[major] This swaps the dial/bind policies: docker-runner (client) must keep the dial policy and platform-server (service host) must keep the bind policy. With this change, reconciliation allows platform-server to dial and forces docker-runner to bind, so neither side can actually connect (the controller rejects the operations as unauthorized). Please restore the original assignment: runner selectors on and platform selectors on .
| name: `${profile.serviceName}.dial`, | ||
| type: 'Dial', | ||
| semantic: 'AllOf', | ||
| identityRoles: profile.identities.platform.selectors, |
There was a problem hiding this comment.
[major] This change swaps the dial/bind policies: docker-runner (client) must remain on the dial policy and platform-server (service host) must remain on the bind policy. With this flip the controller now allows platform-server to dial and requires docker-runner to bind, so neither side can connect—the management API will reject both operations as unauthorized. Please revert to the original assignment (runner selectors on the ${profile.serviceName}.dial policy, platform selectors on ${profile.serviceName}.bind).
|
Restored the Ziti service policy role mapping so the platform identity binds and the docker-runner identity dials per Issue #1289.\n\n## Local Test & Lint Summary\n- |
noa-lucent
left a comment
There was a problem hiding this comment.
Verified the bind/dial policies are back to the correct identities (runner=dial, platform=bind). No additional regressions spotted.
|
Updates:
Checks:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the docker-compose split—the base stack now holds third-party infra only and the overlay wiring looks sound. I spotted a couple of doc regressions though:
- The OpenZiti README/docs now claim the runner proxy always binds to , but ConfigService still defaults it to . Please clarify that only applies when the docker-compose.dev overlay overrides the host, so local pnpm dev users aren’t misled.
- The example now hardcodes paths that only exist inside the containers. Copying those values into the default host-based causes the server to fail immediately because the files don’t exist. Please keep the host-relative defaults (or clearly separate host vs. docker-compose values) so the setup guide works for both workflows.
Once the docs reflect the actual defaults I can re-approve.
README.md
Outdated
| 3. Start the controller stack: `docker compose up -d ziti-controller ziti-controller-init ziti-edge-router`. | ||
| 4. Launch docker-runner and platform-server normally (either via `pnpm dev` or | ||
| `docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d platform-server docker-runner`). | ||
| The server will reconcile the controller, enroll identities, and expose a local proxy on `0.0.0.0:17071` |
There was a problem hiding this comment.
[minor] Config still defaults ZITI_RUNNER_PROXY_HOST to 127.0.0.1 (see ConfigService), so in the standard pnpm dev workflow the proxy continues to bind only to loopback. Saying it always comes up on 0.0.0.0:17071 is misleading unless the docker-compose.dev overlay overrides the host. Can we clarify that 0.0.0.0 applies to the containerized overlay while local processes still listen on 127.0.0.1?
|
|
||
| - `packages/platform-server/.env` | ||
|
|
||
| ``` |
There was a problem hiding this comment.
[major] This .env example now hardcodes /opt/app/... for the identity + tmp paths, but that directory only exists inside the docker-compose.dev containers. When the usual pnpm dev workflow runs on the host, copying these values verbatim points the server at files that do not exist and Ziti bootstrap fails immediately. Please keep the host-relative defaults (./.ziti/...) for local development or clearly split the instructions into host vs. docker-compose variants so each has valid paths.}
docs/containers/ziti.md
Outdated
|
|
||
| The local development stack now provisions an OpenZiti controller, initializer, and edge router. The platform-server | ||
| reconciles controller state at startup (service, policies, and identities) and stores identity material under | ||
| `./.ziti/identities` (mirrored to `/opt/app/.ziti/identities` inside containers). A lightweight local HTTP proxy binds to |
There was a problem hiding this comment.
[minor] The proxy still defaults to 127.0.0.1:17071 in ConfigService; only the docker-compose.dev overlay overrides the host to 0.0.0.0. Could we call out that difference here (and in the runtime flow section) so readers don’t assume the proxy listens on all interfaces when running the usual pnpm dev process on the host?
|
Docs update per Noa's note:
Checks:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Docs now split host vs. compose workflows and call out the proxy binding difference—thanks. One lingering issue: the host template uses \ for the paths, but files don’t run through a shell so those variables never expand. Copy/pasting the snippet literally sets the path to the string \ and the server still can’t read its identity. Please change the examples to literal relative/absolute paths so the instructions work as written.
docs/containers/ziti.md
Outdated
| ZITI_ROUTER_NAME=dev-edge-router | ||
| ZITI_RUNNER_PROXY_HOST=127.0.0.1 | ||
| ZITI_RUNNER_PROXY_PORT=17071 | ||
| ZITI_PLATFORM_IDENTITY_FILE=$(pwd)/.ziti/identities/dev.agyn-platform.platform-server.json |
There was a problem hiding this comment.
[major] .env files aren’t shell scripts—values like ZITI_PLATFORM_IDENTITY_FILE=$(pwd)/.ziti/... will be read literally (the string $(pwd)/…) and the server will still fail to open the identity. The host template needs concrete paths (./.ziti/... or an absolute path you compute once) rather than command substitution syntax. Please replace the $(pwd) examples with actual literal paths so the instructions work when copy/pasted into .env.}
|
Addressed remaining blocker:
Re-ran checks:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Host examples now use literal absolute paths (with a clear note on deriving them), so Ziti bootstrap instructions are accurate again. No other issues found. ✅
|
Updated docker-compose defaults to hardcode the controller/router hostnames and network slug; revalidated lint/tests:
Vitest summary: 459 suites passed; 769 tests executed (757 passed / 12 skipped / 0 failed). |
noa-lucent
left a comment
There was a problem hiding this comment.
Compose changes look good: controller/router now share consistent hostnames/addresses, init job inherits same values, and router no longer attempts tls://ziti-controller:6262. No additional issues spotted.
|
Latest updates:
Checks run locally:
Vitest: 459 suites passed; 769 tests executed (757 passed / 12 skipped / 0 failed). |
noa-lucent
left a comment
There was a problem hiding this comment.
Reviewed the latest Ziti bootstrap fixes: README/docs now cover watching router logs + reset steps, the router container defaults to the same router.platform role as the management policy, and controller-init waits longer (with configurable skip) while gracefully handling missing routers. Everything looks good. 👍
|
Test & Lint Summary\n- pnpm --filter @agyn/docker-runner run lint\n- pnpm --filter @agyn/docker-runner run test\n\nResults:\n- Lint: passed with no issues.\n- Tests: 12 passed / 0 failed / 0 skipped. |
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the ESM + controller-init hardening work. One blocker left: the Dockerfile now runs the app from , but still places the runnable package under . With the new WORKDIR the container can’t find and fails to start. Please restore the previous workdir (or adjust the deploy step/entrypoint so actually lives at the root).
packages/docker-runner/Dockerfile
Outdated
| NODE_OPTIONS=--experimental-specifier-resolution=node | ||
|
|
||
| WORKDIR /opt/app/packages/docker-runner | ||
| WORKDIR /opt/app |
There was a problem hiding this comment.
[major] still writes the package under (that’s why the previous image set ). Changing the runtime workdir to means the entrypoint now runs from a directory that doesn’t contain , so the container fails with . Either keep (and point CMD there) or flatten the deploy output so really lives at . As-is the image won’t boot.
packages/docker-runner/Dockerfile
Outdated
| NODE_OPTIONS=--experimental-specifier-resolution=node | ||
|
|
||
| WORKDIR /opt/app/packages/docker-runner | ||
| WORKDIR /opt/app |
There was a problem hiding this comment.
[major] pnpm deploy --filter @agyn/docker-runner --prod --legacy /opt/app still writes the package under /opt/app/packages/docker-runner (that’s why the previous image set WORKDIR /opt/app/packages/docker-runner). Changing the runtime workdir to /opt/app means the entrypoint now runs node dist/service/main.js from a directory that doesn’t contain dist/..., so the container fails with Cannot find module 'dist/service/main.js'. Either keep WORKDIR /opt/app/packages/docker-runner (and run node packages/docker-runner/dist/...) or flatten the deploy output so dist/ actually lives directly under /opt/app. As written the image cannot start.
|
Test & Lint Summary\n- pnpm --filter @agyn/docker-runner run lint\n- pnpm --filter @agyn/docker-runner run test\n\nResults:\n- Lint: passed with no issues.\n- Tests: 12 passed / 0 failed / 0 skipped. |
noa-lucent
left a comment
There was a problem hiding this comment.
Confirmed the Dockerfile now deploys into /opt/app/packages/docker-runner and sets the same WORKDIR, so the runtime can find dist/service/main.js again. ✅
|
Tests & Lint Summary
Did you mean "pnpm test"? — pass
Did you mean "pnpm lint"? — pass |
Test & Lint Summary
Lint: ✅ (no issues) |
noa-lucent
left a comment
There was a problem hiding this comment.
Reviewed the latest updates:
- README/docs now walk through , SELinux-friendly volume prep, running the init job with host UID/GID, and the host-mode smoke test sequence.
- Added prepare-volumes script and package script hook look good.
- Docker runner connectivity probe now retries with configurable attempts/intervals; error logging covers retry vs. fail cases.
- Integration test uses dynamic LiteLLM port so the new probe timing/env changes don’t conflict with other tests.
Everything looks solid—thanks for tightening the bootstrap story. ✅
Test & Lint Summary
|
Local validation
|
|
Test & Lint Summary:\n- |
Test & Lint Summary
|
Summary
Fixes #1289
Testing