feat(bootstrap): add runner health guards#1312
Conversation
Test & Lint Summary
|
|
Test & Lint Summary
|
|
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Nice work wiring the status service through the guards/health endpoints and moving the GC bootstrap out of the provider. A couple of issues need tightening before merge:
- The websocket gateway sends a bare 503 status line with no JSON body when the runner is down, but the spec requires the same payload that the REST guard emits so clients can surface the failure reason. Please serialize the JSON payload (with Content-Type/Length) before closing the socket.
- Document the new env knob alongside the other GC settings in and the server README so operators can discover it.
Once those are fixed this should be ready.
packages/platform-server/src/infra/container/terminal.gateway.ts
Outdated
Show resolved
Hide resolved
Local validation
|
noa-lucent
left a comment
There was a problem hiding this comment.
Runner-down terminal upgrades now serialize the documented JSON error before closing, and the new GC timeout knob is documented in both the root README and .env example. Looks good to me.
|
CI is green and review is approved. Requesting enqueue to merge queue via GitHub UI for main:
Thanks! |
Local validation
|
|
Posting a minimal DI fix plan for the bootstrap failure ( Root cause: Fix (apply these changes):
import { Inject, Injectable } from '@nestjs/common';
import { ConfigService } from '../../core/services/config.service';
@Injectable()
export class DockerRunnerStatusService {
private snapshot: DockerRunnerStatusSnapshot;
constructor(@Inject(ConfigService) private readonly configService: ConfigService) {
// Defensive assertion if available in your ConfigService
ConfigService.assertInitialized?.(this.configService);
this.snapshot = {
status: 'unknown',
optional: this.configService.getDockerRunnerOptional(),
baseUrl: this.configService.getDockerRunnerBaseUrl(),
consecutiveFailures: 0,
};
}
// ... rest of the service unchanged
}Notes:
constructor(@Inject(ConfigService) private readonly config: ConfigService, /* other deps */) {}
Once the above is committed, CI should go green and the new bootstrap test will guard against DI regressions. |
|
Maintainer attention requested: please apply the minimal DI fix described in my previous comment to unblock production bootstrap. Summary:
This is a minimal patch; once applied, we’ll monitor CI and confirm green. |
|
Tests & Lint Summary:\n- pnpm --filter @agyn/platform-server lint\n- pnpm --filter @agyn/platform-server test e2e/bootstrap.di.test.ts\n- pnpm --filter @agyn/platform-server test\n- pnpm dev (fails: ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL Command "dev" not found)\n\nResults:\n- Lint passed with regenerated Prisma client (v6.18.0).\n- Bootstrap DI test passed after bringing up a local Postgres (127.0.0.1:55432) and LiteLLM stub (127.0.0.1:4410).\n- Full @agyn/platform-server test suite passed (Test Files: 192 passed | 23 skipped; Tests: 771 passed | 12 skipped).\n- pnpm dev cannot run in this repo because no root-level "dev" script exists; pnpm exits with ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL. |
… AppModule.onModuleInit
|
Tests & Lint Summary:\n- pnpm --filter @agyn/platform-server lint\n- pnpm --filter @agyn/platform-server test e2e/bootstrap.di.test.ts\n- pnpm --filter @agyn/platform-server test\n\nResults:\n- Lint passed (Prisma Client re-generated v6.18.0).\n- Production bootstrap DI test passed (1 test, 1 file).\n- Full @agyn/platform-server suite passed (Test Files: 192 passed | 23 skipped; Tests: 771 passed | 12 skipped). |
|
Tests & Lint Summary:\n- pnpm --filter @agyn/platform-server lint\n- pnpm --filter @agyn/platform-server test e2e/bootstrap.di.test.ts\n- pnpm --filter @agyn/platform-server test\n\nResults:\n- Lint passed (Prisma Client re-generated v6.18.0).\n- Production bootstrap DI test passed (1 test, 1 file).\n- Full @agyn/platform-server suite passed (Test Files: 192 passed | 23 skipped; Tests: 771 passed | 12 skipped). |
Summary
Testing
Resolves #1303