chore: increase workspace readiness timeout to 60s and add startup log#1010
chore: increase workspace readiness timeout to 60s and add startup log#1010
Conversation
- Double READINESS_PROBE_MAX_RETRIES from 150 to 300 (30s → 60s budget)
to handle loaded CI runners after workspace integration tests
- Redirect BEAM node stderr to {workspace_dir}/startup.log instead of
/dev/null, giving crash reports and OTP error_logger output on failure
- Distinguish crash vs slow-start in the timeout error: check PID liveness
via sysinfo and report actionable guidance + log file path in both cases
Fixes intermittent MCP integration test failures in CI (run 22522874219)
where the 30s readiness budget was exhausted after 12 BEAM nodes were
started/killed by the workspace integration tests running immediately before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughDoubles the BEAM readiness probe retries (150→300), redirects BEAM stderr to a startup log (with /dev/null fallback), passes the log path into the TCP readiness waiter, and augments readiness-timeout errors to distinguish slow startup vs. crashed process with optional log-path hints. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/beamtalk-cli/src/commands/workspace/process.rs`:
- Around line 331-340: The startup-log attempt currently falls back silently to
/dev/null; modify the code around the OpenOptions block that tries to open
startup_log_path so that on Err(_) you print a warning with eprintln! (including
the path and the error) and set a flag (e.g., startup_log_enabled = true/false)
when you successfully call cmd.stderr(Stdio::from(log_file)). Then update the
timeout/error messages that currently unconditionally tell users to inspect
startup_log_path (the messages produced later where you reference the startup
log on failures) to only mention the startup log path when startup_log_enabled
is true; otherwise omit that suggestion. Apply the same pattern to the other
similar blocks referenced in the review (the other startup/log-related message
sites), using the unique names startup_log_path, cmd.stderr(...), and the
timeout/error message locations to find and update the code.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
crates/beamtalk-cli/src/commands/workspace/process.rs
Per CodeRabbit review: if the startup log file could not be opened we previously unconditionally told users to check it. Now we track whether the open succeeded (startup_log_enabled) and pass an Option<&Path> to wait_for_tcp_ready. Error messages only include the log-file hint when Some(path) is present; a warning is printed when the file cannot be opened so the fallback is visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes intermittent MCP integration test failures in CI (run 22522874219) where all 18 MCP tests failed because the 30-second workspace readiness budget was exhausted.
Root cause
READINESS_PROBE_MAX_RETRIES = 150× 200ms = 30 seconds — too tight when the workspace integration tests (which start/kill 12 BEAM nodes in ~10s) run immediately before MCP tests, leaving the CI runner under load.The failure signature: ECONNREFUSED on port 46141 for the full 30s after PID + port file were written, suggesting the BEAM node either took >30s to bind or crashed and rolled back before any probe landed.
Changes
READINESS_PROBE_MAX_RETRIES150 → 300 — doubles the readiness budget to 60 seconds to handle loaded CI runnersstartup.log— redirects OTP crash reports / error_logger output to{workspace_dir}/startup.loginstead of/dev/null, giving actionable diagnostics on future failuressysinfoand emits a distinct message for crash (PID gone) vs slow startup (PID alive), including the log file path in both casesTest plan
just build && just clippy && just fmt-check— passesjust test— 1898 Rust + 237 stdlib + 645 BUnit + 2208 runtime tests pass🤖 Generated with Claude Code
Summary by CodeRabbit