Skip to content

chore: increase workspace readiness timeout to 60s and add startup log#1010

Merged
jamesc merged 2 commits intomainfrom
worktree-main-mcp-fails
Feb 28, 2026
Merged

chore: increase workspace readiness timeout to 60s and add startup log#1010
jamesc merged 2 commits intomainfrom
worktree-main-mcp-fails

Conversation

@jamesc
Copy link
Owner

@jamesc jamesc commented Feb 28, 2026

Summary

Fixes intermittent MCP integration test failures in CI (run 22522874219) where all 18 MCP tests failed because the 30-second workspace readiness budget was exhausted.

Root cause

READINESS_PROBE_MAX_RETRIES = 150 × 200ms = 30 seconds — too tight when the workspace integration tests (which start/kill 12 BEAM nodes in ~10s) run immediately before MCP tests, leaving the CI runner under load.

The failure signature: ECONNREFUSED on port 46141 for the full 30s after PID + port file were written, suggesting the BEAM node either took >30s to bind or crashed and rolled back before any probe landed.

Changes

  • READINESS_PROBE_MAX_RETRIES 150 → 300 — doubles the readiness budget to 60 seconds to handle loaded CI runners
  • BEAM node stderr → startup.log — redirects OTP crash reports / error_logger output to {workspace_dir}/startup.log instead of /dev/null, giving actionable diagnostics on future failures
  • Better timeout error — checks PID liveness via sysinfo and emits a distinct message for crash (PID gone) vs slow startup (PID alive), including the log file path in both cases

Test plan

  • just build && just clippy && just fmt-check — passes
  • just test — 1898 Rust + 237 stdlib + 645 BUnit + 2208 runtime tests pass
  • CI MCP tests should no longer time out on loaded runners

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Increased startup timeout window from 30 to 60 seconds for BEAM node initialization.
    • Redirected node stderr to a startup log (with a non-fatal fallback) to capture crash diagnostics.
    • Added diagnostic-backed messaging to distinguish slow startups from process crashes, including optional log file hints for troubleshooting.

- Double READINESS_PROBE_MAX_RETRIES from 150 to 300 (30s → 60s budget)
  to handle loaded CI runners after workspace integration tests
- Redirect BEAM node stderr to {workspace_dir}/startup.log instead of
  /dev/null, giving crash reports and OTP error_logger output on failure
- Distinguish crash vs slow-start in the timeout error: check PID liveness
  via sysinfo and report actionable guidance + log file path in both cases

Fixes intermittent MCP integration test failures in CI (run 22522874219)
where the 30s readiness budget was exhausted after 12 BEAM nodes were
started/killed by the workspace integration tests running immediately before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f6b0696 and 8e37dbe.

📒 Files selected for processing (1)
  • crates/beamtalk-cli/src/commands/workspace/process.rs

📝 Walkthrough

Walkthrough

Doubles the BEAM readiness probe retries (150→300), redirects BEAM stderr to a startup log (with /dev/null fallback), passes the log path into the TCP readiness waiter, and augments readiness-timeout errors to distinguish slow startup vs. crashed process with optional log-path hints.

Changes

Cohort / File(s) Summary
BEAM Startup Diagnostics & Readiness Enhancement
crates/beamtalk-cli/src/commands/workspace/process.rs
READINESS_PROBE_MAX_RETRIES increased from 150→300; BEAM stderr redirected to startup.log (fallback to /dev/null on failure); wait_for_tcp_ready signature extended to accept an optional log path and all call sites updated; readiness failure handling now reports alive vs. crashed process and includes log-path diagnostic hints.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI
participant BEAM as BEAM process
participant Waiter as wait_for_tcp_ready
participant Log as startup.log
CLI->>BEAM: spawn BEAM, redirect stderr -> startup.log (or /dev/null)
CLI->>Waiter: wait_for_tcp_ready(host, port, retries, optional log path)
loop retry until timeout or ready
Waiter->>BEAM: check TCP port
alt TCP open
Waiter-->>CLI: ready
else still closed
Waiter->>BEAM: check process liveness
end
end
alt timeout reached
alt BEAM alive
Waiter-->>CLI: error("slow startup", log path?)
else BEAM crashed
Waiter-->>CLI: error("crashed", include log path)
end
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: increasing the workspace readiness timeout to 60s and adding startup log functionality for diagnostics.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch worktree-main-mcp-fails

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/beamtalk-cli/src/commands/workspace/process.rs`:
- Around line 331-340: The startup-log attempt currently falls back silently to
/dev/null; modify the code around the OpenOptions block that tries to open
startup_log_path so that on Err(_) you print a warning with eprintln! (including
the path and the error) and set a flag (e.g., startup_log_enabled = true/false)
when you successfully call cmd.stderr(Stdio::from(log_file)). Then update the
timeout/error messages that currently unconditionally tell users to inspect
startup_log_path (the messages produced later where you reference the startup
log on failures) to only mention the startup log path when startup_log_enabled
is true; otherwise omit that suggestion. Apply the same pattern to the other
similar blocks referenced in the review (the other startup/log-related message
sites), using the unique names startup_log_path, cmd.stderr(...), and the
timeout/error message locations to find and update the code.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between afab809 and f6b0696.

📒 Files selected for processing (1)
  • crates/beamtalk-cli/src/commands/workspace/process.rs

Per CodeRabbit review: if the startup log file could not be opened we
previously unconditionally told users to check it.  Now we track
whether the open succeeded (startup_log_enabled) and pass an
Option<&Path> to wait_for_tcp_ready.  Error messages only include the
log-file hint when Some(path) is present; a warning is printed when the
file cannot be opened so the fallback is visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jamesc jamesc merged commit e3000d3 into main Feb 28, 2026
5 checks passed
@jamesc jamesc deleted the worktree-main-mcp-fails branch February 28, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant