chore: increase workspace readiness timeout to 60s and add startup log by jamesc · Pull Request #1010 · jamesc/beamtalk

jamesc · 2026-02-28T17:26:07Z

Summary

Fixes intermittent MCP integration test failures in CI (run 22522874219) where all 18 MCP tests failed because the 30-second workspace readiness budget was exhausted.

Root cause

READINESS_PROBE_MAX_RETRIES = 150 × 200ms = 30 seconds — too tight when the workspace integration tests (which start/kill 12 BEAM nodes in ~10s) run immediately before MCP tests, leaving the CI runner under load.

The failure signature: ECONNREFUSED on port 46141 for the full 30s after PID + port file were written, suggesting the BEAM node either took >30s to bind or crashed and rolled back before any probe landed.

Changes

READINESS_PROBE_MAX_RETRIES 150 → 300 — doubles the readiness budget to 60 seconds to handle loaded CI runners
BEAM node stderr → startup.log — redirects OTP crash reports / error_logger output to {workspace_dir}/startup.log instead of /dev/null, giving actionable diagnostics on future failures
Better timeout error — checks PID liveness via sysinfo and emits a distinct message for crash (PID gone) vs slow startup (PID alive), including the log file path in both cases

Test plan

just build && just clippy && just fmt-check — passes
just test — 1898 Rust + 237 stdlib + 645 BUnit + 2208 runtime tests pass
CI MCP tests should no longer time out on loaded runners

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Increased startup timeout window from 30 to 60 seconds for BEAM node initialization.
- Redirected node stderr to a startup log (with a non-fatal fallback) to capture crash diagnostics.
- Added diagnostic-backed messaging to distinguish slow startups from process crashes, including optional log file hints for troubleshooting.

- Double READINESS_PROBE_MAX_RETRIES from 150 to 300 (30s → 60s budget) to handle loaded CI runners after workspace integration tests - Redirect BEAM node stderr to {workspace_dir}/startup.log instead of /dev/null, giving crash reports and OTP error_logger output on failure - Distinguish crash vs slow-start in the timeout error: check PID liveness via sysinfo and report actionable guidance + log file path in both cases Fixes intermittent MCP integration test failures in CI (run 22522874219) where the 30s readiness budget was exhausted after 12 BEAM nodes were started/killed by the workspace integration tests running immediately before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-28T17:26:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f6b0696 and 8e37dbe.

📒 Files selected for processing (1)

crates/beamtalk-cli/src/commands/workspace/process.rs

📝 Walkthrough

Walkthrough

Doubles the BEAM readiness probe retries (150→300), redirects BEAM stderr to a startup log (with /dev/null fallback), passes the log path into the TCP readiness waiter, and augments readiness-timeout errors to distinguish slow startup vs. crashed process with optional log-path hints.

Changes

Cohort / File(s)	Summary
BEAM Startup Diagnostics & Readiness Enhancement `crates/beamtalk-cli/src/commands/workspace/process.rs`	READINESS_PROBE_MAX_RETRIES increased from 150→300; BEAM stderr redirected to `startup.log` (fallback to `/dev/null` on failure); `wait_for_tcp_ready` signature extended to accept an optional log path and all call sites updated; readiness failure handling now reports alive vs. crashed process and includes log-path diagnostic hints.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI
participant BEAM as BEAM process
participant Waiter as wait_for_tcp_ready
participant Log as startup.log
CLI->>BEAM: spawn BEAM, redirect stderr -> startup.log (or /dev/null)
CLI->>Waiter: wait_for_tcp_ready(host, port, retries, optional log path)
loop retry until timeout or ready
Waiter->>BEAM: check TCP port
alt TCP open
Waiter-->>CLI: ready
else still closed
Waiter->>BEAM: check process liveness
end
end
alt timeout reached
alt BEAM alive
Waiter-->>CLI: error("slow startup", log path?)
else BEAM crashed
Waiter-->>CLI: error("crashed", include log path)
end
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Use --timeout 300 in MCP test REPL to prevent orphaned workspaces (BT-789) #793: Modifies the same readiness probing logic in crates/beamtalk-cli/src/commands/workspace/process.rs; earlier increase of retry budget and related changes.
Fix flaky workspace integration tests (BT-662) #646: Introduced wait_for_tcp_ready usage in node startup and TCP readiness/retry behavior that this change extends with log-path diagnostics.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: increasing the workspace readiness timeout to 60s and adding startup log functionality for diagnostics.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch worktree-main-mcp-fails

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/beamtalk-cli/src/commands/workspace/process.rs`:
- Around line 331-340: The startup-log attempt currently falls back silently to
/dev/null; modify the code around the OpenOptions block that tries to open
startup_log_path so that on Err(_) you print a warning with eprintln! (including
the path and the error) and set a flag (e.g., startup_log_enabled = true/false)
when you successfully call cmd.stderr(Stdio::from(log_file)). Then update the
timeout/error messages that currently unconditionally tell users to inspect
startup_log_path (the messages produced later where you reference the startup
log on failures) to only mention the startup log path when startup_log_enabled
is true; otherwise omit that suggestion. Apply the same pattern to the other
similar blocks referenced in the review (the other startup/log-related message
sites), using the unique names startup_log_path, cmd.stderr(...), and the
timeout/error message locations to find and update the code.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between afab809 and f6b0696.

📒 Files selected for processing (1)

crates/beamtalk-cli/src/commands/workspace/process.rs

crates/beamtalk-cli/src/commands/workspace/process.rs

Per CodeRabbit review: if the startup log file could not be opened we previously unconditionally told users to check it. Now we track whether the open succeeded (startup_log_enabled) and pass an Option<&Path> to wait_for_tcp_ready. Error messages only include the log-file hint when Some(path) is present; a warning is printed when the file cannot be opened so the fallback is visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai bot requested changes Feb 28, 2026

View reviewed changes

crates/beamtalk-cli/src/commands/workspace/process.rs Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Feb 28, 2026

View reviewed changes

jamesc merged commit e3000d3 into main Feb 28, 2026
5 checks passed

jamesc deleted the worktree-main-mcp-fails branch February 28, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: increase workspace readiness timeout to 60s and add startup log#1010

chore: increase workspace readiness timeout to 60s and add startup log#1010
jamesc merged 2 commits intomainfrom
worktree-main-mcp-fails

jamesc commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesc commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamesc commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading