refactor: domain-driven prompt-response correlation via ExecutionEvent by BlueHotDog · Pull Request #794 · frontman-ai/frontman

BlueHotDog · 2026-04-06T14:53:48Z

Summary

Introduce ExecutionEvent domain struct as the ACL boundary between SwarmAi infrastructure events and Frontman domain events
Thread interaction_id (causation) through execution metadata so completion events carry context back to the channel
Replace fragile pending_prompt_id scalar with a correlated pending_prompt map (interaction_id + jsonrpc_id)
Reject concurrent prompts entirely — no message persisted when execution is already running
Extract add_user_message as a domain primitive (persist without execution), compose submit_user_message from it
Rename Execution.handle_swarm_event/3 → Execution.classify_event/1 accepting %ExecutionEvent{} directly

Motivation

Devin review flagged a bug: when a second session/prompt arrives while an agent is running, pending_prompt_id gets overwritten. The first prompt's JSON-RPC response is lost. Root cause: the transport layer was manually mirroring domain execution state via a bare scalar.

Rather than patching the symptom, this PR fixes the architecture: the domain now carries its own correlation context through the execution lifecycle, and the transport maps domain IDs to protocol IDs.

Test plan

825 tests pass, 0 failures
Updated execution_test.exs: {:ok, :already_running} → {:error, :already_running}, verify rejected message is not persisted
Updated task_channel_test.exs: all event builders use ExecutionEvent, pending_prompt_id → pending_prompt
Updated error_propagation_test.exs, execution_sentry_test.exs: {:swarm_event, ...} → {:execution_event, %ExecutionEvent{...}}
Updated tasks_channel_test.exs: use add_user_message for history-only tests
Removed "cancel does not interfere with subsequent prompts" test (premise invalid with reject-entirely semantics)

🤖 Generated with Claude Code

Surface :already_running from submit_user_message when an agent is already executing for the task, and add an integration test that exercises the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Skip title generation and log accurately when submit_user_message returns {:ok, :already_running} instead of falling through to the catch-all {:ok, _interaction} clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- execution_test: use chained setup matching main's pattern - tasks_test: use add_agent_response for sequence test to avoid already_running from blocking agent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…correlation Replace the fragile `pending_prompt_id` scalar in TaskChannel with a correlated `pending_prompt` map that ties JSON-RPC request IDs to domain interaction IDs. **Domain changes:** - Add `ExecutionEvent` struct as the ACL boundary between SwarmAi infrastructure events and Frontman domain events. Carries `caused_by` (the interaction_id of the triggering UserMessage). - Thread `interaction_id` through execution metadata so completion events carry causation context back to the channel. - `submit_user_message` now rejects entirely when an execution is already running — no message persisted, immediate error response. - Extract `add_user_message` as a domain primitive for persisting messages without starting execution. - SwarmDispatcher wraps raw swarm events into `ExecutionEvent` before PubSub broadcast. - Rename `Execution.handle_swarm_event/3` to `classify_event/1`, accepting `%ExecutionEvent{}` directly. **Transport changes:** - Replace `pending_prompt_id` with `pending_prompt` map containing both `interaction_id` and `jsonrpc_id`. - Channel receives `{:execution_event, %ExecutionEvent{}}` instead of `{:swarm_event, {type, payload}}`. - `resolve_pending_prompt` verifies causation match via `caused_by`. Fixes the bug where a second prompt during execution overwrote the pending prompt ID, causing the first prompt's response to be lost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The 'crashed agent' and 'failed agent' e2e tests (added in #798) still asserted on the raw {:swarm_event, ...} tuple. After the ExecutionEvent refactor, SwarmDispatcher now broadcasts {:execution_event, %ExecutionEvent{}} instead. Update both assert_receive calls to match the new domain event format.

BlueHotDog · 2026-04-08T16:16:44Z

Fixed caused_by propagation in handle_transient_error — passing event.caused_by through as a required 3rd arg so the exhausted-retry path calls finalize_turn/3 with the correct causation context, enabling the mismatch check in resolve_pending_prompt.

When retries exhaust, finalize_turn was called without caused_by, silently skipping the causation mismatch check in resolve_pending_prompt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng pending prompt Add test verifying that when all retries are exhausted the pending JSON-RPC prompt is resolved with an error response, covering the caused_by propagation path through handle_transient_error. Also remove the caused_by \\ nil default from finalize_turn/3 — the two callsites without an execution event now pass nil explicitly, making the arity mandatory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

These functions operate solely on ExecutionEvent data — moving them there eliminates Feature Envy and keeps domain semantics co-located with the type. Callers (TaskChannel, SwarmDispatcher) now call ExecutionEvent.classify/1 and ExecutionEvent.classify_error/1 directly.

This comment was marked as resolved.

Sign in to view

BlueHotDog force-pushed the issue-767-test-add-integration-test branch from 2c89b4d to c031117 Compare April 6, 2026 20:07

This comment was marked as resolved.

Sign in to view

BlueHotDog changed the title ~~test: add integration test for concurrent execution prevention~~ refactor: domain-driven prompt-response correlation via ExecutionEvent Apr 8, 2026

BlueHotDog and others added 5 commits April 8, 2026 15:00

fix(test): resolve CI failures after rebase onto main

8fa6bb3

- execution_test: use chained setup matching main's pattern - tasks_test: use add_agent_response for sequence test to avoid already_running from blocking agent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BlueHotDog force-pushed the issue-767-test-add-integration-test branch from 3022668 to 0c630f9 Compare April 8, 2026 15:41

This comment was marked as resolved.

Sign in to view

BlueHotDog and others added 3 commits April 8, 2026 18:19

fix: propagate caused_by through handle_transient_error to finalize_turn

0adda4f

When retries exhaust, finalize_turn was called without caused_by, silently skipping the causation mismatch check in resolve_pending_prompt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: domain-driven prompt-response correlation via ExecutionEvent#794

refactor: domain-driven prompt-response correlation via ExecutionEvent#794
BlueHotDog wants to merge 8 commits intomainfrom
issue-767-test-add-integration-test

BlueHotDog commented Apr 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

BlueHotDog commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BlueHotDog commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

BlueHotDog commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BlueHotDog commented Apr 6, 2026 •

edited

Loading