-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
Nexus PR nexi-lab/nexus#2779 introduced multi-agent orchestration patterns (copilot/worker, agent_step() decomposition, DeliveryPolicy, optimistic locking). A deep comparison against Koi's architecture identified actionable gaps.
Gap 1: No agent_step() — monolithic streamEvents generator (LOW — deferred)
Nexus pattern: Decomposed agent_loop() into agent_step() — a single-turn function returning StepResult(action, messages, turn). The outer loop becomes a trivial while calling step(). Enables fair scheduling, debugger stepping, and per-turn unit testing.
Koi today: streamEvents in packages/kernel/engine/src/koi.ts is a ~580-line async generator. Session init, turn boundaries, forge refresh, inbox drain, idle-wake — all baked into one generator.
Why LOW priority: If agents are treated as async (fire-and-forget, each runs independently), fair scheduling via step() is unnecessary. Koi already has InboxComponent with steer mode + adapter.inject() for mid-run signal injection, and ChildHandle.signal() + CascadingTermination for lifecycle control. The remaining unique value is a stepper/debugger UI (stepping through turns one at a time) — nice-to-have, not blocking.
If we do it later: Add TurnStatus/TurnResult to L0, create createTurnStepper() in L1, wire as optional step() on KoiRuntime, mutually exclusive with run().
Gap 2: No delivery policy control on spawn (MED)
Nexus pattern: DeliveryPolicy enum (IMMEDIATE / DEFERRED / ON_DEMAND) on WorkerConfig controls how child results flow to parent. CopilotOrchestrator uses asyncio queues (IMMEDIATE), completion events (DEFERRED), or CAS store (ON_DEMAND).
Koi today:
InboxComponentmodes cover the runtime mechanism:steer= IMMEDIATE,collect/followup= DEFERREDReportStoreexists in L0 for structured run reports (ON_DEMAND equivalent)DeliveryPolicyandDeliveryOptionstypes exist in compiled.d.tsoutput but are NOT in source — likely stubs from a previous attemptSpawnChildOptionshas nodeliveryfieldspawnChildAgent()has no delivery logic — all child events stream inline- No wiring from spawn → inbox mode selection or report store
What it enables:
- Quiet fan-out: Spawn 5 researchers without their intermediate events flooding parent context. Deferred mode buffers results, pushes summary to parent inbox on completion.
- Async task dispatch: Parent spawns worker with on-demand policy, continues working. Results written to ReportStore, pulled later by parent or another agent.
- Today you must either consume the entire child stream (blocking) or fire-and-forget with no result retrieval.
Proposed fix:
- Add
DeliveryPolicy/DeliveryOptionstypes to L0 source (newdelivery.ts— NOTdelegation.ts, these are orthogonal concerns) - Add optional
deliverytoAgentManifestandSpawnChildOptions - Create
delivery-policy.tsin L1 with policy application functions - Wire into
spawnChildAgent()— wrapruntime.run()based on resolved policy - Priority:
options.delivery>manifest.delivery>{ policy: "streaming" }
Gap 3: No versioned TaskBoard for multi-agent coordination (LOW-MED)
Nexus pattern: A2A Task model carries version: int. TaskManager.update_task_state() passes expected_version to the store, rejects on mismatch with StaleTaskVersionError. Prevents lost updates under concurrent modification.
Koi today — CAS exists in two places, but NOT on TaskBoard:
- ✅
AgentRegistry.transition()— CAS viaexpectedGenerationon lifecycle state - ✅
ScratchpadComponent.write()— CAS viaexpectedGeneration(0 = create-only, >0 = conditional, undefined = unconditional) - ❌
TaskBoard— immutable DAG coordinator, returns new instances on mutation, but no generation field, no conflict detection for concurrent access - ❌
InboxComponent— fire-and-forget FIFO queue, no versioning
What it enables:
- Safe concurrent task updates when multiple agents modify shared coordination state (e.g., Agent A marks subtask complete while Agent B marks it failed — conflict detected instead of last-write-wins)
- Reliable delegation chains where sub-workers produce partial results concurrently
Why LOW-MED: The Scratchpad CAS could serve as a workaround for coordination state (agents write versioned entries instead of using TaskBoard). The gap is real but has a workaround path.
Proposed fix (future phase): Add generation field to TaskItem, add expectedGeneration to TaskBoard mutation operations, return conflict errors on mismatch.
What Koi already covers (no action needed)
| Nexus Pattern | Koi Equivalent | Status |
|---|---|---|
| IMMEDIATE delivery (runtime) | InboxMode.steer + adapter.inject() |
✅ Working |
| DEFERRED delivery (runtime) | InboxMode.collect/followup + turn-boundary drain |
✅ Working |
| Permission inherit-and-restrict | scopeChecker + DelegationComponent + DepthToolRule |
✅ Stronger (crypto proofs, circuit breakers) |
| WorkerConfig (frozen spawn descriptor) | SpawnChildOptions + SpawnInheritanceConfig |
✅ More flexible |
| Process hooks | Middleware hooks + spawn-child wiring + CascadingTermination |
✅ Supervision-aware |
| Lifecycle CAS | AgentRegistry.transition(id, phase, expectedGeneration) |
✅ Working |
| ProcessManager (spawn/kill/wait) | AgentRegistry + SpawnLedger + ChildHandle + CascadingTermination |
✅ ECS decomposition |
| ToolDispatcher (centralized routing) | Middleware = sole interposition layer | ✅ Architecturally cleaner |
| SessionStore (CAS checkpoint) | Engine state is unknown (opaque) — adapter's responsibility |
✅ By design |
| Syscall boundary | Not needed — middleware is the sole interposition layer | ✅ Simpler |
Koi advantages over Nexus (preserve)
- ECS composition (Agent = entity, Tool = component, Middleware = system)
- Middleware as sole interposition layer (vs separate hook systems + syscall facade)
- Manifest-driven assembly (YAML IS the agent)
- Forge self-extension (agents create capabilities at runtime)
- OTP-style supervision (one_for_one, one_for_all, rest_for_one)
- Kernel extension system (composable guards)
- Governance controller (cost budgets, error rate thresholds)
- Scratchpad with CAS (group-scoped versioned key-value, Nexus doesn't have this)
Implementation priority
- Gap 2 (DeliveryPolicy) — highest practical value, additive, backward-compatible
- Gap 3 (Versioned TaskBoard) — future phase, Scratchpad CAS is workaround
- Gap 1 (TurnStepper) — deferred, only unique value is stepper/debugger UI
References
- Nexus PR: feat(#2761): multi-agent orchestration — copilot/worker pattern nexi-lab/nexus#2779
- Key Nexus files:
contracts/agent_runtime_types.py,system_services/agent_runtime/agent_loop.py,system_services/agent_runtime/copilot_orchestrator.py