feat: close agent runtime gaps inspired by Nexus PR #2779

## Context

Nexus PR [nexi-lab/nexus#2779](https://github.com/nexi-lab/nexus/pull/2779) introduced multi-agent orchestration patterns (copilot/worker, `agent_step()` decomposition, `DeliveryPolicy`, optimistic locking). A deep comparison against Koi's architecture identified actionable gaps.

---

## Gap 1: No `agent_step()` — monolithic `streamEvents` generator (LOW — deferred)

**Nexus pattern**: Decomposed `agent_loop()` into `agent_step()` — a single-turn function returning `StepResult(action, messages, turn)`. The outer loop becomes a trivial `while` calling `step()`. Enables fair scheduling, debugger stepping, and per-turn unit testing.

**Koi today**: `streamEvents` in `packages/kernel/engine/src/koi.ts` is a ~580-line async generator. Session init, turn boundaries, forge refresh, inbox drain, idle-wake — all baked into one generator.

**Why LOW priority**: If agents are treated as async (fire-and-forget, each runs independently), fair scheduling via `step()` is unnecessary. Koi already has `InboxComponent` with `steer` mode + `adapter.inject()` for mid-run signal injection, and `ChildHandle.signal()` + `CascadingTermination` for lifecycle control. The remaining unique value is a **stepper/debugger UI** (stepping through turns one at a time) — nice-to-have, not blocking.

**If we do it later**: Add `TurnStatus`/`TurnResult` to L0, create `createTurnStepper()` in L1, wire as optional `step()` on `KoiRuntime`, mutually exclusive with `run()`.

---

## Gap 2: No delivery policy control on spawn (MED)

**Nexus pattern**: `DeliveryPolicy` enum (IMMEDIATE / DEFERRED / ON_DEMAND) on `WorkerConfig` controls how child results flow to parent. `CopilotOrchestrator` uses asyncio queues (IMMEDIATE), completion events (DEFERRED), or CAS store (ON_DEMAND).

**Koi today**: 
- `InboxComponent` modes cover the runtime mechanism: `steer` = IMMEDIATE, `collect`/`followup` = DEFERRED
- `ReportStore` exists in L0 for structured run reports (ON_DEMAND equivalent)
- `DeliveryPolicy` and `DeliveryOptions` types exist in compiled `.d.ts` output but are **NOT in source** — likely stubs from a previous attempt
- `SpawnChildOptions` has **no `delivery` field**
- `spawnChildAgent()` has **no delivery logic** — all child events stream inline
- No wiring from spawn → inbox mode selection or report store

**What it enables**:
- **Quiet fan-out**: Spawn 5 researchers without their intermediate events flooding parent context. Deferred mode buffers results, pushes summary to parent inbox on completion.
- **Async task dispatch**: Parent spawns worker with on-demand policy, continues working. Results written to ReportStore, pulled later by parent or another agent.
- Today you must either consume the entire child stream (blocking) or fire-and-forget with no result retrieval.

**Proposed fix**: 
- Add `DeliveryPolicy`/`DeliveryOptions` types to L0 source (new `delivery.ts` — NOT `delegation.ts`, these are orthogonal concerns)
- Add optional `delivery` to `AgentManifest` and `SpawnChildOptions`
- Create `delivery-policy.ts` in L1 with policy application functions
- Wire into `spawnChildAgent()` — wrap `runtime.run()` based on resolved policy
- Priority: `options.delivery` > `manifest.delivery` > `{ policy: "streaming" }`

---

## Gap 3: No versioned TaskBoard for multi-agent coordination (LOW-MED)

**Nexus pattern**: A2A `Task` model carries `version: int`. `TaskManager.update_task_state()` passes `expected_version` to the store, rejects on mismatch with `StaleTaskVersionError`. Prevents lost updates under concurrent modification.

**Koi today** — CAS exists in two places, but NOT on TaskBoard:
- ✅ `AgentRegistry.transition()` — CAS via `expectedGeneration` on lifecycle state
- ✅ `ScratchpadComponent.write()` — CAS via `expectedGeneration` (0 = create-only, >0 = conditional, undefined = unconditional)
- ❌ `TaskBoard` — immutable DAG coordinator, returns new instances on mutation, but **no generation field**, no conflict detection for concurrent access
- ❌ `InboxComponent` — fire-and-forget FIFO queue, no versioning

**What it enables**:
- Safe concurrent task updates when multiple agents modify shared coordination state (e.g., Agent A marks subtask complete while Agent B marks it failed — conflict detected instead of last-write-wins)
- Reliable delegation chains where sub-workers produce partial results concurrently

**Why LOW-MED**: The `Scratchpad` CAS could serve as a workaround for coordination state (agents write versioned entries instead of using TaskBoard). The gap is real but has a workaround path.

**Proposed fix (future phase)**: Add `generation` field to `TaskItem`, add `expectedGeneration` to TaskBoard mutation operations, return conflict errors on mismatch.

---

## What Koi already covers (no action needed)

| Nexus Pattern | Koi Equivalent | Status |
|---|---|---|
| IMMEDIATE delivery (runtime) | `InboxMode.steer` + `adapter.inject()` | ✅ Working |
| DEFERRED delivery (runtime) | `InboxMode.collect/followup` + turn-boundary drain | ✅ Working |
| Permission inherit-and-restrict | `scopeChecker` + `DelegationComponent` + `DepthToolRule` | ✅ Stronger (crypto proofs, circuit breakers) |
| WorkerConfig (frozen spawn descriptor) | `SpawnChildOptions` + `SpawnInheritanceConfig` | ✅ More flexible |
| Process hooks | Middleware hooks + spawn-child wiring + `CascadingTermination` | ✅ Supervision-aware |
| Lifecycle CAS | `AgentRegistry.transition(id, phase, expectedGeneration)` | ✅ Working |
| ProcessManager (spawn/kill/wait) | `AgentRegistry` + `SpawnLedger` + `ChildHandle` + `CascadingTermination` | ✅ ECS decomposition |
| ToolDispatcher (centralized routing) | Middleware = sole interposition layer | ✅ Architecturally cleaner |
| SessionStore (CAS checkpoint) | Engine state is `unknown` (opaque) — adapter's responsibility | ✅ By design |
| Syscall boundary | Not needed — middleware is the sole interposition layer | ✅ Simpler |

## Koi advantages over Nexus (preserve)

- ECS composition (Agent = entity, Tool = component, Middleware = system)
- Middleware as sole interposition layer (vs separate hook systems + syscall facade)
- Manifest-driven assembly (YAML IS the agent)
- Forge self-extension (agents create capabilities at runtime)
- OTP-style supervision (one_for_one, one_for_all, rest_for_one)
- Kernel extension system (composable guards)
- Governance controller (cost budgets, error rate thresholds)
- Scratchpad with CAS (group-scoped versioned key-value, Nexus doesn't have this)

## Implementation priority

1. **Gap 2 (DeliveryPolicy)** — highest practical value, additive, backward-compatible
2. **Gap 3 (Versioned TaskBoard)** — future phase, Scratchpad CAS is workaround
3. **Gap 1 (TurnStepper)** — deferred, only unique value is stepper/debugger UI

## References

- Nexus PR: https://github.com/nexi-lab/nexus/pull/2779
- Key Nexus files: `contracts/agent_runtime_types.py`, `system_services/agent_runtime/agent_loop.py`, `system_services/agent_runtime/copilot_orchestrator.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: close agent runtime gaps inspired by Nexus PR #2779 #848

Context

Gap 1: No `agent_step()` — monolithic `streamEvents` generator (LOW — deferred)

Gap 2: No delivery policy control on spawn (MED)

Gap 3: No versioned TaskBoard for multi-agent coordination (LOW-MED)

What Koi already covers (no action needed)

Koi advantages over Nexus (preserve)

Implementation priority

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Nexus Pattern	Koi Equivalent	Status
IMMEDIATE delivery (runtime)	`InboxMode.steer` + `adapter.inject()`	✅ Working
DEFERRED delivery (runtime)	`InboxMode.collect/followup` + turn-boundary drain	✅ Working
Permission inherit-and-restrict	`scopeChecker` + `DelegationComponent` + `DepthToolRule`	✅ Stronger (crypto proofs, circuit breakers)
WorkerConfig (frozen spawn descriptor)	`SpawnChildOptions` + `SpawnInheritanceConfig`	✅ More flexible
Process hooks	Middleware hooks + spawn-child wiring + `CascadingTermination`	✅ Supervision-aware
Lifecycle CAS	`AgentRegistry.transition(id, phase, expectedGeneration)`	✅ Working
ProcessManager (spawn/kill/wait)	`AgentRegistry` + `SpawnLedger` + `ChildHandle` + `CascadingTermination`	✅ ECS decomposition
ToolDispatcher (centralized routing)	Middleware = sole interposition layer	✅ Architecturally cleaner
SessionStore (CAS checkpoint)	Engine state is `unknown` (opaque) — adapter's responsibility	✅ By design
Syscall boundary	Not needed — middleware is the sole interposition layer	✅ Simpler

feat: close agent runtime gaps inspired by Nexus PR #2779 #848

Description

Context

Gap 1: No agent_step() — monolithic streamEvents generator (LOW — deferred)

Gap 2: No delivery policy control on spawn (MED)

Gap 3: No versioned TaskBoard for multi-agent coordination (LOW-MED)

What Koi already covers (no action needed)

Koi advantages over Nexus (preserve)

Implementation priority

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Gap 1: No `agent_step()` — monolithic `streamEvents` generator (LOW — deferred)