[FEATURE] Durable Steps/event loop execution (Epic)

# Durable Execution for Strands Agents — Tracking Ticket

## Problem

The agent event loop runs entirely in-process. All state lives in memory — if the process crashes mid-loop, all progress is lost. Durability providers (Temporal, Dapr, AWS Lambda Durable) can only wrap the entire `agent("prompt")` as one opaque unit, meaning a crash replays everything from scratch, including already-completed model calls and non-idempotent tool calls.

**Goal:** Enable crash-resilient agent execution where completed model calls and tool calls are cached and never re-executed on recovery.

**Design doc:** https://github.com/strands-agents/docs/pull/584

---

## Desired User Experience

```python
# Python
agent = Agent(
    model=BedrockModel(),
    tools=[search_flights, book_hotel],
    plugins=[TemporalDurabilityPlugin()],
)
```

```typescript
// TypeScript — Worker (no sandbox)
const result = await agent.invokeWithCheckpoint(input.prompt, { checkpoint: input.checkpoint })

// TypeScript — Workflow (pure deterministic loop)
let result = await runAgentStep({ prompt })
while (!result.done) {
  result = await runAgentStep({ checkpoint: result.checkpoint })
}
```

On crash and replay: completed steps return cached results. Zero re-execution. The loop resumes from the last incomplete step.

---

## Workstreams

### WS1: Python SDK — Step Abstraction + Plugin Approach

Durability as a standard Plugin intercepting hook events. Agent code runs via `Step.execute()`.

| # | Task | Depends On | Description |
|---|---|---|---|
| P1 | `Step` dataclass | — | New file `strands/event_loop/step.py`. Unit of I/O work (callable + args). |
| P2 | Async hooks at model/tool call sites | — | Switch `invoke_callbacks` → `await invoke_callbacks_async` at `BeforeModelCallEvent` and `BeforeToolCallEvent`. |
| P3 | Writable `step` field on hook events | P1 | Add `step: Step \| None` to `BeforeModelCallEvent` and `BeforeToolCallEvent`. |
| P4 | Event loop wiring | P1–P3 | Refactor `_handle_model_execution` and `_handle_tool_execution` to create Step → fire hook → call `event.step.execute()`. Largest core change. All existing tests must pass. |
| P5 | `strands-temporal` Python package scaffold | — | New package with `pyproject.toml`, dependency on `temporalio` + `strands-agents`. |
| P6 | `TemporalDurabilityPlugin` | P4, P5 | Plugin with `@hook` on `BeforeModelCallEvent`/`BeforeToolCallEvent`. Replaces `event.step` with wrapper dispatching to `workflow.execute_activity()`. |
| P7 | Integration test | P6 | Real Bedrock + real Temporal dev server. Crash simulation (`CRASH_AFTER_ACTIVITY=N`), verify completed steps not re-executed. |
| P8 | Runnable sample | P6 | Travel planner agent in `strands-agents/samples`. Includes `docker-compose.yml`. |
| P9 | User guide | P8 | "Durable Agents with Temporal" page on strandsagents.com. |
| P10 | Update design doc PR #584 | — (Week 1) | Revise to reflect Step/Plugin approach. Remove old `Durability` ABC proposal. |

### WS2: TypeScript SDK — `invokeWithCheckpoint()` + Temporal Package

Rejects plugin-hook approach for Temporal (sandbox constraints). Checkpoint tokens returned from a new additive Agent method. Agent code runs in Temporal activities only.

| # | Task | Depends On | Description |
|---|---|---|---|
| T1 | `Checkpoint` / `CheckpointResult` types | — | New file `src/agent/checkpoint.ts`. Per-tool granularity via `nextToolIndex`. |
| T2 | `invokeWithCheckpoint()` on Agent | T1 | New method. One unit of I/O work per call. Deferred message append. |
| T3 | Hook integration | T2 | Existing hooks (`BeforeModelCallEvent`, etc.) still fire in checkpoint mode. |
| T4 | `strands-temporal` TS package scaffold | — | Separate package, no `temporalio` dep in core SDK. |
| T5 | `runAgentStep` activity + per-run-ID registry | T2, T4 | Fix prototype's singleton agent. Key agents by workflow run ID. |
| T6 | `durableAgentWorkflow` | T5 | Pure deterministic loop passing checkpoint tokens. |
| T7 | `StrandsWorker` helper | T5 | Registers activity + workflow. Users bring tools; worker resolves by name. |
| T8 | Integration test + crash simulation | T6 | Per-tool granularity: 3 tools = 3 `ActivityTaskCompleted` entries. Crash after tool 2 → only tool 3 re-runs. Concurrent workflow isolation. |
| T9 | Migrate prototype to `examples/temporal/` | T8 | Clean up, update for `nextToolIndex`, add README + `docker-compose.yml`. |
| T10 | User guide | T9 | "Durable Agents with Temporal" page. Architecture diagram, quick start, limitations. |
| T11 | MCP support in durable context | T5 | See details below. |

### WS3: External Outreach (post-implementation)

- **AWS Lambda Durable team** — File GitHub issue on `aws/aws-durable-execution-sdk-python` requesting async handler/step support. Blocked on sync-only Python SDK.
- **Temporal team** — Share working integration + sample after `strands-temporal` is published.

---

## Task 11: MCP Support in Durable Context

The previous "MCP limitation — cannot cross activity boundary" was **wrong**. MCP server *config* crosses the boundary. MCP *clients* are reconstructed inside activities. Remote MCP servers work.

### Corrected MCP Gap Analysis

| Gap | Reality |
|---|---|
| MCP cannot cross activity boundary | MCP server **config** crosses the boundary. MCP clients are reconstructed inside activities. Remote MCP servers work. |
| Stdio MCP servers | Not durable — subprocess dies with worker process. Documented limitation. |
| Long-running workflow connection timeouts | Needs reconnect-on-failure in McpClient. Separate issue. |

### Subtasks

- [ ] Add `mcpServers?: McpServerConfig[]` to `run_config`
- [ ] Update `getOrCreate()` to construct `McpClient` instances from config
- [ ] Verify `agent.initialize()` correctly connects and lists tools inside activity
- [ ] Integration test: MCP tool call cached across crash, remote MCP server not re-called
- [ ] Document stdio MCP limitation explicitly
- [ ] Document connection timeout handling requirement for long-running workflows

### Bottom Line

Remote MCP works. Stdio MCP doesn't. The architecture doesn't need to change — just the `run_config` shape and `getOrCreate()` implementation.

---

## Key Design Decisions

| Decision | Python | TypeScript |
|---|---|---|
| Durability mechanism | Plugin intercepting hook events | Checkpoint tokens from `invokeWithCheckpoint()` |
| Agent code location | Runs via `Step.execute()` | Runs in Temporal activities only (never in sandbox) |
| Checkpoint granularity | Per I/O call (one model call OR one tool call) | Per I/O call via `nextToolIndex` cursor |
| Breaking changes | None — additive `step` field on events | None — `invoke()` / `stream()` unchanged |
| MCP support | Follow-up | Task 11 — remote MCP works, stdio doesn't |

---

## Known Gaps

| Gap | Impact | Status |
|---|---|---|
| Model objects hold clients that can't serialize | Activity must reconstruct model from config | By design |
| `AgentState` replay may not be deterministic | Tools writing to `agent.state` may diverge | Snapshot session manager addresses this |
| Human-in-the-loop | Strands Interrupt ≠ Temporal Signal | Documented limitation |
| Streaming during replay | No token stream on cached steps — UX gap, not correctness | Document; use `AfterInvocationEvent` for final-result callbacks |
| Stdio MCP servers | Subprocess dies with worker process | Documented limitation; remote MCP works |
| Long-running MCP connection timeouts | Needs reconnect-on-failure in McpClient | Separate issue |

---

## Out of Scope (Follow-up)

- Full state machine refactor with explicit `CycleState` enum / transition table (Step is the first building block)
- `strands-dapr` package (same plugin pattern, lower priority)
- `strands-aws` Lambda Durable package (blocked on async SDK support)
- AgentCore runtime validation
- Streaming callback behavior during replay (UX documentation only)
- McpClient reconnect-on-failure for long-running workflows

---

## Timeline

5 weeks per workstream (can run in parallel):

| Week | Python SDK | TypeScript SDK |
|---|---|---|
| Week 1 | P10: Design doc PR #584 merged | T1–T2: Checkpoint types + `invokeWithCheckpoint()` |
| Week 2 | P1–P4: Step + hooks + event loop wiring | T3: Hook integration verified |
| Week 3 | P5–P7: `strands-temporal` + integration test | T4–T6: `strands-temporal` package + activity + workflow |
| Week 4 | P8–P9: Sample + user guide | T7–T8: Worker helper + crash recovery test |
| Week 5 | Buffer — review, polish, outreach | T9–T11: Example + user guide + MCP support |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Durable Steps/event loop execution (Epic) #757

Durable Execution for Strands Agents — Tracking Ticket

Problem

Desired User Experience

Workstreams

WS1: Python SDK — Step Abstraction + Plugin Approach

WS2: TypeScript SDK — `invokeWithCheckpoint()` + Temporal Package

WS3: External Outreach (post-implementation)

Task 11: MCP Support in Durable Context

Corrected MCP Gap Analysis

Subtasks

Bottom Line

Key Design Decisions

Known Gaps

Out of Scope (Follow-up)

Timeline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Task	Depends On	Description
P1	`Step` dataclass	—	New file `strands/event_loop/step.py`. Unit of I/O work (callable + args).
P2	Async hooks at model/tool call sites	—	Switch `invoke_callbacks` → `await invoke_callbacks_async` at `BeforeModelCallEvent` and `BeforeToolCallEvent`.
P3	Writable `step` field on hook events	P1	Add `step: Step \| None` to `BeforeModelCallEvent` and `BeforeToolCallEvent`.
P4	Event loop wiring	P1–P3	Refactor `_handle_model_execution` and `_handle_tool_execution` to create Step → fire hook → call `event.step.execute()`. Largest core change. All existing tests must pass.
P5	`strands-temporal` Python package scaffold	—	New package with `pyproject.toml`, dependency on `temporalio` + `strands-agents`.
P6	`TemporalDurabilityPlugin`	P4, P5	Plugin with `@hook` on `BeforeModelCallEvent`/`BeforeToolCallEvent`. Replaces `event.step` with wrapper dispatching to `workflow.execute_activity()`.
P7	Integration test	P6	Real Bedrock + real Temporal dev server. Crash simulation (`CRASH_AFTER_ACTIVITY=N`), verify completed steps not re-executed.
P8	Runnable sample	P6	Travel planner agent in `strands-agents/samples`. Includes `docker-compose.yml`.
P9	User guide	P8	"Durable Agents with Temporal" page on strandsagents.com.
P10	Update design doc PR #584	— (Week 1)	Revise to reflect Step/Plugin approach. Remove old `Durability` ABC proposal.

#	Task	Depends On	Description
T1	`Checkpoint` / `CheckpointResult` types	—	New file `src/agent/checkpoint.ts`. Per-tool granularity via `nextToolIndex`.
T2	`invokeWithCheckpoint()` on Agent	T1	New method. One unit of I/O work per call. Deferred message append.
T3	Hook integration	T2	Existing hooks (`BeforeModelCallEvent`, etc.) still fire in checkpoint mode.
T4	`strands-temporal` TS package scaffold	—	Separate package, no `temporalio` dep in core SDK.
T5	`runAgentStep` activity + per-run-ID registry	T2, T4	Fix prototype's singleton agent. Key agents by workflow run ID.
T6	`durableAgentWorkflow`	T5	Pure deterministic loop passing checkpoint tokens.
T7	`StrandsWorker` helper	T5	Registers activity + workflow. Users bring tools; worker resolves by name.
T8	Integration test + crash simulation	T6	Per-tool granularity: 3 tools = 3 `ActivityTaskCompleted` entries. Crash after tool 2 → only tool 3 re-runs. Concurrent workflow isolation.
T9	Migrate prototype to `examples/temporal/`	T8	Clean up, update for `nextToolIndex`, add README + `docker-compose.yml`.
T10	User guide	T9	"Durable Agents with Temporal" page. Architecture diagram, quick start, limitations.
T11	MCP support in durable context	T5	See details below.

Gap	Reality
MCP cannot cross activity boundary	MCP server config crosses the boundary. MCP clients are reconstructed inside activities. Remote MCP servers work.
Stdio MCP servers	Not durable — subprocess dies with worker process. Documented limitation.
Long-running workflow connection timeouts	Needs reconnect-on-failure in McpClient. Separate issue.

Decision	Python	TypeScript
Durability mechanism	Plugin intercepting hook events	Checkpoint tokens from `invokeWithCheckpoint()`
Agent code location	Runs via `Step.execute()`	Runs in Temporal activities only (never in sandbox)
Checkpoint granularity	Per I/O call (one model call OR one tool call)	Per I/O call via `nextToolIndex` cursor
Breaking changes	None — additive `step` field on events	None — `invoke()` / `stream()` unchanged
MCP support	Follow-up	Task 11 — remote MCP works, stdio doesn't

Gap	Impact	Status
Model objects hold clients that can't serialize	Activity must reconstruct model from config	By design
`AgentState` replay may not be deterministic	Tools writing to `agent.state` may diverge	Snapshot session manager addresses this
Human-in-the-loop	Strands Interrupt ≠ Temporal Signal	Documented limitation
Streaming during replay	No token stream on cached steps — UX gap, not correctness	Document; use `AfterInvocationEvent` for final-result callbacks
Stdio MCP servers	Subprocess dies with worker process	Documented limitation; remote MCP works
Long-running MCP connection timeouts	Needs reconnect-on-failure in McpClient	Separate issue

Week	Python SDK	TypeScript SDK
Week 1	P10: Design doc PR #584 merged	T1–T2: Checkpoint types + `invokeWithCheckpoint()`
Week 2	P1–P4: Step + hooks + event loop wiring	T3: Hook integration verified
Week 3	P5–P7: `strands-temporal` + integration test	T4–T6: `strands-temporal` package + activity + workflow
Week 4	P8–P9: Sample + user guide	T7–T8: Worker helper + crash recovery test
Week 5	Buffer — review, polish, outreach	T9–T11: Example + user guide + MCP support

[FEATURE] Durable Steps/event loop execution (Epic) #757

Description

Durable Execution for Strands Agents — Tracking Ticket

Problem

Desired User Experience

Workstreams

WS1: Python SDK — Step Abstraction + Plugin Approach

WS2: TypeScript SDK — invokeWithCheckpoint() + Temporal Package

WS3: External Outreach (post-implementation)

Task 11: MCP Support in Durable Context

Corrected MCP Gap Analysis

Subtasks

Bottom Line

Key Design Decisions

Known Gaps

Out of Scope (Follow-up)

Timeline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

WS2: TypeScript SDK — `invokeWithCheckpoint()` + Temporal Package