Skip to content

docs(designs): MCP integration beyond tools#718

Open
mkmeral wants to merge 2 commits intostrands-agents:mainfrom
mkmeral:design/mcp-integration
Open

docs(designs): MCP integration beyond tools#718
mkmeral wants to merge 2 commits intostrands-agents:mainfrom
mkmeral:design/mcp-integration

Conversation

@mkmeral
Copy link
Copy Markdown
Contributor

@mkmeral mkmeral commented Mar 30, 2026

Description

A design proposal for how to integrate MCP with Strands

Related Issues

strands-agents/sdk-python#1659

Type of Change

Design Doc

Checklist

  • I have read the CONTRIBUTING document
  • My changes follow the project's documentation style
  • I have tested the documentation locally using npm run dev
  • Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-718/docs/user-guide/quickstart/overview/

Updated at: 2026-04-07T16:51:40.815Z

…ections

Incremental additions to the MCP design doc:
- Tasks section: current implementation status, spec gaps table, P1/P2 priorities
- Configuration & Auth: env passthrough, transport defaults, bearer token config sugar
- Open question strands-agents#6: model-immediate-response as future concern (async plumbing)

- **Tool list changes go unnoticed.** Some MCP servers dynamically add or remove tools based on context (e.g., auth state, project type). The server sends `notifications/tools/list_changed`, but Strands' message handler only processes exceptions. The notification falls through silently, and the agent keeps using a stale tool list until restart.

- **Servers can't request LLM completions.** The MCP spec allows servers to ask the client to generate text via `sampling/createMessage`. No production server uses this today, but the pattern is growing — it enables MCP servers that behave as agents rather than just tool providers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the pattern growing if no production servers use it? Are hobby or dev servers using it?

agent = Agent(
model=my_model,
tools=[my_local_tool], # local tools still go in tools=
plugins=[plugin], # MCP integration via plugin
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to de-dupe tool/client names being passed across the plugin initialization and agent initialization, otherwise we'll get a ValueError for tool already found when registering

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to de-dupe tool/client names being passed across the plugin initialization and agent initialization

we should be adding a prefix to tool names when they come through mcp. if not, i'll add it, but yes. that's a common issue where tool names across local and other MCP servers can have the same name


There are also two bugs: `_create_call_tool_coroutine()` doesn't forward the `_meta` field from tool call arguments (breaking progress tokens and custom metadata), and `MCPToolResult` discards the `isError` flag from `CallToolResult` (making it impossible to distinguish application errors from protocol errors).

Beyond the callback gaps, there's no integrated story. MCP events don't connect to the Strands hook system. There's no config file loading (every other MCP client supports this). There's no way to map MCP elicitation to Strands interrupts. If one of five MCP servers fails to start, the entire agent crashes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually blockers in the current setup? I think its pretty trivial to add config based loading in the current mcp client

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually blockers in the current setup?

i think not really "blockers", otherwise we would have heard people saying it doesn't work.

That said, if you want to achieve implementation of mcp.json with optional loading, you need to do a bunch of manual work, and check the connection before passing it to the agent. Check the code here https://github.com/mkmeral/containerized-strands-agents/blob/main/src/containerized_strands_agents/agent.py#L253

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, trivial was probably the wrong phrase. I meant more "uncontroversial"

Comment on lines +78 to +83
Or from a config file (standard format used by Claude Desktop, Cursor, VS Code):

```python
plugin = MCPPlugin.from_config("mcp.json", fail_open=True)
agent = Agent(model=my_model, plugins=[plugin])
```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is taking away from the doc a bit. Agree this is a gap, but the proposal here is to cover the gaps in the new mcp spec

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I take the task as improving MCP in strands. Definitely agree that adding new spec updates is part of it, but not all imo

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the purpose of this doc was to cover the mcp spec updates, not necessarily the mcp config feature request.

I agree that we should also have mcp config, but it is already being tracked as a part of this issue: strands-agents/sdk-python#482

I dont want this design discussion to get caught in the weeds of "if/how should we do mcp config", when we already have an issue tracking it that has been accepted by the team.

- Every user reinvents the same patterns. "Route MCP logs to Python logging" is a ~15-line function everyone will write. "Refresh tool cache when tools change" is another ~20-line function everyone will write.
- The marginal cost per MCP feature is low but constant — each new spec feature means a new `MCPClient.__init__` parameter and documentation.

**Recommendation:** Ship the wire-through callbacks as part of any option — they're small, useful, and serve as an escape hatch for users who need direct control or want to bypass the plugin.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably cleaner and more maintainable to carve one clear path for integration. This one looks like it requires a lot of lift on the user, so I'd lean towards not exposing callbacks at all

4. Installs a default `logging_callback` that routes MCP server logs to Python's `logging` module
5. Installs a default `list_roots_callback` that exposes the current working directory

Users who want to react to MCP events subscribe via the hook system they already know:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our hooks system allow for customer defined hook events? If not, we should just do that, and have this feature take advantage of that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, MCP plugin does that essentially.

Essentially you can create any event that extends base hook, then the only blocker (not sure if it is) is calling the invoke callbacks in agent's hook registry

Comment on lines +89 to +90
4. Installs a default `logging_callback` that routes MCP server logs to Python's `logging` module
5. Installs a default `list_roots_callback` that exposes the current working directory
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the "secure by default" version of this is? Do we want to allow an mcp server to read a filesystem by default? Or is it an explicit opt in?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more explicit wording there. In MCP plugin, all features are opt-in. I don't want to expose customer data to bunch of MCP servers :)


---

### Option 2: Wire Through (pass callbacks to MCPClient)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we have today right? A lowlevel client that does some tool specific stuff, and lets the user implement the rest?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but only for elicitation. There are more callbacks that we can hook up


These patterns make MCP feel like a natural part of the framework rather than just plumbing. They can be built on top of any option but are easiest with Options 1 or 3 because of hook integration.

### Elicitation as Interrupts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here is that elicitation requires the mcp client connection to remain open. You can't shut down the agent, restore from session, and then respond to interrupts. That is partly why we setup elicitation as a pass through. Not sure if things have changed since.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a pretty good call out 😅

I think that limitation still exists. I'll dive a bit deeper


---

### Option 3: Full Integration (first-class `mcp_clients` on Agent)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like this one, follows the session manager approach of a top level plugin primitive. MCP is THE industry standard for agentic communication today, so I think its ubiquitous enough to deserve a top level primitive spot

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCP is THE industry standard for agentic communication today

I wouldn't say agentic communication, but tool proxying sure. As I see it the main use case of MCP is tools and everything else is optional/additional. That's why I did not want to auto-connect everything (sampling, elicitation, etc) to agent. Then we will have more complexity on the core agent

I think plugins are a good middle ground


---

## Immediate Improvements (Ship Regardless of Option)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can TS have same pairity?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall feature-wise? yep. I think it should be part of MCP project


**P1 (ship soon):**
- Graceful startup failures (`fail_open`) — 30 lines, one broken server shouldn't crash the agent
- Progress callback passthrough — 20 lines, pass `progress_callback` to `call_tool()`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a stopgap solution until the MCPProgressEvent hook is in place? Or do both progress event paths have different purposes?

- **Opt-in via `TasksConfig`**: Pass `TasksConfig()` to `MCPClient` constructor to enable
- **Server capability detection**: Caches `tasks.requests.tools.call` during `session.initialize()`
- **Tool-level negotiation**: Reads `execution.taskSupport` per tool (`required`, `optional`, `forbidden`)
- **Full lifecycle**: `call_tool_as_task` → `poll_task` → `get_task_result` with timeout protection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider durable if we want a redesign?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, what would that look like? 🤔


---

## Tasks
Copy link
Copy Markdown
Member

@Unshure Unshure Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive seen a few attempts at this, and I think we need this feature regardless of MCP. A way for an llm to schedule a background task, and wait for it to respond at some point in the future. Could be particularly useful in running a background bash command, triggering a research agent, or calling an mcp tool. MCP should use this async task tool for its implementation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the same for MCP Dev Summit https://github.com/agent-of-mkmeral/strands-cli-agent

The main problem is plumbing. We can invoke the agent again with tool result, but where will the response go?

Additionally, if there is an ongoing conversation, the async injected context can hurt more than it helps. That's why I left this as more of a followup for now.

Maybe we should vend both option, and let users configure? so like default callbacks for task completion? 🤔


---

## Immediate Improvements (Ship Regardless of Option)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling out active bugs takes a bit away from the discussion of a design proposal. A bug is a bug, and should be fixed; we dont need a design discussion for that. Lets try to keep these designs focused on new feature proposals


3. **Include Option 2 (wire-through callbacks) as escape hatches** inside MCPClient. Power users who want raw control or have unusual requirements can bypass the plugin.

4. **Revisit Option 3 (first-class)** once we have adoption data on the plugin. If most users end up using MCPPlugin, promoting it to a native Agent parameter is straightforward.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on preliminary feature usage data we might already have enough signal to jump for this option which exposes the neatest interface to customers.

MCP is among the most popular feature we measured internally and on GH


---

## Willingness to Implement
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need this section?


---

## Tasks
Copy link
Copy Markdown
Member

@pgrayy pgrayy Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some model providers also support a background mode where you can send a request to the model and receive a response id to use for polling. I think it would actually make sense to exit out of the agent loop under these circumstances to allow the user to poll themselves. Polling internally defeats the purpose as connections remain open for the agent caller. I'd be curious if we could support something similar for background tools. It should work similarly to interrupts. We exit the agent loop and allow the user to reinvoke when ready.

This gets tricky though when executing multiple tools concurrently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of polling, you can also use notifications. So server can send notification. We need to implement both imho, so we can support whatever the mcp server supports


### Option 2: Wire Through (pass callbacks to MCPClient)

**The idea:** The simplest possible approach. Add the four missing callback parameters to `MCPClient.__init__()`, pass them through to `ClientSession`, and let users handle everything themselves. No hook integration, no auto-wiring, no plugin.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the simplest possible approach, can't we ship this with P0 tasks, then make option 1 (Plugin) as a follow-up?


## Open Questions

1. **Plugin location** — Should MCPPlugin live inside the SDK (`strands.plugins.mcp`) or as a separate package? Inside = better discoverability, separate = faster iteration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose this option and this is the recommended path for MCP, it is much better DX to include it directly in sdk-python

@@ -0,0 +1,429 @@
# MCP Integration Beyond Tools
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to existing tools=[mcp_client] users?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works as is. i dont think we should be changing that behavior for now


The `mcpServers` JSON config format we support today handles the basics (`command`, `args`, `url`, `headers`). A few small additions would improve the developer experience:

- **Pass-through environment keys**: Let users specify env var names to forward from the host environment, instead of hardcoding values. Example: `"env": {"passthrough": ["AWS_PROFILE", "DATABASE_URL"]}` forwards those vars from the host into the stdio subprocess without exposing secrets in config files.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more about the reference here?

Copy link
Copy Markdown
Contributor Author

@mkmeral mkmeral Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. https://kiro.dev/docs/mcp/configuration/

the idea is, MCP.json is not a good file, because it causes you to persist tokens in multiple places. With pass-through env vars, you can just say, this variable will come from environment


2. **Message handler API** — The plugin monkey-patches `_handle_error_message`. Should we add a public `set_message_handler()` on MCPClient?

3. **Elicitation-as-interrupts timing** — The elicitation callback fires during tool execution (not before). The current interrupt mechanism lives on `BeforeToolCallEvent`. Bridging these needs design work. Worth doing now or deferring?
Copy link
Copy Markdown
Member

@pgrayy pgrayy Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support interrupts from within decorator tool definitions as well. The interrupt method is on ToolContext. Just need to support raising interrupts from MCPTool is all. The piping is already in place. But see comment further above regarding why elicitation was setup as a pass through.

**Cons:**

- Requires changes to `Agent.__init__()` — adding a parameter, import paths, and initialization logic. This is a higher-risk change that affects every user, even those who don't use MCP.
- Needs more design work around lifecycle (when do MCP sessions start/stop?), multi-agent sharing (can two agents share an MCPClient?), and backward compatibility (what about existing `tools=[mcp_client]` code?).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-agent sharing(can two agents share an MCPClient?)

This applies to all three options right? This is kind of a design question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants