Skip to content

Conversation

@MiguelsPizza
Copy link

@MiguelsPizza MiguelsPizza commented Sep 17, 2025

This draft proposal outlines a declarative WebMCP API that enables web pages to expose tools via HTML, using minimal attributes like tool-name and standard form semantics. Currently, it's a compilation of my notes and ideas from developing this approach, and I'm sharing it to gather feedback before the September 18th working group meeting. I'm particularly interested in your thoughts on the open questions (e.g., JSON vs. HTML responses, elicitation flows), tradeoffs, and overall API design.

The proposed API was shaped by building a real application and polyfill during the MCP enterprise hackathon, where our team successfully implemented it (and took home the win, which was exciting validation!). You can see a video of a Rails app using declarative WebMCP tools to enable complex browser automation without client-side JavaScript: link.

Based on your feedback, I'll refine this draft to align more closely with the structured format and narrative style of other explainers in the repo.

@MiguelsPizza MiguelsPizza marked this pull request as ready for review September 17, 2025 04:08
@bwalderman
Copy link
Collaborator

This is great work. I do have one general question. Was reusing ARIA attributes instead of introducing new tool-* attributes considered?

There are already attributes such as aria-label and aria-description and others for labelling and describing elements and so it might be helpful to define WebMCP mappings/behaviors for these instead of introducing entirely new HTML attributes.

One benefit of using these existing attributes is that they are also surfaced in native accessibility APIs, so assistive tools that already use these APIs to access the page's accessibility tree would be able to access WebMCP tools declarations as well.

@MiguelsPizza
Copy link
Author

@bwalderman This is a good idea, I'll put the PR in draft while I re-implement the ARIA based polyfill.

The only thing I can think is that we still need a way to make exposing tools to the agent opt-in (or opt-out)

Maybe we still tag elements with a tool-name to expose them to the agent? This will help prevent duplicate tool names which causes errors in most inference providers

@vsakaria
Copy link

vsakaria commented Nov 7, 2025

The concern with the HTML method is of course the iFrame. Realistically speaking its stood up well for some years now. Would an iFrame in a browser be more trustworthy. I would prefer JSON and rendering on client. I am sure web components can be distributed with framework payloads and CSS. But the build process for this type of architecture would have to change. I would prefer that design.

The trade off really is that payload would need to be disputed more frequently.

@anssiko
Copy link
Member

anssiko commented Nov 11, 2025

@matatk to review for the accessibility group's perspective (aka APA WG).

@anssiko
Copy link
Member

anssiko commented Nov 25, 2025

A new paper and implementation experience:

https://arxiv.org/abs/2511.11287v1
https://svenschultze.github.io/VOIX/

@svenschultze & team, this W3C community group is developing a WebMCP API that is complemented with a declarative mechanism explored in this PR.

Let’s join forces to explore this space. Here’s how to join:
https://webmachinelearning.github.io/community/#join

@anssiko
Copy link
Member

anssiko commented Nov 25, 2025

That was fast. I’m excited to welcome @svenschultze to the WebML Community Group! 🎉

@svenschultze
Copy link

Hi @anssiko, thank you for making me aware of this project! It is great to see the community converging on this. I'm happy to share some insights from our work on VOIX, where we implemented a similar declarative framework and tested it with developers.

  1. We established a more explicit interface where MCP tools are separated from standard UI HTML elements. This ensures the agent only accesses data and actions the developer specifically intended to share. I think this is also relevant for the discussion about including ARIA attributes. I think it is important not to just reuse ARIA since this could lead to conflicts of interest between optimizing for accessibility or agents.
  2. Is there an equivalent idea for declarative context/resources in this spec? We found that it was really helpful to explicitly set agent-only text elements (in our case, specific <context name="mouse_position"> elements). This avoids long context inputs of the full html text, hides potentially sensitive data like credit card numbers, and enables high-fidelity synergetic multimodal interaction where UI hover/selection states can be explicitly exposed to the agent. This way, you can interact with websites using commands like "move this to here" without requiring long chains of tool calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants