Menu
InfoQ Architecture·June 13, 2026

WebMCP: Standardizing AI Agent Interaction with Web UIs

WebMCP is a new standard proposal allowing web developers to explicitly expose JavaScript functions and HTML forms as "tools" to in-browser AI agents. This aims to enable more reliable, precise, and token-efficient agentic web actuation by moving away from unreliable methods like DOM scraping and screenshot analysis. The specification includes both Declarative (HTML attributes) and Imperative (JavaScript API) methods for defining agent capabilities, significantly reducing LLM token usage and improving determinism.

Read original on InfoQ Architecture

The Web Machine Context Protocol (WebMCP) introduces a standardized way for web applications to communicate their capabilities to in-browser AI agents. This protocol addresses significant limitations of previous approaches where agents relied on heuristic methods such as DOM scraping, screenshot analysis, and simulated clicks, which are prone to breakage due to UI changes, expensive in terms of computational resources (LLM tokens), and often non-deterministic.

Improving Agent Reliability and Efficiency

Traditionally, an AI agent attempting to interact with a web page would download the Document Object Model (DOM), analyze visual elements, and deduce interaction points. This process is fragile, as even minor layout shifts or delayed content loading can break the automation. WebMCP bypasses these issues by allowing developers to define explicit, machine-readable APIs for agents, enabling direct function calls to perform tasks. Early benchmarks show up to a 90% reduction in LLM token usage and significant improvements in speed and determinism.

Core API Surfaces

WebMCP provides two primary methods for developers to expose tools to AI agents:

  • Declarative API: Leverages custom HTML attributes (e.g., `toolname`, `tooldescription`, `toolautosubmit`) to annotate existing HTML forms, making them discoverable and understandable by agents.
  • Imperative API: Uses the `document.modelContext.registerTool()` JavaScript interface. This method allows for more dynamic tool registration, requiring a tool name, description, an input schema (JSON Schema), and an `execute` function. The `execute` function handles UI logic, state management, and returns direct payloads to the agent.
html
<form toolname="Search flights" tooldescription="This form searches flights and displays [...]" toolautosubmit>
</form>
js
document.modelContext.registerTool({
  name: 'toggle_layer',
  description: 'Control pizza layers (sauce, cheese). Use "add", "remove", or "toggle".',
  inputSchema: {
    type: 'object',
    properties: {
      layer: { type: 'string', enum: ['sauce-layer', 'cheese-layer'] },
      action: { type: 'string', enum: ['add', 'remove', 'toggle'] },
    },
    required: ['layer'],
  },
  execute: async ({ layer, action }) => {
    await toggleLayer(layer, action);
    return `Performed ${action || 'toggle'} on layer: ${layer}`;
  },
});

Security and Operational Considerations

Exposing native site APIs to AI agents introduces new security and operational risks. Developers must guard against indirect prompt injection and manage permission gaps. WebMCP suggests using `untrustedContentHint` for externally sourced data payloads and `readOnlyHint` for non-mutating operations to guide agent decision-making. Continuous AI evaluations (`AI evals`) on targeted user journeys are recommended to ensure agents perform actions correctly and in line with business rules.

💡

Best Practices for Tool Descriptions

Developers are advised to keep tool descriptions concise to fit within LLM context windows. Specific character budgets are recommended: 500 for tool descriptions, 150 for parameter descriptions, and 30 for names, with a 1,500-character limit for individual tool output. This optimizes token usage and agent comprehension.

AI agentsWeb standardsBrowser APIFrontend architectureLLM integrationToolingSecurityDeveloper experience

Comments

Loading comments...