WebMCP is a new standard proposal allowing web developers to explicitly expose JavaScript functions and HTML forms as "tools" to in-browser AI agents. This aims to enable more reliable, precise, and token-efficient agentic web actuation by moving away from unreliable methods like DOM scraping and screenshot analysis. The specification includes both Declarative (HTML attributes) and Imperative (JavaScript API) methods for defining agent capabilities, significantly reducing LLM token usage and improving determinism.
Read original on InfoQ ArchitectureThe Web Machine Context Protocol (WebMCP) introduces a standardized way for web applications to communicate their capabilities to in-browser AI agents. This protocol addresses significant limitations of previous approaches where agents relied on heuristic methods such as DOM scraping, screenshot analysis, and simulated clicks, which are prone to breakage due to UI changes, expensive in terms of computational resources (LLM tokens), and often non-deterministic.
Traditionally, an AI agent attempting to interact with a web page would download the Document Object Model (DOM), analyze visual elements, and deduce interaction points. This process is fragile, as even minor layout shifts or delayed content loading can break the automation. WebMCP bypasses these issues by allowing developers to define explicit, machine-readable APIs for agents, enabling direct function calls to perform tasks. Early benchmarks show up to a 90% reduction in LLM token usage and significant improvements in speed and determinism.
WebMCP provides two primary methods for developers to expose tools to AI agents:
<form toolname="Search flights" tooldescription="This form searches flights and displays [...]" toolautosubmit>
</form>document.modelContext.registerTool({
name: 'toggle_layer',
description: 'Control pizza layers (sauce, cheese). Use "add", "remove", or "toggle".',
inputSchema: {
type: 'object',
properties: {
layer: { type: 'string', enum: ['sauce-layer', 'cheese-layer'] },
action: { type: 'string', enum: ['add', 'remove', 'toggle'] },
},
required: ['layer'],
},
execute: async ({ layer, action }) => {
await toggleLayer(layer, action);
return `Performed ${action || 'toggle'} on layer: ${layer}`;
},
});Exposing native site APIs to AI agents introduces new security and operational risks. Developers must guard against indirect prompt injection and manage permission gaps. WebMCP suggests using `untrustedContentHint` for externally sourced data payloads and `readOnlyHint` for non-mutating operations to guide agent decision-making. Continuous AI evaluations (`AI evals`) on targeted user journeys are recommended to ensure agents perform actions correctly and in line with business rules.
Best Practices for Tool Descriptions
Developers are advised to keep tool descriptions concise to fit within LLM context windows. Specific character budgets are recommended: 500 for tool descriptions, 150 for parameter descriptions, and 30 for names, with a 1,500-character limit for individual tool output. This optimizes token usage and agent comprehension.