InfoQ Architecture·May 29, 2026

Optimizing LLM Agent Workflow Costs: GitHub's Audit and Pruning Strategy

GitHub significantly reduced token usage and costs in its LLM agent workflows by implementing daily audits, pruning unused Model Context Protocol (MCP) tools, and replacing certain MCP calls with GitHub CLI invocations. This approach highlights a system for continuous optimization in AI-driven automation, crucial for managing the operational expenses of LLM-based systems.

DevOps & SRE AI & ML Infrastructure Performance & Scaling

Read original on InfoQ Architecture

The Challenge of LLM Agent Costs

Large Language Model (LLM) agents, especially when integrated into continuous integration (CI) pipelines and other automated workflows, can incur substantial and often hidden costs due to token consumption. Each API call to an LLM, particularly those that include large context windows like tool schemas, directly translates to expenditure. GitHub's experience demonstrates that without active management, these costs can accumulate rapidly, making cost optimization a critical aspect of designing and operating systems that leverage LLMs.

GitHub's Multi-pronged Optimization Strategy

GitHub tackled this problem with a comprehensive strategy involving observability, automated auditing, and targeted optimization. Their approach is particularly relevant for system architects designing platforms that rely on third-party API usage where cost is a direct function of usage. The key components include:

Centralized API Proxy & Token Tracking: All agent calls are routed through an API proxy that generates detailed `token-usage.jsonl` artifacts. This provides a normalized view of input, output, and cache tokens across different LLM CLIs (Claude, Copilot, Codex).
Effective Tokens (ET) Metric: To standardize cost comparison across various LLM models (which have different pricing tiers), GitHub introduced an 'Effective Tokens' metric. This metric weights output tokens higher (4x), cache reads lower (0.1x), and applies a model-specific multiplier. This allows for a consistent measure of cost reduction regardless of the underlying LLM model.
Automated Daily Audit Agent: An agent aggregates token consumption by workflow, identifies anomalous runs, and highlights the most expensive jobs. This component acts as a monitoring and alerting system.
Automated Daily Token Optimiser: When the audit agent flags an issue, this optimiser agent reads workflow source and logs, then opens a GitHub issue proposing specific fixes. This self-correcting loop is a powerful pattern for continuous system improvement.

💡

Architectural Lesson: Observability is Key to Optimization

GitHub's success hinges on robust observability at the API proxy level. By meticulously tracking token usage, they gained the insights needed to identify inefficiencies. This emphasizes that designing systems with comprehensive metrics and logging from the outset is crucial for future optimization efforts, especially in cost-sensitive distributed environments.

Key Optimization Techniques

MCP Pruning: Unused Model Context Protocol (MCP) tools were a significant source of overhead. Since LLM APIs are stateless, tool schemas are often sent with every request. Removing unused schemas drastically cut per-call context, reducing token spend.
Replacing MCP with GitHub CLI: For specific tasks like fetching pull request diffs, GitHub replaced verbose MCP calls with more efficient `gh CLI` commands. This involved either pre-downloading data or using a transparent HTTP proxy for secure runtime invocation, avoiding exposing authentication tokens to the LLM agent itself.

The results were substantial, with reductions of up to 62% in Effective Tokens across various production workflows, demonstrating the effectiveness of their systematic approach to cost management in AI-driven systems.

LLMAI AgentsFinOpsCost OptimizationDevOpsCI/CDAPI ManagementDistributed Systems

Comments

Loading comments...

Architecture Design

Design this yourself

Design a continuous optimization platform for managing costs in an enterprise environment that heavily utilizes LLM agents across various automated workflows. The platform should include a centralized API gateway for LLM interactions, comprehensive token usage tracking, an 'Effective Tokens' metric for normalized cost comparison, automated auditing to detect anomalies, and an optimization agent capable of proposing and initiating fixes for inefficient workflows.

Practice Interview

Other design angles

· Design a cost governance system for multi-tenant SaaS applications utilizing LLMs, focusing on per-tenant usage tracking, quota enforcement, and automated budget alerts.· Design an API proxy service specifically optimized for LLM interactions, including features like prompt caching, dynamic tool schema management, and real-time cost estimation.· Architect a CI/CD pipeline integrated with an LLM agent optimization framework, where workflow changes are automatically analyzed for potential token cost increases before deployment.