Martin Fowler·March 26, 2026

Automated Testing for LLM Specifications in SDD

This article highlights a critical gap in the specification-driven development (SDD) approach for Large Language Models (LLMs): the lack of automated testing for specifications. It emphasizes that while defining desired behavior and constraints for LLMs is good practice, these specifications must be encoded into executable tests to effectively enforce the contract and prevent drift, rather than relying solely on documentation.

DevOps & SRE AI & ML Infrastructure Tools & Frameworks

Read original on Martin Fowler

The Gap in Specification-Driven Development for LLMs

The adoption of Large Language Models (LLMs) has led to a surge in interest in specification-driven development (SDD). The common advice is to write detailed specifications outlining desired behavior, constraints, and guardrails for LLM agents. This approach aims to provide clarity and direction, much like traditional software specifications.

⚠️

The Specification Trap

Many developers treat the specification document as the primary safety net, but it's merely a blueprint. Without automated tests, there's no reliable mechanism to detect when an LLM's behavior deviates from its intended contract, leading to potential issues in production.

The Need for Executable Specifications and Test Suites

The crucial next step, often overlooked, is to translate these specifications into automated tests that actively enforce the contract. Just as in traditional software development, a specification document provides the *what*, but a robust test suite provides the *proof* that the *what* is being met. This is particularly vital for LLMs, where outputs can be non-deterministic and prone to 'drift' over time or with new prompts.

Prevent Drift: LLMs can exhibit unexpected behaviors or 'drift' from intended logic without explicit programming. Automated tests catch these deviations.
Ensure Contract Adherence: Guarantees that the LLM continues to meet its defined functional and non-functional requirements.
Improve Reliability: Increases confidence in LLM applications by systematically validating their responses against expectations.
Facilitate Iteration: Allows for safer experimentation and updates to LLMs or prompts, knowing that core behaviors are still verified.

Implementing a test suite for LLM specifications is analogous to establishing a continuous integration process for code. It provides an immediate feedback loop when changes to prompts, models, or underlying data cause a violation of the specified behavior.

LLMTestingSpecification Driven DevelopmentAISoftware QualityAutomated TestingDevOps

Comments

Loading comments...

Architecture Design

Design this yourself

Design a system for managing and enforcing specifications for a conversational AI agent using a specification-driven development approach. Include mechanisms for defining expected LLM behaviors, converting these specifications into executable tests, and integrating them into a continuous delivery pipeline to detect behavioral drift and ensure adherence to contract.

Practice Interview

Focus: Automated testing for LLM specifications

Other design angles

· Design a framework specifically for testing the outputs and constraints of various LLM prompts, ensuring robustness across different models and use cases.· Architect a CI/CD pipeline that incorporates automated validation of LLM responses against predefined criteria, focusing on data governance and ethical AI principles.· Propose a system to automatically generate test cases from high-level LLM specifications, including edge cases and adversarial inputs, to improve model resilience.

Automated Testing for LLM Specifications in SDD

The Gap in Specification-Driven Development for LLMs

The Need for Executable Specifications and Test Suites

Comments

Architecture Design

Related Lessons