This article explores the evolving role of Large Language Models (LLMs) and agentic programming in software development, focusing on their architectural implications for areas like legacy system modernization, data transformation, and multi-jurisdictional system design. It discusses the potential to simplify complex processes and challenges developers to rethink traditional architectural patterns in light of new AI capabilities, while also highlighting the importance of human learning and predictable system behavior.
Read original on Martin FowlerThe advent of LLMs introduces a significant shift in how legacy system modernization is approached. Traditionally, "Lift and Shift" (porting a system to a new platform with feature parity) was often seen as a missed opportunity, advocating for re-evaluation of user needs and feature sets. However, LLMs can drastically reduce the cost of porting existing code to a new platform (e.g., behavioral cloning of a GNU Cobol compiler to Rust in days). This makes lift-and-shift a more viable and often recommended first step in modernization, as it provides a better environment for subsequent, more strategic refactoring and evolution, rather than being the final solution.
Designing systems for multiple jurisdictions, each with unique regulatory controls, often leads to significant software complexity in deciding and applying the correct rules. Agentic programming presents a potential paradigm shift: instead of building one complex system to handle all variations, developers might create individual, simpler systems for each jurisdiction. LLMs could then be leveraged to ensure consistency between these systems as product rules change, reducing the inherent complexity of identifying commonalities and differences across contexts, a fundamental challenge in software design.
A crucial architectural consideration with LLMs is whether to use them as autonomous agents or predictable functions. While agents offer autonomy, they introduce unpredictability, making debugging challenging. For workflows with known sequences of steps where an LLM is involved, using LLMs as functions is often superior. This approach provides predictable composition, faster execution (fewer tokens), and easier failure handling, as the scope of interaction is smaller and more controllable. This suggests that architectural patterns should favor explicit orchestration over relying on agentic autonomy for critical paths.
LLMs: Functions over Agents
When integrating LLMs into a system, consider treating them as deterministic functions for specific tasks within a defined workflow, rather than relying on their autonomous agent capabilities, especially for critical or complex operations. This enhances predictability, testability, and debuggability.
The article emphasizes that focusing on architectural cleanliness and robust "harness engineering" is more effective than accumulating LLM "skills." A clean codebase with clear patterns and a well-defined project configuration minimizes the need for extensive LLM configuration. If an LLM struggles (e.g., writing good tests), the problem often lies in the underlying code's inconsistency or complex setup, not the LLM's inability. Improving the system's foundational architecture (e.g., clean test files) allows the LLM to perform better without explicit "skills," highlighting that architecture should precede configuration when working with AI tools.
The rise of agentic programming forces a renewed focus on non-determinism, a challenge long faced by distributed systems. Concepts like Chaos Engineering, famously used by Netflix to test system resiliency, raise questions about their AI counterparts. A "Chaos Monkey for AI" could intentionally introduce hallucinations into a pipeline to test the robustness of detection and recovery mechanisms. This parallel underscores the need for robust monitoring, validation, and error handling strategies in AI-infused architectures, akin to those developed for distributed systems to manage inherent unpredictability.
Testing AI Resiliency
Consider implementing 'AI Chaos Engineering' by deliberately introducing model failures or subtle data corruptions into your AI pipelines to validate the system's ability to detect, mitigate, and recover from such non-deterministic behaviors.