Menu

Software Architecture and System Design News

Latest curated articles from top engineering blogs

NetflixUberMetaLinkedInSpotifyGitHubAirbnbPinterestSlackDropboxCloudflareStripeDatadogFigmaShopifyAWSGoogle CloudAzureWerner Vogels& 15+ more

541 articles

Dev.to #architecture·10d ago

Implementing a Circuit Breaker for AI Tool Calls to Prevent Cascading Failures

This article details the design and implementation of an MCP (Multi-protocol Communication Protocol) circuit breaker to prevent cascading failures in AI agent workflows. It focuses on how the circuit breaker pattern, a key distributed systems concept, can be applied to isolate flaky external tool calls and ensure system resilience. The post explores the state machine, failure handling, and configuration for robust operation at scale.

Distributed SystemsPerformance & Scaling
132187619
Medium #system-design·10d ago

Architectural Strategy for Migrating Legacy Database-Centric Systems with Event Sourcing

This article outlines an architectural strategy for migrating legacy database-centric systems using events and progressive ownership transfer. It focuses on how to incrementally modernize monolithic applications by extracting functionalities and data, leveraging event-driven patterns to decouple services and manage data consistency during the transition.

MicroservicesDistributed Systems
133488434
Dev.to #architecture·10d ago

Designing RAG Pipelines: Ingestion and Query Shifts

This article provides a detailed breakdown of the two distinct operational shifts in a Retrieval Augmented Generation (RAG) pipeline: ingestion (offline) and query time (live). It emphasizes the architectural decisions and potential failure points within each shift, focusing on critical steps like document parsing, chunking, embedding, and retrieval to ensure accurate and contextually relevant AI responses. Understanding these shifts is crucial for building robust and debuggable RAG systems.

AI & ML InfrastructureDistributed Systems
85858241
Medium #system-design·10d ago

System Design Glossary: Essential Concepts for Architects

This article serves as a crucial glossary, defining fundamental terms and concepts frequently encountered in system design and software architecture. It provides a shared reference for understanding complex distributed systems, architectural patterns, and scalability considerations, ensuring clarity across various system design discussions and analyses.

Distributed SystemsPerformance & Scaling
92157464
Dev.to #architecture·11d ago

Building a Real-time API Proxy for Cross-Platform LLM Tooling Interoperability

This article details the architecture and implementation of a local proxy designed to enable interoperability between Cursor IDE and GitHub Copilot. It explores the challenges of bypassing proprietary routing and transforming API request schemas in real-time to bridge two different AI model ecosystems. The solution highlights practical techniques for HTTP interception, payload manipulation, and AST cleansing within a proxy architecture.

API DesignDistributed Systems
49531165
InfoQ Cloud·11d ago

Replacing Database Sequences at Scale: A Distributed ID Generation System

This article details Coupang's journey to replace legacy database sequences with a highly available, low-latency distributed ID generation system without breaking over 100 existing services. The solution leverages local application caching, server-side caching, and DynamoDB as the source of truth, optimizing for performance and availability over strict global ordering and gap-free IDs. It highlights practical design principles for large-scale migrations, emphasizing simplicity and backward compatibility.

Distributed SystemsDatabases & Storage
46930630
Dev.to #systemdesign·11d ago

Architectural Deep Dive into Claude Code's LLM Agent Loop

This article dissects the core `while(true)` loop powering Claude Code's AI coding agent, revealing its state machine architecture for managing complex interactions with large language models and tools. It highlights critical design decisions like avoiding recursion for stack overflow prevention and implementing streaming tool execution for significant performance gains, showcasing a robust approach to building interactive AI agents.

AI & ML InfrastructureDistributed Systems
49130830
Medium #system-design·11d ago

Ensuring Data Integrity in Observability Platforms

This article discusses common pitfalls in observability platforms that lead to inaccurate data and offers practical strategies to ensure the integrity and reliability of monitoring and logging systems. It emphasizes the importance of understanding data lifecycles, proper instrumentation, and architectural considerations to prevent 'lying' platforms.

DevOps & SREDistributed Systems
44030307
InfoQ Architecture·11d ago

Replacing Database Sequences at Scale: A Cached, Distributed ID Generation System

This article details Coupang's journey to replace traditional database sequences with a highly scalable, available, and low-latency distributed ID generation system. It highlights critical design decisions, such as prioritizing eventual consistency and local caching over strict global ordering and network calls, to support over 100 services and facilitate a seamless migration from relational databases to NoSQL.

Distributed SystemsDatabases & Storage
46231108
Dev.to #systemdesign·11d ago

Scaling Challenges with Misused Vector Databases

This article highlights a common architectural pitfall where a system broke during scaling not due to performance bottlenecks, but incorrect database selection. The author mistakenly used a vector database for both similarity search and general data storage, leading to poor performance and scalability issues. The solution involved adopting a hybrid architecture, leveraging a vector database for its strengths (semantic search) and a traditional database for its (exact-match queries and structured data storage).

Databases & StorageDistributed Systems
41725979
The New Stack·11d ago

Hybrid Search Architectures for Production RAG Pipelines

This article discusses an architectural problem in RAG (Retrieval-Augmented Generation) pipelines where pure vector similarity falls short in production environments, leading to issues like stale information, security leaks, and incorrect answers. It proposes "hybrid search," which combines vector similarity with structured SQL predicates within a single database query, as a solution. The article highlights how this approach improves retrieval accuracy, enhances security through relational joins, and simplifies operational complexity compared to a "vector sidecar" anti-pattern.

Databases & StorageAI & ML Infrastructure
35326945
InfoQ Architecture·11d ago

Communicating Architecture and Decentralized Decision-Making in System Design

This panel discussion from InfoQ explores critical aspects of modern software architecture, focusing on effective communication strategies for architectural concerns to diverse stakeholders and the benefits of decentralized decision-making through Architecture Decision Records (ADRs). Experts share insights on bridging technical and business perspectives to foster a holistic system understanding and improve collaboration.

Distributed SystemsDevOps & SRE
37026042