Menu

Software Architecture and System Design News

Latest curated articles from top engineering blogs

NetflixUberMetaLinkedInSpotifyGitHubAirbnbPinterestSlackDropboxCloudflareStripeDatadogFigmaShopifyAWSGoogle CloudAzureWerner Vogels& 15+ more

178 articles

InfoQ Architecture·4h ago

AWS Blocks: AI Agent-Driven Backend Development with Infrastructure as Code

AWS Blocks is an open-source TypeScript framework enabling AI agents to construct backends by bundling application code, local development, and AWS infrastructure. It emphasizes a "local-first" development model and uses built-in steering files to guide AI agents toward correct architectural patterns, abstracting away the complexities of infrastructure tools like AWS CDK while retaining an escape hatch for custom configurations.

Cloud & InfrastructureAI & ML Infrastructure
634114
Cloudflare Blog·16h ago

Debugging a Race Condition in Cloudflare's Image Service and Hyper HTTP Library

This article details a complex debugging effort at Cloudflare to resolve a subtle race condition within the hyper HTTP library, affecting their Rust-based Images service. The bug caused intermittent truncation of large image responses, despite 200 OK statuses, due to a premature socket shutdown. The incident highlights the challenges of debugging timing-sensitive issues in distributed systems and the importance of deep system observability.

Distributed SystemsPerformance & Scaling
1026223
Meta Engineering·16h ago

Meta's AV1 Adoption for Real-Time Communication: System Design Challenges and Solutions

Meta's adoption of AV1 for real-time communication (RTC) across Messenger and WhatsApp highlights critical system design considerations for integrating new, computationally intensive codecs at scale. The article details challenges in balancing video quality, low latency, power efficiency, and binary size, especially for a diverse range of mobile devices. It showcases architectural solutions including custom low-complexity encoders, optimized decoder selection, and ML-based device eligibility frameworks to ensure broad and reliable AV1 deployment.

Distributed SystemsPerformance & Scaling
966952
Dev.to #architecture·16h ago

.NET Feature Module Engine for Configurable Application Behavior

This article introduces PowerCSharp Features, a .NET engine designed to manage application features as self-contained modules, enabling dynamic configuration and conditional behavior based on feature flags. It addresses the common pain point of monolithic library dependencies by allowing consumers to selectively enable or disable features at runtime across different environments, promoting a more flexible and modular application architecture.

MicroservicesTools & Frameworks
866132
InfoQ Architecture·3d ago

Block's Monorepo Migration: Tackling Dependency Drift in JVM Ecosystems

Block, Inc. migrated approximately 450 JVM repositories into a monorepo for Cash App and Square to address significant dependency management and coordination challenges inherent in a polyrepo architecture. This shift aimed to simplify cross-service development, improve dependency visibility, and reduce operational friction, ultimately enhancing developer experience and CI/CD efficiency for their distributed systems. The article details the motivations, implementation strategies, and resulting benefits and challenges of this large-scale architectural change.

DevOps & SREMicroservices
16011251
Cloudflare Blog·3d ago

Designing Frictionless Deployment Workflows for AI Agents

This article introduces Cloudflare's Temporary Accounts feature, enabling AI agents to deploy web applications and APIs without manual sign-up or authentication. It highlights the architectural considerations for creating frictionless, programmatic access to cloud resources, addressing challenges like human-centric OAuth flows and the need for rapid iteration in agentic development. The system facilitates a "write "

Cloud & InfrastructureAPI Design
15811020
The New Stack·4d ago

Reimagining Code Hosting for the AI Agent Era: Architectural Shifts

This article explores the architectural challenges faced by existing code hosting platforms like GitHub due to the explosion of AI-generated code and the emergence of AI agents. It highlights new projects like Origin, Project Switch (GitLab), and DeltaDB (Zed) that are rethinking the underlying infrastructure of version control to handle high-velocity, agent-driven workflows, focusing on distributed systems, data models, and performance at scale.

Distributed SystemsTools & Frameworks
20412094
ByteByteGo·4d ago

Observability Fundamentals: Logs, Metrics, and Traces in System Design

This article introduces the foundational concepts of observability: logs, metrics, and traces. It explains how these three telemetry types provide different perspectives on events generated by a running service, enabling engineers to understand system behavior, diagnose issues, and make informed architectural decisions. Understanding these primitives is crucial for designing resilient and maintainable distributed systems.

DevOps & SREDistributed Systems
16211218
DZone Microservices·5d ago

Externalizing Business Logic with Expression Languages for Dynamic Systems

This article explores an architectural approach to externalize business rules from application code into a database using expression languages like MVEL. This strategy enhances system flexibility by enabling dynamic updates to logic (e.g., discount rates, loyalty points) without requiring code deployments. It highlights the benefits of separating data from code, focusing on reducing maintenance burdens and accelerating responsiveness to business changes.

MicroservicesPerformance & Scaling
16210370
Hacker News·5d ago

Lore: A Scalable, Content-Addressed Version Control System

Lore is an open-source, centralized version control system designed by Epic Games for extreme scalability, particularly for projects combining code with large binary assets. Its architecture leverages content-addressed storage using Merkle trees and an immutable revision chain for data integrity and efficient deduplication. Key design choices focus on optimizing for large teams and high-throughput scenarios, enabling on-demand data hydration and lightweight branching.

Distributed SystemsDatabases & Storage
1258118
InfoQ Architecture·10d ago

WebMCP: Standardizing AI Agent Interaction with Web UIs

WebMCP is a new standard proposal allowing web developers to explicitly expose JavaScript functions and HTML forms as "tools" to in-browser AI agents. This aims to enable more reliable, precise, and token-efficient agentic web actuation by moving away from unreliable methods like DOM scraping and screenshot analysis. The specification includes both Declarative (HTML attributes) and Imperative (JavaScript API) methods for defining agent capabilities, significantly reducing LLM token usage and improving determinism.

API DesignAI & ML Infrastructure
25116693
Slack Engineering·11d ago

Evaluating Agent-Driven E2E Testing Architectures: Trade-offs in Reliability, Speed, and Cost

This article explores the architectural considerations and trade-offs of integrating agent-driven end-to-end (E2E) testing into existing development workflows. It details an experiment comparing different execution models (Playwright MCP, Playwright CLI, Generated Tests) in terms of reliability, speed, and cost, highlighting the impact of context management and execution environment on performance and resource consumption. The findings offer insights into where agentic testing best fits within a comprehensive testing strategy, emphasizing its role in exploratory testing due to higher costs and flexibility.

Distributed SystemsTools & Frameworks
17211688