Menu

Software Architecture and System Design News

Latest curated articles from top engineering blogs

NetflixUberMetaLinkedInSpotifyGitHubAirbnbPinterestSlackDropboxCloudflareStripeDatadogFigmaShopifyAWSGoogle CloudAzureWerner Vogels& 15+ more

489 articles

Cloudflare Blog·4h ago

Optimizing Bare-Metal Server Boot Times in Large-Scale Infrastructure

Cloudflare's incident with core unit boot times escalating from minutes to hours highlights critical considerations in managing bare-metal infrastructure. The core issue stemmed from inefficient network boot processes and firmware quirks, leading to substantial operational overhead. This case study details their methodical approach to diagnosing and resolving these issues, offering insights into automation, vendor collaboration, and UEFI intricacies for maintaining fleet efficiency.

Cloud & InfrastructureDevOps & SRE
533376
Dev.to #systemdesign·4h ago

Offloading Image Processing to Dedicated Microservices

This article highlights common pitfalls of handling image processing directly within a main application, such as dependency bloat, performance bottlenecks, and resource contention. It advocates for an architectural pattern where image manipulation tasks are offloaded to dedicated microservices or external APIs to improve scalability, maintainability, and resource efficiency. This approach aligns with microservices principles by isolating complex, resource-intensive operations.

MicroservicesPerformance & Scaling
563277
Dev.to #systemdesign·16h ago

Architecting High-Concurrency Fintech Systems for Emerging Markets

This article explores architectural principles for building resilient, high-concurrency fintech infrastructure, specifically addressing challenges in the West African market. It emphasizes event-driven microservices, idempotency, and smart payment routing to handle transient network failures, transaction spikes, and complex third-party integrations.

Distributed SystemsPerformance & Scaling
25317305
Airbnb Engineering·16h ago

Geographic Prior Propagation for Robust Demand Forecasting in Dynamic Environments

This article details Airbnb's approach to robust demand forecasting during periods of unprecedented change, like the COVID-19 pandemic. Faced with unreliable historical data, they developed a system that leverages sequential geographic recovery signals and prior propagation. This allowed them to generate timely and reliable corridor-level forecasts by borrowing information from structurally similar markets that experienced changes earlier, overcoming data scarcity in newly affected regions.

AI & ML InfrastructureDistributed Systems
26216354
The New Stack·1d ago

JetBrains Mellum2: A Specialized 12B-Parameter MoE Model for AI Agent Infrastructure

JetBrains has open-sourced Mellum2, a 12B-parameter Mixture-of-Experts (MoE) coding model optimized for infrastructure-layer AI agent tasks like routing, retrieval pipelines, and sub-agent coordination. Designed for speed and efficient inference in production environments, Mellum2 offers an alternative to proprietary models, allowing for private on-premises deployment and greater operational control, particularly relevant for enterprises building their own AI infrastructure.

AI & ML InfrastructureDistributed Systems
23416248
Martin Fowler·1d ago

Impact of AI on Software Development Workflows and Technical Debt

This article explores various perspectives on the integration of AI into software development, touching on the challenges of measuring AI productivity, the evolving nature of jobs due to automation, and the impact of AI on security and technical debt. It highlights how AI can both introduce 'generative debt' by perpetuating bad code and significantly accelerate bug detection and remediation, altering development workflows and requiring a shift in focus to human-orchestrated agent systems.

AI & ML InfrastructureDevOps & SRE
26417051
Dev.to #systemdesign·1d ago

Frontend System Design Interview Preparation Guide

This article provides a comprehensive guide to preparing for frontend system design interviews, emphasizing that these interviews assess a senior engineer's ability to architect complex frontend applications at scale. It outlines a structured five-step approach, covering requirements gathering, high-level architecture, data modeling, API design, and cross-cutting concerns like performance and security.

API DesignPerformance & Scaling
18112154
Dev.to #systemdesign·2d ago

Architecting a High-Fidelity Financial Market Simulator: A 4-Part Series Overview

This article introduces a four-part technical series detailing the system design and architectural trade-offs involved in building VTrade, a high-fidelity paper trading simulator. It highlights the complexities of replicating real-world financial markets, emphasizing an event-driven approach to handle execution, portfolio intelligence, AI integration, and gamified distributed systems. The series promises deep dives into core execution architecture, real-time analytics pipelines, LLM integration, and scalable state-tracking backends.

Distributed SystemsPerformance & Scaling
20113231
The New Stack·2d ago

Architecting AI Retrieval Systems for Scale and Performance

This article discusses the evolution of AI retrieval from simple vector search to complex, integrated systems combining keyword matching, semantic retrieval, ranking, and real-time signals. It highlights that building scalable AI retrieval is primarily a system design challenge, not just a tooling problem, emphasizing the operational overhead and architectural trade-offs of fragmented retrieval pipelines. The report advocates for platform convergence to improve latency, data freshness, and experimentation while acknowledging the complexities of migration.

AI & ML InfrastructureDistributed Systems
17614291
Dev.to #architecture·2d ago

Choosing Between Strong and Eventual Consistency in Distributed Systems

This article explores the fundamental differences between strong and eventual consistency, providing practical insights into when to choose each for distributed systems. It highlights the trade-offs in terms of data accuracy, performance, and architectural complexity, drawing from real-world project experiences in banking and manufacturing ERP systems.

Distributed SystemsDatabases & Storage
19212698
InfoQ Architecture·2d ago

Building Highly Customizable Software with Theme Systems at Scale: Shopify's Liquid Case Study

This article explores the architectural challenges and solutions for building highly customizable software systems that must also perform under massive traffic, using Shopify's Liquid theme system as a case study. It delves into the design of a secure domain-specific language (DSL) for templating, mechanisms for integrating native code extensions, and the developer tooling necessary to support such a platform. Key insights include balancing flexibility for non-technical users with strict security and performance requirements for third-party code.

Performance & ScalingAPI Design
19213971
Dev.to #architecture·2d ago

Scaling a Staff Management System: From Monolith to Microservices with Eventual Consistency

This article details a system re-architecture to scale a staff management system named Veltrix, highlighting the importance of correct service boundaries and consistency models. It describes the shift from a monolithic architecture with strong consistency to a microservices-based approach leveraging eventual consistency, Apache Kafka, and Apache Cassandra to achieve significant performance improvements and resilience.

MicroservicesPerformance & Scaling
21914265