Menu

Software Architecture and System Design News

Latest curated articles from top engineering blogs

NetflixUberMetaLinkedInSpotifyGitHubAirbnbPinterestSlackDropboxCloudflareStripeDatadogFigmaShopifyAWSGoogle CloudAzureWerner Vogels& 15+ more

42 articles

📐ByteByteGo·9h ago

X (Twitter) 'For You' Feed Recommendation System Architecture

This article dissects the architecture of X's (formerly Twitter) 'For You' feed recommendation system, highlighting how it leverages a Grok-based transformer model to personalize content. It details the system's four core components: Home Mixer for orchestration, Thunder for real-time in-network post storage, Phoenix for ML-driven retrieval and ranking of out-of-network content, and the Candidate Pipeline framework for modularity. The piece emphasizes architectural choices that enable scalability, real-time performance, and a nuanced understanding of user engagement.

AI & ML InfrastructureDistributed Systems
45
☁️Cloudflare Blog·9h ago

Reimagining Next.js Architecture with Vite and AI for Serverless Environments

This article discusses Cloudflare's project, Vinext, a re-implementation of the Next.js API surface directly on Vite, aimed at improving deployment to serverless platforms like Cloudflare Workers. It highlights architectural challenges with traditional Next.js deployments in serverless environments and proposes a new approach leveraging Vite's ecosystem and AI for rapid development and optimized performance.

Cloud & InfrastructureMicroservices
9
👩‍💻Dev.to #systemdesign·15h ago

Dependency Injection for Scalable and Maintainable Systems

This article explores Dependency Injection (DI) as a crucial technique for building scalable and maintainable large-scale applications, directly addressing the Dependency Inversion Principle (DIP). It highlights how proper DI goes beyond object creation, significantly impacting performance through careful lifecycle management and reducing memory pressure from excessive object instantiations. Understanding DI is key for designing modular and testable software architectures.

MicroservicesAPI Design
38
📦Dropbox Tech·15h ago

Low-Bit Inference for Efficient AI Model Deployment at Scale

This article from Dropbox Tech explores low-bit inference techniques, specifically quantization, as a critical strategy for making large AI models more efficient, faster, and cheaper to run in production. It delves into how reducing numerical precision impacts memory, compute, and energy, and the architectural considerations for deploying these optimized models on modern hardware like GPUs, addressing latency and throughput constraints for real-world AI applications such as Dropbox Dash.

AI & ML InfrastructurePerformance & Scaling
32
📐ByteByteGo·21h ago

Understanding Core ML Concepts for LLM Architecture

This article delves into the foundational mathematical concepts underpinning Large Language Models (LLMs), focusing on how they learn and generate text. It explains loss functions, gradient descent, and next-token prediction, providing insights into the inherent capabilities and limitations that architects should consider when designing and deploying LLM-powered applications.

AI & ML InfrastructureDistributed Systems
30
📌Pinterest Engineering·21h ago

GPU-Serving for Ads Engagement Prediction with MMOE-DCN Architecture

Pinterest engineered a significant upgrade to their ads lightweight ranking system by migrating two-tower models to GPU serving. This shift enabled the adoption of a more complex MMOE-DCN architecture, improving prediction accuracy and efficiency. The article details the architectural evolution, optimizations for GPU training, and the observed performance gains in both offline and online metrics.

AI & ML InfrastructurePerformance & Scaling
15
🔵Meta Engineering·1d ago

Optimizing GPU Communications for AI Workloads with RCCLX

Meta's RCCLX project, an open-source enhancement of RCCL, focuses on optimizing GPU communication for AI models on AMD platforms. It introduces features like Direct Data Access (DDA) and Low Precision Collectives to reduce latency and increase throughput, addressing critical bottlenecks in large language model inference and training. The article details architectural innovations for efficient inter-GPU communication.

AI & ML InfrastructurePerformance & Scaling
48
👩‍💻Dev.to #architecture·1d ago

Choosing ORMs for Performance and Developer Velocity in .NET Systems

This article discusses the architectural considerations for choosing between ORMs like EF Core and micro-ORMs like Dapper in modern .NET applications. It argues that the performance gap has narrowed, making developer velocity and architectural clarity more critical factors than raw ORM speed in most production systems. The author highlights that network latency and I/O often dominate request costs, diminishing the impact of micro-optimizations within the ORM layer.

Databases & StoragePerformance & Scaling
11
📰InfoQ Cloud·1d ago

Proactive Autoscaling for Latency-Sensitive Edge Applications in Kubernetes

This article discusses the limitations of Kubernetes Horizontal Pod Autoscaler (HPA) for dynamic, latency-sensitive edge workloads and proposes a custom autoscaler (CPA) solution. It highlights how HPA's reactive nature and rigid algorithm lead to inefficiencies at the edge, advocating for a more proactive, multi-signal approach incorporating CPU headroom, latency SLOs, and pod startup compensation to ensure stable performance and efficient resource utilization in constrained edge environments.

Performance & ScalingDistributed Systems
21
🐶Datadog Blog·1d ago

Monitoring Mobile Application Startup Performance

This article discusses the importance of monitoring mobile application startup performance using Real User Monitoring (RUM) tools. It highlights key metrics and context for identifying performance bottlenecks in iOS and Android app launches, which is crucial for maintaining a good user experience and ensuring the system's overall responsiveness.

Performance & ScalingTools & Frameworks
32
💳Stripe Blog·1d ago

Building a Flexible and Scalable Billing System: Lessons from Stripe's Acquisition of Metronome

Stripe's acquisition of Metronome aims to enhance its billing platform, particularly for complex usage-based models, by integrating Metronome's capabilities. This move highlights the architectural challenges in designing flexible monetization infrastructure that can support diverse business models, from simple subscriptions to multi-dimensional metering and sales-led contracts at global scale. The integration focuses on creating a unified platform for payments, analytics, revenue recognition, and tax, emphasizing system consolidation and extensibility.

Distributed SystemsAPI Design
128
📝Medium #system-design·1d ago

Scaling IP Geolocation with Redis Sorted Sets and Pipelining

This article details the architectural decisions and implementation strategies behind a high-scale IP geolocation service. It focuses on leveraging Redis's partitioned sorted sets and pipelining capabilities to achieve sub-millisecond enrichment for millions of events, addressing challenges like data freshness, query performance, and operational efficiency.

Distributed SystemsDatabases & Storage
128