Menu

Software Architecture and System Design News

Latest curated articles from top engineering blogs

NetflixUberMetaLinkedInSpotifyGitHubAirbnbPinterestSlackDropboxCloudflareStripeDatadogFigmaShopifyAWSGoogle CloudAzureWerner Vogels& 15+ more

172 articles

InfoQ Architecture·1h ago

Meta's Petabyte-Scale Data Ingestion Migration

Meta successfully migrated its petabyte-scale MySQL social graph data ingestion platform to a centralized, self-managed warehouse service, significantly improving reliability and operational efficiency. The transition involved techniques like staged migrations, reverse shadowing, and continuous checksum monitoring to ensure zero downtime and data consistency for thousands of pipelines supporting analytics and machine learning workloads.

Databases & StorageDistributed Systems
10630
Medium #system-design·1d ago

Choosing Databases Based on Core Data Structures

This article highlights that effective database selection should be driven by understanding the underlying data structures and their operational characteristics rather than marketing hype. It emphasizes that databases are essentially optimized implementations of fundamental data structures, influencing their performance, scalability, and suitability for various use cases.

Databases & StorageDistributed Systems
1339801
Dev.to #architecture·1d ago

Avoiding Kafka Over-Reliance: Lessons from a Treasure Hunt Engine

This article details the architectural evolution of a treasure hunt engine that initially struggled due to an over-reliance on Kafka for all event processing. It highlights the challenges of using a single Kafka topic for diverse events, leading to bottlenecks and consistency issues. The solution involved introducing an event store (EventStoreDB) to decouple event production from consumption, improving performance, reliability, and auditability.

Distributed SystemsDatabases & Storage
1529773
Cloudflare Blog·2d ago

Building Cloudflare's Unified Data Lakehouse and AI Data Agent

Cloudflare tackled data sprawl by creating Town Lake, a unified data lakehouse built on Apache Trino and Iceberg on R2, providing a single SQL interface for diverse data sources. They also developed Skipper, an AI data agent for natural language querying, emphasizing governed access, PII detection, and Cloudflare's own platform services for infrastructure. This architecture addresses challenges like disparate data systems, sampling issues, and tribal knowledge, enabling comprehensive and secure data insights.

Databases & StorageDistributed Systems
1217121
Dev.to #architecture·3d ago

Scaling Real-Time Treasure Hunts: Solving Unbounded State with a Two-Tier Architecture

This article details Veltrix's architectural evolution to support 20,000 concurrent players in a real-time treasure hunt, focusing on overcoming unbounded state issues from long-lived WebSocket connections. The solution involved a two-tier architecture, separating ephemeral WebSocket handling from stateful processing using Rust, Kafka, and RocksDB to significantly reduce memory footprint and improve stability.

Distributed SystemsPerformance & Scaling
1077165
ByteByteGo·3d ago

Airtable's Semantic Search Architecture for AI Features

Airtable engineered a scalable and performant semantic search system to power its AI features, focusing on handling diverse customer database sizes and multi-tenancy. The architecture leverages Milvus for vector storage and search, with critical design decisions made around data partitioning, index selection, and managing hot/cold data to meet strict latency, throughput, and privacy requirements.

Databases & StorageDistributed Systems
1336993
The New Stack·3d ago

Snowflake's Strategic Cloud Infrastructure Investment for AI Expansion

Snowflake's $6 billion commitment to AWS for Graviton and GPU instances signals a major strategic shift towards AI, focusing on leveraging cost-efficient compute for data warehousing and high-performance resources for AI model training and inference. This investment highlights critical architectural considerations for large-scale data platforms expanding into AI, particularly around cloud vendor strategy, infrastructure cost optimization, and data residency.

Cloud & InfrastructureAI & ML Infrastructure
1106934
Dev.to #architecture·3d ago

Rethinking Event Sourcing: Consolidating Events and Aggregates in PostgreSQL

This article presents a crucial system design lesson learned from a CQRS implementation where events and aggregate roots were stored in separate systems (Kafka and PostgreSQL). The initial distributed architecture led to severe performance issues and operational overhead. The authors describe their journey to consolidate events and aggregates into a single PostgreSQL database, leveraging logical replication as an event bus, dramatically improving latency and reducing costs.

Distributed SystemsDatabases & Storage
1206899
Dev.to #architecture·3d ago

Application-Level Envelope Encryption for SOC 2 Compliance

This article details an architectural strategy for implementing application-level envelope encryption to achieve robust data security and SOC 2 compliance, moving beyond basic RBAC and database encryption. It outlines a hybrid cryptographic solution using AES for content and RSA for key wrapping, and presents the data modeling and service contracts necessary for a Symfony application. The focus is on cryptographic isolation at the record level and secure handling of encryption keys.

SecurityDistributed Systems
1439464
DZone Microservices·4d ago

Liquid Clustering: An Adaptive Data Layout for Delta Lake

This article explores Databricks Liquid Clustering, a data layout strategy in Delta Lake 3.0 that replaces traditional partitioning and Z-Ordering. It introduces a self-tuning, flexible approach to organizing data, particularly for Unity Catalog managed tables, to improve query performance and reduce maintenance overhead. The core idea is to dynamically cluster data based on specified keys, adapting to evolving query patterns without rigid partitions or costly data rewrites.

Databases & StoragePerformance & Scaling
1538920
Dev.to #architecture·5d ago

Scaling an In-Memory Metadata Layer: Lessons from Veltrix Feature Store

This article details the challenges and solutions encountered while scaling an in-memory metadata layer for the Veltrix feature store, highlighting critical performance bottlenecks related to garbage collection and disk I/O with RocksDB. It presents a successful architectural pivot to a custom mmap-based sharded hash map, showcasing specific optimizations for latency, memory management, and NUMA awareness to achieve high throughput and low latency.

Performance & ScalingDistributed Systems
16810408
ByteByteGo·5d ago

Building Vector Indexing in a Distributed SQL Database: CockroachDB's C-SPANN

This article details how CockroachDB integrated vector indexing directly into its distributed SQL database by developing C-SPANN. It highlights the architectural constraints faced by a distributed, transactional database when adding a new feature like vector search, emphasizing the need for no central coordinator, real-time updates, sharding compatibility, and hot spot avoidance. The solution treats the vector index as ordinary table data, leveraging CockroachDB's existing distributed mechanisms for scalability and reliability.

Databases & StorageDistributed Systems
14910589