Latest curated articles from top engineering blogs
9 articles
This article details Meta's strategy and solutions for migrating its petabyte-scale data ingestion system, which powers the social graph analytics and ML. It highlights the architectural shift from customer-owned pipelines to a self-managed data warehouse service, emphasizing the rigorous migration lifecycle, data quality validation, and robust rollback mechanisms crucial for ensuring reliability during such a massive transition.
This article draws parallels between the behavior of the Nigerian stock market during a crash and recovery, and fundamental concepts in distributed system design. It highlights single points of failure, cascading failures, eventual consistency, redundancy, and decentralized architecture using real-world market dynamics as examples. The author emphasizes that understanding system behavior is crucial for designing resilient software.
This article provides a post-mortem on the challenges faced when scaling a WebSocket-based live commentary platform from 100,000 to 1 million concurrent users. It details how an initially simple fan-out architecture led to Out-Of-Memory (OOM) kills due to slow consumers and backpressure, and outlines the architectural changes implemented to achieve resilience and scalability, including ruthless message dropping and coalescing.
This article details Apollo Hospitals' successful migration from a legacy Oracle database system to Azure Database for PostgreSQL. It highlights the architectural and operational benefits of moving to a cloud-native, open-source database platform, including significant improvements in performance, scalability, and cost efficiency. The piece also introduces AI-assisted tooling designed to streamline complex Oracle-to-PostgreSQL migrations.
This article details the pragmatic evolution of Netflix's billing and payment systems, showcasing how architectural assumptions shifted from a simple, US-centric DVD rental model to a complex global streaming platform. It highlights key challenges and architectural decisions made to adapt to asynchronous payments, international regulatory differences, and fluctuating demand patterns.
This article explores the 10-year evolution of Stripe's Payments API, detailing the architectural challenges faced in unifying diverse payment methods globally. It highlights the progression from simple synchronous credit card processing to more complex asynchronous methods like ACH and Bitcoin, culminating in the design of the flexible PaymentIntents and PaymentMethods abstractions. The narrative provides valuable insights into API design, state management, and handling distributed transaction complexities in a rapidly expanding fintech platform.
This article conducts a forensic architectural analysis of BTDUex, a fake cryptocurrency exchange, highlighting critical system design red flags that expose its fraudulent nature. It deconstructs the backend architecture, examining state management, wallet topology, and withdrawal logic, to demonstrate how a seemingly legitimate frontend can mask a deceptive, ingress-only system designed for asset extraction rather than secure financial operations. The analysis offers valuable insights for builders on identifying scam architectures through deep inspection of data flow and backend integrity.
This article details the architectural migration of Pixelsurf, a web-based generative AI game engine, to a native mobile application, Plutusgg. The shift was driven by performance bottlenecks on mobile browsers, clunky user-generated content (UGC) distribution, and poor 'cold start' times. The re-architecture leveraged native asset caching and a unified database schema to improve latency, UGC management, and overall user experience.
This article details Hotstar's journey in building an in-house, scalable system for real-time emoji reactions and live voting, moving away from a third-party service. It highlights architectural decisions around asynchronous processing, message queuing with Kafka, and stream processing with Spark to handle billions of user interactions during live events.