High Scalability·December 2, 2022

Scalability Insights: Cloud Costs, Serverless, and Distributed Clock Synchronization

This article, a compilation of various quotes and statistics, offers diverse insights into system design challenges and solutions. It touches on aspects like the efficiency of serverless computing, the complexities of cloud egress costs, the critical role of precise clock synchronization in distributed databases, and strategic decisions in technology stacks.

Distributed Systems Performance & Scaling Cloud & Infrastructure

Read original on High Scalability

The article presents a collection of observations and data points from various industry experts and events, highlighting contemporary issues and advancements in scalable system design. It covers a broad spectrum, from specific performance metrics of large-scale systems to philosophical discussions around serverless architectures and hiring practices.

Key Takeaways for System Design

Performance & Scale Metrics: Examples like Twitter's GraphQL API handling 1.5 billion fields/second and Roblox's caching system managing 1 billion requests/second illustrate the extreme scale modern systems operate at.
Cloud Cost Optimization: Discussions on AWS egress fees and the cost-effectiveness of moving from cloud to on-premise infrastructure for specific workloads (Stanford's Hack Lab) emphasize the importance of cost analysis in architecture.
Serverless Evolution: The debate around the definition and benefits of 'serverless' — whether it removes developer burdens or business burdens — highlights the ongoing evolution and trade-offs in serverless adoption.
Distributed System Challenges: The deep dive into monotonic clocks and PTP (Precision Time Protocol) versus NTP for snapshot reads in databases underscores the critical role of accurate time synchronization for performance and consistency in distributed environments.
Technology Stack Trends: The observation of a 'TypeScript stack' (Node.js, React, GraphQL) becoming a common choice for startups points to emerging industry standards for developer productivity and hiring.

The Importance of Accurate Clock Synchronization in Distributed Databases

ℹ️

Snapshot Reads and Monotonic Clocks

Achieving highly performant snapshot reads in a distributed database relies heavily on having accurate, monotonic clocks across all machines. Without precise time synchronization, systems might introduce artificial delays to account for worst-case clock errors, significantly reducing overall capacity and increasing server count requirements. The article notes that PTP can offer a 100x performance improvement over state-of-the-art NTP implementations for such scenarios.

This section provides a crucial insight into a subtle yet powerful optimization in distributed database design. By minimizing clock skews, systems can reduce the need for compensating delays, thereby maximizing throughput and minimizing infrastructure costs. This demonstrates a clear trade-off: investing in highly accurate clock synchronization hardware/software can lead to substantial operational savings and performance gains.

Autoscaling as an Anti-Pattern?

An interesting perspective is presented regarding autoscaling, suggesting that for complex services, it can be an anti-pattern. The argument is that complex services don't scale predictably laterally, and their scaling vectors are too complex to measure reliably. The proposed alternative is to slightly under-provision some instances to identify failure points and over-provision others to absorb extra traffic. This challenges the conventional wisdom of dynamic scaling and suggests a more deterministic provisioning approach for certain types of workloads.

scalabilitycloud computingserverlessdistributed databasesclock synchronizationautoscalingarchitecture trade-offscost optimization

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly available and performant distributed database system that supports snapshot isolation, focusing on the architectural considerations for precise clock synchronization (e.g., PTP vs. NTP, hardware clocks) and its impact on transaction latency and system capacity.

Focus: distributed clock synchronization, autoscaling strategies, cloud cost management

Other design angles

· Design a real-time analytics platform with high data ingestion rates, considering how to manage cloud egress costs and choose between serverless functions and always-on instances for different processing stages.· Design an API gateway for a microservices architecture, implementing strategies for autoscaling that balance cost-efficiency with predictable performance for complex, non-linearly scaling backend services.