Menu
Cloudflare Blog·March 23, 2026

Cloudflare's Gen 13 Server Design: Optimizing Hardware for High-Performance Distributed Systems

This article details Cloudflare's Gen 13 server hardware design, emphasizing how component selection for CPU, memory, and storage directly supports their Rust-based FL2 request handling layer. It highlights the architectural decisions and trade-offs made to achieve significant throughput, performance-per-watt, and operational efficiency gains crucial for a global distributed network.

Read original on Cloudflare Blog

Cloudflare's transition to its Rust-based FL2 request handling layer necessitated a complete refresh of its server hardware, culminating in the Gen 13 design. This redesign exemplifies how software architecture deeply influences hardware choices in large-scale distributed systems, driving optimizations for throughput, efficiency, and operational simplicity. The core principle was to align hardware capabilities with the evolving demands of a high-performance network.

Key Hardware Design Principles

  • Workload Alignment: Hardware choices were directly driven by the characteristics of the FL2 software stack, which is less L3 cache-dependent and scales linearly with core count.
  • Performance per TCO: Decisions were made to maximize aggregate requests per second and improve performance-per-watt efficiency, directly impacting data center expansion costs and rack-level economics.
  • Operational Simplicity: A strong preference for fewer, higher-density servers was maintained to reduce provisioning, patching, and monitoring overhead across Cloudflare's global network.
  • Future Compatibility: Selection of components supporting the latest standards (DDR5-6400, PCIe Gen 5.0, CXL 2.0) ensures long-term relevance and extended security support.

CPU Selection: AMD EPYC™ Turin 9965

The choice of AMD EPYC™ Turin 9965 (192-core) over other Turin candidates demonstrates a critical architectural trade-off. While it has less L3 cache per core than previous generations, the FL2 workload's low L3 cache dependency and high core count scalability made the 9965 the optimal choice for achieving significant throughput gains and better performance-per-watt. This decision underscores that raw specifications are less important than how they match specific software workload characteristics.

FeatureGen 12 (AMD Genoa-X 9684X)Gen 13 (AMD Turin 9965)
ℹ️

Software-Hardware Co-Design

The alignment between software (FL2's Rust-based architecture) and hardware (high core count, lower L3 cache per core) is a crucial lesson. System designers must understand workload profiles to make informed hardware decisions, rather than blindly pursuing higher individual component specs.

Memory and Storage Optimizations

  • Memory: Gen 13 doubles memory capacity to 768GB (4 GB/core) using 12x 64GB DDR5-6400 DIMMs. Populating all 12 channels in a 1DPC configuration maximizes bandwidth and ensures the CPU is not starved. The 4 GB/core ratio, maintained from Gen 12, provides a balance between capacity for core-scaling workloads and future growth headroom without overprovisioning.
  • Storage: Transition to PCIe Gen 5.0 NVMe drives increases bandwidth and reduces latency. Expanding from two to three E1.S NVMe drives (16TB to 24TB total) supports growing CDN cache demands and services like Durable Objects. The addition of a front drive bay for up to ten U.2 drives provides future flexibility for storage-heavy use cases, showcasing a modular design approach.
hardware selectionserver architecturedata centerCPUmemorystorageperformance optimizationCloudflare

Comments

Loading comments...