Menu
Cloudflare Blog·March 23, 2026

Cloudflare's FL2 Rewrite: Optimizing Edge Compute for Core-Heavy Architectures

Cloudflare details their journey to optimize their edge compute infrastructure for new AMD EPYC Gen 13 servers. Facing a trade-off between higher core counts and reduced L3 cache, they initially encountered significant latency regressions with their legacy software stack (FL1). The solution involved a complete rewrite of their request handling layer to FL2 in Rust, which enabled efficient utilization of core-heavy CPUs by minimizing cache dependency, ultimately doubling throughput and improving power efficiency.

Read original on Cloudflare Blog

Cloudflare's blog post highlights a critical system design challenge: adapting software to evolving hardware architectures to maximize performance and efficiency. As CPU designs shift, software needs to evolve alongside to truly leverage new capabilities without incurring unacceptable trade-offs, like increased latency for higher throughput.

The Hardware Dilemma: Cache vs. Cores

Cloudflare's Gen 12 servers, featuring AMD EPYC Genoa-X, relied on large 3D V-Cache, which was ideal for their cache-sensitive FL1 request handling layer. However, Gen 13 AMD EPYC Turin processors offered significantly more cores (up to 192 vs 96) and better IPC, but at the cost of drastically reduced L3 cache per core (2MB vs 12MB). This presented a dilemma for workloads heavily reliant on cache locality. Initial tests showed that while Turin offered higher throughput, it came with an unacceptable latency increase of over 50% at high utilization due to increased L3 cache misses and DRAM access.

MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL1)
MetricGen 12 (FL1)Gen 13 - AMD Turin 9965 (FL2)

FL2: A Software Rewrite for Hardware Synergy

To address the cache bottleneck and fully utilize Gen 13's core density, Cloudflare initiated FL2, a complete rewrite of their request handling layer in Rust, built on their Pingora and Oxy frameworks. This new stack replaced an older NGINX and LuaJIT-based implementation. While FL2's primary drivers were security, development velocity, and general performance, its cleaner architecture with improved memory access patterns and reduced dynamic allocation inadvertently made it less dependent on large L3 caches. This architectural shift allowed FL2 to eliminate the cache bottleneck on Gen 13 servers, achieving a remarkable 2x throughput increase and a 50% boost in performance per watt compared to Gen 12, all while maintaining strict latency SLAs.

💡

Hardware-Software Co-Design: This case study vividly illustrates the importance of hardware-software co-design. To extract maximum performance from new hardware, especially at the edge where efficiency is paramount, the software stack often needs fundamental architectural changes, not just incremental tuning. This involves understanding low-level hardware characteristics like cache hierarchy and optimizing memory access patterns.

Key System Design Takeaways

  • Performance Profiling: Deep dives into CPU performance counters (like L3 cache miss rates and memory fetch latency) are crucial for diagnosing bottlenecks when introducing new hardware.
  • Architectural Adaptability: Software architectures, especially for critical path components like edge request handling, should anticipate or be adaptable to future hardware trends. Dependence on specific hardware characteristics (like large caches) can become a liability.
  • Language Choice & Memory Management: Languages like Rust, with their emphasis on memory safety and explicit control over memory allocation, can lead to more cache-friendly and performant code, which becomes critical in core-heavy, cache-constrained environments.
  • Cost of Abstraction: While high-level languages and frameworks offer productivity, they can introduce overhead (e.g., dynamic allocations) that becomes prohibitive in extreme performance scenarios, necessitating lower-level rewrites.
edge computinghardware-software co-designperformance optimizationRustAMD EPYCCDNnetwork architecturelatency

Comments

Loading comments...