Menu
InfoQ Architecture·June 6, 2026

Optimizing ClickHouse Query Performance: Cloudflare's Journey with Lock Contention and Partitioning

Cloudflare encountered a significant performance bottleneck in their ClickHouse billing pipeline, which was traced to query planning inefficiencies caused by lock contention on the table's parts list. Their solution involved patching ClickHouse to replace an exclusive lock with a shared lock, optimizing part filtering, and re-evaluating partitioning strategies to handle increased data parts and metadata load. This case study highlights the importance of deep profiling and understanding low-level database internals in large-scale distributed systems.

Read original on InfoQ Architecture

Cloudflare experienced a critical slowdown in their ClickHouse-based billing and fraud detection systems. The performance degradation was counter-intuitive, as standard metrics like I/O and memory usage appeared normal. The root cause was identified as severe lock contention during the query planning phase within ClickHouse, specifically when accessing the table's list of data parts.

The Problem: Query Planning Bottleneck

Cloudflare's ClickHouse instance, managing hundreds of petabytes of data, used a custom retention system based on daily partitions. A migration introduced customer namespaces and significantly increased the number of data parts. This change, while improving per-tenant data retention, exposed a critical flaw in ClickHouse's query planning mechanism: a single mutex (MergeTreeData) protected the entire table's list of parts. As the number of parts grew, queries spent over half their duration waiting to acquire this exclusive lock, rather than on actual computation.

ℹ️

Key System Design Takeaway

Even with high-level telemetry showing healthy resource utilization, bottlenecks can emerge at lower levels, especially in coordination layers or database internals under high concurrency. Deep profiling is essential for diagnosing such issues.

Cloudflare's Solution and Architectural Changes

  • Replaced Exclusive Lock with Shared Lock: This allowed multiple queries to read the list of parts concurrently, drastically reducing contention.
  • Removed Per-Query Parts List Copy: Eliminating the need to copy the entire parts list for each query reduced memory overhead and processing time during planning.
  • Improved Part Filtering: Optimized how ClickHouse filters data parts to avoid scanning the entire list, further accelerating the planning phase.

These patches, contributed back to the ClickHouse project, led to a 50% reduction in query durations and decoupled query performance from the growing number of data parts. However, the article also notes that the increasing metadata load has impacted ZooKeeper, which manages ClickHouse cluster coordination, raising questions about the long-term sustainability of the current partitioning architecture.

⚠️

While immediate performance issues can be resolved with targeted optimizations, it's crucial to continuously evaluate underlying architectural decisions (like partitioning schemes) as systems scale to avoid future operational problems and ensure long-term sustainability.

ClickHouseDatabase OptimizationQuery PlanningLock ContentionData PartitioningPerformance BottleneckDistributed SystemsCloudflare

Comments

Loading comments...