InfoQ Architecture·June 6, 2026

Optimizing ClickHouse Query Performance: Cloudflare's Journey with Lock Contention and Partitioning

Cloudflare encountered a significant performance bottleneck in their ClickHouse billing pipeline, which was traced to query planning inefficiencies caused by lock contention on the table's parts list. Their solution involved patching ClickHouse to replace an exclusive lock with a shared lock, optimizing part filtering, and re-evaluating partitioning strategies to handle increased data parts and metadata load. This case study highlights the importance of deep profiling and understanding low-level database internals in large-scale distributed systems.

Databases & Storage Distributed Systems Performance & Scaling

Read original on InfoQ Architecture

Cloudflare experienced a critical slowdown in their ClickHouse-based billing and fraud detection systems. The performance degradation was counter-intuitive, as standard metrics like I/O and memory usage appeared normal. The root cause was identified as severe lock contention during the query planning phase within ClickHouse, specifically when accessing the table's list of data parts.

The Problem: Query Planning Bottleneck

Cloudflare's ClickHouse instance, managing hundreds of petabytes of data, used a custom retention system based on daily partitions. A migration introduced customer namespaces and significantly increased the number of data parts. This change, while improving per-tenant data retention, exposed a critical flaw in ClickHouse's query planning mechanism: a single mutex (MergeTreeData) protected the entire table's list of parts. As the number of parts grew, queries spent over half their duration waiting to acquire this exclusive lock, rather than on actual computation.

ℹ️

Key System Design Takeaway

Even with high-level telemetry showing healthy resource utilization, bottlenecks can emerge at lower levels, especially in coordination layers or database internals under high concurrency. Deep profiling is essential for diagnosing such issues.

Cloudflare's Solution and Architectural Changes

Replaced Exclusive Lock with Shared Lock: This allowed multiple queries to read the list of parts concurrently, drastically reducing contention.
Removed Per-Query Parts List Copy: Eliminating the need to copy the entire parts list for each query reduced memory overhead and processing time during planning.
Improved Part Filtering: Optimized how ClickHouse filters data parts to avoid scanning the entire list, further accelerating the planning phase.

These patches, contributed back to the ClickHouse project, led to a 50% reduction in query durations and decoupled query performance from the growing number of data parts. However, the article also notes that the increasing metadata load has impacted ZooKeeper, which manages ClickHouse cluster coordination, raising questions about the long-term sustainability of the current partitioning architecture.

⚠️

While immediate performance issues can be resolved with targeted optimizations, it's crucial to continuously evaluate underlying architectural decisions (like partitioning schemes) as systems scale to avoid future operational problems and ensure long-term sustainability.

ClickHouseDatabase OptimizationQuery PlanningLock ContentionData PartitioningPerformance BottleneckDistributed SystemsCloudflare

Comments

Loading comments...

Architecture Design

Design this yourself

Design a large-scale analytical data pipeline using a columnar database like ClickHouse to handle petabytes of data with high ingestion rates. Focus on strategies to prevent and mitigate query planning bottlenecks, manage data partitioning for multi-tenancy and efficient retention, and ensure robust cluster coordination, drawing lessons from Cloudflare's experience with lock contention and metadata management challenges.

Practice Interview

Other design angles

· Design a data retention system for a multi-tenant analytical database, considering the trade-offs between partitioning schemes, query performance, and metadata overhead.· Develop a monitoring and profiling strategy for a distributed analytical database to identify and diagnose low-level performance bottlenecks like lock contention.· Architect a highly scalable billing data pipeline using ClickHouse, ensuring fast query performance for aggregation jobs while managing hundreds of petabytes of data and millions of rows per second ingestion.

Optimizing ClickHouse Query Performance: Cloudflare's Journey with Lock Contention and Partitioning

The Problem: Query Planning Bottleneck

Cloudflare's Solution and Architectural Changes

Comments

Architecture Design

Related Lessons