Dev.to #architecture·May 30, 2026

Deconstructing a Legacy Search Engine: A Case Study in Performance Optimization

This article presents a detailed case study on optimizing a legacy search engine that exhibited significant performance degradation due to in-memory aggregation buffer thrashing. It highlights the process of diagnosing the root causes, exploring initial failed attempts at resolution, and ultimately redesigning a critical data path to stabilize performance and improve data consistency. The core architectural decision involved extracting the aggregation logic into a new microservice, demonstrating a practical application of the Lambda architecture pattern.

Distributed Systems Performance & Scaling Microservices

Read original on Dev.to #architecture

The Challenge: Undocumented Performance Bottlenecks

The initial problem stemmed from a third-party "Treasure Hunt Engine" that, despite documentation claims of high indexing rates, experienced severe search latency spikes as data volume grew. The core issue was an undocumented behavior where the engine's in-memory aggregation buffers would thrash when the working set exceeded 70% of available RAM (48 GB per node). This led to query timeouts and a significant degradation in user experience. This highlights the critical importance of understanding not just stated capacities, but also the underlying operational characteristics and resource consumption of system components, especially when integrating vendor solutions.

Initial Attempts and Their Limitations

Upsizing Brokers and JVM Tuning: Increasing Kafka broker sizes (from i3.2xlarge to i3.4xlarge) and JVM heap (12 GB to 24 GB) with G1GC provided only a temporary reprieve. A larger JVM led to prohibitive GC pauses (4 GB/min evacuation pauses), indicating the fundamental design was not scalable for the aggregation workload.
Offloading to Flink SQL with RocksDB: An attempt to use Flink SQL with a RocksDB state backend for aggregation aimed to keep the hot path in memory. However, this failed due to insufficient local NVMe drive capacity (200 GB vs. 350 GB state size) and issues with compaction settings leading to duplicate keys and loss of exactly-once guarantees. This illustrates the complexities of stateful stream processing and the need for adequate storage provisioning and careful consistency model considerations.

The Architectural Solution: Decomposing the Workload

The key to solving the problem was to stop trying to force the monolithic search engine to handle both indexing and aggregation. A new service boundary was created: the Search Aggregator Microservice. This service's responsibilities include:

Consuming raw events from a dedicated compacted Kafka topic.
Maintaining 5-minute tumbling windows in a local, in-memory LRU Caffeine cache (chosen for its low latency and high hit rate with a small footprint).
Publishing pre-aggregated deltas to a second Kafka topic, which the Treasure Hunt Engine then consumes. This effectively separates the 'hot' aggregation path from the 'cold' indexing path, aligning with a Lambda architecture pattern.

ℹ️

Idempotency and Consistency

The Search Aggregator was designed to be idempotent by using event offsets as Kafka keys and emitting tombstones on window closure. This ensures that late data can be safely dropped by downstream consumers without violating the desired exactly-once consistency model, a crucial aspect for reliable data processing.

Impact and Lessons Learned

The architectural change significantly improved performance (p95 latency from 1.2s to 240ms) and stability (JVM heap stabilized at 14GB, GC pauses under 20ms). It also reduced infrastructure costs by enabling the use of smaller nodes. Importantly, search result quality improved due to the elimination of duplicates from late-arriving data. The primary lesson was the importance of early and clear service boundary definition, especially when dealing with vendor solutions or workloads that naturally fit patterns like Lambda architecture, which separate real-time and batch processing. Additionally, comprehensive end-to-end latency testing under synthetic load is crucial to uncover performance bottlenecks that simple throughput tests might miss.

KafkaJVMCaffeine CacheFlinkRocksDBLatency OptimizationMicroservices ArchitectureLambda Architecture

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly available and scalable real-time analytics and search platform that processes millions of events per second, focusing on how to separate real-time aggregation from historical indexing to maintain low search latency and high data consistency. Include considerations for handling late-arriving data, managing stateful stream processing, and optimizing resource utilization in a microservices-based architecture.

Practice Interview

Other design angles

· Design a system that uses a "speed layer" for real-time aggregation and a "batch layer" for comprehensive indexing, ensuring data consistency across both paths.· Design a stream processing pipeline for event aggregation that can scale horizontally, provides exactly-once processing semantics, and integrates with a search engine for querying aggregated data.· Focus on designing only the "Search Aggregator Microservice" described in the article, detailing its internal components, data flow, caching strategy, and how it interacts with Kafka and the main search engine.