Swiggy enhanced its search autocomplete system by transitioning from heuristic-based ranking to a real-time machine learning ranking architecture. This system leverages OpenSearch for candidate retrieval, integrates feature stores, and uses learning-to-rank models directly within OpenSearch to meet strict low-latency requirements. The design separates candidate generation from ranking, employing a continuous feedback loop to adapt models to user behavior while improving relevance and maintaining responsiveness.
Read original on InfoQ ArchitectureAutocomplete suggestions are crucial for user experience but are highly sensitive to latency, as every keystroke can trigger a new query. Traditional systems often prioritize speed using lexical matching and static rules, which can limit relevance. Swiggy faced the challenge of integrating more intelligent, learned ranking without compromising the stringent latency requirements of an interactive search component.
Swiggy's improved architecture separates the autocomplete workflow into two primary stages: candidate generation and ranking. This modularity allows for independent optimization of each stage, balancing recall and precision with performance.
Design Principle: Separation of Concerns
Separating candidate generation from ranking is a common pattern in search and recommendation systems. It allows the initial retrieval phase to focus purely on speed and broad coverage (recall), while the subsequent ranking phase can apply more complex, computationally intensive logic to improve precision and personalization, often using ML models.
The system leverages several key components to achieve its goals:
A crucial design decision was to run the learned ranking model *directly inside OpenSearch*. This avoids extra network hops and services, which are critical for meeting the low-latency demands of autocomplete, where every millisecond counts.