Menu
InfoQ Architecture·May 18, 2026

Swiggy's Real-time ML Ranking for Search Autocomplete

Swiggy enhanced its search autocomplete system by transitioning from heuristic-based ranking to a real-time machine learning ranking architecture. This system leverages OpenSearch for candidate retrieval, integrates feature stores, and uses learning-to-rank models directly within OpenSearch to meet strict low-latency requirements. The design separates candidate generation from ranking, employing a continuous feedback loop to adapt models to user behavior while improving relevance and maintaining responsiveness.

Read original on InfoQ Architecture

The Challenge of Autocomplete Ranking

Autocomplete suggestions are crucial for user experience but are highly sensitive to latency, as every keystroke can trigger a new query. Traditional systems often prioritize speed using lexical matching and static rules, which can limit relevance. Swiggy faced the challenge of integrating more intelligent, learned ranking without compromising the stringent latency requirements of an interactive search component.

Architectural Overview: Candidate Generation and ML Ranking

Swiggy's improved architecture separates the autocomplete workflow into two primary stages: candidate generation and ranking. This modularity allows for independent optimization of each stage, balancing recall and precision with performance.

  1. Candidate Generation: Uses OpenSearch for fast lexical retrieval combined with embedding-based similarity search to fetch a broad set of potential suggestions. This stage is optimized for high recall and rapid response times.
  2. Ranking Layer: Machine learning models reorder the candidate suggestions based on predicted relevance. This layer incorporates real-time signals like user interaction history, click behavior, query context, and item popularity, alongside offline-trained models.
💡

Design Principle: Separation of Concerns

Separating candidate generation from ranking is a common pattern in search and recommendation systems. It allows the initial retrieval phase to focus purely on speed and broad coverage (recall), while the subsequent ranking phase can apply more complex, computationally intensive logic to improve precision and personalization, often using ML models.

Key Components and Technologies

The system leverages several key components to achieve its goals:

  • OpenSearch: Utilized for both fast candidate retrieval and for hosting the learning-to-rank models directly, reducing latency by avoiding additional service calls.
  • Feature Store: Serves both precomputed and streaming features to the ranking models. This is critical for reacting to recent user behavior without expensive real-time computations, ensuring consistency between training and serving.
  • Learning-to-Rank (LTR) Frameworks: Integrated with OpenSearch (e.g., OpenSearch LTR) to deploy models like RankLib and gradient boosted trees (XGBoost) for relevance scoring.
  • Continuous Feedback Loop: User interactions (click-through rates, conversions, ordering behavior) are streamed into offline training pipelines. Updated models are generated and stored in a model registry for continuous deployment, enabling the system to adapt to evolving trends automatically.

A crucial design decision was to run the learned ranking model *directly inside OpenSearch*. This avoids extra network hops and services, which are critical for meeting the low-latency demands of autocomplete, where every millisecond counts.

machine learningreal-timesearchautocompleteOpenSearchrankinglow latencyfeature store

Comments

Loading comments...