InfoQ Architecture·May 18, 2026

Swiggy's Real-time ML Ranking for Search Autocomplete

Swiggy enhanced its search autocomplete system by transitioning from heuristic-based ranking to a real-time machine learning ranking architecture. This system leverages OpenSearch for candidate retrieval, integrates feature stores, and uses learning-to-rank models directly within OpenSearch to meet strict low-latency requirements. The design separates candidate generation from ranking, employing a continuous feedback loop to adapt models to user behavior while improving relevance and maintaining responsiveness.

AI & ML Infrastructure Distributed Systems Performance & Scaling

Read original on InfoQ Architecture

The Challenge of Autocomplete Ranking

Autocomplete suggestions are crucial for user experience but are highly sensitive to latency, as every keystroke can trigger a new query. Traditional systems often prioritize speed using lexical matching and static rules, which can limit relevance. Swiggy faced the challenge of integrating more intelligent, learned ranking without compromising the stringent latency requirements of an interactive search component.

Architectural Overview: Candidate Generation and ML Ranking

Swiggy's improved architecture separates the autocomplete workflow into two primary stages: candidate generation and ranking. This modularity allows for independent optimization of each stage, balancing recall and precision with performance.

Candidate Generation: Uses OpenSearch for fast lexical retrieval combined with embedding-based similarity search to fetch a broad set of potential suggestions. This stage is optimized for high recall and rapid response times.
Ranking Layer: Machine learning models reorder the candidate suggestions based on predicted relevance. This layer incorporates real-time signals like user interaction history, click behavior, query context, and item popularity, alongside offline-trained models.

💡

Design Principle: Separation of Concerns

Separating candidate generation from ranking is a common pattern in search and recommendation systems. It allows the initial retrieval phase to focus purely on speed and broad coverage (recall), while the subsequent ranking phase can apply more complex, computationally intensive logic to improve precision and personalization, often using ML models.

Key Components and Technologies

The system leverages several key components to achieve its goals:

OpenSearch: Utilized for both fast candidate retrieval and for hosting the learning-to-rank models directly, reducing latency by avoiding additional service calls.
Feature Store: Serves both precomputed and streaming features to the ranking models. This is critical for reacting to recent user behavior without expensive real-time computations, ensuring consistency between training and serving.
Learning-to-Rank (LTR) Frameworks: Integrated with OpenSearch (e.g., OpenSearch LTR) to deploy models like RankLib and gradient boosted trees (XGBoost) for relevance scoring.
Continuous Feedback Loop: User interactions (click-through rates, conversions, ordering behavior) are streamed into offline training pipelines. Updated models are generated and stored in a model registry for continuous deployment, enabling the system to adapt to evolving trends automatically.

A crucial design decision was to run the learned ranking model *directly inside OpenSearch*. This avoids extra network hops and services, which are critical for meeting the low-latency demands of autocomplete, where every millisecond counts.

machine learningreal-timesearchautocompleteOpenSearchrankinglow latencyfeature store

Comments

Loading comments...

Architecture Design

View Architecture

Design a real-time, low-latency search autocomplete system for an e-commerce platform that incorporates machine learning ranking to improve relevance. The system should separate candidate generation from ranking, utilize OpenSearch for efficient retrieval, integrate a feature store for real-time signals, and implement a continuous feedback loop for model improvement.

Practice Interview

Focus: real-time machine learning ranking system for search autocomplete

Other design angles

· Design a generic real-time ML ranking service that can be plugged into various search components, detailing its API and data contracts.· Design the data pipelines and infrastructure for a continuous ML feedback loop in a search system, focusing on feature engineering, model training, and deployment.· Architect the feature store specifically for a real-time ML inference system, discussing trade-offs between freshness, consistency, and query latency.