InfoQ Architecture·June 8, 2026

AWS OpenSearch Serverless NextGen Architecture for Scalable Search and AI Workloads

AWS has released the next generation of Amazon OpenSearch Serverless, featuring a redesigned architecture that significantly improves resource provisioning, offers true scale-to-zero capabilities, and reduces costs. This update positions OpenSearch Serverless as a key building block for agentic AI applications, emphasizing its decoupled compute and storage architecture for enhanced scalability and efficiency.

Cloud & Infrastructure Distributed Systems AI & ML Infrastructure

Read original on InfoQ Architecture

Evolution to NextGen OpenSearch Serverless

The latest iteration of Amazon OpenSearch Serverless, dubbed NextGen, represents a significant architectural overhaul from its predecessor, the Classic architecture. This redesign focuses on addressing critical pain points experienced by users, particularly around resource provisioning times and cost efficiency for varying workloads. The primary goal was to achieve faster scaling, lower idle costs, and better support for emerging AI-driven search patterns.

Key Architectural Improvements

The core of the NextGen architecture's improvements lies in its decoupled compute and storage layer. Unlike traditional designs where compute and storage are tightly coupled (e.g., local disks on compute instances), NextGen separates these concerns using a shared storage layer. This fundamental change has several profound implications for system design:

Stateless Compute Units (OCUs): OpenSearch Capacity Units (OCUs) become stateless. Data no longer resides on the OCU itself but in the shared storage layer.
Faster Provisioning: OCUs can be provisioned in seconds because they no longer need to bootstrap local disks or replicate data. They simply mount the shared storage and begin serving requests.
Efficient Scale-to-Zero: Idle compute capacity can be released without data loss, enabling true scale-to-zero capabilities. This is crucial for cost optimization during periods of low or no traffic.
Cost Reduction: Up to 60% lower cost compared to provisioned clusters, especially for peak loads, due to the efficient scaling and scale-to-zero features.

Network Access and Management

The NextGen architecture also introduces new endpoint formats for improved network resource management. While per-collection endpoints ( `.aoss..on.aws` ) still exist, a new per-account regional endpoint ( `.aoss..on.aws` ) allows access to all collections within an account via a single hostname. This can simplify client-side connection management by enabling single connection pools and TLS sessions across multiple collections.

💡

Design Consideration: Multi-Tenancy and Cost Optimization

The introduction of Collection Groups in NextGen is a significant feature for multi-tenant architectures. By sharing compute capacity across multiple collections within a group, organizations can achieve greater cost reductions, especially for smaller workloads that might otherwise incur higher costs if each had dedicated compute. This design choice highlights a trade-off between strict isolation and resource efficiency.

bash

aws opensearchserverless create-collection-group \
  --name articles-cg \
  --generation NEXTGEN \
  --standby-replicas ENABLED \
  --capacity-limits "minIndexCapacityInOCU=0,maxIndexCapacityInOCU=4,minSearchCapacityInOCU=4,maxSearchCapacityInOCU=2"

aws opensearchserverless create-collection \
  --name articles-vectors \
  --type VECTORSEARCH \
  --collection-group-name articles-cg

AWSOpenSearchServerlessSearch EngineVector SearchAI WorkloadsScalabilityCloud Architecture

Comments

Loading comments...

AWS OpenSearch Serverless NextGen Architecture for Scalable Search and AI Workloads

Evolution to NextGen OpenSearch Serverless

Key Architectural Improvements

Network Access and Management

Comments

Related Lessons