AWS Architecture Blog·June 22, 2026

Secure Multi-Tenant RAG Architecture with Fine-Grained Authorization in AWS

This article presents an architecture for implementing secure, multi-tenant Retrieval Augmented Generation (RAG) applications within an enterprise, focusing on fine-grained access control. It demonstrates how to use a single Amazon Bedrock Knowledge Base instance to serve multiple departments while isolating document access through metadata filtering and dynamic policy evaluation with Amazon Verified Permissions. The core design involves a two-layer defense-in-depth authorization strategy, allowing rule updates without code redeployment.

AI & ML Infrastructure Security Distributed Systems

Read original on AWS Architecture Blog

The Challenge of Multi-Tenant RAG Authorization

Large organizations often struggle with providing internal generative AI applications like RAG while maintaining strict control over document access for different teams and roles. Duplicating infrastructure per group is costly and complex. The goal is to allow a single RAG application to serve multiple departments, ensuring employees only access authorized material, while also accommodating executives who might need cross-departmental access. This architecture focuses on logical isolation within a single tenant rather than hard multi-tenant isolation for separate customers.

Core Architectural Pattern: Metadata Filtering with Externalized Policies

The solution leverages a single, shared Amazon Bedrock Knowledge Base. Document isolation is achieved by tagging documents with department-specific metadata during ingestion. Instead of embedding filter selection logic directly in application code, authorization decisions are externalized to Amazon Verified Permissions using Cedar policies. This allows dynamic, runtime-evaluated authorization, enabling access rule changes without code redeployment and providing a detailed audit trail.

ℹ️

Defense-in-Depth Authorization

The architecture implements a two-layer defense-in-depth authorization strategy: 1. API Access (Layer 1): An AWS Lambda Authorizer on Amazon API Gateway calls Verified Permissions to decide if a user can invoke the API at all. 2. Document Access (Layer 2): A middleware Lambda function, orchestrating calls to the Knowledge Base, also queries Verified Permissions to determine which resources (document tags) the user is permitted to query, constructing a metadata filter for the `RetrieveAndGenerate` API. Each layer operates independently, providing resilience if one layer fails or is bypassed.

Ingestion and Query Flow

Ingestion Pipeline: Documents uploaded to Amazon S3 trigger an EventBridge event, routed via SQS to a Lambda function. This Lambda processes documents, extracting metadata, and stores it in a sidecar (e.g., DynamoDB) alongside the document in S3, ensuring documents are tagged with appropriate department metadata before being indexed by Bedrock.
Query Flow: A client makes a request to the API Gateway. The Lambda Authorizer (Layer 1) checks basic API access via Verified Permissions. If authorized, the request proceeds to a middleware Lambda (Layer 2). This middleware Lambda uses the user's identity (e.g., from Cognito JWT) to query Verified Permissions, which evaluates Cedar policies to determine authorized document tags. The middleware then constructs a metadata filter based on these tags and passes it to the Bedrock `RetrieveAndGenerate` API. Only documents matching the filter are used for RAG.

⚠️

Isolation Model and Residual Risk

This pattern provides filter-level (logical) isolation, not IAM-enforced (infrastructure) isolation. The underlying Knowledge Base is a shared resource. A failure in the middleware logic constructing the filter could expose documents from other groups. For hard cross-tenant isolation where compliance mandates infrastructure separation, a dedicated Knowledge Base per tenant with IAM boundaries is recommended, with this pattern layered on top for finer-grained control within each tenant.

The architecture promotes efficiency by avoiding separate Knowledge Base instances per department, reducing cost and operational overhead. The dynamic policy evaluation through Amazon Verified Permissions significantly improves agility in managing access rules, eliminating the need for code changes and deployments for policy updates.

RAGMulti-tenancyAuthorizationAWS BedrockAWS Verified PermissionsMetadata FilteringServerlessAccess Control

Comments

Loading comments...

Architecture Design

Design this yourself

Design an enterprise RAG system that supports multiple departments within a single organization, each requiring fine-grained, document-level access control. The system must use a shared knowledge base to optimize costs and operational overhead, dynamically evaluate authorization policies at runtime to filter content, and allow access rules to be updated without code deployments. Detail the authorization architecture, including how documents are tagged, policies are managed, and requests are processed to enforce security.

Practice Interview

Focus: fine-grained authorization for multi-tenant RAG using externalized policies and metadata filtering

Other design angles

· Design a multi-tenant SaaS RAG platform where each customer requires strict IAM-level isolation, and then integrate fine-grained authorization for internal teams within each customer's tenant.· Focus on designing the policy management and evaluation service for a RAG system, detailing the schema for policies (e.g., Cedar) and the interaction with a data store to fetch authorized metadata filters.· Design the ingestion pipeline for a secure RAG system, specifically focusing on how documents are processed, enriched with metadata for authorization, and indexed while preventing accidental exposure of sensitive data before policies are applied.