Menu
AWS Architecture Blog·December 11, 2025

Architecting a Generative AI-Powered Observability Assistant for Kubernetes

This article outlines the architecture for building a generative AI-powered troubleshooting assistant for Kubernetes environments. It addresses the challenges of observability in distributed microservices by leveraging RAG (Retrieval-Augmented Generation) with LLMs to process telemetry data and provide self-service diagnostics, aiming to reduce Mean Time to Recovery (MTTR) and operational overhead.

Read original on AWS Architecture Blog

Modern cloud applications, often built as microservices on platforms like Kubernetes, introduce significant challenges for observability and troubleshooting due to their distributed nature. Engineers frequently spend considerable time manually correlating logs, metrics, and events across various layers, leading to increased Mean Time to Recovery (MTTR) and requiring deep domain expertise. This complexity highlights a critical need for more efficient diagnostic tools.

Challenges of Observability in Distributed Systems

Distributed systems, while offering flexibility and scalability, complicate troubleshooting. Kubernetes, for instance, involves multiple abstraction layers (pods, nodes, networking) and generates vast amounts of telemetry data (kubelet logs, application logs, events, metrics). Making sense of this data requires both system and application knowledge, leading to skill gaps and prolonged resolution times. Statistics show that lack of team knowledge is a major observability challenge, and MTTR has been increasing.

Generative AI-Powered Troubleshooting Assistant Architecture

The proposed solution is an AI assistant that combines LLM-driven analysis with existing telemetry. The architecture primarily consists of three parts: deployment approach, telemetry collection and storage, and an interactive troubleshooting interface. The article details a RAG-based chatbot approach, which can be extended to other compute services beyond Amazon EKS.

Telemetry Collection and Storage Pipeline

A crucial first step is establishing a robust pipeline for collecting, processing, and storing telemetry. Fluent Bit is used to stream telemetry (application logs, kubelet logs, Kubernetes events) into Amazon Kinesis Data Streams. AWS Lambda functions normalize this data, Amazon Bedrock generates vector embeddings, and OpenSearch Serverless stores these embeddings for efficient semantic retrieval. This serverless approach minimizes infrastructure management overhead.

💡

Batching for Efficiency

For enhanced performance and cost-efficiency, ensure your Lambda functions utilize batching when ingesting data from Kinesis, generating embeddings, and storing them in OpenSearch. This reduces the number of invocations and optimizes resource utilization.

Iterative Troubleshooting with RAG Chatbot

  1. Engineer queries the chatbot (e.g., "Pod stuck in pending state.")
  2. Chatbot sends query to Bedrock for vector embeddings.
  3. Chatbot retrieves semantically matching telemetry from OpenSearch.
  4. Augmented prompt (query + telemetry) is sent to LLM, which suggests `kubectl` commands.
  5. Chatbot forwards commands to a read-only troubleshooting assistant in the EKS cluster; output is returned.
  6. LLM evaluates output, decides to continue investigation or provide resolution.
  7. Final resolution, including query, telemetry, and investigation results, is composed by the chatbot and returned to the engineer.
KubernetesObservabilityTroubleshootingGenerative AILLMRAGTelemetryDistributed TracingAWS

Comments

Loading comments...