InfoQ Architecture·June 25, 2026

Grab's Secure Agentic AI Workload Platform (Palana)

Grab developed Palana, a Kubernetes-native platform, to securely run autonomous AI agent workloads. This platform addresses the inherent security risks of non-deterministic, model-driven applications by providing isolated runtime environments, robust secrets management, and centralized egress control. It leverages Kubernetes custom resources for scalable and auditable management of agent lifecycles.

Security AI & ML Infrastructure Cloud & Infrastructure

Read original on InfoQ Architecture

The rise of autonomous AI agents, capable of executing arbitrary tools and making API calls, introduces significant security challenges like prompt injection and logic hijacking. Grab's platform engineering and cybersecurity teams tackled these by building Palana, a proprietary, Kubernetes-native secure execution platform. Palana's core purpose is to provide a secure, isolated runtime environment for these non-deterministic, model-driven applications.

Isolation as the Primary Unit of Trust

Palana implements a zero-trust model, establishing isolation as its fundamental security principle. Each AI agent is assigned to its own dedicated Kubernetes namespace. This namespace is configured with restrictive Role-Based Access Control (RBAC), custom network policies, and isolated service accounts to prevent lateral movement or impact to other workloads if one agent is compromised. Agents also receive persistent, localized storage for state preservation across container restarts during long-running workflows.

Advanced Secrets Management and Egress Control

Traditional secrets management methods (e.g., environment variables) are too risky for autonomous agents. Palana decouples secrets into agent-readable credentials and proxy-only secrets. Sensitive credentials reside in HashiCorp Vault. Agents only receive abstract placeholder tokens. Outbound API calls are intercepted by a secure, intermediate proxy that validates the destination and dynamically replaces placeholders with real secrets, ensuring raw secrets never touch the agent container's environment or memory.

ℹ️

Egress Control Mechanism

All outbound HTTP/HTTPS traffic is routed through an Envoy proxy and an external authorization service running Open Policy Agent (OPA) rules. This setup enables real-time traffic decryption (using Man-in-the-Middle CA termination), header evaluation, endpoint validation, and token substitution, all while generating detailed audit trails. This centralized control point is crucial for monitoring and securing agent communications.

Kubernetes-Native Operational Model

Custom Resource Definition (CRD): Each agent is modeled as a Kubernetes Custom Resource.
Custom Operator: A Kubernetes operator reconciles these custom resources, dynamically provisioning necessary infrastructure components like namespaces, storage, network policies, and ingress paths.
Separation of Concerns: This design provides a simplified user interface/CLI for developers and a robust, standard Kubernetes layer for systems engineers, facilitating programmatic auditing, updates, and lifecycle management of hundreds of concurrent agent workloads.

For critical operational controls, such as network-level kill switches and idle shutdowns, Palana ensures they reside entirely outside the agent's execution runtime. This prevents a compromised agent from preventing its own termination, reinforcing the platform's security posture.

KubernetesAI AgentsZero TrustSecrets ManagementEnvoyOpen Policy AgentCustom ResourcesPlatform Engineering

Comments

Loading comments...

Architecture Design

Design this yourself

Design a secure, Kubernetes-native platform for running autonomous AI agent workloads. Focus on architectural decisions for isolation (e.g., namespaces, RBAC, network policies), secrets management (e.g., proxy-only secrets, integration with HashiCorp Vault), and centralized egress control (e.g., Envoy proxy, OPA rules). Describe how to manage agent lifecycles using Kubernetes custom resources and operators.

Practice Interview

Focus: secure execution platform for autonomous AI workloads

Other design angles

· Design only the secrets management component for AI agents, detailing the proxy architecture and interaction with a secret store.· Design a secure multi-tenant platform for AI workloads where different organizations run their autonomous agents, ensuring strong tenant isolation.· Design a framework for auditability and monitoring of AI agent actions within a secure platform, focusing on logging, traceability, and incident response.