Cloudflare Blog·March 26, 2026

Optimizing Kubernetes Persistent Volume Restarts for Large File Systems

This article details a Cloudflare case study where slow Kubernetes StatefulSet restarts, caused by recursive file ownership changes on large Persistent Volumes, led to significant engineering downtime. It explores the debugging process to identify the root cause, a default Kubernetes `fsGroupChangePolicy`, and the simple one-line fix that dramatically reduced restart times and improved operational efficiency.

DevOps & SRE Cloud & Infrastructure Performance & Scaling

Read original on Cloudflare Blog

The Challenge: Slow Kubernetes StatefulSet Restarts

Cloudflare encountered a recurring issue where restarting their Atlantis Kubernetes StatefulSet, responsible for managing Terraform changes, took approximately 30 minutes. With around 100 restarts per month for credential rotations and onboarding, this amounted to over 50 hours of blocked engineering time monthly. The problem stemmed from a Kubernetes default behavior interacting inefficiently with a PersistentVolume containing millions of files.

Debugging the Bottleneck: A Deep Dive into Kubelet Logs

Initial investigations using `kubectl events` provided limited insight, only showing the pod waiting for an init container. To uncover the true bottleneck, the team analyzed `kubelet` logs on the affected node. This revealed a significant delay between the Persistent Volume being mounted and the pod actually starting, accompanied by `Error syncing pod` messages related to unmounted volumes and context deadlines.

💡

Debugging Kubernetes Deep Dives

When Kubernetes events and basic pod descriptions don't reveal the problem, checking the `kubelet` logs on the node where the pod is scheduled can provide crucial low-level insights into volume mounting, container runtime issues, and other host-level interactions.

Identifying the `fsGroupChangePolicy` Culprit

Further log analysis, specifically filtering for the Persistent Volume name, exposed a critical log message: `Setting volume ownership for ... and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow`. This immediately pointed to `fsGroupChangePolicy` as the issue. The default `fsGroupChangePolicy: Always` recursively changes the group ownership for every file and directory on the mounted volume to match the `fsGroup` specified in the pod's `securityContext`.

The Solution: A One-Line Configuration Change

The fix involved changing the `fsGroupChangePolicy` from its default `Always` to `OnRootMismatch` within the pod's `securityContext`. This setting, available since Kubernetes v1.20, ensures that group ownership is only changed if the root directory of the PV doesn't have the correct permissions, avoiding a recursive traversal of millions of files. This simple modification reduced Atlantis restart times from 30 minutes to approximately 30 seconds, saving Cloudflare 600 engineering hours annually.

yaml

spec:
  template:
    spec:
      securityContext:
        fsGroupChangePolicy: OnRootMismatch

KubernetesPersistent VolumesStatefulSetTroubleshootingPerformance OptimizationDevOpsCloudflareInfrastructure as Code

Comments

Loading comments...

Architecture Design

Design this yourself

Design a high-availability, fault-tolerant infrastructure as code (IaC) management system, similar to Cloudflare's Atlantis setup, running on Kubernetes. Focus on optimizing the performance of Persistent Volumes used for storing repository state, ensuring rapid restarts and minimal downtime for configuration changes. Detail how to prevent issues like slow `fsGroupChangePolicy` operations for large file systems, considering alternative storage solutions or Kubernetes volume configurations.

Practice Interview

Focus: Kubernetes Persistent Volume optimization

Other design angles

· Design a system to manage IaC without relying on persistent storage for state, perhaps utilizing external, highly available object storage or a distributed key-value store for state management.· Propose a disaster recovery strategy for an IaC management system running on Kubernetes, including how to quickly restore state and ensure consistency across multiple environments.· Design a multi-tenant IaC platform on Kubernetes, addressing isolation, security contexts, and performance challenges for persistent volumes when managing numerous projects with varied file system requirements.