Menu
Cloudflare Blog·March 26, 2026

Optimizing Kubernetes Persistent Volume Restarts for Large File Systems

This article details a Cloudflare case study where slow Kubernetes StatefulSet restarts, caused by recursive file ownership changes on large Persistent Volumes, led to significant engineering downtime. It explores the debugging process to identify the root cause, a default Kubernetes `fsGroupChangePolicy`, and the simple one-line fix that dramatically reduced restart times and improved operational efficiency.

Read original on Cloudflare Blog

The Challenge: Slow Kubernetes StatefulSet Restarts

Cloudflare encountered a recurring issue where restarting their Atlantis Kubernetes StatefulSet, responsible for managing Terraform changes, took approximately 30 minutes. With around 100 restarts per month for credential rotations and onboarding, this amounted to over 50 hours of blocked engineering time monthly. The problem stemmed from a Kubernetes default behavior interacting inefficiently with a PersistentVolume containing millions of files.

Debugging the Bottleneck: A Deep Dive into Kubelet Logs

Initial investigations using `kubectl events` provided limited insight, only showing the pod waiting for an init container. To uncover the true bottleneck, the team analyzed `kubelet` logs on the affected node. This revealed a significant delay between the Persistent Volume being mounted and the pod actually starting, accompanied by `Error syncing pod` messages related to unmounted volumes and context deadlines.

💡

Debugging Kubernetes Deep Dives

When Kubernetes events and basic pod descriptions don't reveal the problem, checking the `kubelet` logs on the node where the pod is scheduled can provide crucial low-level insights into volume mounting, container runtime issues, and other host-level interactions.

Identifying the `fsGroupChangePolicy` Culprit

Further log analysis, specifically filtering for the Persistent Volume name, exposed a critical log message: `Setting volume ownership for ... and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow`. This immediately pointed to `fsGroupChangePolicy` as the issue. The default `fsGroupChangePolicy: Always` recursively changes the group ownership for every file and directory on the mounted volume to match the `fsGroup` specified in the pod's `securityContext`.

The Solution: A One-Line Configuration Change

The fix involved changing the `fsGroupChangePolicy` from its default `Always` to `OnRootMismatch` within the pod's `securityContext`. This setting, available since Kubernetes v1.20, ensures that group ownership is only changed if the root directory of the PV doesn't have the correct permissions, avoiding a recursive traversal of millions of files. This simple modification reduced Atlantis restart times from 30 minutes to approximately 30 seconds, saving Cloudflare 600 engineering hours annually.

yaml
spec:
  template:
    spec:
      securityContext:
        fsGroupChangePolicy: OnRootMismatch
KubernetesPersistent VolumesStatefulSetTroubleshootingPerformance OptimizationDevOpsCloudflareInfrastructure as Code

Comments

Loading comments...