Menu
InfoQ Cloud·June 9, 2026

Automating Code Changes Across Diverse Software Fleets at Scale

Netflix developed an event-driven orchestration platform to automate code changes and migrations across its vast and diverse software fleet, aiming to reduce migration times from months to days. This platform uses composable, 'Lego-like' steps, integrates automated canary validation, and incorporates compliance checks to ensure safety and confidence in large-scale changes. The core architectural challenge was to balance flexibility for unique migrations with the need for standardized, repeatable processes for common updates.

Read original on InfoQ Cloud

The Challenge: Managing Code Migrations in a Diverse Fleet

Netflix faced a significant challenge with software migrations: libraries often had many active versions due to slow adoption of updates, leading to a "long tail" of maintenance. Critical vulnerabilities, like Log4j, highlighted the need for rapid, fleet-wide code changes. The goal was to automate all code changes within a week, and critical vulnerabilities within two days, with minimal effort for both platform teams and software owners. Key requirements included handling the diverse characteristics of the fleet (languages, security, business units, monorepos vs. microservices) and ensuring changes were applied safely without breaking production systems.

Architecting an Event-Driven Orchestration Platform

Netflix's solution is a fleet-wide automation platform centered around an event-driven orchestration engine. This system allows platform teams to create "campaigns" to update "targets" (software units) along a defined "path" of automated steps. The architecture decouples the state machine from the event consumer, enabling flexibility for events to originate from various internal and external systems. This design ensures the system can react to diverse triggers and progress changes asynchronously.

  • Campaigns: Initiated by platform teams to drive a migration.
  • Targets: Individual software units undergoing the migration.
  • Path: A sequence of automated, composable steps.
  • Rollout: The orchestration and progression through the steps.
  • Deployment: The actual delivery of the change to infrastructure.

Composable Steps and Workflow Management

At the heart of the platform are composable, predefined units of automation, likened to Lego bricks. Each step has its own state, allowing for flexible path creation to accommodate unique migration requirements while also offering pre-configured paths for common updates (e.g., dependency updates). The state machine processes incoming events, determines the next step, updates step states, launches child workflows (step handlers) for specific automation tasks, and manages edge cases like pausing, resuming, and failure handling.

Building Confidence Through Validation and Phased Rollouts

💡

Safety First: Automated Canary and Compliance Checks

To ensure safety, the platform implements several checks: - Draft Pull Requests: Changes are initially made in draft PRs, awaiting all PR checks to pass. - Automated Canary Validation: Integration with resilience teams enables canary deployments. If a canary fails, the rollout stops, preventing broader impact. - Phased Rollouts: Changes are rolled out by criticality, allowing early detection of issues in lower-risk applications. - Compliance Checks: Ensures changes align with team preferences and security requirements. - Easy Interventions: Provides a 'big red stop button' for manual pauses at any point.

The typical migration path involves a code transform step (using custom scripts, GenAI-prompted containers, or pre-configured codemods), followed by draft pull request creation, and then an extensive validation step. This validation leverages automated canaries, a crucial mechanism to test changes in a small production subset before widespread deployment, significantly boosting confidence in automated changes.

automationcode migrationfleet managementevent-drivenorchestrationcanary deploymentsDevOpsNetflix

Comments

Loading comments...