This article discusses Netflix's Data Projects, a system designed to address the challenges of managing data asset permissions and workload identities at Netflix's scale. It introduces a "project" as a logical container for related data assets, providing a durable, synthetic identity for scheduled workloads and simplifying access control through role-based grants, thus solving issues caused by fine-grained ACLs and human-tied identities in a dynamic organization.
Read original on Netflix Tech BlogAt Netflix's immense scale, managing millions of tables and tens of thousands of scheduled workloads presented significant challenges in identity and access management. The traditional approach involved fine-grained Access Control Lists (ACLs) per individual asset and workloads running under human identities. This model proved unsustainable and led to two main problems:
Netflix's solution, Data Projects, introduces a new abstraction layer to manage data assets and identities. A Data Project serves two primary functions:
Key Architectural Concept: Hoisting Granularity
The core architectural shift here is hoisting the granularity of management from individual assets to a logical container (the project). This simplifies permission management from hundreds of individual ACLs to a single set of project-level roles and grants. This pattern can be applied to many system design challenges where fine-grained, individual management becomes unwieldy at scale.
A crucial feature is "gravity": when a workload running under a project's identity creates new assets (e.g., tables), those assets are automatically added to and contained within the project. This inherent association eliminates the need for manual configuration, ensuring that assets inherit the project's access controls and remain organized without additional effort. It's a powerful mechanism for ensuring consistency and reducing operational burden.
For workflow orchestrators like Netflix's Maestro, Data Projects provide a robust solution to the fragility of user-tied identities. Workflows now run under the project's durable application identity, which doesn't change or leave the company. This ensures that permissions are stable, auditable, and persist through organizational shifts. It also enables consistent access management for created assets and scoped secrets, leading to resilient data pipelines.