This article summarizes a discussion with Robert Erez, a principal engineer at Octopus Deploy, on critical aspects of CI/CD, deployment systems, and software delivery. It covers best practices, common pitfalls, and future trends, including the impact of AI, offering valuable insights for designing robust and efficient deployment pipelines and platform engineering initiatives. The discussion emphasizes trade-offs in various deployment approaches, from GitOps to progressive delivery, and highlights the ongoing relevance of on-premise solutions for specific industries.
Read original on The Pragmatic EngineerA fundamental principle in highly stateful systems, especially those interacting with databases, is to prioritize roll-forward deployments over rollbacks. Attempting to roll back to a previous version (v1) after a failure in a new version (v2) can lead to critical schema mismatches and data inconsistencies if v2 introduced database changes. Instead, the recommended strategy is to quickly prepare and deploy a v3 that incorporates the fix, ensuring forward compatibility and avoiding complex state reconciliation issues. This approach emphasizes rapid iteration and recovery over reverting to a potentially incompatible state.
Feature Flags as a Safety Net
Feature toggles offer a superior safety mechanism compared to traditional rollbacks. In case of production issues, a feature flag allows for immediate disabling of the problematic functionality, effectively "stopping the bleeding" without requiring a full redeployment. This reduces the pressure during incidents, enabling calmer diagnosis and resolution, and makes incident response less nerve-wracking.
While widely adopted, the term "GitOps" can be misleading. Its four core pillars—declarative configuration, versioned and immutable artifacts, pull-based deployments, and continuous reconciliation—do not inherently mandate Git. The industry's dogmatic focus on Git can lead to anti-patterns, such as storing sensitive information (like secrets) in repositories. Furthermore, at extreme scales with thousands of independent Kubernetes clusters, a centralized Git repository can become a bottleneck, leading to throttling issues and requiring complex workarounds. This highlights that pull-based GitOps, while powerful, is not infinitely scalable without careful architectural consideration.
The article also touches on the enduring need for on-premise solutions in highly regulated industries like finance and government, emphasizing their demand for full control over hardware and upgrades. It also discusses the rise of platform engineering teams in larger organizations to standardize and streamline development workflows, providing sanity and focus across multiple projects and teams.