GitHub engineered a novel solution using eBPF to prevent circular dependencies during critical deployments, ensuring system stability even during outages. By selectively monitoring and blocking network egress from deployment scripts, they mitigate risks where deployment tools or services might depend on GitHub.com itself, which could be unavailable. This approach enhances deployment safety and reduces recovery times during incidents.
Read original on GitHub EngineeringGitHub's core challenge in deployments is a potential circular dependency: if GitHub.com is down, the tools and processes used to fix it might depend on GitHub.com. This article details their solution using eBPF to create a robust, host-based deployment system that can operate independently of GitHub's availability.
The article outlines several types of circular dependencies that can cripple deployments during an incident, particularly if the incident affects GitHub itself:
To solve this, GitHub adopted eBPF, which allows loading custom programs into the Linux kernel to hook into core system primitives like networking. Specifically, they used:
The userspace DNS proxy evaluates requested domains against a blocklist, using eBPF Maps to communicate with the `CGROUP_SKB` program to allow or deny requests. This allows for dynamic, domain-based filtering rather than static IP blocking, which is impractical for a system as dynamic as GitHub's.
/* This is the hexadecimal representation of 127.0.0.1 address */
const __u32 ADDRESS_LOCALHOST_NETBYTEORDER = bpf_htonl(0x7f000001);
SEC("cgroup/connect4")
int connect4(struct bpf_sock_addr *ctx) {
__be32 original_ip = ctx->user_ip4;
__u16 original_port = bpf_ntohs(ctx->user_port);
if (ctx->user_port == bpf_htons(53)) {
/* For DNS Query (*:53) rewire service to backend
* 127.0.0.1:const_dns_proxy_port */
ctx->user_ip4 = const_mitm_proxy_address;
ctx->user_port = bpf_htons(const_dns_proxy_port);
}
return 1;
}Beyond blocking, eBPF enabled GitHub to correlate blocked DNS requests back to the specific command or process that triggered them. By tracking DNS transaction IDs and Process IDs (PIDs) using eBPF Maps, they can log detailed warnings, including the command line that initiated the problematic request, significantly aiding debugging and remediation.
Key Takeaways for System Design
eBPF provides powerful, low-level programmability within the Linux kernel, making it ideal for granular control over network traffic and process behavior. It's a critical tool for building resilient systems, especially in scenarios requiring isolation and secure execution environments for critical operations like deployments. Consider its use for network filtering, observability, and security enforcement at the kernel level.