Course/Distributed Systems/Split-Brain Problem

Split-Brain Problem

When network partitions divide a cluster: split-brain scenarios, quorum-based prevention, fencing tokens, and STONITH strategies.

10 min read

What Is Split-Brain?

Split-brain occurs when a network partition divides a cluster into two or more groups, and each group independently believes it is the majority — electing its own leader. Both leaders then accept writes, and when the partition heals, the two partitions have diverged state that may be impossible to automatically reconcile. It is one of the most dangerous failure modes in distributed databases.

Loading diagram...

Split-brain: both partitions elect a leader and accept diverging writes

For example: a 5-node database cluster partitions into a group of 3 and a group of 2. The group of 2, not knowing about the network partition, elects a new leader and begins accepting writes. The original group of 3 also has a leader accepting writes. When the partition heals, both groups have made conflicting changes to the same data.

Prevention: Quorum-Based Consensus

The primary defense against split-brain is quorum-based consensus: a leader can only be elected and can only commit writes if it has the acknowledgment of a majority (N/2 + 1) of nodes. Since two majorities cannot exist simultaneously in a cluster, split-brain is impossible.

In a 5-node cluster, majority = 3. The partition of 2 cannot elect a leader or commit writes.
In a 3-node cluster, majority = 2. A partition of 1 cannot lead; a partition of 2 can still elect a leader.
This is why odd-numbered cluster sizes are strongly preferred — even-numbered clusters waste one vote for no extra fault tolerance (4-node ≈ 3-node in practice).

Cluster Size	Majority Required	Nodes That Can Fail	Notes
3	2	1	Minimum for fault tolerance
5	3	2	Common production choice
7	4	3	Large, high-availability clusters
4	3	1	No better than 3-node cluster
6	4	2	No better than 5-node cluster

Fencing Tokens

Even with quorum, a subtle split-brain can occur due to process pauses: a leader is elected with a new term, but the old leader (paused via GC or OS scheduling) resumes and still thinks it is leader. It has no network connectivity to check. This is mitigated with fencing tokens — a monotonically increasing number issued with each leadership grant.

Every write the leader makes must include its fencing token. Storage systems reject any write with a fencing token lower than the highest seen. When a stale leader resumes and tries to write with its old (lower) token, the storage rejects it — preventing the stale leader from corrupting data.

pseudocode

// Storage system with fencing token enforcement
class FencedStorage:
  highestSeenToken = 0

  function write(key, value, fencingToken):
    if fencingToken < highestSeenToken:
      return ERROR_STALE_LEADER  // Reject stale leader writes
    highestSeenToken = fencingToken
    data[key] = value
    return SUCCESS

// Leader acquires token when elected (e.g., from ZooKeeper)
// leaderToken = zookeeper.getEphemeralNode().sessionId  (monotonically increasing)
// All writes include leaderToken

STONITH: Shoot The Other Node In The Head

STONITH (Shoot The Other Node In The Head) is a hardware-level fencing strategy used in high-availability clustering. When a node is suspected of being a rogue leader, the cluster forcefully powers it off via an out-of-band management interface (IPMI, iDRAC, AWS EC2 terminate API). This is more reliable than software-level fencing because a node that is alive but partitioned cannot defend itself from a power-off command.

📌

Split-brain in practice: MongoDB replica sets

A MongoDB replica set partitions: primary (old) is isolated, secondary + arbiter form a quorum of 2 (from 3) and elect a new primary. Writes to the old primary succeed locally but will be rolled back when the partition heals. MongoDB handles this with its oplog rollback mechanism — but data written to the old primary during the partition is lost. This is why `writeConcern: majority` is critical for durability guarantees.

💡

Interview Tip

When discussing replication or database clusters in interviews, proactively raise split-brain. Explain quorum (majority) as the prevention mechanism, fencing tokens as the backstop against stale leaders, and note that odd cluster sizes are not arbitrary — they maximize fault tolerance per node. If asked about MongoDB, describe `writeConcern: majority` as the safeguard that ensures writes are durable even if the primary fails.

Leader Election

Bloom Filters & HyperLogLog

Split-Brain Problem

What Is Split-Brain?

Prevention: Quorum-Based Consensus

Fencing Tokens

STONITH: Shoot The Other Node In The Head

Comments

Knowledge Check