Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version)
Pith reviewed 2026-05-18 00:41 UTC · model grok-4.3
The pith
EPaxos* replaces the complex recovery in leaderless Paxos with a simpler proved-correct version and generalizes it to optimal failure thresholds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Egalitarian Paxos admits a simpler failure-recovery algorithm that has been rigorously proved correct, and that the protocol can be generalized to the entire spectrum of failure thresholds f and e satisfying n greater than or equal to the maximum of 2e plus f minus 1 and 2f plus 1, a number the authors show to be optimal.
What carries the argument
The redesigned failure-recovery algorithm that restores safety and liveness after crashes without a leader while supporting the generalized f and e thresholds.
Load-bearing premise
The model assumes only crash-stop failures and reliable delivery of messages between correct processes.
What would settle it
An execution with crashes that violates safety or liveness under the new recovery rules, or a working protocol that meets the fast-path and throughput goals with fewer than the claimed optimal number of processes.
Figures
read the original abstract
Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to $f$ crashes of any processes out of a total of $n = 2f+1$. The protocol furthermore allows any process to execute a command $c$ fast, in $2$ message delays, provided no more than $e = \lceil\frac{f+1}{2}\rceil$ other processes fail, and all concurrently submitted commands commute with $c$; the latter condition is often satisfied in practical systems. Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* -- a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds $f$ and $e$ such that $n \ge \max\{2e+f-1, 2f+1\}$ -- the number of processes that we show to be optimal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents EPaxos*, a simplified and corrected variant of Egalitarian Paxos for leaderless state-machine replication. Its central claims are a simpler failure-recovery algorithm that has been rigorously proved correct, together with a generalization of the protocol to arbitrary f and e such that n ≥ max{2e+f-1, 2f+1}, which the authors show is optimal.
Significance. If the results hold, the work is significant for distributed computing. Egalitarian Paxos is foundational for many leaderless replication protocols; fixing its bugs and complexity while adding an optimal generalization with a rigorous proof supplies a cleaner, more reliable base for future systems. The rigorous proof of the recovery algorithm is a clear strength that the manuscript ships.
minor comments (2)
- [Abstract] Abstract: the claim of a 'rigorous proof' of the recovery algorithm would be strengthened by a one-sentence high-level sketch or list of key invariants even in the abstract or introduction.
- [Introduction] The optimality argument for n ≥ max{2e+f-1, 2f+1} is stated clearly but would benefit from an explicit sentence contrasting it with the original Egalitarian Paxos bounds to make the improvement immediate.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work, recognition of its significance for distributed computing, and recommendation of minor revision. We are pleased that the referee highlights the value of the simpler failure-recovery algorithm with its rigorous proof and the optimal generalization of the failure thresholds.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper defines EPaxos* directly via message-exchange rules and a failure-recovery procedure, then supplies an independent rigorous proof of correctness for the recovery algorithm under standard crash-stop assumptions with reliable delivery between correct processes. The generalization to thresholds satisfying n ≥ max{2e+f-1, 2f+1} is presented as a mathematical bound that the authors separately establish as optimal; neither the protocol rules nor the optimality claim reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. The central claims rest on explicit protocol text and external verification rather than any construction that equates output to input by definition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Crash-stop failure model with reliable message delivery between correct processes in an asynchronous network.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds f and e such that n ≥ max{2e+f-1, 2f+1} — the number of processes that we show to be optimal.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The main technical novelty of EPaxos* is its failure recovery procedure, which significantly simplifies the one in the original EPaxos
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
For every valuev and process p∈Π\E, there exists anE-faulty synchronous run in which onlypcallspropose(), the proposed value isv, andpdecides a value by time2∆
-
[2]
For every valuev and process p∈Π\E, there exists anE-faulty synchronous run in which all processes inΠ\E call propose(v)at the beginning of the first round, andp decides a value by time2∆. Fix a subsetE⊆Πof size e. Assume some valuev, a processp∈Π\E, and a run of the above construction in whichp calls propose(v)at the start of the first round and the othe...
work page 2025
-
[3]
Each command is pre-accepted at most once by each process
-
[4]
Once a process moves a command to theaccepted or committed phase, it remains in one of these phases for the entire execution
-
[5]
At every process and for every id, if abal[id] > 0, then phase[id] ∈ {accepted,committed}
-
[6]
Assume that at timet a process p pre-accepts or validates a commandid with a payloadc (i.e., p executes line 13 or line 83). At any moment aftert, the processp has cmd[id] = c orcmd[id] =Nop, with the latter possible only ifphase[id]̸=preaccepted
-
[7]
Assume that a process sendsCommit(0,id, c, D)on the fast path (line 23). If a process executes line 57 (or line 57t), then the condition in this line evaluates to true, and only for the dependency setD
-
[8]
If a process sends PreAcceptOK(id, D[,_])and PreAcceptOK(id′, D′[,_])for two com- mands with conflicting initial payloads, thenid∈D′orid ′∈D
-
[9]
Assume that a quorum of processes has receivedAccept(b,id, c, D)and replied with AcceptOK(b,id). a.For anyAccept(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D; b.For anyCommit(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D
-
[10]
a.For anyAccept(_,id, c′, D′)sent,c ′=candD ′=D; b.For anyCommit(_,id, c′, D′)sent,c ′=candD ′=D
Assume that a process sendsCommit(0,id, c, D)at line 23. a.For anyAccept(_,id, c′, D′)sent,c ′=candD ′=D; b.For anyCommit(_,id, c′, D′)sent,c ′=candD ′=D. Figure 9Additional invariants ofEPaxos*. All invariants hold for all three versions ofEPaxos*. and therefore initCoord(id) /∈R. Hence, every process inQ that pre-accepted id did so with the dependency s...
work page 2025
-
[11]
there exists a quorum Q such that every process q∈Q previously sent a PreAcceptOK(id, Dq[,_])message with ⋃ q∈QDq =D; or 2.a process previously sent anAccept(_,id, c, D)message from line 64 or line 73. Proof. By induction on the length of the execution. It is easy to see that the proposition holds initially. For the induction step, suppose a processp eith...
-
[12]
there exists a quorum Q such that every process q∈Q previously sent a PreAcceptOK(id, Dq[,_])message with ⋃ q∈QDq =D; or 2.a process previously sent anAccept(_,id, c, D)message from line 64 or line 73. Proof. By induction on the length of the execution. It is easy to see that the proposition holds initially. For the induction step, suppose a processp eith...
work page 2025
-
[13]
Assume that both commands satisfy condition 1 of Proposition 18. Then there are two quorums Q and Q′such that every process q∈Q (respectively, q∈Q′) has sent a PreAcceptOK(id, Dq[,_])message (respectively, PreAcceptOK(id′, D′ q[,_])) with⋃ q∈QDq = D (respectively, ⋃ q∈Q′D′ q = D′). Since Q∩Q′̸= ∅, there exists a process that sends bothPreAcceptOK(id, Dq[,...
-
[14]
Assume that id′satisfies condition 1 of Proposition 18, whileid satisfies condition 2 (the symmetric case is analogous). Then there exists a quorumQ′such that every process q∈Q′has sent aPreAcceptOK(id′, D′ q[,_])message with ⋃ q∈Q′D′ q = D′. Furthermore, a process p has sent anAccept(_,id, c, D)from line 64 or line 73. LetQ be a quorum that enables the c...
-
[15]
In this case, they both validate their dependencies at their recovery quorums at lines 60–61
Assume that both commands satisfy condition 2 of Proposition 18. In this case, they both validate their dependencies at their recovery quorums at lines 60–61. Thus, there exists a process that sendsValidateOK(_,id, I)and ValidateOK(_,id′, I′). Assume that it first sendsValidateOK(_,id′, I′)(the symmetric case is analogous). By line 84, at the F. Ryabinin,...
work page 2025
-
[16]
8.Processesp 1,p 2,p 3, andp 9 fail
Process p2 successfully commits id′with dep[id′] ={id}and seq[id′] = 1on the slow path using a quorum{p2, p3, p4, p8, p9}. 8.Processesp 1,p 2,p 3, andp 9 fail. 9.Processp 4 starts recoveringid. a. It broadcasts Prepare and receives f + 1 = 5replies from{p4, p5, p6, p7, p8}(page 41, step 1). b.Steps 2 on page 41 and 3–4 on page 42 are not applicable. c. Ho...
-
[17]
Process p4 repeatedly attempts to recoverid without success (i.e., it indefinitely repeats item 9)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.