pith. sign in

arxiv: 2511.02743 · v3 · submitted 2025-11-04 · 💻 cs.DC

Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version)

Pith reviewed 2026-05-18 00:41 UTC · model grok-4.3

classification 💻 cs.DC
keywords state machine replicationleaderless consensusPaxosfailure recoverydistributed systemscrash faultsEgalitarian Paxos
0
0 comments X

The pith

EPaxos* replaces the complex recovery in leaderless Paxos with a simpler proved-correct version and generalizes it to optimal failure thresholds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces EPaxos*, a corrected and simplified variant of Egalitarian Paxos for collaborative command ordering in distributed replication. Egalitarian Paxos avoids a single leader to prevent bottlenecks and keep throughput under crashes, while allowing fast execution when few failures occur and commands commute. The authors supply a simpler failure-recovery algorithm with a rigorous correctness proof and extend the protocol to all pairs of thresholds f and e where the replica count n meets or exceeds the maximum of 2e plus f minus 1 and 2f plus 1. This matters because the original protocol underpins many modern systems yet was hard to implement due to complexity and bugs.

Core claim

The central claim is that Egalitarian Paxos admits a simpler failure-recovery algorithm that has been rigorously proved correct, and that the protocol can be generalized to the entire spectrum of failure thresholds f and e satisfying n greater than or equal to the maximum of 2e plus f minus 1 and 2f plus 1, a number the authors show to be optimal.

What carries the argument

The redesigned failure-recovery algorithm that restores safety and liveness after crashes without a leader while supporting the generalized f and e thresholds.

Load-bearing premise

The model assumes only crash-stop failures and reliable delivery of messages between correct processes.

What would settle it

An execution with crashes that violates safety or liveness under the new recovery rules, or a working protocol that meets the fast-path and throughput goals with fewer than the claimed optimal number of processes.

Figures

Figures reproduced from arXiv: 2511.02743 by Alexey Gotsman, Fedor Ryabinin, Pierre Sutra.

Figure 5
Figure 5. Figure 5: First, before attempting to recover a fast path decision for a command [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to $f$ crashes of any processes out of a total of $n = 2f+1$. The protocol furthermore allows any process to execute a command $c$ fast, in $2$ message delays, provided no more than $e = \lceil\frac{f+1}{2}\rceil$ other processes fail, and all concurrently submitted commands commute with $c$; the latter condition is often satisfied in practical systems. Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* -- a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds $f$ and $e$ such that $n \ge \max\{2e+f-1, 2f+1\}$ -- the number of processes that we show to be optimal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper presents EPaxos*, a simplified and corrected variant of Egalitarian Paxos for leaderless state-machine replication. Its central claims are a simpler failure-recovery algorithm that has been rigorously proved correct, together with a generalization of the protocol to arbitrary f and e such that n ≥ max{2e+f-1, 2f+1}, which the authors show is optimal.

Significance. If the results hold, the work is significant for distributed computing. Egalitarian Paxos is foundational for many leaderless replication protocols; fixing its bugs and complexity while adding an optimal generalization with a rigorous proof supplies a cleaner, more reliable base for future systems. The rigorous proof of the recovery algorithm is a clear strength that the manuscript ships.

minor comments (2)
  1. [Abstract] Abstract: the claim of a 'rigorous proof' of the recovery algorithm would be strengthened by a one-sentence high-level sketch or list of key invariants even in the abstract or introduction.
  2. [Introduction] The optimality argument for n ≥ max{2e+f-1, 2f+1} is stated clearly but would benefit from an explicit sentence contrasting it with the original Egalitarian Paxos bounds to make the improvement immediate.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance for distributed computing, and recommendation of minor revision. We are pleased that the referee highlights the value of the simpler failure-recovery algorithm with its rigorous proof and the optimal generalization of the failure thresholds.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines EPaxos* directly via message-exchange rules and a failure-recovery procedure, then supplies an independent rigorous proof of correctness for the recovery algorithm under standard crash-stop assumptions with reliable delivery between correct processes. The generalization to thresholds satisfying n ≥ max{2e+f-1, 2f+1} is presented as a mathematical bound that the authors separately establish as optimal; neither the protocol rules nor the optimality claim reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. The central claims rest on explicit protocol text and external verification rather than any construction that equates output to input by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard distributed-systems assumptions rather than introducing new free parameters or invented entities; the contribution lies in the protocol construction and its proof.

axioms (1)
  • domain assumption Crash-stop failure model with reliable message delivery between correct processes in an asynchronous network.
    Invoked implicitly throughout the description of Paxos-style replication and the recovery procedure.

pith-pipeline@v0.9.0 · 5824 in / 1330 out tokens · 36634 ms · 2026-05-18T00:41:04.585917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    For every valuev and process p∈Π\E, there exists anE-faulty synchronous run in which onlypcallspropose(), the proposed value isv, andpdecides a value by time2∆

  2. [2]

    Fix a subsetE⊆Πof size e

    For every valuev and process p∈Π\E, there exists anE-faulty synchronous run in which all processes inΠ\E call propose(v)at the beginning of the first round, andp decides a value by time2∆. Fix a subsetE⊆Πof size e. Assume some valuev, a processp∈Π\E, and a run of the above construction in whichp calls propose(v)at the start of the first round and the othe...

  3. [3]

    Each command is pre-accepted at most once by each process

  4. [4]

    Once a process moves a command to theaccepted or committed phase, it remains in one of these phases for the entire execution

  5. [5]

    At every process and for every id, if abal[id] > 0, then phase[id] ∈ {accepted,committed}

  6. [6]

    At any moment aftert, the processp has cmd[id] = c orcmd[id] =Nop, with the latter possible only ifphase[id]̸=preaccepted

    Assume that at timet a process p pre-accepts or validates a commandid with a payloadc (i.e., p executes line 13 or line 83). At any moment aftert, the processp has cmd[id] = c orcmd[id] =Nop, with the latter possible only ifphase[id]̸=preaccepted

  7. [7]

    If a process executes line 57 (or line 57t), then the condition in this line evaluates to true, and only for the dependency setD

    Assume that a process sendsCommit(0,id, c, D)on the fast path (line 23). If a process executes line 57 (or line 57t), then the condition in this line evaluates to true, and only for the dependency setD

  8. [8]

    If a process sends PreAcceptOK(id, D[,_])and PreAcceptOK(id′, D′[,_])for two com- mands with conflicting initial payloads, thenid∈D′orid ′∈D

  9. [9]

    a.For anyAccept(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D; b.For anyCommit(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D

    Assume that a quorum of processes has receivedAccept(b,id, c, D)and replied with AcceptOK(b,id). a.For anyAccept(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D; b.For anyCommit(b ′,id, c′, D′)sent, ifb ′≥b, thenc′=candD ′=D

  10. [10]

    a.For anyAccept(_,id, c′, D′)sent,c ′=candD ′=D; b.For anyCommit(_,id, c′, D′)sent,c ′=candD ′=D

    Assume that a process sendsCommit(0,id, c, D)at line 23. a.For anyAccept(_,id, c′, D′)sent,c ′=candD ′=D; b.For anyCommit(_,id, c′, D′)sent,c ′=candD ′=D. Figure 9Additional invariants ofEPaxos*. All invariants hold for all three versions ofEPaxos*. and therefore initCoord(id) /∈R. Hence, every process inQ that pre-accepted id did so with the dependency s...

  11. [11]

    there exists a quorum Q such that every process q∈Q previously sent a PreAcceptOK(id, Dq[,_])message with ⋃ q∈QDq =D; or 2.a process previously sent anAccept(_,id, c, D)message from line 64 or line 73. Proof. By induction on the length of the execution. It is easy to see that the proposition holds initially. For the induction step, suppose a processp eith...

  12. [12]

    there exists a quorum Q such that every process q∈Q previously sent a PreAcceptOK(id, Dq[,_])message with ⋃ q∈QDq =D; or 2.a process previously sent anAccept(_,id, c, D)message from line 64 or line 73. Proof. By induction on the length of the execution. It is easy to see that the proposition holds initially. For the induction step, suppose a processp eith...

  13. [13]

    Assume that both commands satisfy condition 1 of Proposition 18. Then there are two quorums Q and Q′such that every process q∈Q (respectively, q∈Q′) has sent a PreAcceptOK(id, Dq[,_])message (respectively, PreAcceptOK(id′, D′ q[,_])) with⋃ q∈QDq = D (respectively, ⋃ q∈Q′D′ q = D′). Since Q∩Q′̸= ∅, there exists a process that sends bothPreAcceptOK(id, Dq[,...

  14. [14]

    Then there exists a quorumQ′such that every process q∈Q′has sent aPreAcceptOK(id′, D′ q[,_])message with ⋃ q∈Q′D′ q = D′

    Assume that id′satisfies condition 1 of Proposition 18, whileid satisfies condition 2 (the symmetric case is analogous). Then there exists a quorumQ′such that every process q∈Q′has sent aPreAcceptOK(id′, D′ q[,_])message with ⋃ q∈Q′D′ q = D′. Furthermore, a process p has sent anAccept(_,id, c, D)from line 64 or line 73. LetQ be a quorum that enables the c...

  15. [15]

    In this case, they both validate their dependencies at their recovery quorums at lines 60–61

    Assume that both commands satisfy condition 2 of Proposition 18. In this case, they both validate their dependencies at their recovery quorums at lines 60–61. Thus, there exists a process that sendsValidateOK(_,id, I)and ValidateOK(_,id′, I′). Assume that it first sendsValidateOK(_,id′, I′)(the symmetric case is analogous). By line 84, at the F. Ryabinin,...

  16. [16]

    8.Processesp 1,p 2,p 3, andp 9 fail

    Process p2 successfully commits id′with dep[id′] ={id}and seq[id′] = 1on the slow path using a quorum{p2, p3, p4, p8, p9}. 8.Processesp 1,p 2,p 3, andp 9 fail. 9.Processp 4 starts recoveringid. a. It broadcasts Prepare and receives f + 1 = 5replies from{p4, p5, p6, p7, p8}(page 41, step 1). b.Steps 2 on page 41 and 3–4 on page 42 are not applicable. c. Ho...

  17. [17]

    Process p4 repeatedly attempts to recoverid without success (i.e., it indefinitely repeats item 9)