pith. sign in

arxiv: 2606.25099 · v1 · pith:RKVEF4TMnew · submitted 2026-06-23 · 💻 cs.DC

Ambulance: saving BFT through racing

Pith reviewed 2026-06-25 21:49 UTC · model grok-4.3

classification 💻 cs.DC
keywords Byzantine fault tolerancestate machine replicationBFT consensustimeoutsleader changeslowdown recoveryreplica races
0
0 comments X

The pith

Ambulance achieves both high performance and robustness in BFT replication by replacing timeouts with races among replicas executing protocol steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Ambulance, a Byzantine fault tolerant state machine replication protocol designed to handle replica slowdowns without the drawbacks of timeouts. It claims that by structuring races where replicas compete to advance protocol steps, the system can recover quickly while maintaining common-case throughput and latency similar to optimized timeout-based protocols. A sympathetic reader would care because existing BFT deployments either trigger unnecessary leader changes with aggressive timeouts or suffer idle periods and inflated latency with conservative ones, and alternatives like hedging or fully cooperative protocols each sacrifice one side of the trade-off. Ambulance positions its races as a way to combine the strengths of both approaches.

Core claim

Ambulance is a BFT state machine replication protocol that sidesteps the performance-robustness trade-off through protocol-rigged races, where replicas race against each other by executing protocol steps rather than against the clock. This enables high throughput and low latency comparable to state-of-the-art timeout-based BFT while matching the robustness of cooperative asynchronous approaches.

What carries the argument

Protocol-rigged races, in which replicas advance by competing to execute successive protocol steps to recover from slowdowns.

If this is right

  • Ambulance matches the common-case throughput and latency of state-of-the-art timeout-based BFT protocols.
  • Ambulance recovers from slowdowns with the speed of cooperative asynchronous protocols.
  • The protocol avoids both spurious leader changes from aggressive timeouts and idle time from conservative timeouts.
  • Replicas advance through direct competition on protocol steps rather than waiting on timers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of other leader-based systems might adapt similar step-based competition to reduce timeout tuning needs.
  • The approach could extend to settings with heterogeneous replica speeds if the race rules preserve ordering invariants.
  • Measurements under mixed failure and slowdown scenarios would test whether the performance-robustness combination holds beyond the paper's evaluated cases.

Load-bearing premise

That races among replicas can be structured to recover from slowdowns without creating new failure modes, excessive communication overhead, or correctness problems.

What would settle it

A workload with induced slowdowns on one replica where Ambulance either shows latency higher than a tuned timeout-based baseline or requires more messages than a cooperative protocol to make progress.

Figures

Figures reproduced from arXiv: 2606.25099 by Benjamin Marsh, Grzegorz Prusak, Hein Meling, Kartik Nayak, Lorenzo Alvisi, Natacha Crooks, Neil Giridharan, Shubham Mishra.

Figure 2
Figure 2. Figure 2: Replica lane protocol pattern [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Leader beating the cutoff [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Production Latency CDF [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: 1s slowdown 0 2 4 6 8 10 12 14 Time (s) 0 250 500 750 1000 1250 1500 1750 Latency (ms) Ambulance (p50) Autobahn-1s (p50) Autobahn-5s (p50) SMVBA (p50) ParBFT2 (p50) [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: 5s slowdown 0 2 4 6 8 10 12 14 Time (s) 0 1000 2000 3000 4000 5000 6000 7000 Latency (ms) Ambulance (p50) Autobahn-5s (p50) Autobahn-10s (p50) SMVBA (p50) ParBFT2 (p50) [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Today's practical Byzantine Fault Tolerant (BFT) state machine replication deployments are vulnerable to slowdowns. The main culprit is timeouts. Aggressive timeouts spuriously trigger expensive leader changes, while conservative timeouts leave the system idle and let slowdowns severely inflate latency. Two main alternatives exist: hedging, which improves recovery from slow leaders but still incurs a time-based hedging delay, and cooperative asynchronous protocols, which recover quickly from slowdowns but suffer from high common-case latency and low throughput. This paper presents Ambulance: a BFT state machine replication protocol that sidesteps this trade-off through protocol-rigged races, where replicas, rather than race against the clock, race against each other by executing protocol steps. This enables Ambulance to achieve high throughput and low latency comparable to state-of-the-art timeout-based BFT, while matching the robustness of cooperative approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces Ambulance, a BFT state machine replication protocol that replaces clock-based timeouts with protocol-rigged races in which replicas compete by executing protocol steps. It claims this design delivers throughput and latency comparable to state-of-the-art timeout-based BFT while matching the slowdown robustness of cooperative asynchronous protocols, thereby avoiding the latency/throughput penalties of hedging and the common-case overhead of fully asynchronous designs.

Significance. If the protocol mechanics, safety/liveness arguments, and evaluation data substantiate the claims, the work would address a practically important trade-off in deployed BFT systems. The approach is presented as parameter-free with respect to timeout tuning and would constitute a concrete advance over both timeout-based and cooperative baselines.

major comments (3)
  1. No protocol description, pseudocode, message patterns, or quorum definitions are supplied anywhere in the manuscript. Without these, the central claim that protocol-rigged races recover from slowdowns without new failure modes, excessive overhead, or correctness violations cannot be assessed.
  2. No safety or liveness arguments, view-change mechanics, or handling of concurrent races appear in the text. These are load-bearing for any BFT claim and must be provided before the performance-robustness combination can be evaluated.
  3. The manuscript contains no experimental results, throughput/latency numbers, or comparison against the cited state-of-the-art timeout-based and cooperative protocols. The abstract's performance assertions therefore remain untestable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed review and for identifying the key elements required to evaluate the protocol. We agree that the submitted manuscript is incomplete in several respects and will revise it to include the missing components.

read point-by-point responses
  1. Referee: No protocol description, pseudocode, message patterns, or quorum definitions are supplied anywhere in the manuscript. Without these, the central claim that protocol-rigged races recover from slowdowns without new failure modes, excessive overhead, or correctness violations cannot be assessed.

    Authors: We agree that the current manuscript provides only a high-level overview. The revised version will contain a complete protocol description, including pseudocode, message patterns, and quorum definitions, so that the recovery mechanism and overhead claims can be directly assessed. revision: yes

  2. Referee: No safety or liveness arguments, view-change mechanics, or handling of concurrent races appear in the text. These are load-bearing for any BFT claim and must be provided before the performance-robustness combination can be evaluated.

    Authors: We acknowledge that safety and liveness arguments, view-change mechanics, and concurrent-race handling are absent from the submitted draft. The revision will add these arguments and mechanics to substantiate the BFT guarantees. revision: yes

  3. Referee: The manuscript contains no experimental results, throughput/latency numbers, or comparison against the cited state-of-the-art timeout-based and cooperative protocols. The abstract's performance assertions therefore remain untestable.

    Authors: The submitted manuscript is a conceptual short version and indeed contains no experimental data. The revised manuscript will include a full evaluation section with throughput and latency measurements together with direct comparisons to the referenced timeout-based and cooperative protocols. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe a new BFT protocol design (Ambulance) that introduces protocol-rigged races as an alternative to timeouts or hedging. No equations, fitted parameters, self-citations as load-bearing premises, or derivations are present in the text. The central claim is a protocol-level architectural choice whose correctness and performance would need to be established via standard safety/liveness proofs and evaluation, none of which reduce to the inputs by construction. This is the common case of a self-contained protocol paper with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5691 in / 1007 out tokens · 20192 ms · 2026-06-25T21:49:35.129356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

104 extracted references · 27 canonical work pages

  1. [1]

    [n. d.]. Amazon EC2 M6a Instances. https: //aws.amazon.com/ec2/instance-types/m6a/ (last accessed on 12/09/25)

  2. [2]

    [n. d.]. Amazon EC2 M6i Instances. https: //aws.amazon.com/ec2/instance-types/m6i/ (last accessed on 12/09/25)

  3. [3]

    [n. d.]. CockroachDB Replication Layer. https://www.cockroachlabs.com/docs/stable/ architecture/replication-layer (last accessed on 12/09/25)

  4. [4]

    d.].Confidential Consortium Framework, Microsoft

    [n. d.].Confidential Consortium Framework, Microsoft. https://ccf.microsoft.com/ (last accessed on 09/23/24)

  5. [5]

    [n. d.]. Consider rolling the WAL if the HDFS write pipeline is slow. https://issues.apache. org/jira/browse/HBASE-22301 (last accessed on 12/09/25)

  6. [6]

    [n. d.]. Dalek elliptic curve cryptography. https://github.com/dalek-cryptography/ ed25519-dalek(last accessed on 09/23/24)

  7. [7]

    [n. d.]. Delayed heartbeat from etcd leader. https: //github.com/etcd-io/etcd/issues/7312 (last accessed on 12/09/25)

  8. [8]

    [n. d.]. Digital Euro. https://www.ecb.europa. eu/euro/digital_euro/html/index.en.html (last accessed on 09/23/24)

  9. [9]

    [n. d.]. etcd Tuning. https://etcd.io/docs/v3.4/ tuning/(last accessed on 12/09/25)

  10. [10]

    [n. d.]. Microsoft CCF Configuration. https: //microsoft.github.io/CCF/main/operations/ configuration.html(last accessed on 12/09/25)

  11. [11]

    [n. d.]. minimum master nodes does not prevent split-brain if splits are intersecting. https://github. com/elastic/elasticsearch/issues/2488 (last accessed on 12/09/25)

  12. [12]

    [n. d.]. Private communications with engineers at the blockchain company, Espresso, running HotStuff in production. March 2025

  13. [13]

    [n. d.]. Private communications with researchers at Mysten Labs, a leading blockchain company), and formerly of Facebook Novi. March 2024

  14. [14]

    [n. d.]. RocksDB, version 0.16.0. https: //rocksdb.org/(last accessed on 09/23/24)

  15. [15]

    [n. d.]. Sui Blockchain. https://sui.io/ (last accessed on 09/23/24)

  16. [16]

    [n. d.]. TiKV Config. https://tikv.org/docs/6. 1/deploy/configure/tikv-configuration-file/ (last accessed on 12/09/25)

  17. [17]

    [n. d.]. Tokio, version 1.5.0. https://tokio.rs/ (last accessed on 09/23/24)

  18. [18]

    Ittai Abraham, Naama Ben-David, and Sravya Yan- damuri. 2022. Efficient and Adaptively Secure Asynchronous Binary Agreement via Binding Crusader Agreement. InProceedings of the 2022 ACM Sympo- sium on Principles of Distributed Computing. 381–391

  19. [19]

    Ittai Abraham, Dahlia Malkhi, and Alexander Spiegel- man. 2019. Asymptotically Optimal Validated Asynchronous Byzantine Agreement. InProceedings of the 2019 ACM Symposium on Principles of Distributed Computing(Toronto ON, Canada)(PODC ’19). Association for Computing Machinery, New York, NY , USA, 337–346. doi:10.1145/3293611.3331612

  20. [20]

    Ittai Abraham, Kartik Nayak, Ling Ren, and Zhuolun Xiang. 2021. Good-Case Latency of Byzantine Broadcast: A Complete Categorization. InProceedings of the 2021 ACM Symposium on Principles of Dis- tributed Computing(Virtual Event, Italy)(PODC’21). Association for Computing Machinery, New York, NY , USA, 331–341. doi:10.1145/3465084.3467899

  21. [21]

    Aguilera and Michael Walfish

    Marcos K. Aguilera and Michael Walfish. 2009. No time for asynchrony. InProceedings of the 12th Conference on Hot Topics in Operating Systems(Monte Verità, Switzerland)(HotOS’09). USENIX Association, USA, 3

  22. [22]

    Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, and Samer Al-Kiswany. 2020. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 351–368. https://www.usenix.org/ conference/osdi20/presentation/alfatafta

  23. [23]

    Ahmed Alquraan, Hatem Takruri, Mohammed Alfatafta, and Samer Al-Kiswany. 2018. An Analysis of Network-Partitioning Failures in Cloud Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 51–68. https://www.usenix.org/ conference/osdi18/presentation/alquraan

  24. [24]

    Antunes, Afonso N

    Diogo S. Antunes, Afonso N. Oliveira, André Breda, Matheus Guilherme Franco, Henrique Moniz, and Rodrigo Rodrigues. 2024. Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 313–328. https://www.usenix. org/conference/nsdi24/prese...

  25. [25]

    Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, and Alexander Spiegelman. 2025. Shoal++: High Throughput DAG BFT Can Be Fast and Ro- bust!. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). USENIX Association, Philadelphia, PA, 813–826. https://www.usenix.org/conference/nsdi25/ presentation/arun

  26. [26]

    Kushal Babel, Andrey Chursin, George Danezis, Lefteris Kokoris-Kogias, and Alberto Sonnino. 2023. Mysticeti: Low-Latency DAG Consensus with Fast Commit Path.arXiv preprint arXiv:2310.14821(2023)

  27. [27]

    Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Zekun Li, Dahlia Malkhi, Oded Naor, Dmitri Perelman, and Alberto Sonnino

  28. [28]

    The Libra Association Technical Report(2019)

    State machine replication in the Libra Blockchain. The Libra Association Technical Report(2019)

  29. [29]

    Erica Blum, Jonathan Katz, Julian Loss, Kartik Nayak, and Simon Ochsenreither. 2023. Abraxas: Throughput-Efficient Hybrid Asynchronous Consensus. InProceedings of the 2023 ACM SIGSAC Confer- ence on Computer and Communications Security (Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 519–533. doi:10.1145/3576915.3623191

  30. [30]

    Gabriel Bracha and Sam Toueg. 1985. Asynchronous consensus and broadcast protocols.Journal of the ACM (JACM)32, 4 (1985), 824–840

  31. [31]

    Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. InProceedings of the Third Symposium on Operating Systems Design and Implementation(New Orleans, Louisiana, USA)(OSDI ’99). USENIX Association, USA, 173–186

  32. [32]

    Chan and Rafael Pass

    Benjamin Y . Chan and Rafael Pass. 2023. Simplex Consensus: A Simple and Fast Consensus Proto- col. InTheory of Cryptography: 21st International Conference, TCC 2023, Taipei, Taiwan, November 29–December 2, 2023, Proceedings, Part IV(Taipei, Taiwan). Springer-Verlag, Berlin, Heidelberg, 452–479. doi:10.1007/978-3-031-48624-1_17

  33. [33]

    Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. 1996. The weakest failure detector for solving consensus.J. ACM43, 4 (July 1996), 685–722. doi:10.1145/234533.234549

  34. [34]

    Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems.J. ACM43, 2 (March 1996), 225–267. doi:10.1145/226643.226647

  35. [35]

    Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti. 2009. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. In Proceedings of the 6th USENIX Symposium on Net- worked Systems Design and Implementation(Boston, Massachusetts)(NSDI’09). USENIX Association, USA, 153–168

  36. [36]

    Graeme Connell, Vivian Fang, Rolfe Schmidt, Emma Dauterman, and Raluca Ada Popa. 2024. Secret Key Recovery in a Global-Scale End-to-End Encryption System. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 703–719. https://www.usenix.org/conference/osdi24/ presentation/connell

  37. [37]

    Xiaohai Dai, Chaozheng Ding, Hai Jin, Julian Loss, and Ling Ren. 2024. Ipotane: Balanc- ing the Good and Bad Cases of Asynchronous BFT. Cryptology ePrint Archive, Paper 2024/653. doi:10.14722/ndss.2026.230003

  38. [38]

    Xiaohai Dai, Bolin Zhang, Hai Jin, and Ling Ren. 2023. ParBFT: Faster Asynchronous BFT Consensus with a Parallel Optimistic Path. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 504–518. doi:10.1145/3576915.3623101

  39. [39]

    George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, and Alexander Spiegelman. 2022. Narwhal and Tusk: a DAG-based mempool and efficient BFT consensus. InProceedings of the Seventeenth European Conference on Computer Systems. 34–50

  40. [40]

    Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, and Pierre Sutra

  41. [41]

    InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20)

    State-machine replication for planet-scale systems. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20). Association for Computing Machin- ery, New York, NY , USA, Article 24, 15 pages. doi:10.1145/3342195.3387543

  42. [42]

    Novi Facebook Research. [n.d.]. Hot- stuff Implementation. https:// github.com/asonnino/hotstuff/commit/ d771d4868db301bcb5e3deaa915b5017220463f6 (last accessed on 09/10/24)

  43. [43]

    Novi Facebook Research. [n.d.]. Narwahl and Bullshark implementation. https://github.com/asonnino/ narwhal(last accessed on 09/23/24)

  44. [44]

    Yingzi Gao, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Dumbo-ng: Fast asynchronous bft consensus with throughput-oblivious latency. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1187–1201

  45. [45]

    Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xi- ang. 2022. Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback. InFinancial Cryptography and Data Security: 26th International Conference, FC 2022, Grenada, May 2–6, 2022, Revised Selected Papers(Grenada, Grenada). Springer-Verla...

  46. [46]

    Neil Giridharan, Heidi Howard, Ittai Abraham, Natacha Crooks, and Alin Tomescu. 2021. No-commit proofs: Defeating livelock in bft.Cryptology ePrint Archive (2021)

  47. [47]

    Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, and Natacha Crooks. 2024. Autobahn: Seamless high speed BFT. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). Association for Computing Machinery, New York, NY , USA, 1–23. doi:10.1145/3694715.3695942

  48. [48]

    Neil Giridharan, Florian Suri-Payer, Matthew Ding, Heidi Howard, Ittai Abraham, and Natacha Crooks

  49. [49]

    In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing

    BeeGees: stayin’alive in chained BFT. In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing. 233–243

  50. [50]

    Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael Reiter, Dragos- Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019. SBFT: A Scalable and Decentralized Trust Infrastruc- ture. In2019 49th Annual IEEE/IFIP International Con- ference on Dependable Systems and Networks (DSN). IEEE, USA, 568–580. doi: 10.1109/DSN.2019.00063

  51. [51]

    Gunawi, Riza O

    Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Casey Golliher, Swaminathan Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, Nematollah Bidokhti, Caitie McCaffrey, Gary Grider, Parks M. Fields, Kevin Harms, Robert B. Ross, Andree Jacobson, Robert Ricci, Kirk Webb, Peter Alvaro, H. Birali Runesha, Mingzhe Hao, and Huaicheng Li. 2018. Fail-Slow at Scal...

  52. [52]

    Bingyong Guo, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Speeding Dumbo: Pushing Asynchronous BFT Closer to Practice. Cryptology ePrint Archive, Paper 2022/027. https://eprint.iacr.org/2022/027

  53. [53]

    Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi

  54. [54]

    In2021 IEEE 37th International Conference on Data Engineering (ICDE)

    RCC: resilient concurrent consensus for high- throughput secure transaction processing. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1392–1403

  55. [55]

    Andreas Haeberlen, Petr Kouznetsov, and Peter Dr- uschel. 2007. PeerReview: practical accountability for distributed systems.SIGOPS Oper. Syst. Rev.41, 6 (Oct. 2007), 175–188. doi:10.1145/1323293.1294279

  56. [56]

    Lorch, Lidong Zhou, and Yingnong Dang

    Peng Huang, Chuanxiong Guo, Jacob R. Lorch, Lidong Zhou, and Yingnong Dang. 2018. Capturing and En- hancing In Situ System Observability for Failure Detec- tion. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Asso- ciation, Carlsbad, CA, 1–16. https://www.usenix. org/conference/osdi18/presentation/huang

  57. [57]

    In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp

    Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. 2017. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. InProceedings of the 16th Workshop on Hot Topics in Operating Systems (Whistler, BC, Canada)(HotOS ’17). Association for Computing Machinery, New York, NY , USA, 150–155. doi:10.1145/...

  58. [58]

    Philipp Jovanovic, Lefteris Kokoris Kogias, Bryan Kumara, Alberto Sonnino, Pasindu Tennage, and Igor Zablotchi. 2024. Mahi- Mahi: Low-Latency Asynchronous BFT DAG- Based Consensus. arXiv:2410.08670 [cs.DC] https://arxiv.org/abs/2410.08670

  59. [59]

    Idit Keidar and Alexander Shraer. 2006. Timeli- ness, failure-detectors, and consensus performance. InProceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing (Denver, Colorado, USA)(PODC ’06). Association for Computing Machinery, New York, NY , USA, 169–178. doi:10.1145/1146381.1146408

  60. [60]

    Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2010. Zyzzyva: Specu- lative Byzantine Fault Tolerance.ACM Transactions on Computer Systems (TOCS)27, 4, Article 7 (Jan. 2010), 39 pages. doi:10.1145/1658357.1658358

  61. [61]

    S Krishnapriya and Greeshma Sarath. 2020. Securing Land Registration using Blockchain.Procedia computer science171 (2020), 1708–1715

  62. [62]

    Leners, Hao Wu, Wei-Lun Hung, Marcos K

    Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. 2011. Detecting failures in distributed systems with the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles(Cascais, Portugal)(SOSP ’11). Association for Computing Machinery, New York, NY , USA, 279–294. doi:10.1145/2043556.2043583

  63. [63]

    Tom Lianza and Chris Snook. [n. d.]. A Byzantine failure in the real world. https://blog.cloudflare. com/a-byzantine-failure-in-the-real-world/

  64. [64]

    Shengyun Liu, Wenbo Xu, Chen Shan, Xiaofeng Yan, Tianjing Xu, Bo Wang, Lei Fan, Fuxi Deng, Ying Yan, and Hui Zhang. 2023. Flexible Advancement in Asyn- chronous BFT Consensus. InProceedings of the 29th Symposium on Operating Systems Principles. 264–280

  65. [65]

    Chang Lou, Peng Huang, and Scott Smith. 2019. Comprehensive and Efficient Runtime Checking in System Software through Watchdogs. InProceedings of the Workshop on Hot Topics in Operating Sys- tems(Bertinoro, Italy)(HotOS ’19). Association for Computing Machinery, New York, NY , USA, 51–57. doi:10.1145/3317550.3321440

  66. [66]

    Ruiming Lu, Yunchi Lu, Yuxuan Jiang, Guangtao Xue, and Peng Huang. 2025. One-size-fits-none: understanding and enhancing slow-fault tolerance in modern distributed systems. InProceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation(Philadelphia, PA, USA)(NSDI ’25). USENIX Association, USA, Article 20, 20 pages

  67. [67]

    Ruiming Lu, Erci Xu, Yiming Zhang, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, and Jiesheng Wu. 2022. NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow. In2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 1005–1020. https://www.usenix.org/conference/atc22/ presentation/lu

  68. [68]

    Yuan Lu, Zhenliang Lu, and Qiang Tang. 2022. Bolt-Dumbo Transformer: Asynchronous Consensus As Fast As the Pipelined BFT. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Commu- nications Security(Los Angeles, CA, USA)(CCS ’22). Association for Computing Machinery, New York, NY , USA, 2159–2173. doi:10.1145/3548606.3559346

  69. [69]

    Benjamin Marsh, Steven Landers, and Jayendra Jog. 2025. Sei Giga. arXiv:2505.14914 [cs.DC] https://arxiv.org/abs/2505.14914

  70. [70]

    Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. 2016. The Honey Badger of BFT Protocols. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Se- curity(Vienna, Austria)(CCS ’16). Association for Computing Machinery, New York, NY , USA, 31–42. doi:10.1145/2976749.2978399

  71. [71]

    Andersen, and Michael Kamin- sky

    Iulian Moraru, David G. Andersen, and Michael Kamin- sky. 2013. There is more consensus in Egalitarian parliaments. InProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania)(SOSP ’13). Association for Computing Machinery, New York, NY , USA, 358–372. doi:10.1145/2517349.2517350

  72. [72]

    Ray Neiheiser, Miguel Matos, and Luís Rodrigues

  73. [73]

    https://doi.org/10.1145/3477132.3483580

    Kauri: Scalable BFT Consensus with Pipelined Tree-Based Dissemination and Aggregation. InProceed- ings of the ACM SIGOPS 28th Symposium on Operating Systems Principles(Virtual Event, Germany)(SOSP ’21). Association for Computing Machinery, New York, NY , USA, 35–48. doi:10.1145/3477132.3483584

  74. [74]

    Neo4j. [n. d.]. Miti- gating Causal Cluster Re- elections. https:// neo4j.com/developer/ kb/mitigating-causal-cluster-re-elections-caused-by-high-gcs/ Last accessed: 2025-12-09

  75. [75]

    Khiem Ngo, Siddhartha Sen, and Wyatt Lloyd

  76. [76]

    In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20)

    Tolerating Slowdowns in Replicated State Machines using Copilots. In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20). USENIX Association, 583–598. https://www.usenix.org/conference/osdi20/ presentation/ngo

  77. [77]

    Daniel Porto, João Leitão, Cheng Li, Allen Clement, Aniket Kate, Flavio Junqueira, and Rodrigo Rodrigues

  78. [78]

    InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15)

    Visigoth fault tolerance. InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15). Association for Computing Machinery, New York, NY , USA, Article 8, 14 pages. doi:10.1145/2741948.2741979

  79. [79]

    Ramasamy and Christian Cachin

    HariGovind V . Ramasamy and Christian Cachin. 2006. Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast. InPrinciples of Distributed Systems, James H. Anderson, Giuseppe Prencipe, and Roger Wattenhofer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 88–102

  80. [80]

    Matthieu Rambaud. 2024. Faster Asyn- chronous Blockchain Consensus and MVBA. Cryptology ePrint Archive, Paper 2024/1108. https://eprint.iacr.org/2024/1108

Showing first 80 references.