Ambulance: saving BFT through racing

Benjamin Marsh; Grzegorz Prusak; Hein Meling; Kartik Nayak; Lorenzo Alvisi; Natacha Crooks; Neil Giridharan; Shubham Mishra

arxiv: 2606.25099 · v1 · pith:RKVEF4TMnew · submitted 2026-06-23 · 💻 cs.DC

Ambulance: saving BFT through racing

Neil Giridharan , Shubham Mishra , Lorenzo Alvisi , Natacha Crooks , Benjamin Marsh , Hein Meling , Kartik Nayak , Grzegorz Prusak This is my paper

Pith reviewed 2026-06-25 21:49 UTC · model grok-4.3

classification 💻 cs.DC

keywords Byzantine fault tolerancestate machine replicationBFT consensustimeoutsleader changeslowdown recoveryreplica races

0 comments

The pith

Ambulance achieves both high performance and robustness in BFT replication by replacing timeouts with races among replicas executing protocol steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Ambulance, a Byzantine fault tolerant state machine replication protocol designed to handle replica slowdowns without the drawbacks of timeouts. It claims that by structuring races where replicas compete to advance protocol steps, the system can recover quickly while maintaining common-case throughput and latency similar to optimized timeout-based protocols. A sympathetic reader would care because existing BFT deployments either trigger unnecessary leader changes with aggressive timeouts or suffer idle periods and inflated latency with conservative ones, and alternatives like hedging or fully cooperative protocols each sacrifice one side of the trade-off. Ambulance positions its races as a way to combine the strengths of both approaches.

Core claim

Ambulance is a BFT state machine replication protocol that sidesteps the performance-robustness trade-off through protocol-rigged races, where replicas race against each other by executing protocol steps rather than against the clock. This enables high throughput and low latency comparable to state-of-the-art timeout-based BFT while matching the robustness of cooperative asynchronous approaches.

What carries the argument

Protocol-rigged races, in which replicas advance by competing to execute successive protocol steps to recover from slowdowns.

If this is right

Ambulance matches the common-case throughput and latency of state-of-the-art timeout-based BFT protocols.
Ambulance recovers from slowdowns with the speed of cooperative asynchronous protocols.
The protocol avoids both spurious leader changes from aggressive timeouts and idle time from conservative timeouts.
Replicas advance through direct competition on protocol steps rather than waiting on timers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of other leader-based systems might adapt similar step-based competition to reduce timeout tuning needs.
The approach could extend to settings with heterogeneous replica speeds if the race rules preserve ordering invariants.
Measurements under mixed failure and slowdown scenarios would test whether the performance-robustness combination holds beyond the paper's evaluated cases.

Load-bearing premise

That races among replicas can be structured to recover from slowdowns without creating new failure modes, excessive communication overhead, or correctness problems.

What would settle it

A workload with induced slowdowns on one replica where Ambulance either shows latency higher than a tuned timeout-based baseline or requires more messages than a cooperative protocol to make progress.

Figures

Figures reproduced from arXiv: 2606.25099 by Benjamin Marsh, Grzegorz Prusak, Hein Meling, Kartik Nayak, Lorenzo Alvisi, Natacha Crooks, Neil Giridharan, Shubham Mishra.

**Figure 3.** Figure 3: Leader beating the cutoff [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 6.** Figure 6: Production Latency CDF [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: 1s slowdown 0 2 4 6 8 10 12 14 Time (s) 0 250 500 750 1000 1250 1500 1750 Latency (ms) Ambulance (p50) Autobahn-1s (p50) Autobahn-5s (p50) SMVBA (p50) ParBFT2 (p50) [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 9.** Figure 9: 5s slowdown 0 2 4 6 8 10 12 14 Time (s) 0 1000 2000 3000 4000 5000 6000 7000 Latency (ms) Ambulance (p50) Autobahn-5s (p50) Autobahn-10s (p50) SMVBA (p50) ParBFT2 (p50) [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Today's practical Byzantine Fault Tolerant (BFT) state machine replication deployments are vulnerable to slowdowns. The main culprit is timeouts. Aggressive timeouts spuriously trigger expensive leader changes, while conservative timeouts leave the system idle and let slowdowns severely inflate latency. Two main alternatives exist: hedging, which improves recovery from slow leaders but still incurs a time-based hedging delay, and cooperative asynchronous protocols, which recover quickly from slowdowns but suffer from high common-case latency and low throughput. This paper presents Ambulance: a BFT state machine replication protocol that sidesteps this trade-off through protocol-rigged races, where replicas, rather than race against the clock, race against each other by executing protocol steps. This enables Ambulance to achieve high throughput and low latency comparable to state-of-the-art timeout-based BFT, while matching the robustness of cooperative approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ambulance claims protocol-rigged races let BFT replicas recover from slowdowns without timeouts or hedging, but the abstract gives no design or evidence to check if it works.

read the letter

The one thing to know is that Ambulance introduces a BFT protocol where replicas race each other by advancing through protocol steps to recover from slowdowns, aiming to combine the performance of timeout-based systems with the robustness of cooperative ones.

The paper does a good job spelling out why current solutions fall short. Timeouts are tricky because setting them too low causes unnecessary leader changes, too high and slowdowns drag everything down. Hedging helps with slow leaders but still waits on a timer. Cooperative protocols avoid timeouts but end up with worse common-case numbers. Framing the alternative as racing via protocol steps rather than time is a distinct angle.

What is new here is this specific mechanism of rigging the protocol itself to create races between replicas. It positions the work as sidestepping the performance-robustness trade-off.

If the full design holds up, it could be relevant for systems that need both high throughput and quick recovery.

The main soft spot is that the provided text is just the abstract with no protocol description, no correctness arguments, and no experimental results. This makes it impossible to evaluate whether the races can be set up without extra overhead, new failure modes, or breaking safety. The assumption that protocol steps can be structured to enable clean racing needs concrete support from the paper.

This work is aimed at distributed systems researchers focused on BFT consensus and practical deployments. A reader looking for ideas on improving real-world BFT performance might get some value from the high-level concept, but the lack of details limits how much can be taken away right now.

It deserves serious peer review because the problem it targets is a genuine practical concern in BFT, and the proposed direction is different enough from existing work to warrant referees taking a look at the details once provided.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces Ambulance, a BFT state machine replication protocol that replaces clock-based timeouts with protocol-rigged races in which replicas compete by executing protocol steps. It claims this design delivers throughput and latency comparable to state-of-the-art timeout-based BFT while matching the slowdown robustness of cooperative asynchronous protocols, thereby avoiding the latency/throughput penalties of hedging and the common-case overhead of fully asynchronous designs.

Significance. If the protocol mechanics, safety/liveness arguments, and evaluation data substantiate the claims, the work would address a practically important trade-off in deployed BFT systems. The approach is presented as parameter-free with respect to timeout tuning and would constitute a concrete advance over both timeout-based and cooperative baselines.

major comments (3)

No protocol description, pseudocode, message patterns, or quorum definitions are supplied anywhere in the manuscript. Without these, the central claim that protocol-rigged races recover from slowdowns without new failure modes, excessive overhead, or correctness violations cannot be assessed.
No safety or liveness arguments, view-change mechanics, or handling of concurrent races appear in the text. These are load-bearing for any BFT claim and must be provided before the performance-robustness combination can be evaluated.
The manuscript contains no experimental results, throughput/latency numbers, or comparison against the cited state-of-the-art timeout-based and cooperative protocols. The abstract's performance assertions therefore remain untestable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed review and for identifying the key elements required to evaluate the protocol. We agree that the submitted manuscript is incomplete in several respects and will revise it to include the missing components.

read point-by-point responses

Referee: No protocol description, pseudocode, message patterns, or quorum definitions are supplied anywhere in the manuscript. Without these, the central claim that protocol-rigged races recover from slowdowns without new failure modes, excessive overhead, or correctness violations cannot be assessed.

Authors: We agree that the current manuscript provides only a high-level overview. The revised version will contain a complete protocol description, including pseudocode, message patterns, and quorum definitions, so that the recovery mechanism and overhead claims can be directly assessed. revision: yes
Referee: No safety or liveness arguments, view-change mechanics, or handling of concurrent races appear in the text. These are load-bearing for any BFT claim and must be provided before the performance-robustness combination can be evaluated.

Authors: We acknowledge that safety and liveness arguments, view-change mechanics, and concurrent-race handling are absent from the submitted draft. The revision will add these arguments and mechanics to substantiate the BFT guarantees. revision: yes
Referee: The manuscript contains no experimental results, throughput/latency numbers, or comparison against the cited state-of-the-art timeout-based and cooperative protocols. The abstract's performance assertions therefore remain untestable.

Authors: The submitted manuscript is a conceptual short version and indeed contains no experimental data. The revised manuscript will include a full evaluation section with throughput and latency measurements together with direct comparisons to the referenced timeout-based and cooperative protocols. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe a new BFT protocol design (Ambulance) that introduces protocol-rigged races as an alternative to timeouts or hedging. No equations, fitted parameters, self-citations as load-bearing premises, or derivations are present in the text. The central claim is a protocol-level architectural choice whose correctness and performance would need to be established via standard safety/liveness proofs and evaluation, none of which reduce to the inputs by construction. This is the common case of a self-contained protocol paper with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5691 in / 1007 out tokens · 20192 ms · 2026-06-25T21:49:35.129356+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

104 extracted references · 27 canonical work pages

[1]

[n. d.]. Amazon EC2 M6a Instances. https: //aws.amazon.com/ec2/instance-types/m6a/ (last accessed on 12/09/25)
[2]

[n. d.]. Amazon EC2 M6i Instances. https: //aws.amazon.com/ec2/instance-types/m6i/ (last accessed on 12/09/25)
[3]

[n. d.]. CockroachDB Replication Layer. https://www.cockroachlabs.com/docs/stable/ architecture/replication-layer (last accessed on 12/09/25)
[4]

d.].Confidential Consortium Framework, Microsoft

[n. d.].Confidential Consortium Framework, Microsoft. https://ccf.microsoft.com/ (last accessed on 09/23/24)
[5]

[n. d.]. Consider rolling the WAL if the HDFS write pipeline is slow. https://issues.apache. org/jira/browse/HBASE-22301 (last accessed on 12/09/25)
[6]

[n. d.]. Dalek elliptic curve cryptography. https://github.com/dalek-cryptography/ ed25519-dalek(last accessed on 09/23/24)
[7]

[n. d.]. Delayed heartbeat from etcd leader. https: //github.com/etcd-io/etcd/issues/7312 (last accessed on 12/09/25)
[8]

[n. d.]. Digital Euro. https://www.ecb.europa. eu/euro/digital_euro/html/index.en.html (last accessed on 09/23/24)
[9]

[n. d.]. etcd Tuning. https://etcd.io/docs/v3.4/ tuning/(last accessed on 12/09/25)
[10]

[n. d.]. Microsoft CCF Configuration. https: //microsoft.github.io/CCF/main/operations/ configuration.html(last accessed on 12/09/25)
[11]

[n. d.]. minimum master nodes does not prevent split-brain if splits are intersecting. https://github. com/elastic/elasticsearch/issues/2488 (last accessed on 12/09/25)
[12]

[n. d.]. Private communications with engineers at the blockchain company, Espresso, running HotStuff in production. March 2025

2025
[13]

[n. d.]. Private communications with researchers at Mysten Labs, a leading blockchain company), and formerly of Facebook Novi. March 2024

2024
[14]

[n. d.]. RocksDB, version 0.16.0. https: //rocksdb.org/(last accessed on 09/23/24)
[15]

[n. d.]. Sui Blockchain. https://sui.io/ (last accessed on 09/23/24)
[16]

[n. d.]. TiKV Config. https://tikv.org/docs/6. 1/deploy/configure/tikv-configuration-file/ (last accessed on 12/09/25)
[17]

[n. d.]. Tokio, version 1.5.0. https://tokio.rs/ (last accessed on 09/23/24)
[18]

Ittai Abraham, Naama Ben-David, and Sravya Yan- damuri. 2022. Efficient and Adaptively Secure Asynchronous Binary Agreement via Binding Crusader Agreement. InProceedings of the 2022 ACM Sympo- sium on Principles of Distributed Computing. 381–391

2022
[19]

Ittai Abraham, Dahlia Malkhi, and Alexander Spiegel- man. 2019. Asymptotically Optimal Validated Asynchronous Byzantine Agreement. InProceedings of the 2019 ACM Symposium on Principles of Distributed Computing(Toronto ON, Canada)(PODC ’19). Association for Computing Machinery, New York, NY , USA, 337–346. doi:10.1145/3293611.3331612

work page doi:10.1145/3293611.3331612 2019
[20]

Ittai Abraham, Kartik Nayak, Ling Ren, and Zhuolun Xiang. 2021. Good-Case Latency of Byzantine Broadcast: A Complete Categorization. InProceedings of the 2021 ACM Symposium on Principles of Dis- tributed Computing(Virtual Event, Italy)(PODC’21). Association for Computing Machinery, New York, NY , USA, 331–341. doi:10.1145/3465084.3467899

work page doi:10.1145/3465084.3467899 2021
[21]

Aguilera and Michael Walfish

Marcos K. Aguilera and Michael Walfish. 2009. No time for asynchrony. InProceedings of the 12th Conference on Hot Topics in Operating Systems(Monte Verità, Switzerland)(HotOS’09). USENIX Association, USA, 3

2009
[22]

Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, and Samer Al-Kiswany. 2020. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 351–368. https://www.usenix.org/ conference/osdi20/presentation/alfatafta

2020
[23]

Ahmed Alquraan, Hatem Takruri, Mohammed Alfatafta, and Samer Al-Kiswany. 2018. An Analysis of Network-Partitioning Failures in Cloud Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 51–68. https://www.usenix.org/ conference/osdi18/presentation/alquraan

2018
[24]

Antunes, Afonso N

Diogo S. Antunes, Afonso N. Oliveira, André Breda, Matheus Guilherme Franco, Henrique Moniz, and Rodrigo Rodrigues. 2024. Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 313–328. https://www.usenix. org/conference/nsdi24/prese...

2024
[25]

Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, and Alexander Spiegelman. 2025. Shoal++: High Throughput DAG BFT Can Be Fast and Ro- bust!. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). USENIX Association, Philadelphia, PA, 813–826. https://www.usenix.org/conference/nsdi25/ presentation/arun

2025
[26]

Kushal Babel, Andrey Chursin, George Danezis, Lefteris Kokoris-Kogias, and Alberto Sonnino. 2023. Mysticeti: Low-Latency DAG Consensus with Fast Commit Path.arXiv preprint arXiv:2310.14821(2023)

arXiv 2023
[27]

Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Zekun Li, Dahlia Malkhi, Oded Naor, Dmitri Perelman, and Alberto Sonnino
[28]

The Libra Association Technical Report(2019)

State machine replication in the Libra Blockchain. The Libra Association Technical Report(2019)

2019
[29]

Erica Blum, Jonathan Katz, Julian Loss, Kartik Nayak, and Simon Ochsenreither. 2023. Abraxas: Throughput-Efficient Hybrid Asynchronous Consensus. InProceedings of the 2023 ACM SIGSAC Confer- ence on Computer and Communications Security (Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 519–533. doi:10.1145/3576915.3623191

work page doi:10.1145/3576915.3623191 2023
[30]

Gabriel Bracha and Sam Toueg. 1985. Asynchronous consensus and broadcast protocols.Journal of the ACM (JACM)32, 4 (1985), 824–840

1985
[31]

Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. InProceedings of the Third Symposium on Operating Systems Design and Implementation(New Orleans, Louisiana, USA)(OSDI ’99). USENIX Association, USA, 173–186

1999
[32]

Chan and Rafael Pass

Benjamin Y . Chan and Rafael Pass. 2023. Simplex Consensus: A Simple and Fast Consensus Proto- col. InTheory of Cryptography: 21st International Conference, TCC 2023, Taipei, Taiwan, November 29–December 2, 2023, Proceedings, Part IV(Taipei, Taiwan). Springer-Verlag, Berlin, Heidelberg, 452–479. doi:10.1007/978-3-031-48624-1_17

work page doi:10.1007/978-3-031-48624-1_17 2023
[33]

Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. 1996. The weakest failure detector for solving consensus.J. ACM43, 4 (July 1996), 685–722. doi:10.1145/234533.234549

work page doi:10.1145/234533.234549 1996
[34]

Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems.J. ACM43, 2 (March 1996), 225–267. doi:10.1145/226643.226647

work page doi:10.1145/226643.226647 1996
[35]

Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti. 2009. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. In Proceedings of the 6th USENIX Symposium on Net- worked Systems Design and Implementation(Boston, Massachusetts)(NSDI’09). USENIX Association, USA, 153–168

2009
[36]

Graeme Connell, Vivian Fang, Rolfe Schmidt, Emma Dauterman, and Raluca Ada Popa. 2024. Secret Key Recovery in a Global-Scale End-to-End Encryption System. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 703–719. https://www.usenix.org/conference/osdi24/ presentation/connell

2024
[37]

Xiaohai Dai, Chaozheng Ding, Hai Jin, Julian Loss, and Ling Ren. 2024. Ipotane: Balanc- ing the Good and Bad Cases of Asynchronous BFT. Cryptology ePrint Archive, Paper 2024/653. doi:10.14722/ndss.2026.230003

work page doi:10.14722/ndss.2026.230003 2024
[38]

Xiaohai Dai, Bolin Zhang, Hai Jin, and Ling Ren. 2023. ParBFT: Faster Asynchronous BFT Consensus with a Parallel Optimistic Path. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 504–518. doi:10.1145/3576915.3623101

work page doi:10.1145/3576915.3623101 2023
[39]

George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, and Alexander Spiegelman. 2022. Narwhal and Tusk: a DAG-based mempool and efficient BFT consensus. InProceedings of the Seventeenth European Conference on Computer Systems. 34–50

2022
[40]

Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, and Pierre Sutra
[41]

InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20)

State-machine replication for planet-scale systems. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20). Association for Computing Machin- ery, New York, NY , USA, Article 24, 15 pages. doi:10.1145/3342195.3387543

work page doi:10.1145/3342195.3387543
[42]

Novi Facebook Research. [n.d.]. Hot- stuff Implementation. https:// github.com/asonnino/hotstuff/commit/ d771d4868db301bcb5e3deaa915b5017220463f6 (last accessed on 09/10/24)
[43]

Novi Facebook Research. [n.d.]. Narwahl and Bullshark implementation. https://github.com/asonnino/ narwhal(last accessed on 09/23/24)
[44]

Yingzi Gao, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Dumbo-ng: Fast asynchronous bft consensus with throughput-oblivious latency. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1187–1201

2022
[45]

Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xi- ang. 2022. Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback. InFinancial Cryptography and Data Security: 26th International Conference, FC 2022, Grenada, May 2–6, 2022, Revised Selected Papers(Grenada, Grenada). Springer-Verla...

work page doi:10.1007/978-3-031-18283-9_14 2022
[46]

Neil Giridharan, Heidi Howard, Ittai Abraham, Natacha Crooks, and Alin Tomescu. 2021. No-commit proofs: Defeating livelock in bft.Cryptology ePrint Archive (2021)

2021
[47]

Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, and Natacha Crooks. 2024. Autobahn: Seamless high speed BFT. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). Association for Computing Machinery, New York, NY , USA, 1–23. doi:10.1145/3694715.3695942

work page doi:10.1145/3694715.3695942 2024
[48]

Neil Giridharan, Florian Suri-Payer, Matthew Ding, Heidi Howard, Ittai Abraham, and Natacha Crooks
[49]

In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing

BeeGees: stayin’alive in chained BFT. In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing. 233–243

2023
[50]

Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael Reiter, Dragos- Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019. SBFT: A Scalable and Decentralized Trust Infrastruc- ture. In2019 49th Annual IEEE/IFIP International Con- ference on Dependable Systems and Networks (DSN). IEEE, USA, 568–580. doi: 10.1109/DSN.2019.00063

work page doi:10.1109/dsn.2019.00063 2019
[51]

Gunawi, Riza O

Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Casey Golliher, Swaminathan Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, Nematollah Bidokhti, Caitie McCaffrey, Gary Grider, Parks M. Fields, Kevin Harms, Robert B. Ross, Andree Jacobson, Robert Ricci, Kirk Webb, Peter Alvaro, H. Birali Runesha, Mingzhe Hao, and Huaicheng Li. 2018. Fail-Slow at Scal...

2018
[52]

Bingyong Guo, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Speeding Dumbo: Pushing Asynchronous BFT Closer to Practice. Cryptology ePrint Archive, Paper 2022/027. https://eprint.iacr.org/2022/027

2022
[53]

Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi
[54]

In2021 IEEE 37th International Conference on Data Engineering (ICDE)

RCC: resilient concurrent consensus for high- throughput secure transaction processing. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1392–1403
[55]

Andreas Haeberlen, Petr Kouznetsov, and Peter Dr- uschel. 2007. PeerReview: practical accountability for distributed systems.SIGOPS Oper. Syst. Rev.41, 6 (Oct. 2007), 175–188. doi:10.1145/1323293.1294279

work page doi:10.1145/1323293.1294279 2007
[56]

Lorch, Lidong Zhou, and Yingnong Dang

Peng Huang, Chuanxiong Guo, Jacob R. Lorch, Lidong Zhou, and Yingnong Dang. 2018. Capturing and En- hancing In Situ System Observability for Failure Detec- tion. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Asso- ciation, Carlsbad, CA, 1–16. https://www.usenix. org/conference/osdi18/presentation/huang

2018
[57]

In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp

Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. 2017. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. InProceedings of the 16th Workshop on Hot Topics in Operating Systems (Whistler, BC, Canada)(HotOS ’17). Association for Computing Machinery, New York, NY , USA, 150–155. doi:10.1145/...

work page doi:10.1145/3102980.3103005 2017
[58]

Philipp Jovanovic, Lefteris Kokoris Kogias, Bryan Kumara, Alberto Sonnino, Pasindu Tennage, and Igor Zablotchi. 2024. Mahi- Mahi: Low-Latency Asynchronous BFT DAG- Based Consensus. arXiv:2410.08670 [cs.DC] https://arxiv.org/abs/2410.08670

arXiv 2024
[59]

Idit Keidar and Alexander Shraer. 2006. Timeli- ness, failure-detectors, and consensus performance. InProceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing (Denver, Colorado, USA)(PODC ’06). Association for Computing Machinery, New York, NY , USA, 169–178. doi:10.1145/1146381.1146408

work page doi:10.1145/1146381.1146408 2006
[60]

Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2010. Zyzzyva: Specu- lative Byzantine Fault Tolerance.ACM Transactions on Computer Systems (TOCS)27, 4, Article 7 (Jan. 2010), 39 pages. doi:10.1145/1658357.1658358

work page doi:10.1145/1658357.1658358 2010
[61]

S Krishnapriya and Greeshma Sarath. 2020. Securing Land Registration using Blockchain.Procedia computer science171 (2020), 1708–1715

2020
[62]

Leners, Hao Wu, Wei-Lun Hung, Marcos K

Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. 2011. Detecting failures in distributed systems with the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles(Cascais, Portugal)(SOSP ’11). Association for Computing Machinery, New York, NY , USA, 279–294. doi:10.1145/2043556.2043583

work page doi:10.1145/2043556.2043583 2011
[63]

Tom Lianza and Chris Snook. [n. d.]. A Byzantine failure in the real world. https://blog.cloudflare. com/a-byzantine-failure-in-the-real-world/
[64]

Shengyun Liu, Wenbo Xu, Chen Shan, Xiaofeng Yan, Tianjing Xu, Bo Wang, Lei Fan, Fuxi Deng, Ying Yan, and Hui Zhang. 2023. Flexible Advancement in Asyn- chronous BFT Consensus. InProceedings of the 29th Symposium on Operating Systems Principles. 264–280

2023
[65]

Chang Lou, Peng Huang, and Scott Smith. 2019. Comprehensive and Efficient Runtime Checking in System Software through Watchdogs. InProceedings of the Workshop on Hot Topics in Operating Sys- tems(Bertinoro, Italy)(HotOS ’19). Association for Computing Machinery, New York, NY , USA, 51–57. doi:10.1145/3317550.3321440

work page doi:10.1145/3317550.3321440 2019
[66]

Ruiming Lu, Yunchi Lu, Yuxuan Jiang, Guangtao Xue, and Peng Huang. 2025. One-size-fits-none: understanding and enhancing slow-fault tolerance in modern distributed systems. InProceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation(Philadelphia, PA, USA)(NSDI ’25). USENIX Association, USA, Article 20, 20 pages

2025
[67]

Ruiming Lu, Erci Xu, Yiming Zhang, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, and Jiesheng Wu. 2022. NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow. In2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 1005–1020. https://www.usenix.org/conference/atc22/ presentation/lu

2022
[68]

Yuan Lu, Zhenliang Lu, and Qiang Tang. 2022. Bolt-Dumbo Transformer: Asynchronous Consensus As Fast As the Pipelined BFT. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Commu- nications Security(Los Angeles, CA, USA)(CCS ’22). Association for Computing Machinery, New York, NY , USA, 2159–2173. doi:10.1145/3548606.3559346

work page doi:10.1145/3548606.3559346 2022
[69]

Benjamin Marsh, Steven Landers, and Jayendra Jog. 2025. Sei Giga. arXiv:2505.14914 [cs.DC] https://arxiv.org/abs/2505.14914

arXiv 2025
[70]

Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. 2016. The Honey Badger of BFT Protocols. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Se- curity(Vienna, Austria)(CCS ’16). Association for Computing Machinery, New York, NY , USA, 31–42. doi:10.1145/2976749.2978399

work page doi:10.1145/2976749.2978399 2016
[71]

Andersen, and Michael Kamin- sky

Iulian Moraru, David G. Andersen, and Michael Kamin- sky. 2013. There is more consensus in Egalitarian parliaments. InProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania)(SOSP ’13). Association for Computing Machinery, New York, NY , USA, 358–372. doi:10.1145/2517349.2517350

work page doi:10.1145/2517349.2517350 2013
[72]

Ray Neiheiser, Miguel Matos, and Luís Rodrigues
[73]

https://doi.org/10.1145/3477132.3483580

Kauri: Scalable BFT Consensus with Pipelined Tree-Based Dissemination and Aggregation. InProceed- ings of the ACM SIGOPS 28th Symposium on Operating Systems Principles(Virtual Event, Germany)(SOSP ’21). Association for Computing Machinery, New York, NY , USA, 35–48. doi:10.1145/3477132.3483584

work page doi:10.1145/3477132.3483584
[74]

Neo4j. [n. d.]. Miti- gating Causal Cluster Re- elections. https:// neo4j.com/developer/ kb/mitigating-causal-cluster-re-elections-caused-by-high-gcs/ Last accessed: 2025-12-09

2025
[75]

Khiem Ngo, Siddhartha Sen, and Wyatt Lloyd
[76]

In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20)

Tolerating Slowdowns in Replicated State Machines using Copilots. In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20). USENIX Association, 583–598. https://www.usenix.org/conference/osdi20/ presentation/ngo
[77]

Daniel Porto, João Leitão, Cheng Li, Allen Clement, Aniket Kate, Flavio Junqueira, and Rodrigo Rodrigues
[78]

InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15)

Visigoth fault tolerance. InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15). Association for Computing Machinery, New York, NY , USA, Article 8, 14 pages. doi:10.1145/2741948.2741979

work page doi:10.1145/2741948.2741979
[79]

Ramasamy and Christian Cachin

HariGovind V . Ramasamy and Christian Cachin. 2006. Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast. InPrinciples of Distributed Systems, James H. Anderson, Giuseppe Prencipe, and Roger Wattenhofer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 88–102

2006
[80]

Matthieu Rambaud. 2024. Faster Asyn- chronous Blockchain Consensus and MVBA. Cryptology ePrint Archive, Paper 2024/1108. https://eprint.iacr.org/2024/1108

2024

Showing first 80 references.

[1] [1]

[n. d.]. Amazon EC2 M6a Instances. https: //aws.amazon.com/ec2/instance-types/m6a/ (last accessed on 12/09/25)

[2] [2]

[n. d.]. Amazon EC2 M6i Instances. https: //aws.amazon.com/ec2/instance-types/m6i/ (last accessed on 12/09/25)

[3] [3]

[n. d.]. CockroachDB Replication Layer. https://www.cockroachlabs.com/docs/stable/ architecture/replication-layer (last accessed on 12/09/25)

[4] [4]

d.].Confidential Consortium Framework, Microsoft

[n. d.].Confidential Consortium Framework, Microsoft. https://ccf.microsoft.com/ (last accessed on 09/23/24)

[5] [5]

[n. d.]. Consider rolling the WAL if the HDFS write pipeline is slow. https://issues.apache. org/jira/browse/HBASE-22301 (last accessed on 12/09/25)

[6] [6]

[n. d.]. Dalek elliptic curve cryptography. https://github.com/dalek-cryptography/ ed25519-dalek(last accessed on 09/23/24)

[7] [7]

[n. d.]. Delayed heartbeat from etcd leader. https: //github.com/etcd-io/etcd/issues/7312 (last accessed on 12/09/25)

[8] [8]

[n. d.]. Digital Euro. https://www.ecb.europa. eu/euro/digital_euro/html/index.en.html (last accessed on 09/23/24)

[9] [9]

[n. d.]. etcd Tuning. https://etcd.io/docs/v3.4/ tuning/(last accessed on 12/09/25)

[10] [10]

[n. d.]. Microsoft CCF Configuration. https: //microsoft.github.io/CCF/main/operations/ configuration.html(last accessed on 12/09/25)

[11] [11]

[n. d.]. minimum master nodes does not prevent split-brain if splits are intersecting. https://github. com/elastic/elasticsearch/issues/2488 (last accessed on 12/09/25)

[12] [12]

[n. d.]. Private communications with engineers at the blockchain company, Espresso, running HotStuff in production. March 2025

2025

[13] [13]

[n. d.]. Private communications with researchers at Mysten Labs, a leading blockchain company), and formerly of Facebook Novi. March 2024

2024

[14] [14]

[n. d.]. RocksDB, version 0.16.0. https: //rocksdb.org/(last accessed on 09/23/24)

[15] [15]

[n. d.]. Sui Blockchain. https://sui.io/ (last accessed on 09/23/24)

[16] [16]

[n. d.]. TiKV Config. https://tikv.org/docs/6. 1/deploy/configure/tikv-configuration-file/ (last accessed on 12/09/25)

[17] [17]

[n. d.]. Tokio, version 1.5.0. https://tokio.rs/ (last accessed on 09/23/24)

[18] [18]

Ittai Abraham, Naama Ben-David, and Sravya Yan- damuri. 2022. Efficient and Adaptively Secure Asynchronous Binary Agreement via Binding Crusader Agreement. InProceedings of the 2022 ACM Sympo- sium on Principles of Distributed Computing. 381–391

2022

[19] [19]

Ittai Abraham, Dahlia Malkhi, and Alexander Spiegel- man. 2019. Asymptotically Optimal Validated Asynchronous Byzantine Agreement. InProceedings of the 2019 ACM Symposium on Principles of Distributed Computing(Toronto ON, Canada)(PODC ’19). Association for Computing Machinery, New York, NY , USA, 337–346. doi:10.1145/3293611.3331612

work page doi:10.1145/3293611.3331612 2019

[20] [20]

Ittai Abraham, Kartik Nayak, Ling Ren, and Zhuolun Xiang. 2021. Good-Case Latency of Byzantine Broadcast: A Complete Categorization. InProceedings of the 2021 ACM Symposium on Principles of Dis- tributed Computing(Virtual Event, Italy)(PODC’21). Association for Computing Machinery, New York, NY , USA, 331–341. doi:10.1145/3465084.3467899

work page doi:10.1145/3465084.3467899 2021

[21] [21]

Aguilera and Michael Walfish

Marcos K. Aguilera and Michael Walfish. 2009. No time for asynchrony. InProceedings of the 12th Conference on Hot Topics in Operating Systems(Monte Verità, Switzerland)(HotOS’09). USENIX Association, USA, 3

2009

[22] [22]

Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, and Samer Al-Kiswany. 2020. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 351–368. https://www.usenix.org/ conference/osdi20/presentation/alfatafta

2020

[23] [23]

Ahmed Alquraan, Hatem Takruri, Mohammed Alfatafta, and Samer Al-Kiswany. 2018. An Analysis of Network-Partitioning Failures in Cloud Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 51–68. https://www.usenix.org/ conference/osdi18/presentation/alquraan

2018

[24] [24]

Antunes, Afonso N

Diogo S. Antunes, Afonso N. Oliveira, André Breda, Matheus Guilherme Franco, Henrique Moniz, and Rodrigo Rodrigues. 2024. Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 313–328. https://www.usenix. org/conference/nsdi24/prese...

2024

[25] [25]

Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, and Alexander Spiegelman. 2025. Shoal++: High Throughput DAG BFT Can Be Fast and Ro- bust!. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). USENIX Association, Philadelphia, PA, 813–826. https://www.usenix.org/conference/nsdi25/ presentation/arun

2025

[26] [26]

Kushal Babel, Andrey Chursin, George Danezis, Lefteris Kokoris-Kogias, and Alberto Sonnino. 2023. Mysticeti: Low-Latency DAG Consensus with Fast Commit Path.arXiv preprint arXiv:2310.14821(2023)

arXiv 2023

[27] [27]

Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Zekun Li, Dahlia Malkhi, Oded Naor, Dmitri Perelman, and Alberto Sonnino

[28] [28]

The Libra Association Technical Report(2019)

State machine replication in the Libra Blockchain. The Libra Association Technical Report(2019)

2019

[29] [29]

Erica Blum, Jonathan Katz, Julian Loss, Kartik Nayak, and Simon Ochsenreither. 2023. Abraxas: Throughput-Efficient Hybrid Asynchronous Consensus. InProceedings of the 2023 ACM SIGSAC Confer- ence on Computer and Communications Security (Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 519–533. doi:10.1145/3576915.3623191

work page doi:10.1145/3576915.3623191 2023

[30] [30]

Gabriel Bracha and Sam Toueg. 1985. Asynchronous consensus and broadcast protocols.Journal of the ACM (JACM)32, 4 (1985), 824–840

1985

[31] [31]

Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. InProceedings of the Third Symposium on Operating Systems Design and Implementation(New Orleans, Louisiana, USA)(OSDI ’99). USENIX Association, USA, 173–186

1999

[32] [32]

Chan and Rafael Pass

Benjamin Y . Chan and Rafael Pass. 2023. Simplex Consensus: A Simple and Fast Consensus Proto- col. InTheory of Cryptography: 21st International Conference, TCC 2023, Taipei, Taiwan, November 29–December 2, 2023, Proceedings, Part IV(Taipei, Taiwan). Springer-Verlag, Berlin, Heidelberg, 452–479. doi:10.1007/978-3-031-48624-1_17

work page doi:10.1007/978-3-031-48624-1_17 2023

[33] [33]

Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. 1996. The weakest failure detector for solving consensus.J. ACM43, 4 (July 1996), 685–722. doi:10.1145/234533.234549

work page doi:10.1145/234533.234549 1996

[34] [34]

Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems.J. ACM43, 2 (March 1996), 225–267. doi:10.1145/226643.226647

work page doi:10.1145/226643.226647 1996

[35] [35]

Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti. 2009. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. In Proceedings of the 6th USENIX Symposium on Net- worked Systems Design and Implementation(Boston, Massachusetts)(NSDI’09). USENIX Association, USA, 153–168

2009

[36] [36]

Graeme Connell, Vivian Fang, Rolfe Schmidt, Emma Dauterman, and Raluca Ada Popa. 2024. Secret Key Recovery in a Global-Scale End-to-End Encryption System. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 703–719. https://www.usenix.org/conference/osdi24/ presentation/connell

2024

[37] [37]

Xiaohai Dai, Chaozheng Ding, Hai Jin, Julian Loss, and Ling Ren. 2024. Ipotane: Balanc- ing the Good and Bad Cases of Asynchronous BFT. Cryptology ePrint Archive, Paper 2024/653. doi:10.14722/ndss.2026.230003

work page doi:10.14722/ndss.2026.230003 2024

[38] [38]

Xiaohai Dai, Bolin Zhang, Hai Jin, and Ling Ren. 2023. ParBFT: Faster Asynchronous BFT Consensus with a Parallel Optimistic Path. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY , USA, 504–518. doi:10.1145/3576915.3623101

work page doi:10.1145/3576915.3623101 2023

[39] [39]

George Danezis, Lefteris Kokoris-Kogias, Alberto Sonnino, and Alexander Spiegelman. 2022. Narwhal and Tusk: a DAG-based mempool and efficient BFT consensus. InProceedings of the Seventeenth European Conference on Computer Systems. 34–50

2022

[40] [40]

Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, and Pierre Sutra

[41] [41]

InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20)

State-machine replication for planet-scale systems. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece) (EuroSys ’20). Association for Computing Machin- ery, New York, NY , USA, Article 24, 15 pages. doi:10.1145/3342195.3387543

work page doi:10.1145/3342195.3387543

[42] [42]

Novi Facebook Research. [n.d.]. Hot- stuff Implementation. https:// github.com/asonnino/hotstuff/commit/ d771d4868db301bcb5e3deaa915b5017220463f6 (last accessed on 09/10/24)

[43] [43]

Novi Facebook Research. [n.d.]. Narwahl and Bullshark implementation. https://github.com/asonnino/ narwhal(last accessed on 09/23/24)

[44] [44]

Yingzi Gao, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Dumbo-ng: Fast asynchronous bft consensus with throughput-oblivious latency. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1187–1201

2022

[45] [45]

Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xi- ang. 2022. Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback. InFinancial Cryptography and Data Security: 26th International Conference, FC 2022, Grenada, May 2–6, 2022, Revised Selected Papers(Grenada, Grenada). Springer-Verla...

work page doi:10.1007/978-3-031-18283-9_14 2022

[46] [46]

Neil Giridharan, Heidi Howard, Ittai Abraham, Natacha Crooks, and Alin Tomescu. 2021. No-commit proofs: Defeating livelock in bft.Cryptology ePrint Archive (2021)

2021

[47] [47]

Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, and Natacha Crooks. 2024. Autobahn: Seamless high speed BFT. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). Association for Computing Machinery, New York, NY , USA, 1–23. doi:10.1145/3694715.3695942

work page doi:10.1145/3694715.3695942 2024

[48] [48]

Neil Giridharan, Florian Suri-Payer, Matthew Ding, Heidi Howard, Ittai Abraham, and Natacha Crooks

[49] [49]

In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing

BeeGees: stayin’alive in chained BFT. In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing. 233–243

2023

[50] [50]

Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael Reiter, Dragos- Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019. SBFT: A Scalable and Decentralized Trust Infrastruc- ture. In2019 49th Annual IEEE/IFIP International Con- ference on Dependable Systems and Networks (DSN). IEEE, USA, 568–580. doi: 10.1109/DSN.2019.00063

work page doi:10.1109/dsn.2019.00063 2019

[51] [51]

Gunawi, Riza O

Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Casey Golliher, Swaminathan Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, Nematollah Bidokhti, Caitie McCaffrey, Gary Grider, Parks M. Fields, Kevin Harms, Robert B. Ross, Andree Jacobson, Robert Ricci, Kirk Webb, Peter Alvaro, H. Birali Runesha, Mingzhe Hao, and Huaicheng Li. 2018. Fail-Slow at Scal...

2018

[52] [52]

Bingyong Guo, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. 2022. Speeding Dumbo: Pushing Asynchronous BFT Closer to Practice. Cryptology ePrint Archive, Paper 2022/027. https://eprint.iacr.org/2022/027

2022

[53] [53]

Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi

[54] [54]

In2021 IEEE 37th International Conference on Data Engineering (ICDE)

RCC: resilient concurrent consensus for high- throughput secure transaction processing. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1392–1403

[55] [55]

Andreas Haeberlen, Petr Kouznetsov, and Peter Dr- uschel. 2007. PeerReview: practical accountability for distributed systems.SIGOPS Oper. Syst. Rev.41, 6 (Oct. 2007), 175–188. doi:10.1145/1323293.1294279

work page doi:10.1145/1323293.1294279 2007

[56] [56]

Lorch, Lidong Zhou, and Yingnong Dang

Peng Huang, Chuanxiong Guo, Jacob R. Lorch, Lidong Zhou, and Yingnong Dang. 2018. Capturing and En- hancing In Situ System Observability for Failure Detec- tion. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Asso- ciation, Carlsbad, CA, 1–16. https://www.usenix. org/conference/osdi18/presentation/huang

2018

[57] [57]

In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp

Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. 2017. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. InProceedings of the 16th Workshop on Hot Topics in Operating Systems (Whistler, BC, Canada)(HotOS ’17). Association for Computing Machinery, New York, NY , USA, 150–155. doi:10.1145/...

work page doi:10.1145/3102980.3103005 2017

[58] [58]

Philipp Jovanovic, Lefteris Kokoris Kogias, Bryan Kumara, Alberto Sonnino, Pasindu Tennage, and Igor Zablotchi. 2024. Mahi- Mahi: Low-Latency Asynchronous BFT DAG- Based Consensus. arXiv:2410.08670 [cs.DC] https://arxiv.org/abs/2410.08670

arXiv 2024

[59] [59]

Idit Keidar and Alexander Shraer. 2006. Timeli- ness, failure-detectors, and consensus performance. InProceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing (Denver, Colorado, USA)(PODC ’06). Association for Computing Machinery, New York, NY , USA, 169–178. doi:10.1145/1146381.1146408

work page doi:10.1145/1146381.1146408 2006

[60] [60]

Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2010. Zyzzyva: Specu- lative Byzantine Fault Tolerance.ACM Transactions on Computer Systems (TOCS)27, 4, Article 7 (Jan. 2010), 39 pages. doi:10.1145/1658357.1658358

work page doi:10.1145/1658357.1658358 2010

[61] [61]

S Krishnapriya and Greeshma Sarath. 2020. Securing Land Registration using Blockchain.Procedia computer science171 (2020), 1708–1715

2020

[62] [62]

Leners, Hao Wu, Wei-Lun Hung, Marcos K

Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. 2011. Detecting failures in distributed systems with the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles(Cascais, Portugal)(SOSP ’11). Association for Computing Machinery, New York, NY , USA, 279–294. doi:10.1145/2043556.2043583

work page doi:10.1145/2043556.2043583 2011

[63] [63]

Tom Lianza and Chris Snook. [n. d.]. A Byzantine failure in the real world. https://blog.cloudflare. com/a-byzantine-failure-in-the-real-world/

[64] [64]

Shengyun Liu, Wenbo Xu, Chen Shan, Xiaofeng Yan, Tianjing Xu, Bo Wang, Lei Fan, Fuxi Deng, Ying Yan, and Hui Zhang. 2023. Flexible Advancement in Asyn- chronous BFT Consensus. InProceedings of the 29th Symposium on Operating Systems Principles. 264–280

2023

[65] [65]

Chang Lou, Peng Huang, and Scott Smith. 2019. Comprehensive and Efficient Runtime Checking in System Software through Watchdogs. InProceedings of the Workshop on Hot Topics in Operating Sys- tems(Bertinoro, Italy)(HotOS ’19). Association for Computing Machinery, New York, NY , USA, 51–57. doi:10.1145/3317550.3321440

work page doi:10.1145/3317550.3321440 2019

[66] [66]

Ruiming Lu, Yunchi Lu, Yuxuan Jiang, Guangtao Xue, and Peng Huang. 2025. One-size-fits-none: understanding and enhancing slow-fault tolerance in modern distributed systems. InProceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation(Philadelphia, PA, USA)(NSDI ’25). USENIX Association, USA, Article 20, 20 pages

2025

[67] [67]

Ruiming Lu, Erci Xu, Yiming Zhang, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, and Jiesheng Wu. 2022. NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow. In2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 1005–1020. https://www.usenix.org/conference/atc22/ presentation/lu

2022

[68] [68]

Yuan Lu, Zhenliang Lu, and Qiang Tang. 2022. Bolt-Dumbo Transformer: Asynchronous Consensus As Fast As the Pipelined BFT. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Commu- nications Security(Los Angeles, CA, USA)(CCS ’22). Association for Computing Machinery, New York, NY , USA, 2159–2173. doi:10.1145/3548606.3559346

work page doi:10.1145/3548606.3559346 2022

[69] [69]

Benjamin Marsh, Steven Landers, and Jayendra Jog. 2025. Sei Giga. arXiv:2505.14914 [cs.DC] https://arxiv.org/abs/2505.14914

arXiv 2025

[70] [70]

Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. 2016. The Honey Badger of BFT Protocols. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Se- curity(Vienna, Austria)(CCS ’16). Association for Computing Machinery, New York, NY , USA, 31–42. doi:10.1145/2976749.2978399

work page doi:10.1145/2976749.2978399 2016

[71] [71]

Andersen, and Michael Kamin- sky

Iulian Moraru, David G. Andersen, and Michael Kamin- sky. 2013. There is more consensus in Egalitarian parliaments. InProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania)(SOSP ’13). Association for Computing Machinery, New York, NY , USA, 358–372. doi:10.1145/2517349.2517350

work page doi:10.1145/2517349.2517350 2013

[72] [72]

Ray Neiheiser, Miguel Matos, and Luís Rodrigues

[73] [73]

https://doi.org/10.1145/3477132.3483580

Kauri: Scalable BFT Consensus with Pipelined Tree-Based Dissemination and Aggregation. InProceed- ings of the ACM SIGOPS 28th Symposium on Operating Systems Principles(Virtual Event, Germany)(SOSP ’21). Association for Computing Machinery, New York, NY , USA, 35–48. doi:10.1145/3477132.3483584

work page doi:10.1145/3477132.3483584

[74] [74]

Neo4j. [n. d.]. Miti- gating Causal Cluster Re- elections. https:// neo4j.com/developer/ kb/mitigating-causal-cluster-re-elections-caused-by-high-gcs/ Last accessed: 2025-12-09

2025

[75] [75]

Khiem Ngo, Siddhartha Sen, and Wyatt Lloyd

[76] [76]

In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20)

Tolerating Slowdowns in Replicated State Machines using Copilots. In14th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 20). USENIX Association, 583–598. https://www.usenix.org/conference/osdi20/ presentation/ngo

[77] [77]

Daniel Porto, João Leitão, Cheng Li, Allen Clement, Aniket Kate, Flavio Junqueira, and Rodrigo Rodrigues

[78] [78]

InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15)

Visigoth fault tolerance. InProceedings of the Tenth European Conference on Computer Systems (Bordeaux, France)(EuroSys ’15). Association for Computing Machinery, New York, NY , USA, Article 8, 14 pages. doi:10.1145/2741948.2741979

work page doi:10.1145/2741948.2741979

[79] [79]

Ramasamy and Christian Cachin

HariGovind V . Ramasamy and Christian Cachin. 2006. Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast. InPrinciples of Distributed Systems, James H. Anderson, Giuseppe Prencipe, and Roger Wattenhofer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 88–102

2006

[80] [80]

Matthieu Rambaud. 2024. Faster Asyn- chronous Blockchain Consensus and MVBA. Cryptology ePrint Archive, Paper 2024/1108. https://eprint.iacr.org/2024/1108

2024