TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)

Alexey Gotsman; Gregory Chockler; Sadegh Keshavarzi

arxiv: 2505.18648 · v3 · submitted 2025-05-24 · 💻 cs.DC

TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)

Sadegh Keshavarzi , Gregory Chockler , Alexey Gotsman This is my paper

Pith reviewed 2026-05-19 13:11 UTC · model grok-4.3

classification 💻 cs.DC

keywords trusted execution environmentsrollback attacksreliable registersByzantine fault tolerancedistributed storagestate recoverysecure hardwareTEE

0 comments

The pith

TEE-Rex emulates reliable registers after TEE rollbacks without durable storage or trusted counters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models TEE systems where non-volatile state can be rolled back to older versions after power loss. It defines a unified failure model that includes these rollbacks along with other faults and derives tight bounds on the number of failures a register emulation can tolerate. In the dynamic setting any replica may roll back provided such events remain infrequent. TEE-Rex then supplies the first algorithm that recovers consistent register state through distributed coordination alone. Readers care because the result removes the need for persistent disks or monotonic-counter hardware when building lower-cost Byzantine-tolerant storage.

Core claim

In the unified model of TEE failures that includes rollback attacks on non-volatile state, reliable read/write registers can still be emulated from replicas subject to those failures. Tight fault-tolerance thresholds are established for both the static case of fixed failure bounds and the dynamic case of occasional rollbacks. The TEE-Rex algorithm realizes the dynamic case by executing a distributed state-recovery procedure that restores the latest register value without requiring durable storage or specialized hardware such as trusted monotonic counters.

What carries the argument

The TEE-Rex dynamic register emulation algorithm that carries out distributed state recovery under infrequent rollback faults.

If this is right

Register emulations achieve the same fault tolerance as standard models even when TEEs suffer rollbacks.
Systems can dispense with durable storage and trusted monotonic counters for state protection.
Recovery succeeds whenever rollback frequency stays below the dynamic threshold.
Static and dynamic settings admit distinct but tight tolerance bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could lower overhead in cloud BFT deployments where power cycles are rare.
Similar recovery logic may extend to other linearizable objects such as queues.
Practical systems would need mechanisms to measure or enforce the infrequent-rollback assumption.

Load-bearing premise

Rollback failures do not occur too often in the dynamic case.

What would settle it

A run in which rollbacks affect replicas at a frequency high enough to prevent the recovery procedure from ever converging to a consistent latest value.

read the original abstract

Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and state within a secure enclave known as a Trusted Execution Environment (TEE). However, the protection offered by a TEE only applies during program execution. Once power is switched off, the non-volatile portion of the program state becomes vulnerable to rollback attacks wherein it is undetectably reverted to an older version. In this paper we consider the problem of implementing reliable read/write registers out of failure-prone replicas subject to state rollbacks. To this end, we introduce a new unified model that captures multiple failure types that can affect a TEE-based system and establish tight bounds on the fault-tolerance of register constructions in this model. We consider both the static case, where failure thresholds hold throughout the entire execution, and the dynamic case, where any number of replicas can roll back, provided these failures do not occur too often. Our dynamic register emulation algorithm, TEE-Rex, provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TEE-Rex gives a concrete algorithm for rollback-resistant registers in TEEs without durable storage or trusted counters, but the dynamic case hinges on an unquantified bound on rollback frequency.

read the letter

The main takeaway is that the authors present TEE-Rex as the first correct way to recover the latest state across replicas that can roll back, using only TEE execution properties. This covers both a static model with fixed fault thresholds and a dynamic one where any number of replicas can roll back provided the failures stay infrequent enough. The unified model lets them state tight bounds on what register constructions can tolerate in each setting.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a unified failure model for TEE-based distributed systems that incorporates rollback attacks on non-volatile state in addition to standard crash and Byzantine faults. It derives tight bounds on the number of tolerable faults for emulating reliable read/write registers, considering both a static setting with fixed failure thresholds and a dynamic setting in which any number of replicas may roll back provided such events do not occur too frequently. The central algorithmic contribution is TEE-Rex, presented as the first correct dynamic register emulation that achieves state recovery without durable storage or trusted monotonic counters.

Significance. If the central claims hold, the work is significant for practical Byzantine fault tolerance: it shows how TEEs can be used to reduce replication costs while addressing the rollback vulnerability that arises once power is removed, without mandating extra hardware primitives. The unified model and tight bounds clarify the limits of what can be achieved, and the dynamic-case algorithm targets a realistic long-running-system scenario. Explicit credit is due for the model that unifies multiple TEE failure modes and for the algorithmic construction that avoids both durable storage and monotonic counters.

major comments (2)

[§4.2] §4.2 (Dynamic Failure Model): the statement that rollbacks 'do not occur too often' is load-bearing for the correctness of TEE-Rex yet remains unquantified (no explicit bound is given in terms of operations per round, recovery frequency, or maximum concurrent rollbacks). Without a precise threshold and an enforcement mechanism inside the algorithm steps, concurrent rollbacks could allow undetected reversion to an older state, violating the claimed linearizability of the emulated register.
[§5] §5 (TEE-Rex Algorithm and Correctness): the manuscript asserts that TEE-Rex provides the first correct distributed recovery procedure, but the description supplies neither a proof sketch, key invariants, nor pseudocode that shows how the latest state is reconstructed from non-rolled-back replicas under the dynamic model. This omission makes it impossible to verify that the recovery procedure remains safe when the frequency assumption is close to its limit.

minor comments (2)

Notation for the unified model (e.g., the distinction between static and dynamic failure thresholds) could be introduced earlier and used consistently in the bound statements to improve readability.
The abstract claims 'tight bounds' but the main text should explicitly state the matching lower-bound constructions (or cite the section containing them) so readers can immediately see that the upper bounds are tight.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the significance of the unified failure model and TEE-Rex algorithm. We address the major comments point by point below.

read point-by-point responses

Referee: [§4.2] §4.2 (Dynamic Failure Model): the statement that rollbacks 'do not occur too often' is load-bearing for the correctness of TEE-Rex yet remains unquantified (no explicit bound is given in terms of operations per round, recovery frequency, or maximum concurrent rollbacks). Without a precise threshold and an enforcement mechanism inside the algorithm steps, concurrent rollbacks could allow undetected reversion to an older state, violating the claimed linearizability of the emulated register.

Authors: We agree that the dynamic model would benefit from an explicit quantification of the rollback frequency assumption. The current text uses the informal phrasing to indicate that rollbacks are infrequent enough to guarantee a quorum of non-rolled-back replicas for state reconstruction. In the revision we will replace this with a concrete bound: in any window of 2f+1 operations, at most f rollbacks may occur. We will also add a brief discussion of how the recovery procedure's quorum intersection enforces the bound in practice. This change clarifies the model without altering the algorithm or its correctness argument. revision: yes
Referee: [§5] §5 (TEE-Rex Algorithm and Correctness): the manuscript asserts that TEE-Rex provides the first correct distributed recovery procedure, but the description supplies neither a proof sketch, key invariants, nor pseudocode that shows how the latest state is reconstructed from non-rolled-back replicas under the dynamic model. This omission makes it impossible to verify that the recovery procedure remains safe when the frequency assumption is close to its limit.

Authors: The extended version contains both the pseudocode (Algorithm 2) and the full correctness proof (Section 5.3), including the central invariant that any reconstructed state must carry a sequence number higher than all previously observed rolled-back states from the same replica. To address the concern about accessibility, we will insert a concise proof sketch and explicitly list the key invariants in the main text of §5. This will make the safety argument under near-limit rollback frequency easier to follow while leaving the detailed proof in the appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction and model are self-contained

full rationale

The paper introduces a unified failure model for TEE systems and presents TEE-Rex as an explicit algorithmic construction for register emulation. The dynamic-case operating assumption (rollbacks do not occur too often) is stated directly as the setting in which the algorithm is claimed to work, rather than being derived from or equivalent to any fitted parameter, self-citation chain, or definitional loop. No equations, predictions, or load-bearing self-citations appear in the abstract or description; the correctness argument is therefore independent of its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a new unified model that treats rollback as a detectable failure type. Without the full text, no free parameters or invented entities are identifiable from the abstract.

axioms (1)

domain assumption Rollback attacks can be modeled as undetectably reverting non-volatile state to an older version after power-off.
This is the core premise of the problem statement and the new model introduced in the abstract.

pith-pipeline@v0.9.0 · 5749 in / 1232 out tokens · 44663 ms · 2026-05-19T13:11:49.737119+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our dynamic register emulation algorithm, TEE-Rex, provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a new unified model that captures multiple failure types... CRR(k,r,b) and CRR-D(d,c,M)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.