TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)
Pith reviewed 2026-05-19 13:11 UTC · model grok-4.3
The pith
TEE-Rex emulates reliable registers after TEE rollbacks without durable storage or trusted counters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the unified model of TEE failures that includes rollback attacks on non-volatile state, reliable read/write registers can still be emulated from replicas subject to those failures. Tight fault-tolerance thresholds are established for both the static case of fixed failure bounds and the dynamic case of occasional rollbacks. The TEE-Rex algorithm realizes the dynamic case by executing a distributed state-recovery procedure that restores the latest register value without requiring durable storage or specialized hardware such as trusted monotonic counters.
What carries the argument
The TEE-Rex dynamic register emulation algorithm that carries out distributed state recovery under infrequent rollback faults.
If this is right
- Register emulations achieve the same fault tolerance as standard models even when TEEs suffer rollbacks.
- Systems can dispense with durable storage and trusted monotonic counters for state protection.
- Recovery succeeds whenever rollback frequency stays below the dynamic threshold.
- Static and dynamic settings admit distinct but tight tolerance bounds.
Where Pith is reading between the lines
- The approach could lower overhead in cloud BFT deployments where power cycles are rare.
- Similar recovery logic may extend to other linearizable objects such as queues.
- Practical systems would need mechanisms to measure or enforce the infrequent-rollback assumption.
Load-bearing premise
Rollback failures do not occur too often in the dynamic case.
What would settle it
A run in which rollbacks affect replicas at a frequency high enough to prevent the recovery procedure from ever converging to a consistent latest value.
read the original abstract
Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and state within a secure enclave known as a Trusted Execution Environment (TEE). However, the protection offered by a TEE only applies during program execution. Once power is switched off, the non-volatile portion of the program state becomes vulnerable to rollback attacks wherein it is undetectably reverted to an older version. In this paper we consider the problem of implementing reliable read/write registers out of failure-prone replicas subject to state rollbacks. To this end, we introduce a new unified model that captures multiple failure types that can affect a TEE-based system and establish tight bounds on the fault-tolerance of register constructions in this model. We consider both the static case, where failure thresholds hold throughout the entire execution, and the dynamic case, where any number of replicas can roll back, provided these failures do not occur too often. Our dynamic register emulation algorithm, TEE-Rex, provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a unified failure model for TEE-based distributed systems that incorporates rollback attacks on non-volatile state in addition to standard crash and Byzantine faults. It derives tight bounds on the number of tolerable faults for emulating reliable read/write registers, considering both a static setting with fixed failure thresholds and a dynamic setting in which any number of replicas may roll back provided such events do not occur too frequently. The central algorithmic contribution is TEE-Rex, presented as the first correct dynamic register emulation that achieves state recovery without durable storage or trusted monotonic counters.
Significance. If the central claims hold, the work is significant for practical Byzantine fault tolerance: it shows how TEEs can be used to reduce replication costs while addressing the rollback vulnerability that arises once power is removed, without mandating extra hardware primitives. The unified model and tight bounds clarify the limits of what can be achieved, and the dynamic-case algorithm targets a realistic long-running-system scenario. Explicit credit is due for the model that unifies multiple TEE failure modes and for the algorithmic construction that avoids both durable storage and monotonic counters.
major comments (2)
- [§4.2] §4.2 (Dynamic Failure Model): the statement that rollbacks 'do not occur too often' is load-bearing for the correctness of TEE-Rex yet remains unquantified (no explicit bound is given in terms of operations per round, recovery frequency, or maximum concurrent rollbacks). Without a precise threshold and an enforcement mechanism inside the algorithm steps, concurrent rollbacks could allow undetected reversion to an older state, violating the claimed linearizability of the emulated register.
- [§5] §5 (TEE-Rex Algorithm and Correctness): the manuscript asserts that TEE-Rex provides the first correct distributed recovery procedure, but the description supplies neither a proof sketch, key invariants, nor pseudocode that shows how the latest state is reconstructed from non-rolled-back replicas under the dynamic model. This omission makes it impossible to verify that the recovery procedure remains safe when the frequency assumption is close to its limit.
minor comments (2)
- Notation for the unified model (e.g., the distinction between static and dynamic failure thresholds) could be introduced earlier and used consistently in the bound statements to improve readability.
- The abstract claims 'tight bounds' but the main text should explicitly state the matching lower-bound constructions (or cite the section containing them) so readers can immediately see that the upper bounds are tight.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the significance of the unified failure model and TEE-Rex algorithm. We address the major comments point by point below.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Dynamic Failure Model): the statement that rollbacks 'do not occur too often' is load-bearing for the correctness of TEE-Rex yet remains unquantified (no explicit bound is given in terms of operations per round, recovery frequency, or maximum concurrent rollbacks). Without a precise threshold and an enforcement mechanism inside the algorithm steps, concurrent rollbacks could allow undetected reversion to an older state, violating the claimed linearizability of the emulated register.
Authors: We agree that the dynamic model would benefit from an explicit quantification of the rollback frequency assumption. The current text uses the informal phrasing to indicate that rollbacks are infrequent enough to guarantee a quorum of non-rolled-back replicas for state reconstruction. In the revision we will replace this with a concrete bound: in any window of 2f+1 operations, at most f rollbacks may occur. We will also add a brief discussion of how the recovery procedure's quorum intersection enforces the bound in practice. This change clarifies the model without altering the algorithm or its correctness argument. revision: yes
-
Referee: [§5] §5 (TEE-Rex Algorithm and Correctness): the manuscript asserts that TEE-Rex provides the first correct distributed recovery procedure, but the description supplies neither a proof sketch, key invariants, nor pseudocode that shows how the latest state is reconstructed from non-rolled-back replicas under the dynamic model. This omission makes it impossible to verify that the recovery procedure remains safe when the frequency assumption is close to its limit.
Authors: The extended version contains both the pseudocode (Algorithm 2) and the full correctness proof (Section 5.3), including the central invariant that any reconstructed state must carry a sequence number higher than all previously observed rolled-back states from the same replica. To address the concern about accessibility, we will insert a concise proof sketch and explicitly list the key invariants in the main text of §5. This will make the safety argument under near-limit rollback frequency easier to follow while leaving the detailed proof in the appendix. revision: yes
Circularity Check
No circularity: algorithmic construction and model are self-contained
full rationale
The paper introduces a unified failure model for TEE systems and presents TEE-Rex as an explicit algorithmic construction for register emulation. The dynamic-case operating assumption (rollbacks do not occur too often) is stated directly as the setting in which the algorithm is claimed to work, rather than being derived from or equivalent to any fitted parameter, self-citation chain, or definitional loop. No equations, predictions, or load-bearing self-citations appear in the abstract or description; the correctness argument is therefore independent of its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Rollback attacks can be modeled as undetectably reverting non-volatile state to an older version after power-off.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our dynamic register emulation algorithm, TEE-Rex, provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a new unified model that captures multiple failure types... CRR(k,r,b) and CRR-D(d,c,M)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.