Sequential Randomization Tests Using e-values: Applications for trial monitoring
Pith reviewed 2026-05-17 01:59 UTC · model grok-4.3
The pith
Sequential randomization tests using e-values deliver anytime-valid Type I error control for clinical trial monitoring derived solely from the randomization mechanism.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing test martingales through sequential wagers on randomized assignments before incorporating observed labels, the e-RT procedures ensure that under the null hypothesis the expected wealth process is a supermartingale, thereby guaranteeing anytime-valid Type I error control regardless of the stopping rule chosen by the investigator.
What carries the argument
The test martingale formed by wagering on randomized treatment assignments or observed event labels before wealth updates, which enforces the supermartingale property under the null.
If this is right
- Anytime-valid Type I error control holds for arbitrary stopping times in sequential monitoring of randomized trials.
- The same framework applies uniformly to binary, time-to-event, and continuous endpoints without model assumptions.
- Default effect-size-agnostic wagers allow monitoring to begin immediately without specifying a target treatment effect.
- Optional growth-rate-optimal wagers can be substituted when a fixed design alternative is credible to improve power.
- The methods serve as a conservative, assumption-light complement to existing model-based sequential analyses.
Where Pith is reading between the lines
- Investigators could use these procedures to justify adaptive or response-adaptive randomization schemes in which stopping decisions depend on accumulating data.
- The same randomization-derived martingale construction might transfer to other experimental settings that rely on known random assignment, such as online A/B testing platforms.
- Hybrid monitoring rules could combine e-RT validity bounds with parametric likelihood ratios to gain power while preserving the nonparametric safety net.
Load-bearing premise
Treatment assignments are generated by a known random mechanism that is independent of potential outcomes under the null hypothesis.
What would settle it
In repeated simulations under the null with a data-dependent stopping rule that rejects when wealth exceeds a fixed threshold, the observed rejection rate would exceed the nominal alpha level.
Figures
read the original abstract
Sequential monitoring of randomized trials traditionally relies on parametric assumptions or asymptotic approximations. We discuss a family of nonparametric sequential tests - collectively called e-RT - for binary, event-only, and continuous endpoints. All active variants derive validity from the randomization mechanism. Using a betting framework, each test constructs a test martingale by sequentially wagering on randomized assignments or observed event labels before using the current label in the wealth update. Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule. The default e-RT posture is effect-size agnostic: monitoring can begin without specifying a hypothesized treatment effect. Alternatively, fixed design-calibrated wagers, including growth-rate-optimal (GROW) wagers, may be used as optional efficiency tools when a clinically meaningful design alternative is credible. We present simulation studies demonstrating calibration and power, and discuss the principled asymmetry in betting strategies across outcome types. These methods provide a conservative, assumption-light complement to model-based sequential analyses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a family of nonparametric sequential tests called e-RT for monitoring randomized clinical trials with binary, event-only, and continuous endpoints. These tests construct test martingales via a sequential betting framework that wagers on randomized treatment assignments or event labels before updating wealth. Under the sharp null of no treatment effect, the construction ensures that expected wealth is non-increasing, yielding anytime-valid Type I error control for arbitrary stopping times via Ville's inequality. The default approach is effect-size agnostic, though optional design-calibrated wagers (including GROW) are discussed for efficiency gains. Simulation studies are presented to illustrate calibration and power.
Significance. If the martingale property holds as described, the work supplies a conservative, randomization-based complement to parametric or asymptotic sequential methods. It avoids reliance on fitted models or asymptotic approximations and directly leverages the trial's randomization mechanism for validity. The explicit separation of default agnostic monitoring from optional efficiency tools, together with the simulation evidence, strengthens its potential utility for flexible trial monitoring.
major comments (2)
- [§3.2] §3.2, the continuous-endpoint wager definition: the conditional-expectation argument under the null is stated to equal 1, but the precise form of the wager function (e.g., how the continuous outcome is mapped to a bounded payoff) is not shown explicitly; this step is load-bearing for the martingale property and requires an expanded derivation or reference to the supporting lemma.
- [§5] Simulation protocol in §5: the reported power curves for the GROW wager variant assume a specific clinically meaningful alternative; without a sensitivity analysis over a broader range of alternatives (including those near the boundary of detectability), the efficiency claim relative to the agnostic version remains incompletely supported.
minor comments (3)
- [Abstract] The abstract introduces 'e-RT' without an immediate expansion; adding '(e-value Randomization Test)' on first use would improve readability.
- [§2] Notation for the natural filtration in the martingale construction could be made more explicit (e.g., by indexing the sigma-algebra explicitly with the sequence of randomizations and observations) to assist readers less familiar with sequential analysis.
- [Table 2] Table 2 caption should clarify whether the reported Type I error rates are exact or Monte Carlo estimates and include the number of replications used.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive feedback on our manuscript. The comments help clarify important aspects of the e-RT framework. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§3.2] §3.2, the continuous-endpoint wager definition: the conditional-expectation argument under the null is stated to equal 1, but the precise form of the wager function (e.g., how the continuous outcome is mapped to a bounded payoff) is not shown explicitly; this step is load-bearing for the martingale property and requires an expanded derivation or reference to the supporting lemma.
Authors: We agree that additional detail on the continuous-endpoint wager would improve the exposition. In the revised manuscript, we will provide an explicit definition of the wager function for continuous outcomes, specifying how the outcome is mapped to a bounded payoff (for example, through a normalized transformation ensuring the payoff lies in [0,1]). We will then include a detailed derivation demonstrating that the conditional expectation under the sharp null hypothesis equals 1, based on the randomization distribution. This will be presented directly in §3.2 or as a supporting lemma in the appendix. revision: yes
-
Referee: [§5] Simulation protocol in §5: the reported power curves for the GROW wager variant assume a specific clinically meaningful alternative; without a sensitivity analysis over a broader range of alternatives (including those near the boundary of detectability), the efficiency claim relative to the agnostic version remains incompletely supported.
Authors: The simulations in §5 were designed to illustrate performance under a specific, clinically relevant alternative to highlight the benefits of the GROW wager. We recognize that a sensitivity analysis would provide stronger support for the efficiency claims. Accordingly, we will expand the simulation section to include additional scenarios, particularly those near the boundary of detectability, and report the corresponding power comparisons between the agnostic and design-calibrated variants. revision: yes
Circularity Check
No significant circularity; derivation self-contained via external randomization
full rationale
The paper constructs test martingales for sequential randomization tests whose validity under the sharp null follows from the external randomization mechanism: treatment assignments are independent of fixed potential outcomes, so each wealth increment has conditional expectation 1 by construction of the betting update. This property, combined with the standard martingale inequality (Ville), directly yields the anytime-valid Type I error bound for arbitrary stopping times. No equation reduces a derived quantity to a data-fitted parameter by definition, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in; the central guarantee is therefore independent of the paper's own outputs and rests on the stated randomization assumption plus classical probability results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Treatment assignments are randomized independently of potential outcomes under the null hypothesis of no treatment effect.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1. Under the null hypothesis, the wealth process (Wn) is a nonnegative martingale.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Duan, B., Ramdas, A., and Wasserman, L. (2022). Interactive rank testing by betting. In Schölkopf, B., Uhler, C., and Zhang, K., editors,Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 ofProceedings of Machine Learning Research, pages 201–235. PMLR. Grünwald, P., Ly, A., Perez-Ortiz, M., and Schure, J. T. (2021). The safe ...
work page 2022
-
[2]
Kelly, J. L. (1956). A new interpretation of information rate.Bell System Technical Journal, 35(4):917–926
work page 1956
- [3]
-
[4]
Ramdas, A. (2021). Game-theoretic probability and statistics (lecture notes). Accessed: 2025-12-09
work page 2021
-
[5]
Ramdas, A., Ruf, J., Larsson, M., and Koolen, W. M. (2022). Testing exchangeability: Fork- convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109
work page 2022
-
[6]
Ramdas, A. and Wang, R. (2025). Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390
work page 2025
-
[7]
Shafer, G. (2021). Testing by betting: A strategy for statistical and scientific communication. Journal of the Royal Statistical Society: Series A, 184(2):407–431
work page 2021
-
[8]
(1939).Étude critique de la notion de collectif
Ville, J. (1939).Étude critique de la notion de collectif. PhD thesis, Gauthier-Villars, Paris
work page 1939
-
[9]
Vovk, V. and Wang, R. (2021). E-values: Calibration, combination and applications.Annals of Statistics, 49(3):1736–1754
work page 2021
-
[10]
Must␣specify␣either␣p_trt␣or␣hypothesized_ARR
Waudby-Smith, I. and Ramdas, A. (2023). Estimating means of bounded random variables by betting.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):1–27. A R Code # e - RT : S e q u e n t i a l R a n d o m i z a t i o n Tests Using e - values # S u p p l e m e n t a r y R Code library(tidyverse) # --- e - RT : Binary Outcomes...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.