pith. sign in

arxiv: 2512.04366 · v9 · submitted 2025-12-04 · 📊 stat.ME · stat.AP

Sequential Randomization Tests Using e-values: Applications for trial monitoring

Pith reviewed 2026-05-17 01:59 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords sequential monitoringe-valuesrandomization testsmartingalesclinical trialsnonparametric testsanytime validityType I error control
0
0 comments X

The pith

Sequential randomization tests using e-values deliver anytime-valid Type I error control for clinical trial monitoring derived solely from the randomization mechanism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a family of nonparametric sequential tests called e-RT that monitor randomized trials for binary, event, and continuous outcomes without parametric assumptions or fixed sample sizes. Each test builds a test martingale by placing wagers on randomized treatment assignments or event labels and updating wealth only after observing the outcome. Under the null of no treatment effect, expected wealth cannot grow, which automatically bounds the probability of ever crossing any error threshold no matter when monitoring stops. This approach is effect-size agnostic by default but can incorporate design-calibrated or growth-rate-optimal wagers when a clinically meaningful alternative is available. A reader would care because it removes the need to pre-specify stopping rules or rely on asymptotic approximations while still guaranteeing valid inference in ongoing trials.

Core claim

By constructing test martingales through sequential wagers on randomized assignments before incorporating observed labels, the e-RT procedures ensure that under the null hypothesis the expected wealth process is a supermartingale, thereby guaranteeing anytime-valid Type I error control regardless of the stopping rule chosen by the investigator.

What carries the argument

The test martingale formed by wagering on randomized treatment assignments or observed event labels before wealth updates, which enforces the supermartingale property under the null.

If this is right

  • Anytime-valid Type I error control holds for arbitrary stopping times in sequential monitoring of randomized trials.
  • The same framework applies uniformly to binary, time-to-event, and continuous endpoints without model assumptions.
  • Default effect-size-agnostic wagers allow monitoring to begin immediately without specifying a target treatment effect.
  • Optional growth-rate-optimal wagers can be substituted when a fixed design alternative is credible to improve power.
  • The methods serve as a conservative, assumption-light complement to existing model-based sequential analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Investigators could use these procedures to justify adaptive or response-adaptive randomization schemes in which stopping decisions depend on accumulating data.
  • The same randomization-derived martingale construction might transfer to other experimental settings that rely on known random assignment, such as online A/B testing platforms.
  • Hybrid monitoring rules could combine e-RT validity bounds with parametric likelihood ratios to gain power while preserving the nonparametric safety net.

Load-bearing premise

Treatment assignments are generated by a known random mechanism that is independent of potential outcomes under the null hypothesis.

What would settle it

In repeated simulations under the null with a data-dependent stopping rule that rejects when wealth exceeds a fixed threshold, the observed rejection rate would exceed the nominal alpha level.

Figures

Figures reproduced from arXiv: 2512.04366 by Fernando G Zampieri.

Figure 1
Figure 1. Figure 1: Wealth trajectories under the null hypothesis. Left: n = 712 (80% power design). Right: [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: E-processes trajectories under the alternative hypothesis (true ARR = 10%). Left: n = [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectories of the e-RTd process (25% baseline mortality, 5pp ARR, 500 deaths). [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trajectories of the continuous randomization e-process for a trial designed to detect [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trajectories of the e-Survival process for a trial designed to detect a Hazard Ratio of [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Trajectories of e-RTms for a trial with N = 1000 patients. Left: under the null hypothesis (equal transition matrices), wealth fluctuates randomly. Right: under the alternative hypothesis, wealth grows as treatment improves recovery transitions. Dashed red line: rejection threshold (1/α = 20). predict which arm the patient was in?” This comes at a cost: we lose information about which specific transitions … view at source ↗
read the original abstract

Sequential monitoring of randomized trials traditionally relies on parametric assumptions or asymptotic approximations. We discuss a family of nonparametric sequential tests - collectively called e-RT - for binary, event-only, and continuous endpoints. All active variants derive validity from the randomization mechanism. Using a betting framework, each test constructs a test martingale by sequentially wagering on randomized assignments or observed event labels before using the current label in the wealth update. Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule. The default e-RT posture is effect-size agnostic: monitoring can begin without specifying a hypothesized treatment effect. Alternatively, fixed design-calibrated wagers, including growth-rate-optimal (GROW) wagers, may be used as optional efficiency tools when a clinically meaningful design alternative is credible. We present simulation studies demonstrating calibration and power, and discuss the principled asymmetry in betting strategies across outcome types. These methods provide a conservative, assumption-light complement to model-based sequential analyses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes a family of nonparametric sequential tests called e-RT for monitoring randomized clinical trials with binary, event-only, and continuous endpoints. These tests construct test martingales via a sequential betting framework that wagers on randomized treatment assignments or event labels before updating wealth. Under the sharp null of no treatment effect, the construction ensures that expected wealth is non-increasing, yielding anytime-valid Type I error control for arbitrary stopping times via Ville's inequality. The default approach is effect-size agnostic, though optional design-calibrated wagers (including GROW) are discussed for efficiency gains. Simulation studies are presented to illustrate calibration and power.

Significance. If the martingale property holds as described, the work supplies a conservative, randomization-based complement to parametric or asymptotic sequential methods. It avoids reliance on fitted models or asymptotic approximations and directly leverages the trial's randomization mechanism for validity. The explicit separation of default agnostic monitoring from optional efficiency tools, together with the simulation evidence, strengthens its potential utility for flexible trial monitoring.

major comments (2)
  1. [§3.2] §3.2, the continuous-endpoint wager definition: the conditional-expectation argument under the null is stated to equal 1, but the precise form of the wager function (e.g., how the continuous outcome is mapped to a bounded payoff) is not shown explicitly; this step is load-bearing for the martingale property and requires an expanded derivation or reference to the supporting lemma.
  2. [§5] Simulation protocol in §5: the reported power curves for the GROW wager variant assume a specific clinically meaningful alternative; without a sensitivity analysis over a broader range of alternatives (including those near the boundary of detectability), the efficiency claim relative to the agnostic version remains incompletely supported.
minor comments (3)
  1. [Abstract] The abstract introduces 'e-RT' without an immediate expansion; adding '(e-value Randomization Test)' on first use would improve readability.
  2. [§2] Notation for the natural filtration in the martingale construction could be made more explicit (e.g., by indexing the sigma-algebra explicitly with the sequence of randomizations and observations) to assist readers less familiar with sequential analysis.
  3. [Table 2] Table 2 caption should clarify whether the reported Type I error rates are exact or Monte Carlo estimates and include the number of replications used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive feedback on our manuscript. The comments help clarify important aspects of the e-RT framework. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§3.2] §3.2, the continuous-endpoint wager definition: the conditional-expectation argument under the null is stated to equal 1, but the precise form of the wager function (e.g., how the continuous outcome is mapped to a bounded payoff) is not shown explicitly; this step is load-bearing for the martingale property and requires an expanded derivation or reference to the supporting lemma.

    Authors: We agree that additional detail on the continuous-endpoint wager would improve the exposition. In the revised manuscript, we will provide an explicit definition of the wager function for continuous outcomes, specifying how the outcome is mapped to a bounded payoff (for example, through a normalized transformation ensuring the payoff lies in [0,1]). We will then include a detailed derivation demonstrating that the conditional expectation under the sharp null hypothesis equals 1, based on the randomization distribution. This will be presented directly in §3.2 or as a supporting lemma in the appendix. revision: yes

  2. Referee: [§5] Simulation protocol in §5: the reported power curves for the GROW wager variant assume a specific clinically meaningful alternative; without a sensitivity analysis over a broader range of alternatives (including those near the boundary of detectability), the efficiency claim relative to the agnostic version remains incompletely supported.

    Authors: The simulations in §5 were designed to illustrate performance under a specific, clinically relevant alternative to highlight the benefits of the GROW wager. We recognize that a sensitivity analysis would provide stronger support for the efficiency claims. Accordingly, we will expand the simulation section to include additional scenarios, particularly those near the boundary of detectability, and report the corresponding power comparisons between the agnostic and design-calibrated variants. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via external randomization

full rationale

The paper constructs test martingales for sequential randomization tests whose validity under the sharp null follows from the external randomization mechanism: treatment assignments are independent of fixed potential outcomes, so each wealth increment has conditional expectation 1 by construction of the betting update. This property, combined with the standard martingale inequality (Ville), directly yields the anytime-valid Type I error bound for arbitrary stopping times. No equation reduces a derived quantity to a data-fitted parameter by definition, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in; the central guarantee is therefore independent of the paper's own outputs and rests on the stated randomization assumption plus classical probability results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the randomization mechanism as the source of validity and on the martingale property under the null; no free parameters or new entities are introduced in the abstract description.

axioms (1)
  • domain assumption Treatment assignments are randomized independently of potential outcomes under the null hypothesis of no treatment effect.
    Invoked to guarantee that expected wealth does not grow under the null.

pith-pipeline@v0.9.0 · 5467 in / 1185 out tokens · 30568 ms · 2026-05-17T01:59:25.417145+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

  1. [1]

    Duan, B., Ramdas, A., and Wasserman, L. (2022). Interactive rank testing by betting. In Schölkopf, B., Uhler, C., and Zhang, K., editors,Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 ofProceedings of Machine Learning Research, pages 201–235. PMLR. Grünwald, P., Ly, A., Perez-Ortiz, M., and Schure, J. T. (2021). The safe ...

  2. [2]

    Kelly, J. L. (1956). A new interpretation of information rate.Bell System Technical Journal, 35(4):917–926

  3. [3]

    Koning, N. W. (2025). Measuring evidence against exchangeability and group invariance with e- values. arXiv preprint arXiv:2310.01153

  4. [4]

    Ramdas, A. (2021). Game-theoretic probability and statistics (lecture notes). Accessed: 2025-12-09

  5. [5]

    Ramdas, A., Ruf, J., Larsson, M., and Koolen, W. M. (2022). Testing exchangeability: Fork- convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109

  6. [6]

    and Wang, R

    Ramdas, A. and Wang, R. (2025). Hypothesis testing with e-values.Foundations and Trends in Statistics, 1(1-2):1–390

  7. [7]

    Shafer, G. (2021). Testing by betting: A strategy for statistical and scientific communication. Journal of the Royal Statistical Society: Series A, 184(2):407–431

  8. [8]

    (1939).Étude critique de la notion de collectif

    Ville, J. (1939).Étude critique de la notion de collectif. PhD thesis, Gauthier-Villars, Paris

  9. [9]

    and Wang, R

    Vovk, V. and Wang, R. (2021). E-values: Calibration, combination and applications.Annals of Statistics, 49(3):1736–1754

  10. [10]

    Must␣specify␣either␣p_trt␣or␣hypothesized_ARR

    Waudby-Smith, I. and Ramdas, A. (2023). Estimating means of bounded random variables by betting.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):1–27. A R Code # e - RT : S e q u e n t i a l R a n d o m i z a t i o n Tests Using e - values # S u p p l e m e n t a r y R Code library(tidyverse) # --- e - RT : Binary Outcomes...