Sequentially-Rerandomized Switchback Experiments

Alonso Bucarey; Chao Qin; Christopher Adjaho; Paul Hoban; Ramesh Johari; Ruixuan Zhang; Stefan Wager; Zhenghao Zeng

arxiv: 2604.02489 · v1 · submitted 2026-04-02 · 📊 stat.ME

Sequentially-Rerandomized Switchback Experiments

Zhenghao Zeng , Christopher Adjaho , Alonso Bucarey , Chao Qin , Ruixuan Zhang , Paul Hoban , Ramesh Johari , Stefan Wager This is my paper

Pith reviewed 2026-05-13 20:43 UTC · model grok-4.3

classification 📊 stat.ME

keywords switchback experimentssequential randomizationtemporal dependencerandomization inferenceonline experimentscarryoverbalance constraintscausal inference

0 comments

The pith

Sequentially rerandomizing switchback experiments balances lagged outcomes and covariates to reduce variance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Sequentially-Rerandomized Switchback Experiments to run policy tests on platforms that have few operational units, strong temporal dependence, and possible non-stationarity. It shows that re-randomizing treatment each period while forcing balance on pre-specified variables built from past observations lets the design use the correlation between periods to cut variance. Finite-sample randomization tests under a sharp null and asymptotic results for growing numbers of periods are derived when there is no carryover. The method is then extended to first-order carryover by blocking on the prior treatment to create stable stay groups. The practical payoff is more reliable estimates from the same data volume when standard switchback or A/B designs are inefficient.

Core claim

SRSB re-randomizes treatment assignment at every time period so that pre-specified prognostic variables constructed from past observations remain balanced; this leverages temporal dependence to improve precision in the absence of carryover while preserving a known randomization distribution that supports both exact finite-sample inference under the sharp null and asymptotic inference as the number of periods increases.

What carries the argument

Sequentially-Rerandomized Switchback (SRSB) design, which at each period draws a new treatment vector that satisfies balance constraints on lagged outcomes and covariates.

Load-bearing premise

Pre-specified prognostic variables built from past observations can be balanced at each period without breaking the randomization distribution or creating selection bias.

What would settle it

Run parallel standard switchback and SRSB experiments on the same platform data with no carryover; if the empirical variance of the SRSB estimator is not smaller, the precision claim is false.

Figures

Figures reproduced from arXiv: 2604.02489 by Alonso Bucarey, Chao Qin, Christopher Adjaho, Paul Hoban, Ramesh Johari, Ruixuan Zhang, Stefan Wager, Zhenghao Zeng.

**Figure 2.** Figure 2: RMSE comparison between completely randomized experiments and sequential reran [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: RMSE vs ρ and proportion of RMSE reduced by sequential rerandomization. The potential outcomes are generated from an AR(1) model with covariates; see Section 5.1 for details of the data-generating process. as ρ grows. This is expected: as ρ becomes larger, the covariate becomes more prognostic, and the variance component of the difference-in-means estimator attributable to covariate imbalance increases. Ne… view at source ↗

**Figure 4.** Figure 4: Comparison of RMSE under different experimental designs in a setting with first-order [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of RMSE across different treatment effect sizes. The potential outcomes are [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Simulated trajectories for two randomly selected units under a Markovian carryover model [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Simulated trajectories for two randomly selected units under a Markovian carryover model [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Performance of the estimator across different experimental designs. The potential outcomes [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of RMSE under different experimental designs when individual treatment [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗

**Figure 10.** Figure 10: Properties of Wald-style confidence intervals based on the conservative variance estimator [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗

read the original abstract

Large-scale online platforms and marketplace systems often evaluate new policies through experiments that randomize treatment across operational units (e.g., geographies, regions, or clusters) over many time periods. In these settings, standard A/B testing can be inefficient or unreliable due to a limited number of units, substantial cross-unit heterogeneity, non-stationarity, and potential carryover across periods. We propose Sequentially-Rerandomized Switchback Experiments (SRSB), a new experimental design that helps mitigate these challenges. SRSB re-randomizes treatment at each time period such as to enforce balance on pre-specified prognostic variables constructed from past observations. In the absence of carryover, SRSB improves precision by leveraging temporal dependence through balancing lagged outcomes and covariates; we develop finite-sample randomization inference under a sharp null as well as asymptotic inference as the number of periods grows. We then extend SRSB to settings with first-order carryover and introduce a blocked SRSB variant that rerandomizes within strata defined by the previous treatment to form stable and comparable "stay" groups. Extensive simulations demonstrate the practical gains and robustness of SRSB relative to standard switchback designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SRSB adds per-period re-randomization to balance lagged variables in switchback designs, which can tighten precision when temporal dependence is present, but the exact finite-sample randomization inference claim looks shaky because the balancing step conditions on functions of potential outcomes.

read the letter

The main thing here is that the paper introduces Sequentially-Rerandomized Switchback Experiments, where treatment is re-randomized each period to enforce balance on pre-specified prognostic variables built from past observations. In no-carryover settings this leverages temporal structure for better precision than plain switchbacks, and they extend it with a blocked-strata version to handle first-order carryover by keeping comparable stay groups. That sequential balancing step is the concrete novelty beyond standard switchback or stratified designs. They lay out finite-sample randomization inference under the sharp null plus asymptotic results as the number of periods grows, and the simulations reportedly show robustness and gains over baselines. Credit for grounding the design in explicit randomization rules rather than post-fit modeling. The soft spot is the finite-sample claim. Conditioning current assignments on lagged outcomes (which depend on earlier treatments and responses) can restrict the support of the randomization distribution in a way that correlates with the potential outcomes themselves. This risks breaking the exchangeability needed for an exact test under the sharp null, and the blocked extension for carryover likely carries the same dependence issue. Simulations help but do not substitute for a clear proof that type I error stays controlled. The work is aimed at researchers running experiments on platforms or marketplaces with limited units and time-series structure. A reader focused on practical power improvements in temporal A/B testing would find the design and inference proposals useful. I would send it to peer review; the idea targets a real inefficiency and the authors have done enough formal and simulation work to merit referee scrutiny even if the exact-inference details need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Sequentially-Rerandomized Switchback Experiments (SRSB) for multi-period policy evaluation on online platforms. SRSB re-randomizes treatment assignments each period to enforce balance on pre-specified prognostic variables constructed from past observations. In the absence of carryover, it improves precision by leveraging temporal dependence through balancing lagged outcomes and covariates; finite-sample randomization inference under a sharp null and asymptotic inference as the number of periods grows are developed. The design is extended to first-order carryover via a blocked SRSB variant that rerandomizes within strata defined by the previous treatment to form stable groups. Extensive simulations are used to demonstrate gains relative to standard switchback designs.

Significance. If the inference procedures hold, SRSB provides a practical way to increase precision in switchback experiments with temporal dependence, limited units, and heterogeneity while retaining randomization-based inference. The dual finite-sample and asymptotic guarantees, together with the carryover extension, would strengthen the toolkit for marketplace and platform experimentation.

major comments (2)

[Abstract / finite-sample RI section] The central claim of exact finite-sample randomization inference under the sharp null (Abstract) is load-bearing. Sequential balancing on lagged outcomes, which are functions of prior treatments and responses, conditions the current-period assignment on quantities that depend on potential outcomes. This can restrict the support of the realized randomization distribution in a non-uniform way and introduce correlation with the potential outcomes, violating the exchangeability required for exact type-I-error control of the randomization test.
[Carryover extension] The blocked-strata extension for first-order carryover (Abstract) inherits the same conditioning issue and adds dependence induced by the previous-treatment strata. The manuscript must demonstrate that the conditional randomization distribution remains independent of the potential outcomes under the sharp null; otherwise the exactness claim does not extend to this variant.

minor comments (2)

[Design section] Provide explicit algorithmic pseudocode or a step-by-step description of how the sequential balancing is implemented without violating the pre-specified randomization probabilities.
[Simulations] The simulation section would benefit from tabulated coverage rates and power comparisons under both the sharp null and local alternatives, with clear statements of the data-generating processes and any exclusion rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The points raised about the exact finite-sample randomization inference are central, and we address them directly below. We maintain that the claims hold under the sharp null due to invariance of outcomes, but will revise the manuscript to clarify this reasoning explicitly.

read point-by-point responses

Referee: [Abstract / finite-sample RI section] The central claim of exact finite-sample randomization inference under the sharp null (Abstract) is load-bearing. Sequential balancing on lagged outcomes, which are functions of prior treatments and responses, conditions the current-period assignment on quantities that depend on potential outcomes. This can restrict the support of the realized randomization distribution in a non-uniform way and introduce correlation with the potential outcomes, violating the exchangeability required for exact type-I-error control of the randomization test.

Authors: Under the sharp null of no treatment effect whatsoever, potential outcomes are identical for every unit regardless of treatment history. Consequently, all lagged outcomes and derived prognostic variables are fixed quantities that do not vary with the realized treatment sequence. The sequential balancing therefore conditions only on these invariant observed values, so the randomization distribution at each step remains fully specified and independent of the (fixed) potential outcomes. This restores the exchangeability needed for exact type-I error control of the randomization test. We will add a short clarifying subsection in the finite-sample inference section that states this invariance explicitly and shows why the support of the randomization distribution is unaffected. revision: partial
Referee: [Carryover extension] The blocked-strata extension for first-order carryover (Abstract) inherits the same conditioning issue and adds dependence induced by the previous-treatment strata. The manuscript must demonstrate that the conditional randomization distribution remains independent of the potential outcomes under the sharp null; otherwise the exactness claim does not extend to this variant.

Authors: The same invariance argument applies to the blocked SRSB variant. Under the sharp null, outcomes are unaffected by prior treatment, so the strata defined by the previous treatment are formed solely by the design; the outcome values inside each stratum remain fixed and identical across units. Re-randomization within strata therefore proceeds according to a conditional distribution that is still independent of the potential outcomes. We will insert a brief proof sketch immediately after the description of the blocked design showing that the conditional randomization measure under the sharp null does not depend on the potential outcomes. revision: partial

Circularity Check

0 steps flagged

No significant circularity; design and inference are self-contained via explicit randomization rules

full rationale

The paper defines SRSB through explicit, pre-specified randomization rules that balance prognostic variables constructed from past observations without fitting parameters to the target estimands. Finite-sample randomization inference under the sharp null is derived directly from the induced randomization distribution, and asymptotic results follow from standard period-growth arguments. No load-bearing step reduces by construction to a fitted input, self-citation chain, or renamed ansatz; the central claims remain independent of the outcomes they target.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard randomization assumptions plus the domain assumption that prognostic variables can be pre-specified and balanced sequentially; no free parameters or invented entities are introduced in the abstract description.

axioms (2)

domain assumption Treatment can be re-randomized each period while maintaining a valid randomization distribution
Invoked for both finite-sample and asymptotic inference
domain assumption Prognostic variables from past observations are available and balanceable without bias
Core to the precision gain in the no-carryover case

pith-pipeline@v0.9.0 · 5515 in / 1154 out tokens · 49642 ms · 2026-05-13T20:43:01.912805+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SRSB re-randomizes treatment at each time period such as to enforce balance on pre-specified prognostic variables constructed from past observations.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop finite-sample randomization inference under a sharp null as well as asymptotic inference as the number of periods grows.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

For allj≥2, tj −t j−1 ≥2

work page
[2]

Assume that V 2 J /E[V 2 J ] P− →1 asT→ ∞

Define the blockwise predictable variance V 2 J = JX j=1 E     tj+1 −1X t=tj +1 (bτt −τ t)   2 Ftj −1   . Assume that V 2 J /E[V 2 J ] P− →1 asT→ ∞

work page
[3]

Z2 j ν2 j I(|Z j| ≥ϵ) # ν2 j ≤max 1≤j≤JT E

Let a2 t = ∥bτt −τ t∥2 2 S2 T = Var(bτt) S2 T . Assume: JX j=2 a2 tj →0, TX t=2 a2 t =O(1),max 1≤j≤J tj+1 −1X t=tj +1 a2 t →0.(11) 28 Then we have (T−1)(bτ−¯τ) ST d− →N(0,1). Most conditions are similar to Theorem 2 in the main text, except the third condition. Equation (11) ensures the contribution from the first time period of each block is negligible, ...

work page 1977

[1] [1]

For allj≥2, tj −t j−1 ≥2

work page

[2] [2]

Assume that V 2 J /E[V 2 J ] P− →1 asT→ ∞

Define the blockwise predictable variance V 2 J = JX j=1 E     tj+1 −1X t=tj +1 (bτt −τ t)   2 Ftj −1   . Assume that V 2 J /E[V 2 J ] P− →1 asT→ ∞

work page

[3] [3]

Z2 j ν2 j I(|Z j| ≥ϵ) # ν2 j ≤max 1≤j≤JT E

Let a2 t = ∥bτt −τ t∥2 2 S2 T = Var(bτt) S2 T . Assume: JX j=2 a2 tj →0, TX t=2 a2 t =O(1),max 1≤j≤J tj+1 −1X t=tj +1 a2 t →0.(11) 28 Then we have (T−1)(bτ−¯τ) ST d− →N(0,1). Most conditions are similar to Theorem 2 in the main text, except the third condition. Equation (11) ensures the contribution from the first time period of each block is negligible, ...

work page 1977