pith. sign in

arxiv: 2604.17694 · v1 · submitted 2026-04-20 · 📊 stat.ME · cs.LG· stat.ML

Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging

Pith reviewed 2026-05-10 04:44 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords random seed stabilitysubbaggingcross-baggingdebiased machine learningreproducibilitynuisance estimationconcentration conditionmachine learning estimation
0
0 comments X

The pith

Subbagging guarantees stability against random seed changes for any bounded-outcome regression algorithm and adaptive cross-bagging removes seed dependence from both nuisance estimation and sample splitting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Machine learning predictions can shift with different random seeds, which propagates instability into debiased statistical estimators that use those predictions as nuisance components. The paper defines random seed stability through a concentration condition and proves that subbagging, by averaging over multiple subsamples, satisfies the condition for any regression method whose outputs remain bounded. It then presents adaptive cross-bagging as a modified cross-fitting scheme that simultaneously removes seed effects from the nuisance fits and from the data-splitting step. Experiments confirm the procedure reaches the intended stability level while adding only a modest computational overhead compared with fixing seeds or other alternatives.

Core claim

We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning.

What carries the argument

Subbagging, which averages predictions across multiple random subsamples, enforces the concentration condition on seed-induced variation; adaptive cross-bagging extends this by dynamically adjusting splits to control seed effects in both nuisance models and sample partitioning.

If this is right

  • Debiased machine learning estimators become independent of random seed choice.
  • Stability holds for every bounded-outcome regression algorithm without further modification.
  • Adaptive cross-bagging simultaneously neutralizes seed dependence in nuisance estimation and sample splitting.
  • Targeted stability is reached at only a small computational cost relative to standard cross-fitting.
  • Alternative seed-control methods either fail to achieve stability or incur substantially larger costs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stability logic could be applied to other sources of algorithmic randomness such as hyperparameter sampling.
  • Adoption would increase the reliability of causal estimates obtained from observational data via machine learning.
  • Software libraries for debiased estimation could incorporate the procedure as a default option to enforce reproducibility.
  • Relaxing the bounded-outcome requirement would extend the result to a wider class of modern regression methods.

Load-bearing premise

The regression algorithm must produce bounded outcomes for the subbagging stability proof to hold under the stated concentration condition.

What would settle it

Apply subbagging to an unbounded regression algorithm on data where predictions can grow without limit and check whether the concentration condition on seed variation is violated in the resulting debiased estimator.

Figures

Figures reproduced from arXiv: 2604.17694 by Alejandro Schuler, Nicholas Williams.

Figure 1
Figure 1. Figure 1: Empirical seed (𝜖, 𝛿)-stability. Left: single layer neural network predictions. The red line corresponds to no bagging and the black line to subbagging using the estimated minimum 𝑉-bags for seed (0.1, 0.1)-stability (dashed line). Right: AIPW estimator for the ATE. The blue, red, and green lines correspond to 2-fold, 10-fold, and LOO cross-fitting respectively; the black line corresponds to adaptive cross… view at source ↗
read the original abstract

Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript formalizes random seed stability in ML-based estimation via a concentration condition and proves that subbagging guarantees this stability for any bounded-outcome regression algorithm. It introduces adaptive cross-bagging, a cross-fitting procedure that removes seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments are used to show that the method achieves the targeted stability level with only a small computational penalty, unlike standard alternatives.

Significance. If the results hold, the work is significant for enhancing reproducibility in statistical applications of machine learning, such as debiased or double machine learning for causal inference. The formal proof provides a rigorous guarantee under stated assumptions, and adaptive cross-bagging offers a practical innovation addressing multiple instability sources simultaneously. Credit is due for the machine-checked-style formalization (via the concentration condition and subbagging properties) and the confirmatory experiments. The bounded-outcome scope is explicit but restricts broader impact.

major comments (1)
  1. [Theorem on subbagging stability (Section 3)] Theorem on subbagging stability (Section 3): the proof that subbagging delivers the concentration condition for seed stability is load-bearing on the bounded-outcome assumption for the base learner. This is correctly scoped in the statement, but the assumption is restrictive for typical nuisance estimators in debiased ML (e.g., neural nets or gradient boosting without explicit truncation), as unbounded predictions prevent the deviation probability from being controlled by increasing bags. The manuscript should add discussion or practical guidance on enforcing boundedness (e.g., via clipping or regularization) without distorting the downstream estimator.
minor comments (2)
  1. [Abstract] Abstract: the claim of a 'small computational penalty' is qualitative; adding a specific factor (e.g., relative runtime) or reference to the experimental table would improve clarity.
  2. [Numerical experiments] Experimental setup: the description of how bounded outcomes were handled (or if the tested algorithms naturally satisfied the assumption) is unclear; this detail is needed to evaluate whether the experiments directly support the theorem's scope.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript's significance and for the constructive major comment. We address the point below and will revise the manuscript accordingly to strengthen its practical guidance.

read point-by-point responses
  1. Referee: [Theorem on subbagging stability (Section 3)] Theorem on subbagging stability (Section 3): the proof that subbagging delivers the concentration condition for seed stability is load-bearing on the bounded-outcome assumption for the base learner. This is correctly scoped in the statement, but the assumption is restrictive for typical nuisance estimators in debiased ML (e.g., neural nets or gradient boosting without explicit truncation), as unbounded predictions prevent the deviation probability from being controlled by increasing bags. The manuscript should add discussion or practical guidance on enforcing boundedness (e.g., via clipping or regularization) without distorting the downstream estimator.

    Authors: We agree that the bounded-outcome assumption is essential to the subbagging guarantee, as it permits direct application of bounded concentration inequalities (e.g., Hoeffding) that would fail without it. The theorem statement already makes this scope explicit. In the revised manuscript we will insert a new paragraph immediately after the theorem in Section 3 that supplies the requested practical guidance. We will recommend (i) post-prediction clipping of each base-learner output to a fixed interval [a,b] chosen from domain knowledge or quantiles of the observed outcomes, noting that clipping preserves the boundedness required by the proof; (ii) use of bounded activation functions or output constraints during training of neural nets or boosted trees; and (iii) a brief sensitivity check showing that moderate clipping introduces negligible bias in the downstream doubly-robust estimator. We will also cite that many nuisance estimators in DML applications already operate on bounded or near-bounded targets (e.g., probabilities, normalized outcomes). These additions will not alter the formal results but will improve usability for practitioners. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper defines random seed stability explicitly as a concentration condition on the estimator across seeds, then derives that subbagging satisfies this condition for any regression algorithm whose outputs are bounded by a fixed M. This follows directly from applying a standard bounded-difference inequality (e.g., Hoeffding) to the average over independent bags; the bound M is an input assumption, not derived from the stability statement itself. Adaptive cross-bagging is constructed by combining this stable nuisance estimator with a modified sample-splitting scheme; neither step renames a fitted quantity as a prediction nor relies on a self-citation whose content is unverified. No equation reduces to its own inputs by construction, and the central claims remain independent of any prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on formalizing stability as a concentration condition and assuming bounded outcomes for the regression algorithms to enable the subbagging proof. No free parameters or invented entities are indicated in the abstract.

axioms (2)
  • domain assumption Regression algorithms produce bounded outcomes
    Required for the subbagging stability guarantee under the concentration condition.
  • domain assumption Random seed stability can be formalized as a concentration condition
    This is the key formalization used to prove the result.

pith-pipeline@v0.9.0 · 5389 in / 1269 out tokens · 36460 ms · 2026-05-10T04:44:38.488748+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    , Elisseeff, A

    Andonova, S. , Elisseeff, A. , Evgeniou, T. & Pontil, M. (2002). A simple algorithm for learning stable machines. In ECAI

  2. [2]

    Claude opus 4.6

    Anthropic (2025). Claude opus 4.6. https://www.anthropic.com. Large language model

  3. [3]

    (1996 a )

    Breiman, L. (1996 a ). Bagging predictors. Machine learning 24, 123--140

  4. [4]

    (1996 b )

    Breiman, L. (1996 b ). Heuristics of instability and stabilization in model selection. The annals of statistics 24, 2350--2383

  5. [5]

    Breiman, L. (2001). Random forests. Machine Learning 45, 5--32

  6. [6]

    , Chetverikov, D

    Chernozhukov, V. , Chetverikov, D. , Demirer, M. , Duflo, E. , Hansen, C. , Newey, W. & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, C1--C68

  7. [7]

    Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7, 1--26

  8. [8]

    Naimi, A. I. , Yu, Y.-H. & Bodnar, L. M. (2024). Pseudo-random number generator influences on average treatment effect estimates obtained with machine learning. Epidemiology 35, 779--786

  9. [9]

    Popoviciu, T. (1935). Sur les \'e quations alg \'e briques ayant toutes leurs racines r \'e elles. Mathematica 9, 20

  10. [10]

    R: A Language and Environment for Statistical Computing

    R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

  11. [11]

    Robins, J. M. , Rotnitzky, A. & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89, 846--866

  12. [12]

    Rumelhart, D. E. , Hinton, G. E. & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature 323, 533--536

  13. [13]

    , Song, W

    Schader, L. , Song, W. , Kempker, R. & Benkeser, D. (2024). Don’t let your analysis go to seed: on the impact of random seed on machine learning-based causal inference. Epidemiology 35, 764--778

  14. [14]

    , Shamir, O

    Shalev-Shwartz, S. , Shamir, O. , Srebro, N. & Sridharan, K. (2010). Learnability, stability and uniform convergence. The Journal of Machine Learning Research 11, 2635--2670

  15. [15]

    Soloff, J. A. , Barber, R. F. & Willett, R. (2024). Bagging provides assumption-free stability. Journal of Machine Learning Research 25, 1--35

  16. [16]

    van der Laan, M. J. & Rubin, D. B. (2006). Targeted maximum likelihood learning. The International Journal of Biostatistics 2, Article 11

  17. [17]

    , Rosenblum, M

    Williams, N. , Rosenblum, M. & D \' az, I. (2022). Optimising precision and power by machine learning in randomised trials with ordinal and time-to-event outcomes with an application to covid-19. Journal of the Royal Statistical Society Series A: Statistics in Society 185, 2156--2178

  18. [18]

    Williams, N. T. , Hung, A. & Rudolph, K. E. (2025). Re: Don’t let your analysis go to seed: On the impact of random seed on machine learning-based causal inference. Epidemiology 36, e12--e13

  19. [19]

    & van der Laan, M

    Zheng, W. & van der Laan, M. J. (2011). Cross-Validated Targeted Minimum-Loss-Based Estimation. New York, NY: Springer New York, pp. 459--474

  20. [20]

    Zivich, P. N. (2024). Commentary: The seedy side of causal effect estimation with machine learning. Epidemiology 35, 787--790