Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging
Pith reviewed 2026-05-10 04:44 UTC · model grok-4.3
The pith
Subbagging guarantees stability against random seed changes for any bounded-outcome regression algorithm and adaptive cross-bagging removes seed dependence from both nuisance estimation and sample splitting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning.
What carries the argument
Subbagging, which averages predictions across multiple random subsamples, enforces the concentration condition on seed-induced variation; adaptive cross-bagging extends this by dynamically adjusting splits to control seed effects in both nuisance models and sample partitioning.
If this is right
- Debiased machine learning estimators become independent of random seed choice.
- Stability holds for every bounded-outcome regression algorithm without further modification.
- Adaptive cross-bagging simultaneously neutralizes seed dependence in nuisance estimation and sample splitting.
- Targeted stability is reached at only a small computational cost relative to standard cross-fitting.
- Alternative seed-control methods either fail to achieve stability or incur substantially larger costs.
Where Pith is reading between the lines
- The same stability logic could be applied to other sources of algorithmic randomness such as hyperparameter sampling.
- Adoption would increase the reliability of causal estimates obtained from observational data via machine learning.
- Software libraries for debiased estimation could incorporate the procedure as a default option to enforce reproducibility.
- Relaxing the bounded-outcome requirement would extend the result to a wider class of modern regression methods.
Load-bearing premise
The regression algorithm must produce bounded outcomes for the subbagging stability proof to hold under the stated concentration condition.
What would settle it
Apply subbagging to an unbounded regression algorithm on data where predictions can grow without limit and check whether the concentration condition on seed variation is violated in the resulting debiased estimator.
Figures
read the original abstract
Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formalizes random seed stability in ML-based estimation via a concentration condition and proves that subbagging guarantees this stability for any bounded-outcome regression algorithm. It introduces adaptive cross-bagging, a cross-fitting procedure that removes seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments are used to show that the method achieves the targeted stability level with only a small computational penalty, unlike standard alternatives.
Significance. If the results hold, the work is significant for enhancing reproducibility in statistical applications of machine learning, such as debiased or double machine learning for causal inference. The formal proof provides a rigorous guarantee under stated assumptions, and adaptive cross-bagging offers a practical innovation addressing multiple instability sources simultaneously. Credit is due for the machine-checked-style formalization (via the concentration condition and subbagging properties) and the confirmatory experiments. The bounded-outcome scope is explicit but restricts broader impact.
major comments (1)
- [Theorem on subbagging stability (Section 3)] Theorem on subbagging stability (Section 3): the proof that subbagging delivers the concentration condition for seed stability is load-bearing on the bounded-outcome assumption for the base learner. This is correctly scoped in the statement, but the assumption is restrictive for typical nuisance estimators in debiased ML (e.g., neural nets or gradient boosting without explicit truncation), as unbounded predictions prevent the deviation probability from being controlled by increasing bags. The manuscript should add discussion or practical guidance on enforcing boundedness (e.g., via clipping or regularization) without distorting the downstream estimator.
minor comments (2)
- [Abstract] Abstract: the claim of a 'small computational penalty' is qualitative; adding a specific factor (e.g., relative runtime) or reference to the experimental table would improve clarity.
- [Numerical experiments] Experimental setup: the description of how bounded outcomes were handled (or if the tested algorithms naturally satisfied the assumption) is unclear; this detail is needed to evaluate whether the experiments directly support the theorem's scope.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript's significance and for the constructive major comment. We address the point below and will revise the manuscript accordingly to strengthen its practical guidance.
read point-by-point responses
-
Referee: [Theorem on subbagging stability (Section 3)] Theorem on subbagging stability (Section 3): the proof that subbagging delivers the concentration condition for seed stability is load-bearing on the bounded-outcome assumption for the base learner. This is correctly scoped in the statement, but the assumption is restrictive for typical nuisance estimators in debiased ML (e.g., neural nets or gradient boosting without explicit truncation), as unbounded predictions prevent the deviation probability from being controlled by increasing bags. The manuscript should add discussion or practical guidance on enforcing boundedness (e.g., via clipping or regularization) without distorting the downstream estimator.
Authors: We agree that the bounded-outcome assumption is essential to the subbagging guarantee, as it permits direct application of bounded concentration inequalities (e.g., Hoeffding) that would fail without it. The theorem statement already makes this scope explicit. In the revised manuscript we will insert a new paragraph immediately after the theorem in Section 3 that supplies the requested practical guidance. We will recommend (i) post-prediction clipping of each base-learner output to a fixed interval [a,b] chosen from domain knowledge or quantiles of the observed outcomes, noting that clipping preserves the boundedness required by the proof; (ii) use of bounded activation functions or output constraints during training of neural nets or boosted trees; and (iii) a brief sensitivity check showing that moderate clipping introduces negligible bias in the downstream doubly-robust estimator. We will also cite that many nuisance estimators in DML applications already operate on bounded or near-bounded targets (e.g., probabilities, normalized outcomes). These additions will not alter the formal results but will improve usability for practitioners. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper defines random seed stability explicitly as a concentration condition on the estimator across seeds, then derives that subbagging satisfies this condition for any regression algorithm whose outputs are bounded by a fixed M. This follows directly from applying a standard bounded-difference inequality (e.g., Hoeffding) to the average over independent bags; the bound M is an input assumption, not derived from the stability statement itself. Adaptive cross-bagging is constructed by combining this stable nuisance estimator with a modified sample-splitting scheme; neither step renames a fitted quantity as a prediction nor relies on a self-citation whose content is unverified. No equation reduces to its own inputs by construction, and the central claims remain independent of any prior work by the same authors.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Regression algorithms produce bounded outcomes
- domain assumption Random seed stability can be formalized as a concentration condition
Reference graph
Works this paper leans on
-
[1]
Andonova, S. , Elisseeff, A. , Evgeniou, T. & Pontil, M. (2002). A simple algorithm for learning stable machines. In ECAI
work page 2002
-
[2]
Anthropic (2025). Claude opus 4.6. https://www.anthropic.com. Large language model
work page 2025
- [3]
- [4]
-
[5]
Breiman, L. (2001). Random forests. Machine Learning 45, 5--32
work page 2001
-
[6]
Chernozhukov, V. , Chetverikov, D. , Demirer, M. , Duflo, E. , Hansen, C. , Newey, W. & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, C1--C68
work page 2018
-
[7]
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7, 1--26
work page 1979
-
[8]
Naimi, A. I. , Yu, Y.-H. & Bodnar, L. M. (2024). Pseudo-random number generator influences on average treatment effect estimates obtained with machine learning. Epidemiology 35, 779--786
work page 2024
-
[9]
Popoviciu, T. (1935). Sur les \'e quations alg \'e briques ayant toutes leurs racines r \'e elles. Mathematica 9, 20
work page 1935
-
[10]
R: A Language and Environment for Statistical Computing
R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
work page 2025
-
[11]
Robins, J. M. , Rotnitzky, A. & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89, 846--866
work page 1994
-
[12]
Rumelhart, D. E. , Hinton, G. E. & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature 323, 533--536
work page 1986
- [13]
-
[14]
Shalev-Shwartz, S. , Shamir, O. , Srebro, N. & Sridharan, K. (2010). Learnability, stability and uniform convergence. The Journal of Machine Learning Research 11, 2635--2670
work page 2010
-
[15]
Soloff, J. A. , Barber, R. F. & Willett, R. (2024). Bagging provides assumption-free stability. Journal of Machine Learning Research 25, 1--35
work page 2024
-
[16]
van der Laan, M. J. & Rubin, D. B. (2006). Targeted maximum likelihood learning. The International Journal of Biostatistics 2, Article 11
work page 2006
-
[17]
Williams, N. , Rosenblum, M. & D \' az, I. (2022). Optimising precision and power by machine learning in randomised trials with ordinal and time-to-event outcomes with an application to covid-19. Journal of the Royal Statistical Society Series A: Statistics in Society 185, 2156--2178
work page 2022
-
[18]
Williams, N. T. , Hung, A. & Rudolph, K. E. (2025). Re: Don’t let your analysis go to seed: On the impact of random seed on machine learning-based causal inference. Epidemiology 36, e12--e13
work page 2025
-
[19]
Zheng, W. & van der Laan, M. J. (2011). Cross-Validated Targeted Minimum-Loss-Based Estimation. New York, NY: Springer New York, pp. 459--474
work page 2011
-
[20]
Zivich, P. N. (2024). Commentary: The seedy side of causal effect estimation with machine learning. Epidemiology 35, 787--790
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.