High-Dimensional Analysis of Bootstrap Ensemble Classifiers

Balazs Kegl; Cosme Louart; Ekkehard Schnoor; Hamza Cherkaoui; Malik Tiomoko; Mohamed El Amine Seddik

arxiv: 2505.14587 · v2 · pith:6VPXSLHZnew · submitted 2025-05-20 · 📊 stat.ML · cs.LG

High-Dimensional Analysis of Bootstrap Ensemble Classifiers

Malik Tiomoko , Hamza Cherkaoui , Mohamed El Amine Seddik , Cosme Louart , Ekkehard Schnoor , Balazs Kegl This is my paper

Pith reviewed 2026-05-22 13:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords bootstrap ensemblesleast squares support vector machinesrandom matrix theoryhigh-dimensional classificationasymptotic analysisensemble regularization

0 comments

The pith

Random matrix theory supplies explicit rules for choosing bootstrap subset count and regularization to maximize high-dimensional LSSVM ensemble performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a high-dimensional asymptotic analysis of bootstrap ensembles of least squares support vector machines. With both sample size n and dimension p tending to infinity at fixed ratio, random matrix techniques produce deterministic equivalents for the aggregated classifier's decision function. These equivalents directly yield formulas for the number of bootstrap subsets and the regularization parameter that minimize asymptotic error. The resulting selection rules are tested on synthetic and real datasets to confirm they improve accuracy over untuned choices.

Core claim

Under the high-dimensional asymptotic regime where p/n converges to a positive constant, the error of a bootstrap ensemble of LSSVMs converges to a deterministic expression involving the population covariance, the aspect ratio, the regularization strength, and the number of subsets. Substituting the deterministic equivalent for the random Gram matrix and the bootstrap sampling indicators produces this expression, from which the authors extract the optimal subset count and lambda that minimize the limiting risk.

What carries the argument

The deterministic equivalent of the bootstrap-averaged LSSVM predictor, obtained via random matrix theory, which replaces the stochastic ensemble decision function with a non-random formula depending on the data covariance and the limiting ratio p/n.

If this is right

The optimal number of subsets increases with dimension according to a formula derived from the deterministic equivalent.
Regularization should be set proportionally to the effective noise level in the high-dimensional limit rather than by cross-validation alone.
The asymptotic formulas allow direct computation of good hyperparameters without repeated training runs.
The same limiting expressions explain why certain bootstrap sizes outperform others on both synthetic and real high-dimensional data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same random-matrix approach could be used to derive tuning rules for bootstrap ensembles built from other linear or kernel classifiers.
In regimes where p/n is very large the theory predicts that only a moderate number of subsets is needed before additional ones add little value.
Direct comparison of predicted versus observed optimal B on new high-dimensional classification tasks would test the practical range of the asymptotics.

Load-bearing premise

The data covariance satisfies standard random-matrix assumptions and both sample size and dimension grow large with their ratio fixed.

What would settle it

On a dataset with known aspect ratio p/n approximately equal to a constant, measure test error for the LSSVM bootstrap ensemble at the theoretically optimal B and lambda versus several nearby values and check whether the predicted optimum is indeed lowest.

read the original abstract

Bootstrap methods have long been the cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Using tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RMT asymptotics for bootstrap LSSVM ensembles look technically solid but the selection rules probably need observable proxies to be usable.

read the letter

The main thing to know is that this paper derives high-dimensional asymptotic expressions for the performance of bootstrap ensembles of least-squares SVMs and then turns those into rules for picking the number of subsets and the regularization strength. It is a direct extension of prior random-matrix work on single SVMs or non-bootstrap ensembles, and the authors claim the expressions let you optimize those two choices without exhaustive search. That combination is the actual novelty here. The empirical section checks the asymptotics on synthetic data where the high-dim regime can be controlled and also runs some real-world examples, which is the right way to do this kind of theory. The derivations appear to rest on the usual Marchenko-Pastur-type assumptions plus bootstrap sampling, and the abstract gives no sign of post-hoc fitting on the validation data. That is a point in its favor. The soft spot is exactly the one the stress-test note flags: the optimal subset count and regularization expressions are likely to involve the population covariance spectrum or the limiting Stieltjes transform. If those quantities are left in terms of unknowns rather than replaced by consistent data-driven estimators, the proposed strategies stay theoretical and cannot be applied directly. The paper does not make clear in the abstract how they close that gap, so that is the section worth checking first in the full text. This work is aimed at people who already follow random-matrix analyses of kernel methods and want to see the bootstrap case worked out. A practitioner looking for plug-and-play tuning rules will probably find it thin on implementation details. I would send it to peer review. The technical core is coherent enough and the experiments are present, so a referee can sort out whether the selection rules are actually computable from observables or still require oracle quantities.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a high-dimensional asymptotic analysis, using random matrix theory, of bootstrap ensemble Least Squares Support Vector Machines (LSSVM). It derives performance expressions for an ensemble that aggregates decision functions from multiple weak LSSVM classifiers each trained on bootstrap subsets, and proposes explicit strategies for selecting both the number of subsets and the regularization parameter to maximize performance. The theoretical results are claimed to be validated by experiments on synthetic and real-world data.

Significance. If the RMT derivations are correct and the selection rules are shown to be computable from observable quantities alone, the work would supply practical, theoretically grounded tuning guidelines for bootstrap ensembles in the p/n → γ regime, where cross-validation is often unreliable or expensive.

major comments (2)

[§3] §3 (high-dimensional analysis): the asymptotic performance expressions and the subsequent selection rules for the number of bootstrap subsets and the regularization parameter are not shown to be free of dependence on the population covariance spectrum or the limiting Stieltjes transform; standard RMT results in this regime typically require these quantities, and it is unclear whether the proposed strategies supply data-driven estimators or plug-in rules that avoid them.
[§5] §5 (empirical validation): the reported performance gains on real datasets (e.g., the tables comparing different subset counts and regularization values) do not include an ablation or description of how the theoretically optimal parameters were obtained in practice without oracle access to population statistics, leaving the central claim that the strategies are directly usable unsupported.

minor comments (2)

[Notation and §2] The notation for the bootstrap sampling matrix and the ensemble aggregation weights is introduced inconsistently between the theoretical sections and the experimental setup.
[Figure 3] Figure 3 caption does not specify the number of Monte Carlo repetitions used to generate the empirical curves, making it difficult to assess the variability shown.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive major comments. We address each point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (high-dimensional analysis): the asymptotic performance expressions and the subsequent selection rules for the number of bootstrap subsets and the regularization parameter are not shown to be free of dependence on the population covariance spectrum or the limiting Stieltjes transform; standard RMT results in this regime typically require these quantities, and it is unclear whether the proposed strategies supply data-driven estimators or plug-in rules that avoid them.

Authors: We agree that the limiting expressions derived in Section 3 are expressed in terms of the Stieltjes transform of the population covariance. However, the selection rules we propose are formulated as plug-in estimators that replace the population quantities by their consistent high-dimensional estimators constructed from the empirical covariance matrix of the observed data (and from the bootstrap samples themselves). These estimators are data-driven and do not require oracle knowledge of the spectrum. We will add a dedicated paragraph in Section 3.4 that explicitly states the estimation procedure, cites the relevant consistency results from the RMT literature, and shows that the resulting rules remain asymptotically optimal. revision: yes
Referee: [§5] §5 (empirical validation): the reported performance gains on real datasets (e.g., the tables comparing different subset counts and regularization values) do not include an ablation or description of how the theoretically optimal parameters were obtained in practice without oracle access to population statistics, leaving the central claim that the strategies are directly usable unsupported.

Authors: We acknowledge that the current experimental description is insufficient on this point. In the reported experiments the optimal number of subsets and regularization parameter were obtained by applying the data-driven plug-in rules described in Section 3.4 to the sample covariance of each dataset. We will revise Section 5 to include (i) a precise description of the estimation steps used on the real data, (ii) an ablation table that compares the performance obtained with the estimated parameters against the oracle (population) versions, and (iii) a short discussion of the observed gap between the two, thereby demonstrating that the claimed performance gains are achievable from observable quantities alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rely on standard RMT asymptotics and empirical validation

full rationale

The paper derives high-dimensional performance limits for bootstrap LSSVM ensembles via random matrix theory under p/n → γ with standard covariance and sampling assumptions. The proposed selection strategies for subset count and regularization follow directly from these limits and are validated on separate synthetic and real datasets. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citations close the central argument, and no ansatz or uniqueness result is smuggled in via prior work by the same authors. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The analysis rests on the high-dimensional limit and standard random-matrix assumptions on data and bootstrap sampling; no new entities are introduced.

free parameters (1)

regularization parameter
Selected to maximize performance according to the derived expressions; its optimal value depends on the high-dimensional ratio and is therefore fitted to the problem dimensions.

axioms (2)

domain assumption High-dimensional asymptotic regime with p/n -> gamma in (0,infty)
Invoked to obtain closed-form RMT expressions for the ensemble risk.
domain assumption Bootstrap subsets are drawn independently with replacement
Standard bootstrap assumption used to model the ensemble aggregation.

pith-pipeline@v0.9.0 · 5668 in / 1272 out tokens · 35818 ms · 2026-05-22T13:43:42.575268+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1 (Asymptotic Distribution of the Decision Score) ... mℓ = ỹ⊤ Dc Dδ M⊤ Q̄ μℓ, σℓ expressed via traces of Kℓ, Vℓ and deterministic equivalents
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 3.1 (Large n, Large d) ... d/n → c0, m = O(1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.