Gradient Regularized Newton Boosting Trees with Global Convergence

Daniel Falkowski; Lukas Gonon; Nikita Zozoulenko; Thomas Cass

arxiv: 2605.00581 · v1 · submitted 2026-05-01 · 📊 stat.ML · cs.LG· math.OC

Gradient Regularized Newton Boosting Trees with Global Convergence

Nikita Zozoulenko , Daniel Falkowski , Thomas Cass , Lukas Gonon This is my paper

Pith reviewed 2026-05-09 18:48 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC

keywords Newton boostinggradient boosting decision treesglobal convergencesecond-order optimizationadaptive regularizationconvergence ratesweak learnersHilbert space

0 comments

The pith

A gradient-regularized Newton boosting scheme for decision trees achieves global convergence at an O(1/k²) rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for second-order optimization restricted to spaces of weak learners such as decision trees. It first shows that ordinary Newton boosting converges linearly when losses are smooth and strongly convex and obey a Hessian-dominance condition. For broader convex losses whose Hessians are Lipschitz continuous, the authors add a single adaptive ℓ₂-regularization term scaled by the square root of the current gradient norm. This small change produces a globally convergent algorithm whose rate matches the best known rate for first-order methods that use Nesterov momentum. If the result holds, practitioners gain a theoretically justified way to run reliable second-order GBDT training on a wide range of loss functions.

Core claim

Within the Restricted Newton Descent framework for convex optimization with inexact iterates on Hilbert spaces, the gradient-regularized Newton scheme that introduces an adaptive ℓ₂-regularization term proportional to the square root of the gradient norm at each step attains an O(1/k²) convergence rate for convex losses with Lipschitz-continuous Hessians, recovering both classical finite-dimensional Newton theory and Newton boosting with GBDTs as special cases.

What carries the argument

The gradient regularized Newton scheme, which minimally modifies each Newton step by an adaptive ℓ₂-regularization term proportional to the square root of the gradient norm, studied inside the Restricted Newton Descent framework that tracks cosine angle and weak gradient edge.

If this is right

Vanilla Newton boosting converges linearly under the Hessian-dominance condition for smooth strongly convex losses.
The regularized scheme converges globally at O(1/k²) for general convex losses with Lipschitz Hessians.
The achieved rate matches the rate of first-order boosting accelerated by Nesterov momentum.
Numerical experiments confirm that the regularized method converges on problems where vanilla Newton boosting diverges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive regularization idea could be tested on other restricted optimization problems such as boosting with neural-network weak learners.
If the Restricted Newton Descent framework extends to non-convex losses, it might guide stabilization of second-order methods in deep learning.
Software implementations of GBDT could incorporate the regularization term as a default safeguard against divergence on arbitrary loss functions.

Load-bearing premise

Loss functions must satisfy either a Hessian-dominance condition or have Lipschitz-continuous Hessians, while optimization remains restricted to the Hilbert space generated by the weak learners.

What would settle it

Running the regularized algorithm on a convex loss with Lipschitz-continuous Hessian and observing either divergence or a convergence rate slower than O(1/k²) would disprove the claimed global convergence guarantee.

Figures

Figures reproduced from arXiv: 2605.00581 by Daniel Falkowski, Lukas Gonon, Nikita Zozoulenko, Thomas Cass.

**Figure 1.** Figure 1: Charbonnier train loss on the Wine Quality UCI dataset, for different gradient boosting schemes and learning rates η. Vanilla Newton boosting diverges for η = 1. defined in different geometries. Conceptually, γk (Theorem 4.6) can be viewed as an approximate (Hk + λkI) 2 - induced cosine angle between f w k+1 and fk+1, in contrast to the standard Hk-induced cosine angle Θk. If f w k+1 and fk+1 were orthogo… view at source ↗

read the original abstract

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new adaptive regularization trick that yields O(1/k²) global convergence for Newton boosting in the GBDT setting, but the key step is whether the weak-learner angle condition survives the regularization.

read the letter

The punchline is that this work supplies the first global rate for second-order GBDT methods that matches accelerated first-order boosting. They build a Restricted Newton Descent framework in Hilbert space that treats inexact tree steps via cosine angle and weak gradient edge, recovers ordinary finite-dimensional Newton as a special case, and then adds an adaptive ℓ₂ term proportional to the square root of the current gradient norm. Under Hessian dominance they recover linear convergence for vanilla Newton boosting; under Lipschitz Hessians the regularized version gets the O(1/k²) guarantee. Experiments are said to show divergence of the unregularized version and convergence of the new one. That is genuinely new relative to the cited literature on Newton boosting and gradient-regularized Newton methods. The framework is clean and the regularization is minimal, which is a plus for practical relevance in tabular ML. The soft spot is exactly the one the stress-test note flags: once the adaptive regularizer is added, the effective loss changes at each iteration, so it is not obvious that the tree weak learner still produces directions whose cosine angle or weak gradient edge stays bounded away from zero uniformly. If that bound degrades with the regularization strength, the descent lemma used for the accelerated rate fails for actual GBDTs. The abstract does not spell out how the approximation quality is preserved, so that part of the argument needs explicit verification in the full proof. The Hessian-dominance and Lipschitz-Hessian assumptions are standard but still restrictive for real losses. Overall this is aimed at theorists working on convergence of boosting and functional-space optimization. A reader who wants rates for second-order tree methods or who implements production GBDT code would find the regularization device and the framework useful. The paper is coherent on its own terms and shows clear engagement with the literature, so it deserves a serious referee even if the Hilbert-space extension requires tightening.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Restricted Newton Descent framework for convex optimization in Hilbert spaces with inexact Newton iterates, defined via cosine angle and weak gradient edge concepts. It recovers classical finite-dimensional Newton theory and GBDT Newton boosting as special cases. The main results are a linear convergence rate for vanilla Newton boosting under a Hessian-dominance condition for smooth strongly convex losses, and an O(1/k²) rate for a gradient-regularized Newton scheme (with adaptive ℓ₂ regularization proportional to √‖∇‖) under Lipschitz-continuous Hessians. This yields the first global convergence guarantee for second-order GBDT methods with a rate matching Nesterov-accelerated first-order boosting. Experiments illustrate divergence of vanilla Newton boosting and convergence of the regularized variant.

Significance. If the O(1/k²) guarantee extends rigorously to tree-based weak learners, the work would close a notable gap in the theory of Newton boosting for GBDTs (the basis of XGBoost, LightGBM, CatBoost), providing the first rate-competitive global convergence result for these widely used second-order methods and explaining their empirical robustness.

major comments (2)

[Proof of the O(1/k²) rate (extension from finite-dimensional to restricted Hilbert-space setting)] The central O(1/k²) claim for the gradient-regularized scheme in the GBDT setting rests on extending the finite-dimensional analysis while preserving the restricted weak-learner condition (bounded cosine angle or weak gradient edge away from zero). The adaptive ℓ₂ term alters the effective loss at each iteration; the manuscript must explicitly show that the tree weak learner's approximation quality (angle condition) remains uniformly bounded independently of the regularization strength, otherwise the descent lemma fails.
[Statement and use of the Hessian-dominance condition] The Hessian-dominance condition invoked for the linear-rate result on vanilla Newton boosting is an ad-hoc assumption not standard in convex analysis; its necessity and relation to strong convexity should be clarified, as it appears load-bearing for recovering classical Newton theory as a special case.

minor comments (2)

[Numerical experiments] The abstract claims experiments demonstrate divergence of vanilla Newton boosting, but quantitative details (e.g., iteration counts, loss curves, or comparison to Nesterov first-order rates) are needed to support the rate-matching claim.
[Definition of the gradient-regularized scheme] Notation for the adaptive regularization parameter (proportional to √‖∇‖) should be introduced with an explicit equation number when first defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive major comments. We address each point below and indicate the revisions we will make to strengthen the presentation and proofs.

read point-by-point responses

Referee: [Proof of the O(1/k²) rate (extension from finite-dimensional to restricted Hilbert-space setting)] The central O(1/k²) claim for the gradient-regularized scheme in the GBDT setting rests on extending the finite-dimensional analysis while preserving the restricted weak-learner condition (bounded cosine angle or weak gradient edge away from zero). The adaptive ℓ₂ term alters the effective loss at each iteration; the manuscript must explicitly show that the tree weak learner's approximation quality (angle condition) remains uniformly bounded independently of the regularization strength, otherwise the descent lemma fails.

Authors: We agree that an explicit verification is required for rigor. In the Restricted Newton Descent framework, the gradient-regularized step uses the modified direction ∇f(x_k) + λ_k x_k with λ_k = c √‖∇f(x_k)‖. The weak learner is required to satisfy the cosine-angle condition only with respect to this regularized gradient. Under the Lipschitz-Hessian assumption, the angle between the original and regularized gradients is bounded by a constant that depends only on c and the Lipschitz constant, independent of k. We will add a short supporting lemma (new Lemma 4.3) that quantifies this perturbation and shows the cosine-angle lower bound remains uniform. This lemma directly ensures the descent inequality continues to hold with the same constants used in the O(1/k²) proof. The revision will be incorporated in Section 4. revision: yes
Referee: [Statement and use of the Hessian-dominance condition] The Hessian-dominance condition invoked for the linear-rate result on vanilla Newton boosting is an ad-hoc assumption not standard in convex analysis; its necessity and relation to strong convexity should be clarified, as it appears load-bearing for recovering classical Newton theory as a special case.

Authors: We appreciate the request for clarification. The Hessian-dominance condition is introduced precisely because the Newton direction is restricted to the linear span of the weak learners; it ensures that the inexact Newton step still produces a sufficient descent direction relative to the true gradient in this subspace. While the name is new, the condition reduces to a standard consequence of strong convexity when the weak learner can approximate the exact Newton direction arbitrarily closely (i.e., when the cosine angle approaches 1). We will expand the discussion in Section 3.1 with a new proposition that shows: (i) for μ-strongly convex and L-smooth losses the condition holds whenever the weak-learner approximation quality exceeds a threshold depending on μ/L, and (ii) the classical finite-dimensional Newton method is recovered exactly when the restriction is removed. These additions will make the role of the assumption transparent without changing its statement. revision: yes

Circularity Check

0 steps flagged

Derivation built from standard convex-analysis primitives; recovers classical Newton theory as special case with no load-bearing self-referential reductions.

full rationale

The paper defines Restricted Newton Descent via cosine angle and weak gradient edge in Hilbert space, proves linear convergence under Hessian-dominance for vanilla Newton boosting, and extends a gradient-regularized scheme for the O(1/k²) rate under Lipschitz Hessians. These steps use explicit assumptions on the loss and weak learner without defining quantities in terms of the target rates or fitting parameters to the predictions. The adaptive ℓ₂ term is introduced directly from the gradient norm, and the tree setting is recovered as a special case without circular renaming or self-citation chains that force the result. The framework is self-contained against external benchmarks from convex optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 1 invented entities

The central claims rest on standard smoothness/strong-convexity assumptions plus the newly introduced Restricted Newton Descent framework and the Hessian-dominance condition; no free parameters are introduced because the regularization strength is defined adaptively from the gradient norm.

axioms (3)

domain assumption Loss is smooth and strongly convex
Invoked for the linear-rate result on vanilla Newton boosting
ad hoc to paper Hessian-dominance condition
Required for the linear convergence proof of vanilla Newton boosting
domain assumption Hessians are Lipschitz continuous
Used to obtain the O(1/k²) rate for the gradient-regularized scheme

invented entities (1)

Restricted Newton Descent framework no independent evidence
purpose: To analyze inexact Newton steps on Hilbert spaces of weak learners
New abstraction introduced to unify GBDT boosting with classical Newton theory

pith-pipeline@v0.9.0 · 5554 in / 1645 out tokens · 59903 ms · 2026-05-09T18:48:59.834120+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

148 extracted references · 148 canonical work pages

[1]

ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=

Dempster, Angus and Petitjean, Fran. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=. 2020 , volume=

work page 2020
[2]

and Webb, Geoffrey I

Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title =. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages =. 2021 , isbn =

work page 2021
[3]

, title=

Tan, Chang Wei and Dempster, Angus and Bergmeir, Christoph and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

work page
[4]

and Webb, Geoffrey I

Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

work page
[5]

Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=

Middlehurst, Matthew and Sch. Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=. 2024 , volume=

work page 2024
[6]

Breiman, Leo , journal =

work page
[7]

1997 , issn =

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , issn =

work page 1997
[8]

The Annals of Statistics , number =

Jerome Friedman and Trevor Hastie and Robert Tibshirani , title =. The Annals of Statistics , number =

work page
[9]

Machine Learning , keywords =

Breiman, Leo , description =. Machine Learning , keywords =

work page
[10]

echo state

Jaeger, Herbert , year =. The" echo state" approach to analysing and training recurrent neural networks-with an erratum note' , volume =

work page
[11]

and Shrestha, D.L

Solomatine, D.P. and Shrestha, D.L. , booktitle=. AdaBoost.RT: a boosting algorithm for regression problems , year=

work page
[12]

Extreme learning machine: a new learning scheme of feedforward neural networks , year=

Guang-Bin Huang and Qin-Yu Zhu and Chee-Kheong Siew , booktitle=. Extreme learning machine: a new learning scheme of feedforward neural networks , year=

work page
[13]

Greedy Layer-Wise Training of Deep Networks , volume =

Bengio, Yoshua and Lamblin, Pascal and Popovici, Dan and Larochelle, Hugo , booktitle =. Greedy Layer-Wise Training of Deep Networks , volume =

work page
[14]

Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

Huang, Guang-Bin and Chen, Lei and Siew, Chee-Kheong , journal=. Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

work page
[15]

Statistical Comparisons of Classifiers over Multiple Data Sets , journal =

Janez Dem. Statistical Comparisons of Classifiers over Multiple Data Sets , journal =. 2006 , volume =

work page 2006
[16]

Machine Learning , year=

Geurts, Pierre and Ernst, Damien and Wehenkel, Louis , title=. Machine Learning , year=

work page
[17]

Random Features for Large-Scale Kernel Machines , volume =

Rahimi, Ali and Recht, Benjamin , booktitle =. Random Features for Large-Scale Kernel Machines , volume =

work page
[18]

An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =

Salvador Garc. An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =. 2008 , volume =

work page 2008
[19]

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

Rahimi, Ali and Recht, Benjamin , booktitle =. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

work page
[20]

Uniform approximation of functions with random bases , year=

Rahimi, Ali and Recht, Benjamin , booktitle=. Uniform approximation of functions with random bases , year=

work page
[21]

Statistics and Its Interface , year=

Multi-class AdaBoost , author=. Statistics and Its Interface , year=

work page
[22]

2009 , issn =

Reservoir computing approaches to recurrent neural network training , journal =. 2009 , issn =

work page 2009
[23]

Scikit-learn: Machine Learning in Python , year =

Pedregosa, Fabian and Varoquaux, Ga\". Scikit-learn: Machine Learning in Python , year =. J. Mach. Learn. Res. , month = nov, pages =

work page
[24]

Proceedings of the Learning to Rank Challenge , pages =

Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank , author =. Proceedings of the Learning to Rank Challenge , pages =. 2011 , editor =

work page 2011
[25]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =

Random Feature Maps for Dot Product Kernels , author =. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =. 2012 , volume =

work page 2012
[26]

Extreme Learning Machine for Regression and Multiclass Classification , year=

Huang, Guang-Bin and Zhou, Hongming and Ding, Xiaojian and Zhang, Rui , journal=. Extreme Learning Machine for Regression and Multiclass Classification , year=

work page
[27]

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

Huang, Guang-Bin , year =. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

work page
[28]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Adam: A method for stochastic optimization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page
[29]

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Deep Residual Learning for Image Recognition , author=. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2016
[30]

Optimal Rates for Random Fourier Features , volume =

Sriperumbudur, Bharath and Szabo, Zoltan , booktitle =. Optimal Rates for Random Fourier Features , volume =

work page
[31]

Learning Kernels with Random Features , volume =

Sinha, Aman and Duchi, John C , booktitle =. Learning Kernels with Random Features , volume =

work page
[32]

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , isbn =

work page 2016
[33]

Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

Veit, Andreas and Wilber, Michael J and Belongie, Serge , booktitle =. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

work page
[34]

A Vector-Contraction Inequality for Rademacher Complexities

Maurer, Andreas. A Vector-Contraction Inequality for Rademacher Complexities. Algorithmic Learning Theory. 2016

work page 2016
[35]

Journal of Machine Learning Research , year =

Alessio Benavoli and Giorgio Corani and Francesca Mangili , title =. Journal of Machine Learning Research , year =

work page
[36]

2016 , eprint=

Enhancing LambdaMART Using Oblivious Trees , author=. 2016 , eprint=

work page 2016
[37]

Communications in Mathematics and Statistics , year=

E, Weinan , title=. Communications in Mathematics and Statistics , year=

work page
[38]

2017 , volume =

Corinna Cortes and Xavier Gonzalvo and Vitaly Kuznetsov and Mehryar Mohri and Scott Yang , booktitle =. 2017 , volume =

work page 2017
[39]

Generalization Properties of Learning with Random Features , volume =

Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Generalization Properties of Learning with Random Features , volume =

work page
[40]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle =. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

work page
[41]

2017 , note =

Deep reservoir computing: A critical experimental analysis , journal =. 2017 , note =

work page 2017
[42]

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Lou, Yin and Obukhov, Mikhail , title =. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2017 , isbn =

work page 2017
[43]

Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

work page 2018
[44]

Learning Deep

Huang, Furong and Ash, Jordan and Langford, John and Schapire, Robert , booktitle =. Learning Deep. 2018 , volume =

work page 2018
[45]

Proceedings of the 35th International Conference on Machine Learning , pages =

Functional Gradient Boosting based on Residual Network Perception , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , volume =

work page 2018
[46]

Journal of Machine Learning Research , year =

Lyudmila Grigoryeva and Juan-Pablo Ortega , title =. Journal of Machine Learning Research , year =

work page
[47]

But How Does It Work in Theory? Linear SVM with Random Features , volume =

Sun, Yitong and Gilbert, Anna and Tewari, Ambuj , booktitle =. But How Does It Work in Theory? Linear SVM with Random Features , volume =

work page
[48]

Learning with SGD and Random Features , volume =

Carratino, Luigi and Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Learning with SGD and Random Features , volume =

work page
[49]

Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David K , booktitle =. Neural Ordinary Differential Equations , volume =

work page
[50]

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

Allen-Zhu, Zeyuan and Li, Yuanzhi and Liang, Yingyu , booktitle =. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

work page
[51]

Augmented Neural ODEs , volume =

Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , booktitle =. Augmented Neural ODEs , volume =

work page
[52]

On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

Yehudai, Gilad and Shamir, Ohad , booktitle =. On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

work page
[53]

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =

On Kernel Derivative Approximation with Random Fourier Features , author =. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =. 2019 , volume =

work page 2019
[54]

Towards a Unified Analysis of Random

Li, Zhu and Ton, Jean-Francois and Oglic, Dino and Sejdinovic, Dino , booktitle =. Towards a Unified Analysis of Random. 2019 , volume =

work page 2019
[55]

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori , title =. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =. 2019 , isbn =

work page 2019
[56]

2019 , issn =

Recent advances in physical reservoir computing: A review , journal =. 2019 , issn =

work page 2019
[57]

PyTorch: an imperative style, high-performance deep learning library , year =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K\". PyTorch: an imperative style, high-performance deep learning library , year =. Proceedings of the 33rd International Conference on Neural Informa...

work page
[58]

The Annals of Statistics , number =

Susan Athey and Julie Tibshirani and Stefan Wager , title =. The Annals of Statistics , number =

work page
[59]

IEEE Transactions on Parallel and Distributed Systems , title=

Z. IEEE Transactions on Parallel and Distributed Systems , title=. 2019 , volume=

work page 2019
[60]

and Cadre, B

Biau, G. and Cadre, B. and Rouvi. Accelerated gradient boosting , journal=. 2019 , month=

work page 2019
[61]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Accelerating Gradient Boosting Machines , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

work page 2020
[62]

Journal of Machine Learning Research , volume=

Wen, Zeyi and Shi, Jiashuai and He, Bingsheng and Li, Qinbin and Chen, Jian , title =. Journal of Machine Learning Research , volume=

work page
[63]

2020 , issn =

Embedding and approximation theorems for echo state networks , journal =. 2020 , issn =

work page 2020
[64]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , volume =

work page 2020
[65]

Generalized Boosting , volume =

Suggala, Arun and Liu, Bingbin and Ravikumar, Pradeep , booktitle =. Generalized Boosting , volume =

work page
[66]

2020 , eprint=

Gradient Boosting Neural Networks: GrowNet , author=. 2020 , eprint=

work page 2020
[67]

Reservoir Computing Universality With Stochastic Inputs , year=

Gonon, Lukas and Ortega, Juan-Pablo , journal=. Reservoir Computing Universality With Stochastic Inputs , year=

work page
[68]

Neural Controlled Differential Equations for Irregular Time Series , volume =

Kidger, Patrick and Morrill, James and Foster, James and Lyons, Terry , booktitle =. Neural Controlled Differential Equations for Irregular Time Series , volume =

work page
[69]

Sergio González and Salvador García and Javier. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities , journal =. 2020 , issn =

work page 2020
[70]

and Stuart, Andrew M

Nelsen, Nicholas H. and Stuart, Andrew M. , title =. SIAM Journal on Scientific Computing , volume =

work page
[71]

Proceedings of the 38th International Conference on Machine Learning , pages =

Neural SDEs as Infinite-Dimensional GANs , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021
[72]

Proceedings of the 38th International Conference on Machine Learning , pages =

Neural Rough Differential Equations for Long Time Series , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021
[73]

Advances in Neural Information Processing Systems , booksubtitle =

Expressive Power of Randomized Signature , author=. Advances in Neural Information Processing Systems , booksubtitle =

work page
[74]

Proceedings of the 38th International Conference on Machine Learning , pages =

Momentum Residual Neural Networks , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021
[75]

van Rijn and Joaquin Vanschoren , booktitle=

Bernd Bischl and Giuseppe Casalicchio and Matthias Feurer and Pieter Gijsbers and Frank Hutter and Michel Lang and Rafael Gomes Mantovani and Jan N. van Rijn and Joaquin Vanschoren , booktitle=. Open

work page
[76]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021
[77]

IMA Journal of Numerical Analysis , volume =

Kammonen, Aku and Kiessling, Jonas and Plecháč, Petr and Sandberg, Mattias and Szepessy, Anders and Tempone, Raul , title =. IMA Journal of Numerical Analysis , volume =. 2022 , month =

work page 2022
[78]

Communications on Pure and Applied Mathematics , volume =

Mei, Song and Montanari, Andrea , title =. Communications on Pure and Applied Mathematics , volume =

work page
[79]

2022 , issn =

Tabular data: Deep learning is not all you need , journal =. 2022 , issn =

work page 2022
[80]

Why do tree-based models still outperform deep learning on typical tabular data? , year =

Grinsztajn, L\'. Why do tree-based models still outperform deep learning on typical tabular data? , year =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

work page

Showing first 80 references.

[1] [1]

ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=

Dempster, Angus and Petitjean, Fran. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=. 2020 , volume=

work page 2020

[2] [2]

and Webb, Geoffrey I

Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title =. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages =. 2021 , isbn =

work page 2021

[3] [3]

, title=

Tan, Chang Wei and Dempster, Angus and Bergmeir, Christoph and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

work page

[4] [4]

and Webb, Geoffrey I

Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

work page

[5] [5]

Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=

Middlehurst, Matthew and Sch. Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=. 2024 , volume=

work page 2024

[6] [6]

Breiman, Leo , journal =

work page

[7] [7]

1997 , issn =

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , issn =

work page 1997

[8] [8]

The Annals of Statistics , number =

Jerome Friedman and Trevor Hastie and Robert Tibshirani , title =. The Annals of Statistics , number =

work page

[9] [9]

Machine Learning , keywords =

Breiman, Leo , description =. Machine Learning , keywords =

work page

[10] [10]

echo state

Jaeger, Herbert , year =. The" echo state" approach to analysing and training recurrent neural networks-with an erratum note' , volume =

work page

[11] [11]

and Shrestha, D.L

Solomatine, D.P. and Shrestha, D.L. , booktitle=. AdaBoost.RT: a boosting algorithm for regression problems , year=

work page

[12] [12]

Extreme learning machine: a new learning scheme of feedforward neural networks , year=

Guang-Bin Huang and Qin-Yu Zhu and Chee-Kheong Siew , booktitle=. Extreme learning machine: a new learning scheme of feedforward neural networks , year=

work page

[13] [13]

Greedy Layer-Wise Training of Deep Networks , volume =

Bengio, Yoshua and Lamblin, Pascal and Popovici, Dan and Larochelle, Hugo , booktitle =. Greedy Layer-Wise Training of Deep Networks , volume =

work page

[14] [14]

Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

Huang, Guang-Bin and Chen, Lei and Siew, Chee-Kheong , journal=. Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

work page

[15] [15]

Statistical Comparisons of Classifiers over Multiple Data Sets , journal =

Janez Dem. Statistical Comparisons of Classifiers over Multiple Data Sets , journal =. 2006 , volume =

work page 2006

[16] [16]

Machine Learning , year=

Geurts, Pierre and Ernst, Damien and Wehenkel, Louis , title=. Machine Learning , year=

work page

[17] [17]

Random Features for Large-Scale Kernel Machines , volume =

Rahimi, Ali and Recht, Benjamin , booktitle =. Random Features for Large-Scale Kernel Machines , volume =

work page

[18] [18]

An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =

Salvador Garc. An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =. 2008 , volume =

work page 2008

[19] [19]

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

Rahimi, Ali and Recht, Benjamin , booktitle =. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

work page

[20] [20]

Uniform approximation of functions with random bases , year=

Rahimi, Ali and Recht, Benjamin , booktitle=. Uniform approximation of functions with random bases , year=

work page

[21] [21]

Statistics and Its Interface , year=

Multi-class AdaBoost , author=. Statistics and Its Interface , year=

work page

[22] [22]

2009 , issn =

Reservoir computing approaches to recurrent neural network training , journal =. 2009 , issn =

work page 2009

[23] [23]

Scikit-learn: Machine Learning in Python , year =

Pedregosa, Fabian and Varoquaux, Ga\". Scikit-learn: Machine Learning in Python , year =. J. Mach. Learn. Res. , month = nov, pages =

work page

[24] [24]

Proceedings of the Learning to Rank Challenge , pages =

Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank , author =. Proceedings of the Learning to Rank Challenge , pages =. 2011 , editor =

work page 2011

[25] [25]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =

Random Feature Maps for Dot Product Kernels , author =. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =. 2012 , volume =

work page 2012

[26] [26]

Extreme Learning Machine for Regression and Multiclass Classification , year=

Huang, Guang-Bin and Zhou, Hongming and Ding, Xiaojian and Zhang, Rui , journal=. Extreme Learning Machine for Regression and Multiclass Classification , year=

work page

[27] [27]

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

Huang, Guang-Bin , year =. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

work page

[28] [28]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Adam: A method for stochastic optimization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page

[29] [29]

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Deep Residual Learning for Image Recognition , author=. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2016

[30] [30]

Optimal Rates for Random Fourier Features , volume =

Sriperumbudur, Bharath and Szabo, Zoltan , booktitle =. Optimal Rates for Random Fourier Features , volume =

work page

[31] [31]

Learning Kernels with Random Features , volume =

Sinha, Aman and Duchi, John C , booktitle =. Learning Kernels with Random Features , volume =

work page

[32] [32]

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , isbn =

work page 2016

[33] [33]

Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

Veit, Andreas and Wilber, Michael J and Belongie, Serge , booktitle =. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

work page

[34] [34]

A Vector-Contraction Inequality for Rademacher Complexities

Maurer, Andreas. A Vector-Contraction Inequality for Rademacher Complexities. Algorithmic Learning Theory. 2016

work page 2016

[35] [35]

Journal of Machine Learning Research , year =

Alessio Benavoli and Giorgio Corani and Francesca Mangili , title =. Journal of Machine Learning Research , year =

work page

[36] [36]

2016 , eprint=

Enhancing LambdaMART Using Oblivious Trees , author=. 2016 , eprint=

work page 2016

[37] [37]

Communications in Mathematics and Statistics , year=

E, Weinan , title=. Communications in Mathematics and Statistics , year=

work page

[38] [38]

2017 , volume =

Corinna Cortes and Xavier Gonzalvo and Vitaly Kuznetsov and Mehryar Mohri and Scott Yang , booktitle =. 2017 , volume =

work page 2017

[39] [39]

Generalization Properties of Learning with Random Features , volume =

Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Generalization Properties of Learning with Random Features , volume =

work page

[40] [40]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle =. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

work page

[41] [41]

2017 , note =

Deep reservoir computing: A critical experimental analysis , journal =. 2017 , note =

work page 2017

[42] [42]

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Lou, Yin and Obukhov, Mikhail , title =. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2017 , isbn =

work page 2017

[43] [43]

Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

work page 2018

[44] [44]

Learning Deep

Huang, Furong and Ash, Jordan and Langford, John and Schapire, Robert , booktitle =. Learning Deep. 2018 , volume =

work page 2018

[45] [45]

Proceedings of the 35th International Conference on Machine Learning , pages =

Functional Gradient Boosting based on Residual Network Perception , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , volume =

work page 2018

[46] [46]

Journal of Machine Learning Research , year =

Lyudmila Grigoryeva and Juan-Pablo Ortega , title =. Journal of Machine Learning Research , year =

work page

[47] [47]

But How Does It Work in Theory? Linear SVM with Random Features , volume =

Sun, Yitong and Gilbert, Anna and Tewari, Ambuj , booktitle =. But How Does It Work in Theory? Linear SVM with Random Features , volume =

work page

[48] [48]

Learning with SGD and Random Features , volume =

Carratino, Luigi and Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Learning with SGD and Random Features , volume =

work page

[49] [49]

Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David K , booktitle =. Neural Ordinary Differential Equations , volume =

work page

[50] [50]

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

Allen-Zhu, Zeyuan and Li, Yuanzhi and Liang, Yingyu , booktitle =. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

work page

[51] [51]

Augmented Neural ODEs , volume =

Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , booktitle =. Augmented Neural ODEs , volume =

work page

[52] [52]

On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

Yehudai, Gilad and Shamir, Ohad , booktitle =. On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

work page

[53] [53]

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =

On Kernel Derivative Approximation with Random Fourier Features , author =. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =. 2019 , volume =

work page 2019

[54] [54]

Towards a Unified Analysis of Random

Li, Zhu and Ton, Jean-Francois and Oglic, Dino and Sejdinovic, Dino , booktitle =. Towards a Unified Analysis of Random. 2019 , volume =

work page 2019

[55] [55]

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori , title =. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =. 2019 , isbn =

work page 2019

[56] [56]

2019 , issn =

Recent advances in physical reservoir computing: A review , journal =. 2019 , issn =

work page 2019

[57] [57]

PyTorch: an imperative style, high-performance deep learning library , year =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K\". PyTorch: an imperative style, high-performance deep learning library , year =. Proceedings of the 33rd International Conference on Neural Informa...

work page

[58] [58]

The Annals of Statistics , number =

Susan Athey and Julie Tibshirani and Stefan Wager , title =. The Annals of Statistics , number =

work page

[59] [59]

IEEE Transactions on Parallel and Distributed Systems , title=

Z. IEEE Transactions on Parallel and Distributed Systems , title=. 2019 , volume=

work page 2019

[60] [60]

and Cadre, B

Biau, G. and Cadre, B. and Rouvi. Accelerated gradient boosting , journal=. 2019 , month=

work page 2019

[61] [61]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Accelerating Gradient Boosting Machines , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

work page 2020

[62] [62]

Journal of Machine Learning Research , volume=

Wen, Zeyi and Shi, Jiashuai and He, Bingsheng and Li, Qinbin and Chen, Jian , title =. Journal of Machine Learning Research , volume=

work page

[63] [63]

2020 , issn =

Embedding and approximation theorems for echo state networks , journal =. 2020 , issn =

work page 2020

[64] [64]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , volume =

work page 2020

[65] [65]

Generalized Boosting , volume =

Suggala, Arun and Liu, Bingbin and Ravikumar, Pradeep , booktitle =. Generalized Boosting , volume =

work page

[66] [66]

2020 , eprint=

Gradient Boosting Neural Networks: GrowNet , author=. 2020 , eprint=

work page 2020

[67] [67]

Reservoir Computing Universality With Stochastic Inputs , year=

Gonon, Lukas and Ortega, Juan-Pablo , journal=. Reservoir Computing Universality With Stochastic Inputs , year=

work page

[68] [68]

Neural Controlled Differential Equations for Irregular Time Series , volume =

Kidger, Patrick and Morrill, James and Foster, James and Lyons, Terry , booktitle =. Neural Controlled Differential Equations for Irregular Time Series , volume =

work page

[69] [69]

Sergio González and Salvador García and Javier. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities , journal =. 2020 , issn =

work page 2020

[70] [70]

and Stuart, Andrew M

Nelsen, Nicholas H. and Stuart, Andrew M. , title =. SIAM Journal on Scientific Computing , volume =

work page

[71] [71]

Proceedings of the 38th International Conference on Machine Learning , pages =

Neural SDEs as Infinite-Dimensional GANs , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021

[72] [72]

Proceedings of the 38th International Conference on Machine Learning , pages =

Neural Rough Differential Equations for Long Time Series , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021

[73] [73]

Advances in Neural Information Processing Systems , booksubtitle =

Expressive Power of Randomized Signature , author=. Advances in Neural Information Processing Systems , booksubtitle =

work page

[74] [74]

Proceedings of the 38th International Conference on Machine Learning , pages =

Momentum Residual Neural Networks , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

work page 2021

[75] [75]

van Rijn and Joaquin Vanschoren , booktitle=

Bernd Bischl and Giuseppe Casalicchio and Matthias Feurer and Pieter Gijsbers and Frank Hutter and Michel Lang and Rafael Gomes Mantovani and Jan N. van Rijn and Joaquin Vanschoren , booktitle=. Open

work page

[76] [76]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021

[77] [77]

IMA Journal of Numerical Analysis , volume =

Kammonen, Aku and Kiessling, Jonas and Plecháč, Petr and Sandberg, Mattias and Szepessy, Anders and Tempone, Raul , title =. IMA Journal of Numerical Analysis , volume =. 2022 , month =

work page 2022

[78] [78]

Communications on Pure and Applied Mathematics , volume =

Mei, Song and Montanari, Andrea , title =. Communications on Pure and Applied Mathematics , volume =

work page

[79] [79]

2022 , issn =

Tabular data: Deep learning is not all you need , journal =. 2022 , issn =

work page 2022

[80] [80]

Why do tree-based models still outperform deep learning on typical tabular data? , year =

Grinsztajn, L\'. Why do tree-based models still outperform deep learning on typical tabular data? , year =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

work page