pith. sign in

arxiv: 2605.00581 · v1 · submitted 2026-05-01 · 📊 stat.ML · cs.LG· math.OC

Gradient Regularized Newton Boosting Trees with Global Convergence

Pith reviewed 2026-05-09 18:48 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC
keywords Newton boostinggradient boosting decision treesglobal convergencesecond-order optimizationadaptive regularizationconvergence ratesweak learnersHilbert space
0
0 comments X

The pith

A gradient-regularized Newton boosting scheme for decision trees achieves global convergence at an O(1/k²) rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for second-order optimization restricted to spaces of weak learners such as decision trees. It first shows that ordinary Newton boosting converges linearly when losses are smooth and strongly convex and obey a Hessian-dominance condition. For broader convex losses whose Hessians are Lipschitz continuous, the authors add a single adaptive ℓ₂-regularization term scaled by the square root of the current gradient norm. This small change produces a globally convergent algorithm whose rate matches the best known rate for first-order methods that use Nesterov momentum. If the result holds, practitioners gain a theoretically justified way to run reliable second-order GBDT training on a wide range of loss functions.

Core claim

Within the Restricted Newton Descent framework for convex optimization with inexact iterates on Hilbert spaces, the gradient-regularized Newton scheme that introduces an adaptive ℓ₂-regularization term proportional to the square root of the gradient norm at each step attains an O(1/k²) convergence rate for convex losses with Lipschitz-continuous Hessians, recovering both classical finite-dimensional Newton theory and Newton boosting with GBDTs as special cases.

What carries the argument

The gradient regularized Newton scheme, which minimally modifies each Newton step by an adaptive ℓ₂-regularization term proportional to the square root of the gradient norm, studied inside the Restricted Newton Descent framework that tracks cosine angle and weak gradient edge.

If this is right

  • Vanilla Newton boosting converges linearly under the Hessian-dominance condition for smooth strongly convex losses.
  • The regularized scheme converges globally at O(1/k²) for general convex losses with Lipschitz Hessians.
  • The achieved rate matches the rate of first-order boosting accelerated by Nesterov momentum.
  • Numerical experiments confirm that the regularized method converges on problems where vanilla Newton boosting diverges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive regularization idea could be tested on other restricted optimization problems such as boosting with neural-network weak learners.
  • If the Restricted Newton Descent framework extends to non-convex losses, it might guide stabilization of second-order methods in deep learning.
  • Software implementations of GBDT could incorporate the regularization term as a default safeguard against divergence on arbitrary loss functions.

Load-bearing premise

Loss functions must satisfy either a Hessian-dominance condition or have Lipschitz-continuous Hessians, while optimization remains restricted to the Hilbert space generated by the weak learners.

What would settle it

Running the regularized algorithm on a convex loss with Lipschitz-continuous Hessian and observing either divergence or a convergence rate slower than O(1/k²) would disprove the claimed global convergence guarantee.

Figures

Figures reproduced from arXiv: 2605.00581 by Daniel Falkowski, Lukas Gonon, Nikita Zozoulenko, Thomas Cass.

Figure 1
Figure 1. Figure 1: Charbonnier train loss on the Wine Quality UCI dataset, for different gradient boosting schemes and learning rates η. Vanilla Newton boosting diverges for η = 1. defined in different geometries. Conceptually, γk (Theo￾rem 4.6) can be viewed as an approximate (Hk + λkI) 2 - induced cosine angle between f w k+1 and fk+1, in contrast to the standard Hk-induced cosine angle Θk. If f w k+1 and fk+1 were orthogo… view at source ↗
read the original abstract

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Restricted Newton Descent framework for convex optimization in Hilbert spaces with inexact Newton iterates, defined via cosine angle and weak gradient edge concepts. It recovers classical finite-dimensional Newton theory and GBDT Newton boosting as special cases. The main results are a linear convergence rate for vanilla Newton boosting under a Hessian-dominance condition for smooth strongly convex losses, and an O(1/k²) rate for a gradient-regularized Newton scheme (with adaptive ℓ₂ regularization proportional to √‖∇‖) under Lipschitz-continuous Hessians. This yields the first global convergence guarantee for second-order GBDT methods with a rate matching Nesterov-accelerated first-order boosting. Experiments illustrate divergence of vanilla Newton boosting and convergence of the regularized variant.

Significance. If the O(1/k²) guarantee extends rigorously to tree-based weak learners, the work would close a notable gap in the theory of Newton boosting for GBDTs (the basis of XGBoost, LightGBM, CatBoost), providing the first rate-competitive global convergence result for these widely used second-order methods and explaining their empirical robustness.

major comments (2)
  1. [Proof of the O(1/k²) rate (extension from finite-dimensional to restricted Hilbert-space setting)] The central O(1/k²) claim for the gradient-regularized scheme in the GBDT setting rests on extending the finite-dimensional analysis while preserving the restricted weak-learner condition (bounded cosine angle or weak gradient edge away from zero). The adaptive ℓ₂ term alters the effective loss at each iteration; the manuscript must explicitly show that the tree weak learner's approximation quality (angle condition) remains uniformly bounded independently of the regularization strength, otherwise the descent lemma fails.
  2. [Statement and use of the Hessian-dominance condition] The Hessian-dominance condition invoked for the linear-rate result on vanilla Newton boosting is an ad-hoc assumption not standard in convex analysis; its necessity and relation to strong convexity should be clarified, as it appears load-bearing for recovering classical Newton theory as a special case.
minor comments (2)
  1. [Numerical experiments] The abstract claims experiments demonstrate divergence of vanilla Newton boosting, but quantitative details (e.g., iteration counts, loss curves, or comparison to Nesterov first-order rates) are needed to support the rate-matching claim.
  2. [Definition of the gradient-regularized scheme] Notation for the adaptive regularization parameter (proportional to √‖∇‖) should be introduced with an explicit equation number when first defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive major comments. We address each point below and indicate the revisions we will make to strengthen the presentation and proofs.

read point-by-point responses
  1. Referee: [Proof of the O(1/k²) rate (extension from finite-dimensional to restricted Hilbert-space setting)] The central O(1/k²) claim for the gradient-regularized scheme in the GBDT setting rests on extending the finite-dimensional analysis while preserving the restricted weak-learner condition (bounded cosine angle or weak gradient edge away from zero). The adaptive ℓ₂ term alters the effective loss at each iteration; the manuscript must explicitly show that the tree weak learner's approximation quality (angle condition) remains uniformly bounded independently of the regularization strength, otherwise the descent lemma fails.

    Authors: We agree that an explicit verification is required for rigor. In the Restricted Newton Descent framework, the gradient-regularized step uses the modified direction ∇f(x_k) + λ_k x_k with λ_k = c √‖∇f(x_k)‖. The weak learner is required to satisfy the cosine-angle condition only with respect to this regularized gradient. Under the Lipschitz-Hessian assumption, the angle between the original and regularized gradients is bounded by a constant that depends only on c and the Lipschitz constant, independent of k. We will add a short supporting lemma (new Lemma 4.3) that quantifies this perturbation and shows the cosine-angle lower bound remains uniform. This lemma directly ensures the descent inequality continues to hold with the same constants used in the O(1/k²) proof. The revision will be incorporated in Section 4. revision: yes

  2. Referee: [Statement and use of the Hessian-dominance condition] The Hessian-dominance condition invoked for the linear-rate result on vanilla Newton boosting is an ad-hoc assumption not standard in convex analysis; its necessity and relation to strong convexity should be clarified, as it appears load-bearing for recovering classical Newton theory as a special case.

    Authors: We appreciate the request for clarification. The Hessian-dominance condition is introduced precisely because the Newton direction is restricted to the linear span of the weak learners; it ensures that the inexact Newton step still produces a sufficient descent direction relative to the true gradient in this subspace. While the name is new, the condition reduces to a standard consequence of strong convexity when the weak learner can approximate the exact Newton direction arbitrarily closely (i.e., when the cosine angle approaches 1). We will expand the discussion in Section 3.1 with a new proposition that shows: (i) for μ-strongly convex and L-smooth losses the condition holds whenever the weak-learner approximation quality exceeds a threshold depending on μ/L, and (ii) the classical finite-dimensional Newton method is recovered exactly when the restriction is removed. These additions will make the role of the assumption transparent without changing its statement. revision: yes

Circularity Check

0 steps flagged

Derivation built from standard convex-analysis primitives; recovers classical Newton theory as special case with no load-bearing self-referential reductions.

full rationale

The paper defines Restricted Newton Descent via cosine angle and weak gradient edge in Hilbert space, proves linear convergence under Hessian-dominance for vanilla Newton boosting, and extends a gradient-regularized scheme for the O(1/k²) rate under Lipschitz Hessians. These steps use explicit assumptions on the loss and weak learner without defining quantities in terms of the target rates or fitting parameters to the predictions. The adaptive ℓ₂ term is introduced directly from the gradient norm, and the tree setting is recovered as a special case without circular renaming or self-citation chains that force the result. The framework is self-contained against external benchmarks from convex optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 1 invented entities

The central claims rest on standard smoothness/strong-convexity assumptions plus the newly introduced Restricted Newton Descent framework and the Hessian-dominance condition; no free parameters are introduced because the regularization strength is defined adaptively from the gradient norm.

axioms (3)
  • domain assumption Loss is smooth and strongly convex
    Invoked for the linear-rate result on vanilla Newton boosting
  • ad hoc to paper Hessian-dominance condition
    Required for the linear convergence proof of vanilla Newton boosting
  • domain assumption Hessians are Lipschitz continuous
    Used to obtain the O(1/k²) rate for the gradient-regularized scheme
invented entities (1)
  • Restricted Newton Descent framework no independent evidence
    purpose: To analyze inexact Newton steps on Hilbert spaces of weak learners
    New abstraction introduced to unify GBDT boosting with classical Newton theory

pith-pipeline@v0.9.0 · 5554 in / 1645 out tokens · 59903 ms · 2026-05-09T18:48:59.834120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

148 extracted references · 148 canonical work pages

  1. [1]

    ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=

    Dempster, Angus and Petitjean, Fran. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , journal=. 2020 , volume=

  2. [2]

    and Webb, Geoffrey I

    Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title =. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages =. 2021 , isbn =

  3. [3]

    , title=

    Tan, Chang Wei and Dempster, Angus and Bergmeir, Christoph and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

  4. [4]

    and Webb, Geoffrey I

    Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I. , title=. Data Mining and Knowledge Discovery , year=

  5. [5]

    Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=

    Middlehurst, Matthew and Sch. Bake off redux: a review and experimental evaluation of recent time series classification algorithms , journal=. 2024 , volume=

  6. [6]

    Breiman, Leo , journal =

  7. [7]

    1997 , issn =

    A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , issn =

  8. [8]

    The Annals of Statistics , number =

    Jerome Friedman and Trevor Hastie and Robert Tibshirani , title =. The Annals of Statistics , number =

  9. [9]

    Machine Learning , keywords =

    Breiman, Leo , description =. Machine Learning , keywords =

  10. [10]

    echo state

    Jaeger, Herbert , year =. The" echo state" approach to analysing and training recurrent neural networks-with an erratum note' , volume =

  11. [11]

    and Shrestha, D.L

    Solomatine, D.P. and Shrestha, D.L. , booktitle=. AdaBoost.RT: a boosting algorithm for regression problems , year=

  12. [12]

    Extreme learning machine: a new learning scheme of feedforward neural networks , year=

    Guang-Bin Huang and Qin-Yu Zhu and Chee-Kheong Siew , booktitle=. Extreme learning machine: a new learning scheme of feedforward neural networks , year=

  13. [13]

    Greedy Layer-Wise Training of Deep Networks , volume =

    Bengio, Yoshua and Lamblin, Pascal and Popovici, Dan and Larochelle, Hugo , booktitle =. Greedy Layer-Wise Training of Deep Networks , volume =

  14. [14]

    Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

    Huang, Guang-Bin and Chen, Lei and Siew, Chee-Kheong , journal=. Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , year=

  15. [15]

    Statistical Comparisons of Classifiers over Multiple Data Sets , journal =

    Janez Dem. Statistical Comparisons of Classifiers over Multiple Data Sets , journal =. 2006 , volume =

  16. [16]

    Machine Learning , year=

    Geurts, Pierre and Ernst, Damien and Wehenkel, Louis , title=. Machine Learning , year=

  17. [17]

    Random Features for Large-Scale Kernel Machines , volume =

    Rahimi, Ali and Recht, Benjamin , booktitle =. Random Features for Large-Scale Kernel Machines , volume =

  18. [18]

    An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =

    Salvador Garc. An Extension on ``Statistical Comparisons of Classifiers over Multiple Data Sets'' for all Pairwise Comparisons , journal =. 2008 , volume =

  19. [19]

    Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

    Rahimi, Ali and Recht, Benjamin , booktitle =. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , volume =

  20. [20]

    Uniform approximation of functions with random bases , year=

    Rahimi, Ali and Recht, Benjamin , booktitle=. Uniform approximation of functions with random bases , year=

  21. [21]

    Statistics and Its Interface , year=

    Multi-class AdaBoost , author=. Statistics and Its Interface , year=

  22. [22]

    2009 , issn =

    Reservoir computing approaches to recurrent neural network training , journal =. 2009 , issn =

  23. [23]

    Scikit-learn: Machine Learning in Python , year =

    Pedregosa, Fabian and Varoquaux, Ga\". Scikit-learn: Machine Learning in Python , year =. J. Mach. Learn. Res. , month = nov, pages =

  24. [24]

    Proceedings of the Learning to Rank Challenge , pages =

    Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank , author =. Proceedings of the Learning to Rank Challenge , pages =. 2011 , editor =

  25. [25]

    Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =

    Random Feature Maps for Dot Product Kernels , author =. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , pages =. 2012 , volume =

  26. [26]

    Extreme Learning Machine for Regression and Multiclass Classification , year=

    Huang, Guang-Bin and Zhou, Hongming and Ding, Xiaojian and Zhang, Rui , journal=. Extreme Learning Machine for Regression and Multiclass Classification , year=

  27. [27]

    An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

    Huang, Guang-Bin , year =. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , volume =

  28. [28]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Adam: A method for stochastic optimization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  29. [29]

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Deep Residual Learning for Image Recognition , author=. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  30. [30]

    Optimal Rates for Random Fourier Features , volume =

    Sriperumbudur, Bharath and Szabo, Zoltan , booktitle =. Optimal Rates for Random Fourier Features , volume =

  31. [31]

    Learning Kernels with Random Features , volume =

    Sinha, Aman and Duchi, John C , booktitle =. Learning Kernels with Random Features , volume =

  32. [32]

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

    Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , isbn =

  33. [33]

    Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

    Veit, Andreas and Wilber, Michael J and Belongie, Serge , booktitle =. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , volume =

  34. [34]

    A Vector-Contraction Inequality for Rademacher Complexities

    Maurer, Andreas. A Vector-Contraction Inequality for Rademacher Complexities. Algorithmic Learning Theory. 2016

  35. [35]

    Journal of Machine Learning Research , year =

    Alessio Benavoli and Giorgio Corani and Francesca Mangili , title =. Journal of Machine Learning Research , year =

  36. [36]

    2016 , eprint=

    Enhancing LambdaMART Using Oblivious Trees , author=. 2016 , eprint=

  37. [37]

    Communications in Mathematics and Statistics , year=

    E, Weinan , title=. Communications in Mathematics and Statistics , year=

  38. [38]

    2017 , volume =

    Corinna Cortes and Xavier Gonzalvo and Vitaly Kuznetsov and Mehryar Mohri and Scott Yang , booktitle =. 2017 , volume =

  39. [39]

    Generalization Properties of Learning with Random Features , volume =

    Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Generalization Properties of Learning with Random Features , volume =

  40. [40]

    LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

    Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle =. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , volume =

  41. [41]

    2017 , note =

    Deep reservoir computing: A critical experimental analysis , journal =. 2017 , note =

  42. [42]

    Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

    Lou, Yin and Obukhov, Mikhail , title =. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2017 , isbn =

  43. [43]

    Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =

    Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

  44. [44]

    Learning Deep

    Huang, Furong and Ash, Jordan and Langford, John and Schapire, Robert , booktitle =. Learning Deep. 2018 , volume =

  45. [45]

    Proceedings of the 35th International Conference on Machine Learning , pages =

    Functional Gradient Boosting based on Residual Network Perception , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , volume =

  46. [46]

    Journal of Machine Learning Research , year =

    Lyudmila Grigoryeva and Juan-Pablo Ortega , title =. Journal of Machine Learning Research , year =

  47. [47]

    But How Does It Work in Theory? Linear SVM with Random Features , volume =

    Sun, Yitong and Gilbert, Anna and Tewari, Ambuj , booktitle =. But How Does It Work in Theory? Linear SVM with Random Features , volume =

  48. [48]

    Learning with SGD and Random Features , volume =

    Carratino, Luigi and Rudi, Alessandro and Rosasco, Lorenzo , booktitle =. Learning with SGD and Random Features , volume =

  49. [49]

    Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David K , booktitle =. Neural Ordinary Differential Equations , volume =

  50. [50]

    Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

    Allen-Zhu, Zeyuan and Li, Yuanzhi and Liang, Yingyu , booktitle =. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , volume =

  51. [51]

    Augmented Neural ODEs , volume =

    Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , booktitle =. Augmented Neural ODEs , volume =

  52. [52]

    On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

    Yehudai, Gilad and Shamir, Ohad , booktitle =. On the Power and Limitations of Random Features for Understanding Neural Networks , volume =

  53. [53]

    Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =

    On Kernel Derivative Approximation with Random Fourier Features , author =. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , pages =. 2019 , volume =

  54. [54]

    Towards a Unified Analysis of Random

    Li, Zhu and Ton, Jean-Francois and Oglic, Dino and Sejdinovic, Dino , booktitle =. Towards a Unified Analysis of Random. 2019 , volume =

  55. [55]

    Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

    Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori , title =. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =. 2019 , isbn =

  56. [56]

    2019 , issn =

    Recent advances in physical reservoir computing: A review , journal =. 2019 , issn =

  57. [57]

    PyTorch: an imperative style, high-performance deep learning library , year =

    Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K\". PyTorch: an imperative style, high-performance deep learning library , year =. Proceedings of the 33rd International Conference on Neural Informa...

  58. [58]

    The Annals of Statistics , number =

    Susan Athey and Julie Tibshirani and Stefan Wager , title =. The Annals of Statistics , number =

  59. [59]

    IEEE Transactions on Parallel and Distributed Systems , title=

    Z. IEEE Transactions on Parallel and Distributed Systems , title=. 2019 , volume=

  60. [60]

    and Cadre, B

    Biau, G. and Cadre, B. and Rouvi. Accelerated gradient boosting , journal=. 2019 , month=

  61. [61]

    Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

    Accelerating Gradient Boosting Machines , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

  62. [62]

    Journal of Machine Learning Research , volume=

    Wen, Zeyi and Shi, Jiashuai and He, Bingsheng and Li, Qinbin and Chen, Jian , title =. Journal of Machine Learning Research , volume=

  63. [63]

    2020 , issn =

    Embedding and approximation theorems for echo state networks , journal =. 2020 , issn =

  64. [64]

    Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

    Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , volume =

  65. [65]

    Generalized Boosting , volume =

    Suggala, Arun and Liu, Bingbin and Ravikumar, Pradeep , booktitle =. Generalized Boosting , volume =

  66. [66]

    2020 , eprint=

    Gradient Boosting Neural Networks: GrowNet , author=. 2020 , eprint=

  67. [67]

    Reservoir Computing Universality With Stochastic Inputs , year=

    Gonon, Lukas and Ortega, Juan-Pablo , journal=. Reservoir Computing Universality With Stochastic Inputs , year=

  68. [68]

    Neural Controlled Differential Equations for Irregular Time Series , volume =

    Kidger, Patrick and Morrill, James and Foster, James and Lyons, Terry , booktitle =. Neural Controlled Differential Equations for Irregular Time Series , volume =

  69. [69]

    Sergio González and Salvador García and Javier. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities , journal =. 2020 , issn =

  70. [70]

    and Stuart, Andrew M

    Nelsen, Nicholas H. and Stuart, Andrew M. , title =. SIAM Journal on Scientific Computing , volume =

  71. [71]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Neural SDEs as Infinite-Dimensional GANs , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

  72. [72]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Neural Rough Differential Equations for Long Time Series , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

  73. [73]

    Advances in Neural Information Processing Systems , booksubtitle =

    Expressive Power of Randomized Signature , author=. Advances in Neural Information Processing Systems , booksubtitle =

  74. [74]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Momentum Residual Neural Networks , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

  75. [75]

    van Rijn and Joaquin Vanschoren , booktitle=

    Bernd Bischl and Giuseppe Casalicchio and Matthias Feurer and Pieter Gijsbers and Frank Hutter and Michel Lang and Rafael Gomes Mantovani and Jan N. van Rijn and Joaquin Vanschoren , booktitle=. Open

  76. [76]

    Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

    Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

  77. [77]

    IMA Journal of Numerical Analysis , volume =

    Kammonen, Aku and Kiessling, Jonas and Plecháč, Petr and Sandberg, Mattias and Szepessy, Anders and Tempone, Raul , title =. IMA Journal of Numerical Analysis , volume =. 2022 , month =

  78. [78]

    Communications on Pure and Applied Mathematics , volume =

    Mei, Song and Montanari, Andrea , title =. Communications on Pure and Applied Mathematics , volume =

  79. [79]

    2022 , issn =

    Tabular data: Deep learning is not all you need , journal =. 2022 , issn =

  80. [80]

    Why do tree-based models still outperform deep learning on typical tabular data? , year =

    Grinsztajn, L\'. Why do tree-based models still outperform deep learning on typical tabular data? , year =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

Showing first 80 references.