Posterior Contraction of L\'evy Adaptive B-spline Regression in Besov Spaces
Pith reviewed 2026-05-20 02:10 UTC · model grok-4.3
The pith
The LABS posterior contracts around true functions in Besov spaces at nearly minimax-optimal rates while adapting to unknown smoothness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within the nonparametric regression model with univariate random design and Gaussian errors, the LABS posterior, formed by placing a Lévy process prior on the number of B-spline terms and their knot locations and degrees, contracts around the true function at rates that match the minimax rate up to a logarithmic factor whenever the true function lies in a Besov space, and the contraction holds simultaneously for all smoothness indices in a given range.
What carries the argument
The Lévy Adaptive B-spline (LABS) model, which embeds B-splines of varying degrees with independently chosen knots into the Lévy Adaptive Regression Kernel framework.
If this is right
- The method can be applied to estimate functions whose smoothness is unknown in advance while still attaining near-optimal rates.
- The same contraction holds for functions exhibiting irregular or locally structured behavior that Besov spaces can capture.
- The theoretical guarantee fills the previous gap for posterior contraction of LARK-type models in Besov spaces.
Where Pith is reading between the lines
- Similar Lévy-process priors on spline knots could be tested for adaptation in other function classes such as Sobolev or Hölder spaces with comparable rates.
- The univariate result suggests examining whether the LABS construction extends to multivariate random designs while preserving automatic adaptation.
- If the logarithmic factor can be removed by a modest change in the prior, the procedure would achieve exact minimax rates.
Load-bearing premise
The Lévy process prior on spline degrees and knot locations must satisfy the specific technical conditions required to prove contraction in Besov spaces under the univariate random-design Gaussian-error model.
What would settle it
A simulation or analytic calculation that exhibits a posterior contraction rate strictly slower than the claimed nearly minimax rate for some function known to lie in a Besov space with fixed smoothness would refute the main result.
Figures
read the original abstract
We investigate the asymptotic properties of the L\'evy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the L\'evy Adaptive Regression Kernel (LARK) model. LABS applies splines of varying degrees with independently defined knots, yielding a flexible model class capable of adapting to irregular and locally structured features of the true function. Within the nonparametric regression framework with univariate random design and Gaussian errors, we establish that the LABS posterior contracts around the true function in Besov classes at nearly minimax-optimal rates, up to a logarithmic factor, while adapting automatically to unknown smoothness. This study contributes to filling a gap in the literature, where theoretical results on posterior contraction of the LARK model in Besov spaces remain scarce. Simulation experiments on standard test functions in Besov spaces, including Blocks, Bumps, HeaviSine, and Doppler, complement the theoretical results and demonstrate the practical utility of LABS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript establishes that the posterior in the Lévy Adaptive B-spline (LABS) regression model contracts around the true regression function in Besov spaces at nearly minimax-optimal rates (up to a logarithmic factor) while automatically adapting to unknown smoothness. The setting is univariate nonparametric regression with random design and Gaussian errors; the prior places a Lévy process on spline degrees and knot locations within the LARK framework. Theoretical results are complemented by simulations on standard test functions (Blocks, Bumps, HeaviSine, Doppler).
Significance. If the central claims hold, the work is significant because it supplies the first posterior-contraction guarantees for the LABS model in Besov spaces, addressing a documented gap in the LARK literature. The automatic adaptation to smoothness and the nearly optimal rates (up to log factor) are competitive with other adaptive Bayesian nonparametric methods. The simulation study on irregular test functions provides useful empirical corroboration.
major comments (2)
- [Theorem 3.1] Theorem 3.1 (and the surrounding discussion in §3.2): the nearly-minimax contraction rate is obtained by appealing to a general Ghosal–van der Vaart-type theorem whose two load-bearing conditions—sufficient prior mass on Kullback–Leibler neighborhoods of f0 in the Besov norm and entropy bounds on a suitable sieve—are only sketched. The manuscript invokes earlier LARK results without re-verifying that the specific Lévy intensity, B-spline kernel, and random-design measure continue to satisfy the required lower bounds on prior mass for every smoothness level s; this verification is central to the adaptation claim.
- [§3.3] §3.3, entropy bound for the sieve: the growth of the covering numbers induced by the random degree and knot locations must remain o(n ε_n²) uniformly over Besov balls of varying smoothness. The current argument does not explicitly control the effective dimension as a function of the Lévy intensity parameters when s is unknown, which is required to close the proof for the adaptive rate.
minor comments (2)
- [Introduction] The introduction would benefit from a concise table or paragraph contrasting the LABS rates with those already available for other adaptive spline or wavelet priors in Besov spaces.
- [Simulations] Figure captions for the simulation study should explicitly label the smoothness parameters and the competing methods so that the visual comparison is self-contained.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. The comments highlight areas where additional detail would strengthen the presentation of the theoretical results. We have revised the manuscript to address these points explicitly while preserving the original contributions.
read point-by-point responses
-
Referee: [Theorem 3.1] Theorem 3.1 (and the surrounding discussion in §3.2): the nearly-minimax contraction rate is obtained by appealing to a general Ghosal–van der Vaart-type theorem whose two load-bearing conditions—sufficient prior mass on Kullback–Leibler neighborhoods of f0 in the Besov norm and entropy bounds on a suitable sieve—are only sketched. The manuscript invokes earlier LARK results without re-verifying that the specific Lévy intensity, B-spline kernel, and random-design measure continue to satisfy the required lower bounds on prior mass for every smoothness level s; this verification is central to the adaptation claim.
Authors: We agree that explicit verification of the prior-mass condition for the specific LABS components is necessary to rigorously support automatic adaptation across all smoothness levels. In the revised manuscript we have added a dedicated appendix subsection that re-derives the lower bound on prior mass in Kullback–Leibler neighborhoods directly for the Lévy intensity, B-spline kernel, and random-design measure. The argument shows that the intensity parameters can be chosen independently of s so that the required mass holds uniformly over the Besov ball, thereby justifying the appeal to the general Ghosal–van der Vaart theorem for the adaptive rate. revision: yes
-
Referee: [§3.3] §3.3, entropy bound for the sieve: the growth of the covering numbers induced by the random degree and knot locations must remain o(n ε_n²) uniformly over Besov balls of varying smoothness. The current argument does not explicitly control the effective dimension as a function of the Lévy intensity parameters when s is unknown, which is required to close the proof for the adaptive rate.
Authors: We appreciate the referee’s observation that the entropy control must be uniform in the unknown smoothness. In the revised §3.3 we now explicitly bound the effective dimension of the random-degree-and-knot sieve in terms of the Lévy intensity parameters. These bounds are shown to be independent of s within the admissible range, ensuring that the metric entropy remains o(n ε_n²) uniformly over the Besov balls. The updated argument therefore closes the proof of the nearly minimax adaptive contraction rate. revision: yes
Circularity Check
No significant circularity; derivation applies external general theorems to specific prior.
full rationale
The paper claims posterior contraction for the LABS model in Besov spaces by verifying the standard conditions (prior mass on KL neighborhoods and entropy bounds) required by general Bayesian nonparametric contraction theorems from the literature. These verifications are performed for the Lévy-driven B-spline prior under the given random-design Gaussian model and constitute independent technical work rather than any self-definition, fitted-input renaming, or reduction of the target rate to quantities defined by the result itself. No load-bearing step equates the claimed nearly-minimax rate (up to log factor) to an input by construction, and any references to prior LARK work serve as background rather than an unverified self-citation chain that forces the conclusion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Univariate random design with Gaussian errors and the Lévy-process prior construction on B-splines satisfy the conditions needed for posterior contraction in Besov spaces.
Reference graph
Works this paper leans on
-
[1]
(2014), `Adaptive Priors Based on Splines with Random Knots', Bayesian Analysis, 9, 859--882
Belitser, E., and Serra, P. (2014), `Adaptive Priors Based on Splines with Random Knots', Bayesian Analysis, 9, 859--882
work page 2014
-
[2]
Biller, C. (2000), `Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models', Journal of Computational and Graphical Statistics, 9, 122--140
work page 2000
-
[3]
(2010), ` BART : Bayesian Additive Regression Trees', The Annals of Applied Statistics, 4, 266--298
Chipman, H.A., George, E.I., and McCulloch, R.E. (2010), ` BART : Bayesian Additive Regression Trees', The Annals of Applied Statistics, 4, 266--298
work page 2010
-
[4]
Cleveland, W.S. (1979), `Robust Locally Weighted Regression and Smoothing Scatterplots', Journal of the American Statistical Association, 74, 829--836
work page 1979
-
[5]
(1978), A Practical Guide to Splines, New York: Springer
de Boor, C. (1978), A Practical Guide to Splines, New York: Springer
work page 1978
-
[6]
de Jonge, R., and van Zanten, J.H. (2012), `Adaptive Estimation of Multivariate Functions Using Conditionally Gaussian Tensor-Product Spline Priors', Electronic Journal of Statistics, 6, 1984--2001
work page 2012
-
[7]
(1998), `Bayesian MARS ', Statistics and Computing, 8, 337--346
Denison, D.G.T., Mallick, B.K., and Smith, A.F.M. (1998), `Bayesian MARS ', Statistics and Computing, 8, 337--346
work page 1998
-
[8]
(2001), `Bayesian Curve-Fitting with Free-Knot Splines', Biometrika, 88, 1055--1071
DiMatteo, I., Genovese, C.R., and Kass, R.E. (2001), `Bayesian Curve-Fitting with Free-Knot Splines', Biometrika, 88, 1055--1071
work page 2001
-
[9]
(1994), `Ideal Spatial Adaptation by Wavelet Shrinkage', Biometrika, 81, 425--455
Donoho, D.L., and Johnstone, I.M. (1994), `Ideal Spatial Adaptation by Wavelet Shrinkage', Biometrika, 81, 425--455
work page 1994
-
[10]
(1998), `Minimax Estimation via Wavelet Shrinkage', The Annals of Statistics, 26, 879--921
Donoho, D.L., and Johnstone, I.M. (1998), `Minimax Estimation via Wavelet Shrinkage', The Annals of Statistics, 26, 879--921
work page 1998
-
[11]
(2000), `Convergence Rates of Posterior Distributions', The Annals of Statistics, 28, 500--531
Ghosal, S., Ghosh, J.K., and van der Vaart, A.W. (2000), `Convergence Rates of Posterior Distributions', The Annals of Statistics, 28, 500--531
work page 2000
-
[12]
Ghosal, S., and van der Vaart, A. (2007), `Convergence Rates of Posterior Distributions for Non-IID Observations', The Annals of Statistics, 35, 192--223
work page 2007
-
[13]
(2017), Fundamentals of Nonparametric Bayesian Inference, Cambridge: Cambridge University Press
Ghosal, S., and van der Vaart, A. (2017), Fundamentals of Nonparametric Bayesian Inference, Cambridge: Cambridge University Press
work page 2017
-
[14]
Gin \'e , E., and Nickl, R. (2021), Mathematical Foundations of Infinite-Dimensional Statistical Models, Cambridge: Cambridge University Press
work page 2021
-
[15]
Green, P.J. (1995), `Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination', Biometrika, 82, 711--732
work page 1995
-
[16]
Huang, T.-M. (2004), `Convergence Rates for Posterior Distributions and Adaptive Estimation', The Annals of Statistics, 32, 1556--1593
work page 2004
-
[17]
(2005), `Empirical Bayes Selection of Wavelet Thresholds', The Annals of Statistics, 33, 1700--1752
Johnstone, I.M., and Silverman, B.W. (2005), `Empirical Bayes Selection of Wavelet Thresholds', The Annals of Statistics, 33, 1700--1752
work page 2005
-
[18]
(2024), kernlab: Kernel-Based Machine Learning Lab (R package version 0.9-33)
Karatzoglou, A., Smola, A., and Hornik, K. (2024), kernlab: Kernel-Based Machine Learning Lab (R package version 0.9-33). Available at https://CRAN.R-project.org/package=kernlab
work page 2024
-
[19]
Lee, K., and Lee, J. (2022), `Asymptotic Properties for Bayesian Neural Network in Besov Space', in Advances in Neural Information Processing Systems, 35, pp. 5641--5653
work page 2022
-
[20]
Lindstrom, M.J. (2002), `Bayesian Estimation of Free-Knot Splines Using Reversible Jumps', Computational Statistics & Data Analysis, 41, 255--269
work page 2002
-
[21]
KAN: Kolmogorov-Arnold Networks
Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Solja c i \'c , M., Hou, T.Y., and Tegmark, M. (2025), ` KAN : Kolmogorov-Arnold Networks', arXiv preprint arXiv:2404.19756
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
(1996), `Bayesian Curve Fitting Using Multivariate Normal Mixtures', Biometrika, 83, 67--79
M \"u ller, P., Erkanli, A., and West, M. (1996), `Bayesian Curve Fitting Using Multivariate Normal Mixtures', Biometrika, 83, 67--79
work page 1996
-
[23]
Park, S., Oh, H.-S., and Lee, J. (2023), `L \'e vy Adaptive B-Spline Regression via Overcomplete Systems', Statistica Sinica, 33, 2715--2737
work page 2023
-
[24]
Available at https://www.R-project.org/
R Core Team (2025), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. Available at https://www.R-project.org/
work page 2025
-
[25]
(2007), Spline Functions: Basic Theory (3rd ed.), Cambridge: Cambridge University Press
Schumaker, L. (2007), Spline Functions: Basic Theory (3rd ed.), Cambridge: Cambridge University Press
work page 2007
-
[26]
Shen, W., and Ghosal, S. (2015), `Adaptive Bayesian Procedures Using Random Series Priors', Scandinavian Journal of Statistics, 42, 1194--1213
work page 2015
-
[27]
Johnstone, I., and Silverman, B.W. (2005), ` EbayesThresh : R Programs for Empirical Bayes Thresholding', Journal of Statistical Software, 12, 1--38
work page 2005
-
[28]
Suzuki, T. (2018), `Adaptivity of Deep ReLU Network for Learning in Besov and Mixed Smooth Besov Spaces: Optimal Rate and Curse of Dimensionality', arXiv preprint arXiv:1810.08033
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
(2003), Introduction \`a l'estimation non param \'e trique , Berlin: Springer
Tsybakov, A.B. (2003), Introduction \`a l'estimation non param \'e trique , Berlin: Springer
work page 2003
-
[30]
(2020), fANCOVA : Nonparametric Analysis of Covariance (R package version 0.6-1)
Wang, X. (2020), fANCOVA : Nonparametric Analysis of Covariance (R package version 0.6-1). Available at https://CRAN.R-project.org/package=fANCOVA
work page 2020
-
[31]
(2006), Gaussian Processes for Machine Learning, Cambridge, MA: MIT Press
Williams, C.K.I., and Rasmussen, C.E. (2006), Gaussian Processes for Machine Learning, Cambridge, MA: MIT Press
work page 2006
-
[32]
Wolpert, R.L., Clyde, M.A., and Tu, C. (2011), `Stochastic Expansions Using Continuous Dictionaries: L \'e vy Adaptive Regression Kernels', The Annals of Statistics, 39, 1916--1962
work page 2011
-
[33]
Xie, F., and Xu, Y. (2020), `Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models', Bayesian Analysis, 15, 159--186
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.