pith. sign in

arxiv: 2507.05064 · v4 · pith:YFN26XWMnew · submitted 2025-07-07 · 📊 stat.ML · cs.LG· stat.ME

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Pith reviewed 2026-05-25 07:42 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords Gaussian processesVecchia approximationinducing pointsscalable approximationsnon-Gaussian likelihoodsLaplace approximationpreconditionersnumerical stability
0
0 comments X

The pith

Vecchia-inducing-points full-scale approximations combine inducing points and Vecchia methods to scale Gaussian processes to large datasets with improved accuracy and stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes VIF approximations for Gaussian processes to address scalability issues with large data. It integrates global inducing points, effective for high-dimensional inputs and smooth covariances, with local Vecchia approximations suited to low-dimensional and moderately smooth cases. The key innovation is an efficient correlation-based strategy for finding neighbors in the Vecchia approximation of the residual process, using a modified cover tree. The method extends to non-Gaussian likelihoods with iterative methods and new preconditioners that speed up computations significantly. Numerical experiments demonstrate that these approximations are more efficient, accurate, and stable than current state-of-the-art methods.

Core claim

VIF approximations bridge the regimes of inducing point and Vecchia methods by using inducing points for the main process and Vecchia for the residual, with correlation-based neighbor finding, resulting in computationally efficient, accurate, and stable approximations for both Gaussian and non-Gaussian likelihoods.

What carries the argument

The VIF approximation, which pairs inducing points with a Vecchia approximation on the residual process via correlation-based neighbor search implemented with a modified cover tree algorithm.

If this is right

  • Enables handling of both low- and high-dimensional inputs effectively.
  • Reduces computational costs for non-Gaussian likelihoods by several orders of magnitude using iterative methods.
  • Provides theoretical convergence results for the preconditioners in Laplace approximations.
  • Shows superior performance in experiments on simulated and real-world datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may allow Gaussian processes to be applied to even larger problems in fields like spatial statistics or machine learning.
  • Novel preconditioners could be adapted for other scalable GP methods.
  • Further testing in extreme regimes might highlight when the neighbor-finding strategy needs adjustment.

Load-bearing premise

The correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process works reliably across various input dimensions and covariance smoothness levels.

What would settle it

If experiments on data with higher input dimensions or less smooth covariance functions than those tested show reduced accuracy or instability compared to alternatives, that would challenge the central claim.

Figures

Figures reproduced from arXiv: 2507.05064 by Fabio Sigrist, Reinhard Furrer, Tim Gyger.

Figure 1
Figure 1. Figure 1: Violin plots of the estimated variance parameter [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: shows the results when comparing the VIF, FITC, and Vecchia approximations for varying input dimensions d for a 3/2-Matérn kernel. As expected, we find that the Vecchia approximation is very accurate for low-dimensional inputs. However, the accuracy of the Vecchia approximation declines relatively quickly with increasing dimension d, and the FITC approximation is considerably more accurate for large dimens… view at source ↗
Figure 3
Figure 3. Figure 3: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for 1/2-Matérn, 3/2-Matérn, 5/2-Matérn, and Gaussian (∞-Matérn) ARD kernels for d = 10. 7.2 Comparison of preconditioners For all subsequent experiments, unless stated otherwise, we generate 100’000 samples from a zero￾mean Gaussian process with five-dimensional inputs… view at source ↗
Figure 4
Figure 4. Figure 4: Differences of iterative-methods-based log-marginal likelihoods compared to Cholesky-based [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy and runtime comparison of simulation- and iterative-methods-based predictive [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Time (s) for computing the marginal likelihood with VIF, FITC, and Vecchia approximations [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Time (s) for constructing the cover tree and finding the [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Log-score (LS) with error bars (mean ± 2 standard errors) for the data sets modeled with a Gaussian (left plot) and non-Gaussian likelihoods (right plot). The current implementation of DKLGP by Cao et al. [2023] available on https://github.com/katzfuss￾group/DKL-GP/ does not support Poisson or Gamma likelihoods. The NA entry indicates that SGPR crashed. The VIF approximation consistently outperforms all ot… view at source ↗
Figure 9
Figure 9. Figure 9: Log-score (LS) with error bars (mean ± 2 standard errors) when estimating the smoothness parameter for the regression data sets (left plot) and when using non-zero prior mean functions (right plot). Next, we extend the GP model (1) by allowing for non-zero prior mean (fixed effects) functions F(·). Specifically, we consider a linear regression function F(x) = x Tβ as well as a function that is modeled usin… view at source ↗
Figure 10
Figure 10. Figure 10: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for 1/2-Matérn, 3/2-Matérn, 5/2-Matérn, and Gaussian (∞-Matérn) ARD kernels when d = 2. Figures 2, 3, and 10 [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for various dimensions d using an error variance of 0.01 and length scale parameters chosen such that the covariance remains approximately equal (to the one of a Gaussian kernel with length scales λ = (0.35, 0.4, 0.45, 0.5, 0.55)T) at the average distance among two ra… view at source ↗
Figure 12
Figure 12. Figure 12: Log-marginal likelihood differences relative to Cholesky-based computations and runtime [PITH_FULL_IMAGE:figures/full_fig_p037_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Time (s) for computing predictive distributions with VIF, FITC, and Vecchia approxima [PITH_FULL_IMAGE:figures/full_fig_p039_13.png] view at source ↗
read the original abstract

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Vecchia-inducing-points full-scale (VIF) approximations for Gaussian processes that combine global inducing-point methods with local Vecchia approximations applied to the residual process after the inducing-point correction. The Vecchia component uses a correlation-based neighbor-finding strategy implemented via a modified cover tree. The framework is extended to non-Gaussian likelihoods via iterative methods and novel preconditioners with theoretical convergence results under Laplace approximation. Extensive numerical experiments on simulated and real-world datasets are reported to demonstrate that VIF approximations are computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. The methods are implemented in the open-source GPBoost library with Python and R interfaces.

Significance. If the central claims hold, the work provides a practical method that bridges the regimes where Vecchia approximations excel (low-dimensional inputs, moderate smoothness) and where inducing-point methods are preferred (high-dimensional inputs, smoother kernels). The open-source implementation and the theoretical convergence results for the iterative solvers are explicit strengths that enhance reproducibility and usability.

major comments (2)
  1. [Method description of VIF approximation and neighbor-finding strategy] The central claim that VIF bridges low- and high-dimensional regimes rests on the reliability of the correlation-based neighbor-finding strategy for the residual Vecchia component. The manuscript does not report targeted experiments that stress this heuristic when input dimension exceeds the tested range (d>20) or when covariance smoothness increases (e.g., Matérn ν>5/2), where pairwise correlation may cease to be a faithful proxy for conditional dependence after the global correction. This assumption is load-bearing for the accuracy and stability superiority claims.
  2. [Numerical experiments section] The abstract asserts superiority on the basis of extensive numerical experiments, yet the manuscript provides no details on experimental design, data exclusion criteria, or error-bar reporting. Without these, the support for the claim that VIF is more accurate and numerically stable cannot be fully assessed. This directly affects evaluation of the central empirical claim.
minor comments (2)
  1. [Abstract] The phrase 'several orders of magnitudes' in the abstract should read 'several orders of magnitude'.
  2. Notation for the residual process and the correlation threshold parameter should be introduced with explicit definitions and cross-references to the inducing-point correction step to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method description of VIF approximation and neighbor-finding strategy] The central claim that VIF bridges low- and high-dimensional regimes rests on the reliability of the correlation-based neighbor-finding strategy for the residual Vecchia component. The manuscript does not report targeted experiments that stress this heuristic when input dimension exceeds the tested range (d>20) or when covariance smoothness increases (e.g., Matérn ν>5/2), where pairwise correlation may cease to be a faithful proxy for conditional dependence after the global correction. This assumption is load-bearing for the accuracy and stability superiority claims.

    Authors: We agree that the reliability of the correlation-based neighbor-finding strategy after the inducing-point correction is central to the bridging claim. While the current experiments span a range of input dimensions and kernel smoothness levels, we acknowledge that the manuscript lacks targeted stress tests specifically for d>20 and Matérn ν>5/2. In the revised manuscript we will add such experiments to directly evaluate the heuristic's performance in these regimes and thereby provide stronger empirical support for the accuracy and stability claims. revision: yes

  2. Referee: [Numerical experiments section] The abstract asserts superiority on the basis of extensive numerical experiments, yet the manuscript provides no details on experimental design, data exclusion criteria, or error-bar reporting. Without these, the support for the claim that VIF is more accurate and numerically stable cannot be fully assessed. This directly affects evaluation of the central empirical claim.

    Authors: We thank the referee for highlighting the need for greater transparency. In the revised manuscript we will expand the numerical experiments section to include a detailed description of the experimental design, any data exclusion criteria applied, and reporting of error bars or standard deviations across repeated runs. This will allow readers to fully assess the empirical support for the accuracy and stability superiority claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; hybrid approximation is independently constructed.

full rationale

The paper introduces VIF as an algorithmic combination of global inducing points and local Vecchia approximations for the residual process, with a correlation-based neighbor search via modified cover tree. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted inputs, self-citations, or prior ansatzes from the same authors. The central performance claims rest on numerical experiments rather than any self-referential derivation. This is the common case of a self-contained methodological proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the proposed algorithmic combination itself.

pith-pipeline@v0.9.0 · 5754 in / 1002 out tokens · 27345 ms · 2026-05-25T07:42:15.815479+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    K-means++ the advantages of careful seeding

    David Arthur and Sergei Vassilvitskii. K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027--1035, 2007

  2. [2]

    Parameter estimation in high dimensional G aussian distributions

    Erlend Aune, Daniel P Simpson, and Jo Eidsvik. Parameter estimation in high dimensional G aussian distributions. Statistics and Computing, 24: 0 247--263, 2014

  3. [3]

    G aussian predictive process models for large spatial data sets

    Sudipto Banerjee, Alan E Gelfand, Andrew O Finley, and Huiyan Sang. G aussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (4): 0 825--848, 2008

  4. [4]

    An estimator for the diagonal of a matrix

    Costas Bekas, Effrosyni Kokiopoulou, and Yousef Saad. An estimator for the diagonal of a matrix. Applied N umerical M athematics , 57 0 (11-12): 0 1214--1229, 2007

  5. [5]

    Cover trees for nearest neighbor

    Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, pages 97--104, 2006

  6. [6]

    Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization

    Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Tobias Schaefer, and Matthias Katzfuss. Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization. In International Conference on Machine Learning, pages 3559--3576. PMLR, 2023

  7. [7]

    Statistics for spatial data

    Noel Cressie. Statistics for spatial data. John Wiley & Sons, 1993

  8. [8]

    Improving dual-tree algorithms

    Ryan R Curtin. Improving dual-tree algorithms. PhD thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2016

  9. [9]

    Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets

    Abhirup Datta, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111 0 (514): 0 800--812, 2016

  10. [10]

    Direct methods for sparse linear systems

    Timothy A Davis. Direct methods for sparse linear systems. SIAM, 2006

  11. [11]

    Scalable log determinants for G aussian process kernel learning

    Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, and Andrew G Wilson. Scalable log determinants for G aussian process kernel learning. Advances in Neural Information Processing Systems, 30, 2017

  12. [12]

    The approximation of one matrix by another of lower rank

    Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1 0 (3): 0 211--218, 1936

  13. [13]

    A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

    Yury Elkin and Vitaliy Kurlin. A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. In International Conference on Machine Learning, pages 9267--9311. PMLR, 2023

  14. [14]

    Improving the performance of predictive process modeling for large datasets

    Andrew O Finley, Huiyan Sang, Sudipto Banerjee, and Alan E Gelfand. Improving the performance of predictive process modeling for large datasets. Computational S tatistics & D ata A nalysis , 53 0 (8): 0 2873--2884, 2009

  15. [15]

    Practical methods of optimization

    Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000

  16. [16]

    Covariance tapering for interpolation of large spatial datasets

    Reinhard Furrer, Marc G Genton, and Douglas Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15 0 (3): 0 502--523, 2006

  17. [17]

    Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration

    Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration. Advances in Neural Information Processing Systems, 31, 2018 a

  18. [18]

    Product kernel interpolation for scalable G aussian processes

    Jacob Gardner, Geoff Pleiss, Ruihan Wu, Kilian Weinberger, and Andrew Wilson. Product kernel interpolation for scalable G aussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407--1416. PMLR, 2018 b

  19. [19]

    G aussian process learning via F isher scoring of V ecchia’s approximation

    Joseph Guinness. G aussian process learning via F isher scoring of V ecchia’s approximation . Statistics and Computing, 31 0 (3): 0 1--8, 2021

  20. [20]

    Iterative methods for full-scale G aussian process approximations for large spatial data

    Tim Gyger, Reinhard Furrer, and Fabio Sigrist. Iterative methods for full-scale G aussian process approximations for large spatial data. arXiv preprint arXiv:2405.14492, 2024

  21. [21]

    On the low-rank approximation by the pivoted C holesky decomposition

    Helmut Harbrecht, Michael Peters, and Reinhold Schneider. On the low-rank approximation by the pivoted C holesky decomposition. Applied N umerical M athematics , 62 0 (4): 0 428--440, 2012

  22. [22]

    A case study competition among methods for analyzing large spatial data

    Matthew J Heaton, Abhirup Datta, Andrew O Finley, Reinhard Furrer, Joseph Guinness, Rajarshi Guhaniyogi, Florian Gerber, Robert B Gramacy, Dorit Hammerling, Matthias Katzfuss, et al. A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24: 0 398--425, 2019

  23. [23]

    G aussian processes for big data

    James Hensman, Nicol \`o Fusi, and Neil D Lawrence. G aussian processes for big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 282--290, 2013

  24. [24]

    Scalable variational G aussian process classification

    James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational G aussian process classification. In Artificial intelligence and statistics, pages 351--360. PMLR, 2015

  25. [25]

    Matrix analysis

    Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university P ress, 2012

  26. [26]

    A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines

    Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18 0 (3): 0 1059--1076, 1989

  27. [27]

    Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference

    Myeongjong Kang and Matthias Katzfuss. Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference. Statistics and Computing, 33 0 (3): 0 56, 2023

  28. [28]

    A general framework for V ecchia approximations of G aussian processes

    Matthias Katzfuss and Joseph Guinness. A general framework for V ecchia approximations of G aussian processes. Statistical Science, 36 0 (1): 0 124--141, 2021

  29. [29]

    V ecchia approximations of G aussian-process predictions

    Matthias Katzfuss, Joseph Guinness, Wenlong Gong, and Daniel Zilber. V ecchia approximations of G aussian-process predictions. Journal of Agricultural, Biological and Environmental Statistics, 25: 0 383--414, 2020

  30. [30]

    Iterative methods for V ecchia- L aplace approximations for latent G aussian process models

    Pascal K \"u ndig and Fabio Sigrist. Iterative methods for V ecchia- L aplace approximations for latent G aussian process models. Journal of the American Statistical Association, 0 (just-accepted): 0 1--22, 2024

  31. [31]

    An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

    Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. 1950

  32. [32]

    Control variates

    Christiane Lemieux. Control variates. Wiley StatsRef: Statistics Reference Online, pages 1--8, 2014

  33. [33]

    Generalized nested dissection

    Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested dissection. SIAM Journal on N umerical A nalysis , 16 0 (2): 0 346--358, 1979

  34. [34]

    When G aussian process meets big data: A review of scalable GP s

    Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. When G aussian process meets big data: A review of scalable GP s. IEEE Transactions on Neural Networks and Learning Systems, 31 0 (11): 0 4405--4423, 2020

  35. [35]

    Approximations for binary G aussian process classification

    Hannes Nickisch and Carl Edward Rasmussen. Approximations for binary G aussian process classification. Journal of Machine Learning Research, 9 0 (Oct): 0 2035--2078, 2008

  36. [36]

    Constant-time predictive distributions for G aussian processes

    Geoff Pleiss, Jacob Gardner, Kilian Weinberger, and Andrew Gordon Wilson. Constant-time predictive distributions for G aussian processes. In International Conference on Machine Learning, pages 4114--4123. PMLR, 2018

  37. [37]

    A unifying view of sparse approximate G aussian process regression

    Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate G aussian process regression. The Journal of Machine Learning Research, 6: 0 1939--1959, 2005

  38. [38]

    An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data

    Filippo Rambelli and Fabio Sigrist. An accuracy-runtime trade-off comparison of scalable G aussian process approximations for spatial data. arXiv preprint arXiv:2501.11448, 2025

  39. [39]

    Williams

    Carl Edward Rasmussen and Christopher K.I. Williams. G aussian processes for machine learning . MIT P ress Cambridge, MA, 2006

  40. [40]

    Iterative methods for sparse linear systems

    Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003

  41. [41]

    A full scale approximation of covariance functions for large spatial data sets

    Huiyan Sang and Jianhua Z Huang. A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74 0 (1): 0 111--132, 2012

  42. [42]

    Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

    Huiyan Sang, Mikyoung Jun, and Jianhua Z Huang. Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors. The Annals of Applied Statistics, pages 2519--2548, 2011

  43. [43]

    Sparse C holesky factorization by K ullback-- L eibler minimization

    Florian Sch\"afer, Matthias Katzfuss, and Houman Owhadi. Sparse C holesky factorization by K ullback-- L eibler minimization. SIAM Journal on Scientific Computing, 43 0 (3): 0 A2019--A2046, 2021 a

  44. [44]

    Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity

    Florian Sch\"afer, Timothy John Sullivan, and Houman Owhadi. Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity. Multiscale Modeling & Simulation, 19 0 (2): 0 688--730, 2021 b

  45. [45]

    Two new lower bounds for the smallest singular value

    Xu Shun. Two new lower bounds for the smallest singular value. arXiv preprint arXiv:2108.01221, 2021

  46. [46]

    G aussian process boosting

    Fabio Sigrist. G aussian process boosting. The Journal of Machine Learning Research, 23 0 (1): 0 10565--10610, 2022 a

  47. [47]

    Latent G aussian model boosting

    Fabio Sigrist. Latent G aussian model boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (2): 0 1894--1905, 2022 b

  48. [48]

    Integrating random effects in deep neural networks

    Giora Simchoni and Saharon Rosset. Integrating random effects in deep neural networks. Journal of Machine Learning Research, 24 0 (156): 0 1--57, 2023

  49. [49]

    A review of nystr \"o m methods for large-scale machine learning

    Shiliang Sun, Jing Zhao, and Jiang Zhu. A review of nystr \"o m methods for large-scale machine learning. Information Fusion, 26: 0 36--48, 2015

  50. [50]

    Accurate approximations for posterior moments and marginal densities

    Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities . Journal of the American Statistical Association, 81 0 (393): 0 82--86, 1986

  51. [51]

    Variational learning of inducing variables in sparse G aussian processes

    Michalis Titsias. Variational learning of inducing variables in sparse G aussian processes. In Artificial intelligence and statistics, pages 567--574. PMLR, 2009

  52. [52]

    N umerical linear algebra , volume 181

    Lloyd N Trefethen and David Bau. N umerical linear algebra , volume 181. Siam, 2022

  53. [53]

    Some bounds for the singular values of matrices

    Ramazan Turkmen and Haci Civciv. Some bounds for the singular values of matrices. Applied M athematical Sciences , 1 0 (49): 0 2443--2449, 2007

  54. [54]

    Fast estimation of tr(f(a)) via stochastic lanczos quadrature

    Shashanka Ubaru, Jie Chen, and Yousef Saad. Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM Journal on Matrix Analysis and Applications, 38 0 (4): 0 1075--1099, 2017

  55. [55]

    Estimation and model identification for continuous spatial processes

    Aldo V V ecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50 0 (2): 0 297--312, 1988

  56. [56]

    Kernel interpolation for scalable structured G aussian processes ( KISS - GP )

    Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured G aussian processes ( KISS - GP ). In International conference on machine learning, pages 1775--1784. PMLR, 2015

  57. [57]

    A note on a lower bound for the smallest singular value

    Yu Yi-Sheng and Gu Dun-He. A note on a lower bound for the smallest singular value. Linear algebra and its Applications, 253 0 (1-3): 0 25--38, 1997

  58. [58]

    Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets

    Bohai Zhang, Huiyan Sang, and Jianhua Z Huang. Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets. Statistica Sinica, 29 0 (4): 0 1711--1737, 2019