pith. sign in

arxiv: 2307.00847 · v5 · submitted 2023-07-03 · 🧮 math.NA · cs.NA

An analysis on stochastic Lanczos quadrature with asymmetric quadrature nodes

Pith reviewed 2026-05-24 07:43 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords stochastic Lanczos quadraturetrace estimationerror allocationHutchinson estimatorasymmetric quadraturelog-determinantmatrix functions
0
0 comments X

The pith

Optimizing the split of error tolerance between the Hutchinson estimator and Lanczos quadrature minimizes total matrix-vector multiplications for target trace accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes stochastic Lanczos quadrature for trace estimation of matrix functions, paying special attention to asymmetric quadrature nodes. It first clarifies why a scaling factor appears or disappears under affine interval mapping. The core advance is an optimization that assigns the total error budget between the stochastic Hutchinson part and the quadrature part so the combined number of matrix-vector multiplications is as small as possible. Experiments confirm that the resulting allocation is more efficient than equal splitting and consistently favors larger Lanczos step counts over larger sample counts.

Core claim

Rather than splitting the error tolerance evenly between the Hutchinson trace estimator and the Lanczos quadrature, the authors formulate an optimization problem to distribute the error budget so that the total number of matrix-vector multiplications is minimized while meeting a prescribed accuracy for log-determinant estimation; the same procedure works for both Rademacher and Gaussian random vectors.

What carries the argument

Optimization problem that allocates the allowable error between the Hutchinson estimator and the asymmetric Lanczos quadrature to minimize their combined matrix-vector multiplication count.

If this is right

  • The optimized allocation produces tighter overall error bounds than uniform splitting.
  • For both Rademacher and Gaussian queries the minimum occurs at larger Lanczos step counts m and smaller sample counts N.
  • The same optimization supplies an explicit rule for choosing m and N given a target accuracy and matrix size.
  • The approach applies directly to log-determinant estimation and extends to other trace-of-function tasks that use SLQ.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same budget-reallocation idea could be tested on other stochastic estimators that combine Monte Carlo sampling with deterministic quadrature.
  • In large-scale applications the rule of favoring more Lanczos steps may translate into faster wall-clock time once matrix-vector products dominate the cost.
  • Matrices whose spectra are far from symmetric may show even larger gains from the asymmetric-node analysis.

Load-bearing premise

The separate error bounds for the Hutchinson estimator and the Lanczos quadrature remain sufficiently tight and independent when the optimization is performed.

What would settle it

Running the method on standard test matrices with the optimized allocation versus an equal-split baseline and finding that the optimized version requires more, not fewer, matrix-vector multiplications to reach the target accuracy would falsify the practical gain.

Figures

Figures reproduced from arXiv: 2307.00847 by Shengxin Zhu, Wenhao Li, Yixuan Huang.

Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 5
Figure 5. Figure 5: a [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
read the original abstract

This paper revisits the error analysis of the Stochastic Lanczos Quadrature (SLQ) method for approximating the trace of matrix functions, with a specific focus on asymmetric Lanczos quadrature rules. We reexplain an existing theoretical discrepancy regarding the necessity of a scaling factor when applying an affine transformation from the reference interval to the physical spectral interval. Furthermore, we introduce an optimized error reallocation technique for log-determinant estimation. Rather than evenly splitting the error tolerance between the Hutchinson trace estimator and the Lanczos quadrature, we formulate an optimization problem to strategically distribute the error budget. This approach minimizes the total number of matrix-vector multiplications (MVMs) required to reach a target accuracy for both Rademacher and Gaussian queries. Numerical experiments validate that this reallocation yields tighter theoretical bounds and provides a concrete rule-of-thumb for parameter configuration: to achieve a target accuracy efficiently, more computational resources should be allocated to the Lanczos process (larger m) rather than Monte Carlo sampling (smaller N).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript revisits the error analysis of stochastic Lanczos quadrature (SLQ) for approximating traces of matrix functions, with emphasis on asymmetric quadrature nodes. It clarifies a scaling-factor discrepancy arising from affine transformations of the reference interval and introduces an optimization problem that reallocates a total error tolerance ε between the Hutchinson Monte Carlo estimator (parameter N) and the Lanczos quadrature (parameter m) so as to minimize the total number of matrix-vector multiplications. The resulting rule-of-thumb—that more resources should be given to larger m than to larger N—is illustrated for both Rademacher and Gaussian queries on log-determinant estimation and is supported by numerical experiments.

Significance. If the optimization produces allocations that remain valid once the two error sources are coupled through the same matrix-vector products, the work supplies a concrete, practical improvement to the parameter selection of SLQ methods that are already used in large-scale applications. The explicit minimization of MVM count and the accompanying rule-of-thumb constitute a useful engineering contribution; the clarification of the scaling factor removes a potential source of implementation error.

major comments (2)
  1. [optimized error reallocation] The optimization problem (described in the section introducing the optimized error reallocation) treats the a-priori bounds on the Hutchinson estimator and the asymmetric Lanczos quadrature as separable and additive. No analytic argument is supplied showing that the stochastic and deterministic error components remain independent, nor is a diagnostic reported that quantifies the cross term on the test matrices; this assumption is load-bearing for the claimed rule-of-thumb.
  2. [numerical experiments] The numerical experiments are said to validate tighter theoretical bounds, yet the manuscript does not state whether the optimization was solved with the exact analytic bounds or with empirical estimates, nor whether the test matrices and (N,m) pairs were chosen before or after seeing the results; this information is needed to assess whether the reported efficiency gain is reproducible.
minor comments (2)
  1. [error analysis] The re-explanation of the scaling-factor discrepancy would benefit from an explicit equation showing the affine map and the factor that was previously omitted.
  2. [optimization formulation] Notation for the total error tolerance ε and the individual tolerances ε_MC and ε_quad should be introduced once and used consistently throughout the optimization derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions.

read point-by-point responses
  1. Referee: The optimization problem (described in the section introducing the optimized error reallocation) treats the a-priori bounds on the Hutchinson estimator and the asymmetric Lanczos quadrature as separable and additive. No analytic argument is supplied showing that the stochastic and deterministic error components remain independent, nor is a diagnostic reported that quantifies the cross term on the test matrices; this assumption is load-bearing for the claimed rule-of-thumb.

    Authors: The optimization minimizes total MVM cost subject to the sum of the two a-priori upper bounds being at most ε. This construction relies only on the triangle inequality, which supplies a valid (conservative) bound on the total error without requiring statistical independence of the components. Dependence between the Hutchinson estimator and the Lanczos quadrature error does not invalidate the additive bound. We will add a short paragraph clarifying this point and will include, in the numerical section, a comparison of the observed total error against the sum of the individual bounds on the test matrices. revision: yes

  2. Referee: The numerical experiments are said to validate tighter theoretical bounds, yet the manuscript does not state whether the optimization was solved with the exact analytic bounds or with empirical estimates, nor whether the test matrices and (N,m) pairs were chosen before or after seeing the results; this information is needed to assess whether the reported efficiency gain is reproducible.

    Authors: We will add an explicit statement that the optimization was solved using the exact analytic a-priori bounds presented in the paper, that the test matrices were chosen from standard examples in the trace-estimation literature prior to any experiments, and that the (N,m) pairs were obtained by solving the optimization problem for each target accuracy before the runs were performed. revision: yes

Circularity Check

0 steps flagged

No circularity; optimization uses external bounds without reduction to inputs

full rationale

The paper's central contribution is formulating an optimization problem to allocate error tolerance between Hutchinson trace estimation (parameter N) and asymmetric Lanczos quadrature (parameter m), minimizing total MVMs. This is presented as a new technique building on standard a-priori error bounds for each component separately. No quoted equations or steps show the optimization reducing to a fitted parameter, self-definition, or self-citation chain; the rule-of-thumb (larger m, smaller N) follows from the external bounds rather than being forced by construction. The derivation remains self-contained against the cited external error analyses.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard error bounds for Hutchinson's estimator and Lanczos quadrature drawn from prior literature together with conventional assumptions on matrix spectra; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Standard error bounds for the Hutchinson trace estimator and Lanczos quadrature remain valid when the matrix spectrum is affinely mapped to a reference interval.
    Invoked when the authors discuss the scaling factor discrepancy and when they formulate the error-budget optimization.

pith-pipeline@v0.9.0 · 5702 in / 1406 out tokens · 30856 ms · 2026-05-24T07:43:40.442466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    A vron and S

    H. A vron and S. Toledo , Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix, Journal of the ACM, 58 (2011), pp. 1–34

  2. [2]

    Z. Bai, G. F ahey, and G. Golub , Some large-scale matrix computation problems, Journal of Computational and Applied Mathematics, 74 (1996), pp. 71–89

  3. [3]

    Björck, Numerical methods for least squares problems, SIAM, 1996

    Å. Björck, Numerical methods for least squares problems, SIAM, 1996

  4. [4]

    J. W. Brown and R. V. Churchill , Complex variables and applications, McGraw-Hill„ 2009

  5. [5]

    Z. Chen, S. Zhu, Q. Niu, and T. Zuo , Knowledge discovery and recommendation with linear mixed model, IEEE Access, 8 (2020), pp. 38304–38317

  6. [6]

    Cortinovis and D

    A. Cortinovis and D. Kressner , On randomized trace estimates for indefinite matrices with an application to determinants, Foundations of Computational Mathematics, 22 (2022), pp. 875–903

  7. [7]

    T. A. Da vis and Y. Hu , The University of Florida sparse matrix collection, ACM Transac- tions on Mathematical Software, 38 (2011), pp. 1–25

  8. [8]

    K. Dong, D. Eriksson, H. Nickisch, D. Bindel, and A. G. Wilson , Scalable log determi- nants for Gaussian process kernel learning, in Advances in Neural Information Processing Systems, 2017, pp. 6327–6337

  9. [9]

    E. N. Epperly, J. A. Tropp, and R. J. Webber , Xtrace: Making the most of every sample in stochastic trace estimation, arXiv preprint arXiv:2301.07825, (2023)

  10. [10]

    G. B. Folland , Real analysis: modern techniques and their applications, vol. 40, John Wiley & Sons, 1999

  11. [11]

    Girard , A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data, Numerische Mathematik, 56 (1989), pp

    A. Girard , A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data, Numerische Mathematik, 56 (1989), pp. 1–23

  12. [12]

    G. H. Golub and G. Meurant , Matrices, moments and quadrature with applications, vol. 30, Princeton University Press, 2009

  13. [13]

    G. H. Golub and J. H. Welsch , Calculation of Gauss quadrature rules, Mathematics of Computation, 23 (1969), pp. 221–230

  14. [14]

    L. A. Hageman and D. M. Young , Applied iterative methods, Academic Press, 1981

  15. [15]

    I. Han, D. Malioutov, and J. Shin , Large-scale log-determinant computation through sto- chastic Chebyshev expansions, in International Conference on Machine Learning, 2015, pp. 908–917

  16. [16]

    Z. Han, Y. Huang, and S. Zhu , Error bound of Lanczos method to approximate quadrature form, (2023)

  17. [17]

    Z. Han, W. Li, Y. Huang, and S. Zhu ,Suboptimal subspace construction for log-determinant approximation, arXiv preprint arXiv:2307.02152, (2023)

  18. [18]

    Hildebrandt, Definitions of Stieltjes integrals of the Riemann type, The American Math- ematical Monthly, 45 (1938), pp

    T. Hildebrandt, Definitions of Stieltjes integrals of the Riemann type, The American Math- ematical Monthly, 45 (1938), pp. 265–278

  19. [19]

    M. F. Hutchinson , A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines, Comm. Statist. Simulation Comput., 19 (1990), pp. 433–450

  20. [20]

    Kullback and R

    S. Kullback and R. A. Leibler , On information and sufficiency, The Annals of Mathemat- ical Statistics, 22 (1951), pp. 79–86

  21. [21]

    Lanczos, Linear differential operators, SIAM, 1996

    C. Lanczos, Linear differential operators, SIAM, 1996

  22. [22]

    Li and Y

    H. Li and Y. Zhu , Randomized block Krylov space methods for trace and log-determinant estimators, BIT Numerical Mathematics, 61 (2021), pp. 911–939

  23. [23]

    Li , Comparison of four algorithms to calculate log determinant, bachelor’s thesis, Xi’an Jiaotong-Liverpool University, 2020

    W. Li , Comparison of four algorithms to calculate log determinant, bachelor’s thesis, Xi’an Jiaotong-Liverpool University, 2020

  24. [24]

    D. J. MacKay , Bayesian interpolation, Neural Computation, 4 (1992), pp. 415–447

  25. [25]

    R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff , Hutch++: Optimal stochastic trace estimation, in Symposium on Simplicity in Algorithms (SOSA), SIAM, 2021, pp. 142– 155

  26. [26]

    Persson, A

    D. Persson, A. Cortinovis, and D. Kressner , Improved variants of the Hutch++ algo- 17 rithm for trace estimation, SIAM Journal on Matrix Analysis and Applications, 43 (2022), pp. 1162–1185

  27. [27]

    Quiñonero Candela and C

    J. Quiñonero Candela and C. E. Rasmussen , A unifying view of sparse approximate Gaussian process regression, Journal of Machine Learning Research, 6 (2005), pp. 1939– 1959

  28. [28]

    Quiñonero Candela, C

    J. Quiñonero Candela, C. E. Rasmussen, and C. K. Williams , Approximation methods for Gaussian process regression, in Large-scale Kernel Machines, MIT Press, 2007, pp. 203– 223

  29. [29]

    C. E. Rasmussen and C. K. I. Williams , Gaussian processes for machine learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2006

  30. [30]

    Roosta-Khorasani and U

    F. Roosta-Khorasani and U. Ascher , Improved bounds on sample size for implicit matrix trace estimators, Foundations of Computational Mathematics, 15 (2015), pp. 1187–1212

  31. [31]

    Saad, Iterative methods for sparse linear systems, SIAM, 2003

    Y. Saad, Iterative methods for sparse linear systems, SIAM, 2003

  32. [32]

    A. K. Saibaba, A. Alexanderian, and I. C. F. Ipsen , Randomized matrix-free trace and log-determinant estimators, Numerische Mathematik, 137 (2017), pp. 353–395

  33. [33]

    Ubaru, J

    S. Ubaru, J. Chen, and Y. Saad , Fast estimation oftr(f (A)) via stochastic Lanczos quad- rature, SIAM Journal on Matrix Analysis and Applications, 38 (2017), pp. 1075–1099

  34. [34]

    A. J. W athen and S. Zhu , On spectral distribution of kernel matrices related to radial basis functions, Numerical Algorithms, 70 (2015), pp. 709–726

  35. [35]

    Essential formulae for restricted maximum likelihood and its derivatives associated with the linear mixed models

    S. Zhu and A. J. W athen , Essential formulae for restricted maximum likelihood and its de- rivatives associated with the linear mixed models, arXiv preprint arXiv:1805.05188, (2018)

  36. [36]

    Zhu and A

    S. Zhu and A. J. W athen , Sparse inversion for derivative of log determinant, arXiv preprint arXiv:1911.00685, (2019). Appendix A. Jacobi Iteration Matrices have Symmetric Eigenvalue Distribution. As first pointed out by [21, Section 3.7], matrices of form (3.4) have symmetric eigenvalue distribution, while Theorem 1.2.2 in [3] summarized the discussion....