pith. machine review for the scientific record. sign in

arxiv: 2604.16075 · v1 · submitted 2026-04-17 · 🧮 math.NA · cs.DS· cs.LG· cs.NA· math.OC

Towards Universal Convergence of Backward Error in Linear System Solvers

Pith reviewed 2026-05-10 08:06 UTC · model grok-4.3

classification 🧮 math.NA cs.DScs.LGcs.NAmath.OC
keywords backward errorRichardson iterationpositive semidefinite matricesiterative solverslinear systemsKrylov methodsconvergence ratesnumerical linear algebra
0
0 comments X

The pith

Richardson iteration achieves at most 1/k relative backward error after k steps on any positive semidefinite linear system, independent of condition number.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the classical Richardson iteration for solving Ax = b produces iterates whose relative backward error is bounded by 1/k after exactly k steps whenever A is positive semidefinite. Backward error measures the smallest relative perturbation to A and b that would make the current iterate an exact solution, which is often the quantity that matters for stability and practical accuracy. Because the bound holds without any dependence on the eigenvalues or conditioning of A, the method yields an O(n^{2}/epsilon) algorithm that reaches any target backward error epsilon. The authors also introduce MINBERR, which minimizes backward error directly inside a Krylov subspace and obtains a faster 1/k^{2} rate, and they extend the approach to general matrices via normal equations with empirically observed 1/k convergence.

Core claim

For any positive semidefinite matrix A and vector b, the k-th Richardson iterate x_k satisfies that the relative backward error is at most 1/k: there exist perturbations Delta A and Delta b with ||Delta A||/||A|| + ||Delta b||/||b|| <= 1/k such that (A + Delta A)x_k = b + Delta b. This bound is derived directly from the residual contraction properties of the iteration on PSD matrices and does not involve the condition number. The same framework yields an O(1/k^{2}) rate when the iterate is chosen to minimize backward error over the Krylov subspace, which is realized by the MINBERR algorithm.

What carries the argument

The residual contraction of Richardson iteration on PSD matrices that directly controls the minimal perturbation size making the current approximate solution exact.

Load-bearing premise

The matrix must be positive semidefinite for the unconditional 1/k backward error bound to hold.

What would settle it

Compute the relative backward error of the k-th Richardson iterate on a positive semidefinite matrix whose condition number exceeds k; if that error ever exceeds 1/k for any such matrix, the claim is false.

Figures

Figures reproduced from arXiv: 2604.16075 by Elizaveta Rebrova, Micha{\l} Derezi\'nski, Yuji Nakatsukasa.

Figure 1
Figure 1. Figure 1: Empirical evaluation of MINBERR (a,b,c) and MINBERR-NE (d,e,f) on [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing MINRES with MINBERR on the Small-Outlier(2000, κ, σn−1) synthetic task, while varying κ and σn−1. MINBERR is denoted by solid lines, while MINRES is shown using dotted lines. suggesting that the bound in Theorem 3.3 is often tight for typical ill-conditioned problems. On the other hand, MINBERR often outperforms its theoretical 1/k2 rate, especially later in the convergence. The behavior of CG an… view at source ↗
Figure 3
Figure 3. Figure 3: Comparing MINBERR-NE against Richardson-NE, LSMR, and LSQR on the [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evaluating MINBERR-NE on the Small-Outlier(2000, κ, σn−1) synthetic task treated as a general non-symmetric linear system. We use dotted lines to denote MINBERR-NE running on the original task, while solid lines denote MINBERR-NE running on the matrix perturbed with Gaussian noise (in both cases, backward error is computed with respect to the original task). with a scaled identity (see Section 3), but in t… view at source ↗
read the original abstract

The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/\epsilon))$ when solving up to $\epsilon$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/\epsilon)$ complexity algorithm for solving a PSD linear system to $\epsilon$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrt\epsilon)$. We extend this approach via normal equations to solving general linear systems, for which we empirically observe $O(1/k)$ convergence. We report strong numerical performance of our algorithms on benchmark problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

Significance. If the central derivation holds, the result is significant: it supplies the first explicit conditioning-independent convergence guarantees in the backward-error metric, which the numerical-analysis literature has long preferred over forward error. The argument rests on the spectral radius of the Richardson operator being at most 1 for PSD matrices and on the standard normwise backward-error formula, yielding a clean, parameter-free 1/k bound. The construction of MINBERR and the empirical extension to general matrices via normal equations add practical value and suggest new solver design principles focused on backward-error minimization.

minor comments (3)
  1. The complexity statements (O(n²/ε) and O(n²/√ε)) should explicitly state the underlying arithmetic model and whether they count matrix-vector products or full matrix operations.
  2. The precise definition of relative backward error η = ||b − A x|| / (||A|| ||x|| + ||b||) is used in the abstract and introduction; a short dedicated paragraph or equation early in the paper would aid readers.
  3. The section reporting numerical experiments would benefit from a brief statement of the benchmark matrices, stopping criteria, and any data-exclusion rules applied.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their supportive review and recommendation for minor revision. The referee's summary correctly captures the key contributions of our work on universal convergence rates for backward error in linear system solvers.

read point-by-point responses
  1. Referee: The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

    Authors: We thank the referee for this accurate encapsulation of our results. The claims are as stated in the manuscript, and we stand by the derivations and empirical findings presented. No changes are required. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The claimed 1/k backward-error bound for Richardson iteration on PSD systems is derived directly from the spectral radius of the iteration matrix (at most 1 when the step size is 1/||A||_2) and the standard normwise backward-error formula. For eigenvalues in [0, lambda_max], the residual component stays O(1) while the iterate component grows linearly in k, producing eta <= 1/k independent of condition number. The PSD assumption is explicitly required and stated; the derivation does not reduce to any fitted parameter, self-definition, or self-citation chain. The extension to general matrices is presented only as empirical observation, not a proven claim. No load-bearing ansatz, uniqueness theorem, or renaming of known results appears in the provided text. The result is self-contained against the mathematical properties of PSD matrices and Krylov subspaces.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard properties of positive semidefinite matrices and Krylov subspace methods; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The input matrix is positive semidefinite
    Required for the unconditional 1/k backward error bound on Richardson iteration.

pith-pipeline@v0.9.0 · 5647 in / 1240 out tokens · 68099 ms · 2026-05-10T08:06:33.910136+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Well-Conditioned Oblivious Perturbations in Linear Space

    cs.DS 2026-04 unverdicted novelty 8.0

    An O(n)-randomness perturbation combining a dense deterministic pattern matrix with a non-uniform sparse dependent perturbation reduces condition numbers to O(n) for any input matrix.

Reference graph

Works this paper leans on

51 extracted references · 4 canonical work pages · cited by 1 Pith paper

  1. [1]

    Allen-Zhu and E

    Z. Allen-Zhu and E. Hazan , Optimal black-box reductions between optimization objectives , Advances in Neural Information Processing Systems, 29 (2016)

  2. [2]

    Alman, R

    J. Alman, R. Duan, V. V. Williams, Y. Xu, Z. Xu, and R. Zhou , More asymmetry yields faster matrix multiplication , in Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 2025, pp. 2005--2039

  3. [3]

    Arioli, I

    M. Arioli, I. Duff, and D. Ruiz , Stopping criteria for iterative solvers , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 138--144

  4. [4]

    Axelsson and I

    O. Axelsson and I. Kaporin , On the sublinear and superlinear rate of convergence of conjugate gradient methods , Numerical Algorithms, 25 (2000), pp. 1--22

  5. [5]

    Axelsson and G

    O. Axelsson and G. Lindskog , On the rate of convergence of the preconditioned conjugate gradient method , Numerische Mathematik, 48 (1986), pp. 499--523

  6. [6]

    D. G. Barrett and B. Dherin , Implicit gradient regularization , arXiv preprint arXiv:2009.11162, (2020)

  7. [7]

    Beerens and D

    L. Beerens and D. J. Higham , Adversarial ink: Componentwise backward error attacks on deep learning , IMA Journal of Applied Mathematics, 89 (2024), pp. 175--196

  8. [8]

    D. P. Bertsekas , Nonlinear Programming , Athena Scientific, 3rd ed., 2016, http://www.athenasc.com/nonlinbook.html

  9. [9]

    B \"u rgisser and F

    P. B \"u rgisser and F. Cucker , Smoothed analysis of moore--penrose inversion , SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2769--2783

  10. [10]

    A. W. Chou , On the optimality of krylov information , Journal of Complexity, 3 (1987), pp. 26--40

  11. [11]

    T. A. Davis and Y. Hu , The U niversity of F lorida sparse matrix collection , ACM Trans. Math. Soft., 38 (2011), pp. 1--25

  12. [12]

    Derezi \'n ski, E

    M. Derezi \'n ski, E. N. Epperly, and R. A. Meyer , The matrix-vector complexity of Ax=b , arXiv preprint arXiv:2602.04842, (2026)

  13. [13]

    Di Giovacchino, D

    S. Di Giovacchino, D. J. Higham, and K. Zygalakis , Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent , arXiv preprint arXiv:2309.02082, (2023)

  14. [14]

    d’Aspremont, D

    A. d’Aspremont, D. Scieur, and A. Taylor , Acceleration methods , Foundations and Trends in Optimization, 5 (2021), pp. 1--245

  15. [15]

    E. N. Epperly, M. Meier, and Y. Nakatsukasa , Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers , Communications on Pure and Applied Mathematics, 79 (2026), pp. 293--339

  16. [16]

    Y. Feng, T. Gao, L. Li, J.-G. Liu, and Y. Lu , Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation , arXiv preprint arXiv:1902.00635, (2019)

  17. [17]

    K. V. Fernando and B. N. Parlett , Accurate singular values and differential qd-algorithms , Numer. Math., 67 ( 1994 ), pp. 191--229

  18. [18]

    D. C.-L. Fong and M. Saunders , Lsmr: An iterative algorithm for sparse least-squares problems , SIAM Journal on Scientific Computing, 33 (2011), pp. 2950--2971

  19. [19]

    G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins University Press , 1996

  20. [20]

    G. H. Golub and C. F. Van Loan , Matrix computations , JHU press, 2013

  21. [21]

    Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp

    A. Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp. 7--63

  22. [22]

    Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997

    A. Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997

  23. [23]

    Greenbaum, V

    A. Greenbaum, V. Pt \'a k, and Z. e. k. Strako s , Any nonincreasing convergence curve is possible for GMRES , SIAM J. Matrix Anal. Appl., 17 (1996), pp. 465--469

  24. [24]

    M. R. Hestenes, E. Stiefel, et al. , Methods of conjugate gradients for solving linear systems , Journal of research of the National Bureau of Standards, 49 (1952), pp. 409--436

  25. [25]

    N. J. Higham , Accuracy and Stability of Numerical Algorithms , SIAM, Philadelphia, PA, USA, second ed., 2002

  26. [26]

    N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), pp. 347--414

  27. [27]

    E. M. Kasenally , GMBACK : a generalised minimum backward error algorithm for nonsymmetric linear systems , SIAM J. Sci. Comput., 16 (1995), pp. 698--719

  28. [28]

    E. M. Kasenally and V. Simoncini , Analysis of a minimum perturbation algorithm for nonsymmetric linear systems , SIAM J. Numer. Anal., 34 (1997), pp. 48--66

  29. [29]

    Kuczy \'n ski and H

    J. Kuczy \'n ski and H. Wo \'z niakowski , Estimating the largest eigenvalue by the power and lanczos algorithms with a random start , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094--1122

  30. [30]

    Laurent and P

    B. Laurent and P. Massart , Adaptive estimation of a quadratic functional by model selection , Ann. Stat., (2000), pp. 1302--1338

  31. [31]

    Lee and S

    C.-p. Lee and S. Wright , First-order algorithms converge faster than o (1/k) on convex problems , in International Conference on Machine Learning, PMLR, 2019, pp. 3754--3762

  32. [32]

    Levin and E

    A. Levin and E. Saff , Degree of approximation of real functions by reciprocals of real and complex polynomials , SIAM journal on mathematical analysis, 19 (1988), pp. 233--245

  33. [33]

    Musco, C

    C. Musco, C. Musco, and A. Sidford , Stability of the lanczos method for matrix function approximation , in Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2018, pp. 1605--1624

  34. [34]

    A. S. Nemirovsky and D. B. Yudin , Problem Complexity and Method Efficiency in Optimization , Wiley-Interscience, 1983, https://www2.isye.gatech.edu/ nemirovs/Nemirovskii_Yudin_1983.pdf

  35. [35]

    Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol

    Y. Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol. 269, 1983, p. 543

  36. [36]

    C. C. Paige , Error analysis of the lanczos algorithm for tridiagonalizing a symmetric matrix , IMA Journal of Applied Mathematics, 18 (1976), pp. 341--349

  37. [37]

    C. C. Paige and M. A. Saunders , LSQR : An algorithm for sparse linear equations and sparse least squares , ACM Trans. Math. Soft., 8 (1982), pp. 43--71

  38. [38]

    C. C. Paige and Z. Strakos , Residual and backward error bounds in minimum residual krylov subspace methods , SIAM J. Sci. Comput., 23 (2002), pp. 1898--1923

  39. [39]

    Peters and J

    G. Peters and J. H. Wilkinson , Inverse iteration, ill-conditioned equations and N ewton's method , SIAM Rev., 21 (1979), pp. 339--360

  40. [40]

    L. F. Richardson , Ix. the approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam , Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 210 (1911), pp. 307--357

  41. [41]

    Saad , Iterative methods for sparse linear systems , SIAM, 2003

    Y. Saad , Iterative methods for sparse linear systems , SIAM, 2003

  42. [42]

    Sankar, D

    A. Sankar, D. A. Spielman, and S.-H. Teng , Smoothed analysis of the condition numbers and growth factors of matrices , SIAM J. Matrix Anal. Appl., 28 (2006), pp. 446--476

  43. [43]

    D. A. Spielman and S.-H. Teng , Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time , Journal of the ACM (JACM), 51 (2004), pp. 385--463

  44. [44]

    L. N. Trefethen and D. Bau , Numerical Linear Algebra , SIAM, Philadelphia, 1997

  45. [45]

    Virtanen, R

    P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al. , Scipy 1.0: fundamental algorithms for scientific computing in python , Nature methods, 17 (2020), pp. 261--272

  46. [46]

    von Neumann and H

    J. von Neumann and H. Goldstine , Numerical inverting of matrices of high order , Bulletin of the American Mathematical Society, 53 (1947), pp. 1021--1099

  47. [47]

    J. H. Wilkinson , Error analysis of floating-point computation , Numerische Mathematik, 2 (1960), pp. 319--340

  48. [48]

    J. H. Wilkinson , Error analysis of direct methods of matrix inversion , Journal of the ACM (JACM), 8 (1961), pp. 281--330

  49. [49]

    J. H. Wilkinson , Rounding errors in algebraic processes , Her Majesty's Stationery Office, 1963

  50. [50]

    J. H. Wilkinson , The Algebraic Eigenvalue Problem , Oxford University Press, 1965

  51. [51]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTION or pop #1 'skip if FUNCTION new.block.checka empty 'skip 'new.block if FUNCTION field.or.null duplicate empty pop "" 'skip ...