Towards Universal Convergence of Backward Error in Linear System Solvers
Pith reviewed 2026-05-10 08:06 UTC · model grok-4.3
The pith
Richardson iteration achieves at most 1/k relative backward error after k steps on any positive semidefinite linear system, independent of condition number.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any positive semidefinite matrix A and vector b, the k-th Richardson iterate x_k satisfies that the relative backward error is at most 1/k: there exist perturbations Delta A and Delta b with ||Delta A||/||A|| + ||Delta b||/||b|| <= 1/k such that (A + Delta A)x_k = b + Delta b. This bound is derived directly from the residual contraction properties of the iteration on PSD matrices and does not involve the condition number. The same framework yields an O(1/k^{2}) rate when the iterate is chosen to minimize backward error over the Krylov subspace, which is realized by the MINBERR algorithm.
What carries the argument
The residual contraction of Richardson iteration on PSD matrices that directly controls the minimal perturbation size making the current approximate solution exact.
Load-bearing premise
The matrix must be positive semidefinite for the unconditional 1/k backward error bound to hold.
What would settle it
Compute the relative backward error of the k-th Richardson iterate on a positive semidefinite matrix whose condition number exceeds k; if that error ever exceeds 1/k for any such matrix, the claim is false.
Figures
read the original abstract
The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/\epsilon))$ when solving up to $\epsilon$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/\epsilon)$ complexity algorithm for solving a PSD linear system to $\epsilon$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrt\epsilon)$. We extend this approach via normal equations to solving general linear systems, for which we empirically observe $O(1/k)$ convergence. We report strong numerical performance of our algorithms on benchmark problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.
Significance. If the central derivation holds, the result is significant: it supplies the first explicit conditioning-independent convergence guarantees in the backward-error metric, which the numerical-analysis literature has long preferred over forward error. The argument rests on the spectral radius of the Richardson operator being at most 1 for PSD matrices and on the standard normwise backward-error formula, yielding a clean, parameter-free 1/k bound. The construction of MINBERR and the empirical extension to general matrices via normal equations add practical value and suggest new solver design principles focused on backward-error minimization.
minor comments (3)
- The complexity statements (O(n²/ε) and O(n²/√ε)) should explicitly state the underlying arithmetic model and whether they count matrix-vector products or full matrix operations.
- The precise definition of relative backward error η = ||b − A x|| / (||A|| ||x|| + ||b||) is used in the abstract and introduction; a short dedicated paragraph or equation early in the paper would aid readers.
- The section reporting numerical experiments would benefit from a brief statement of the benchmark matrices, stopping criteria, and any data-exclusion rules applied.
Simulated Author's Rebuttal
We are grateful to the referee for their supportive review and recommendation for minor revision. The referee's summary correctly captures the key contributions of our work on universal convergence rates for backward error in linear system solvers.
read point-by-point responses
-
Referee: The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.
Authors: We thank the referee for this accurate encapsulation of our results. The claims are as stated in the manuscript, and we stand by the derivations and empirical findings presented. No changes are required. revision: no
Circularity Check
No significant circularity identified
full rationale
The claimed 1/k backward-error bound for Richardson iteration on PSD systems is derived directly from the spectral radius of the iteration matrix (at most 1 when the step size is 1/||A||_2) and the standard normwise backward-error formula. For eigenvalues in [0, lambda_max], the residual component stays O(1) while the iterate component grows linearly in k, producing eta <= 1/k independent of condition number. The PSD assumption is explicitly required and stated; the derivation does not reduce to any fitted parameter, self-definition, or self-citation chain. The extension to general matrices is presented only as empirical observation, not a proven claim. No load-bearing ansatz, uniqueness theorem, or renaming of known results appears in the provided text. The result is self-contained against the mathematical properties of PSD matrices and Krylov subspaces.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The input matrix is positive semidefinite
Forward citations
Cited by 1 Pith paper
-
Well-Conditioned Oblivious Perturbations in Linear Space
An O(n)-randomness perturbation combining a dense deterministic pattern matrix with a non-uniform sparse dependent perturbation reduces condition numbers to O(n) for any input matrix.
Reference graph
Works this paper leans on
-
[1]
Allen-Zhu and E
Z. Allen-Zhu and E. Hazan , Optimal black-box reductions between optimization objectives , Advances in Neural Information Processing Systems, 29 (2016)
2016
-
[2]
Alman, R
J. Alman, R. Duan, V. V. Williams, Y. Xu, Z. Xu, and R. Zhou , More asymmetry yields faster matrix multiplication , in Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 2025, pp. 2005--2039
2025
-
[3]
Arioli, I
M. Arioli, I. Duff, and D. Ruiz , Stopping criteria for iterative solvers , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 138--144
1992
-
[4]
Axelsson and I
O. Axelsson and I. Kaporin , On the sublinear and superlinear rate of convergence of conjugate gradient methods , Numerical Algorithms, 25 (2000), pp. 1--22
2000
-
[5]
Axelsson and G
O. Axelsson and G. Lindskog , On the rate of convergence of the preconditioned conjugate gradient method , Numerische Mathematik, 48 (1986), pp. 499--523
1986
- [6]
-
[7]
Beerens and D
L. Beerens and D. J. Higham , Adversarial ink: Componentwise backward error attacks on deep learning , IMA Journal of Applied Mathematics, 89 (2024), pp. 175--196
2024
-
[8]
D. P. Bertsekas , Nonlinear Programming , Athena Scientific, 3rd ed., 2016, http://www.athenasc.com/nonlinbook.html
2016
-
[9]
B \"u rgisser and F
P. B \"u rgisser and F. Cucker , Smoothed analysis of moore--penrose inversion , SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2769--2783
2010
-
[10]
A. W. Chou , On the optimality of krylov information , Journal of Complexity, 3 (1987), pp. 26--40
1987
-
[11]
T. A. Davis and Y. Hu , The U niversity of F lorida sparse matrix collection , ACM Trans. Math. Soft., 38 (2011), pp. 1--25
2011
-
[12]
M. Derezi \'n ski, E. N. Epperly, and R. A. Meyer , The matrix-vector complexity of Ax=b , arXiv preprint arXiv:2602.04842, (2026)
-
[13]
S. Di Giovacchino, D. J. Higham, and K. Zygalakis , Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent , arXiv preprint arXiv:2309.02082, (2023)
-
[14]
d’Aspremont, D
A. d’Aspremont, D. Scieur, and A. Taylor , Acceleration methods , Foundations and Trends in Optimization, 5 (2021), pp. 1--245
2021
-
[15]
E. N. Epperly, M. Meier, and Y. Nakatsukasa , Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers , Communications on Pure and Applied Mathematics, 79 (2026), pp. 293--339
2026
- [16]
-
[17]
K. V. Fernando and B. N. Parlett , Accurate singular values and differential qd-algorithms , Numer. Math., 67 ( 1994 ), pp. 191--229
1994
-
[18]
D. C.-L. Fong and M. Saunders , Lsmr: An iterative algorithm for sparse least-squares problems , SIAM Journal on Scientific Computing, 33 (2011), pp. 2950--2971
2011
-
[19]
G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins University Press , 1996
1996
-
[20]
G. H. Golub and C. F. Van Loan , Matrix computations , JHU press, 2013
2013
-
[21]
Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp
A. Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp. 7--63
1989
-
[22]
Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997
A. Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997
1997
-
[23]
Greenbaum, V
A. Greenbaum, V. Pt \'a k, and Z. e. k. Strako s , Any nonincreasing convergence curve is possible for GMRES , SIAM J. Matrix Anal. Appl., 17 (1996), pp. 465--469
1996
-
[24]
M. R. Hestenes, E. Stiefel, et al. , Methods of conjugate gradients for solving linear systems , Journal of research of the National Bureau of Standards, 49 (1952), pp. 409--436
1952
-
[25]
N. J. Higham , Accuracy and Stability of Numerical Algorithms , SIAM, Philadelphia, PA, USA, second ed., 2002
2002
-
[26]
N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), pp. 347--414
2022
-
[27]
E. M. Kasenally , GMBACK : a generalised minimum backward error algorithm for nonsymmetric linear systems , SIAM J. Sci. Comput., 16 (1995), pp. 698--719
1995
-
[28]
E. M. Kasenally and V. Simoncini , Analysis of a minimum perturbation algorithm for nonsymmetric linear systems , SIAM J. Numer. Anal., 34 (1997), pp. 48--66
1997
-
[29]
Kuczy \'n ski and H
J. Kuczy \'n ski and H. Wo \'z niakowski , Estimating the largest eigenvalue by the power and lanczos algorithms with a random start , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094--1122
1992
-
[30]
Laurent and P
B. Laurent and P. Massart , Adaptive estimation of a quadratic functional by model selection , Ann. Stat., (2000), pp. 1302--1338
2000
-
[31]
Lee and S
C.-p. Lee and S. Wright , First-order algorithms converge faster than o (1/k) on convex problems , in International Conference on Machine Learning, PMLR, 2019, pp. 3754--3762
2019
-
[32]
Levin and E
A. Levin and E. Saff , Degree of approximation of real functions by reciprocals of real and complex polynomials , SIAM journal on mathematical analysis, 19 (1988), pp. 233--245
1988
-
[33]
Musco, C
C. Musco, C. Musco, and A. Sidford , Stability of the lanczos method for matrix function approximation , in Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2018, pp. 1605--1624
2018
-
[34]
A. S. Nemirovsky and D. B. Yudin , Problem Complexity and Method Efficiency in Optimization , Wiley-Interscience, 1983, https://www2.isye.gatech.edu/ nemirovs/Nemirovskii_Yudin_1983.pdf
1983
-
[35]
Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol
Y. Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol. 269, 1983, p. 543
1983
-
[36]
C. C. Paige , Error analysis of the lanczos algorithm for tridiagonalizing a symmetric matrix , IMA Journal of Applied Mathematics, 18 (1976), pp. 341--349
1976
-
[37]
C. C. Paige and M. A. Saunders , LSQR : An algorithm for sparse linear equations and sparse least squares , ACM Trans. Math. Soft., 8 (1982), pp. 43--71
1982
-
[38]
C. C. Paige and Z. Strakos , Residual and backward error bounds in minimum residual krylov subspace methods , SIAM J. Sci. Comput., 23 (2002), pp. 1898--1923
2002
-
[39]
Peters and J
G. Peters and J. H. Wilkinson , Inverse iteration, ill-conditioned equations and N ewton's method , SIAM Rev., 21 (1979), pp. 339--360
1979
-
[40]
L. F. Richardson , Ix. the approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam , Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 210 (1911), pp. 307--357
1911
-
[41]
Saad , Iterative methods for sparse linear systems , SIAM, 2003
Y. Saad , Iterative methods for sparse linear systems , SIAM, 2003
2003
-
[42]
Sankar, D
A. Sankar, D. A. Spielman, and S.-H. Teng , Smoothed analysis of the condition numbers and growth factors of matrices , SIAM J. Matrix Anal. Appl., 28 (2006), pp. 446--476
2006
-
[43]
D. A. Spielman and S.-H. Teng , Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time , Journal of the ACM (JACM), 51 (2004), pp. 385--463
2004
-
[44]
L. N. Trefethen and D. Bau , Numerical Linear Algebra , SIAM, Philadelphia, 1997
1997
-
[45]
Virtanen, R
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al. , Scipy 1.0: fundamental algorithms for scientific computing in python , Nature methods, 17 (2020), pp. 261--272
2020
-
[46]
von Neumann and H
J. von Neumann and H. Goldstine , Numerical inverting of matrices of high order , Bulletin of the American Mathematical Society, 53 (1947), pp. 1021--1099
1947
-
[47]
J. H. Wilkinson , Error analysis of floating-point computation , Numerische Mathematik, 2 (1960), pp. 319--340
1960
-
[48]
J. H. Wilkinson , Error analysis of direct methods of matrix inversion , Journal of the ACM (JACM), 8 (1961), pp. 281--330
1961
-
[49]
J. H. Wilkinson , Rounding errors in algebraic processes , Her Majesty's Stationery Office, 1963
1963
-
[50]
J. H. Wilkinson , The Algebraic Eigenvalue Problem , Oxford University Press, 1965
1965
-
[51]
write newline
" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTION or pop #1 'skip if FUNCTION new.block.checka empty 'skip 'new.block if FUNCTION field.or.null duplicate empty pop "" 'skip ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.