arxiv: 2604.16075 · v1 · submitted 2026-04-17 · 🧮 math.NA · cs.DS· cs.LG· cs.NA· math.OC

Towards Universal Convergence of Backward Error in Linear System Solvers

Micha{\l} Derezi\'nski , Yuji Nakatsukasa , Elizaveta Rebrova This is my paper

Pith reviewed 2026-05-10 08:06 UTC · model grok-4.3

classification 🧮 math.NA cs.DScs.LGcs.NAmath.OC

keywords backward errorRichardson iterationpositive semidefinite matricesiterative solverslinear systemsKrylov methodsconvergence ratesnumerical linear algebra

0 comments

The pith

Richardson iteration achieves at most 1/k relative backward error after k steps on any positive semidefinite linear system, independent of condition number.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the classical Richardson iteration for solving Ax = b produces iterates whose relative backward error is bounded by 1/k after exactly k steps whenever A is positive semidefinite. Backward error measures the smallest relative perturbation to A and b that would make the current iterate an exact solution, which is often the quantity that matters for stability and practical accuracy. Because the bound holds without any dependence on the eigenvalues or conditioning of A, the method yields an O(n^{2}/epsilon) algorithm that reaches any target backward error epsilon. The authors also introduce MINBERR, which minimizes backward error directly inside a Krylov subspace and obtains a faster 1/k^{2} rate, and they extend the approach to general matrices via normal equations with empirically observed 1/k convergence.

Core claim

For any positive semidefinite matrix A and vector b, the k-th Richardson iterate x_k satisfies that the relative backward error is at most 1/k: there exist perturbations Delta A and Delta b with ||Delta A||/||A|| + ||Delta b||/||b|| <= 1/k such that (A + Delta A)x_k = b + Delta b. This bound is derived directly from the residual contraction properties of the iteration on PSD matrices and does not involve the condition number. The same framework yields an O(1/k^{2}) rate when the iterate is chosen to minimize backward error over the Krylov subspace, which is realized by the MINBERR algorithm.

What carries the argument

The residual contraction of Richardson iteration on PSD matrices that directly controls the minimal perturbation size making the current approximate solution exact.

Load-bearing premise

The matrix must be positive semidefinite for the unconditional 1/k backward error bound to hold.

What would settle it

Compute the relative backward error of the k-th Richardson iterate on a positive semidefinite matrix whose condition number exceeds k; if that error ever exceeds 1/k for any such matrix, the claim is false.

Figures

Figures reproduced from arXiv: 2604.16075 by Elizaveta Rebrova, Micha{\l} Derezi\'nski, Yuji Nakatsukasa.

**Figure 2.** Figure 2: Comparing MINRES with MINBERR on the Small-Outlier(2000, κ, σn−1) synthetic task, while varying κ and σn−1. MINBERR is denoted by solid lines, while MINRES is shown using dotted lines. suggesting that the bound in Theorem 3.3 is often tight for typical ill-conditioned problems. On the other hand, MINBERR often outperforms its theoretical 1/k2 rate, especially later in the convergence. The behavior of CG an… view at source ↗

**Figure 3.** Figure 3: Comparing MINBERR-NE against Richardson-NE, LSMR, and LSQR on the [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluating MINBERR-NE on the Small-Outlier(2000, κ, σn−1) synthetic task treated as a general non-symmetric linear system. We use dotted lines to denote MINBERR-NE running on the original task, while solid lines denote MINBERR-NE running on the matrix perturbed with Gaussian noise (in both cases, backward error is computed with respect to the original task). with a scaled identity (see Section 3), but in t… view at source ↗

read the original abstract

The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/\epsilon))$ when solving up to $\epsilon$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/\epsilon)$ complexity algorithm for solving a PSD linear system to $\epsilon$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrt\epsilon)$. We extend this approach via normal equations to solving general linear systems, for which we empirically observe $O(1/k)$ convergence. We report strong numerical performance of our algorithms on benchmark problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Richardson iteration on PSD systems gets 1/k backward error independent of condition number, a direct spectral consequence that leads to O(n^2/eps) solvers.

read the letter

The central observation is that Richardson iteration with step 1/||A||_2 produces at most 1/k relative backward error after k steps on any PSD matrix, no matter how large the condition number. This follows because components in near-null directions keep a residual of order 1 while the corresponding part of the iterate grows linearly in k, so the standard backward-error formula eta = ||b - A x|| / (||A|| ||x|| + ||b||) decays as 1/k. The paper uses this to claim an O(n^2/eps) algorithm for epsilon backward error on PSD systems, which is the main practical payoff since forward-error rates usually pay for kappa(A).

Referee Report

0 major / 3 minor

Summary. The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

Significance. If the central derivation holds, the result is significant: it supplies the first explicit conditioning-independent convergence guarantees in the backward-error metric, which the numerical-analysis literature has long preferred over forward error. The argument rests on the spectral radius of the Richardson operator being at most 1 for PSD matrices and on the standard normwise backward-error formula, yielding a clean, parameter-free 1/k bound. The construction of MINBERR and the empirical extension to general matrices via normal equations add practical value and suggest new solver design principles focused on backward-error minimization.

minor comments (3)

The complexity statements (O(n²/ε) and O(n²/√ε)) should explicitly state the underlying arithmetic model and whether they count matrix-vector products or full matrix operations.
The precise definition of relative backward error η = ||b − A x|| / (||A|| ||x|| + ||b||) is used in the abstract and introduction; a short dedicated paragraph or equation early in the paper would aid readers.
The section reporting numerical experiments would benefit from a brief statement of the benchmark matrices, stopping criteria, and any data-exclusion rules applied.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their supportive review and recommendation for minor revision. The referee's summary correctly captures the key contributions of our work on universal convergence rates for backward error in linear system solvers.

read point-by-point responses

Referee: The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

Authors: We thank the referee for this accurate encapsulation of our results. The claims are as stated in the manuscript, and we stand by the derivations and empirical findings presented. No changes are required. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The claimed 1/k backward-error bound for Richardson iteration on PSD systems is derived directly from the spectral radius of the iteration matrix (at most 1 when the step size is 1/||A||_2) and the standard normwise backward-error formula. For eigenvalues in [0, lambda_max], the residual component stays O(1) while the iterate component grows linearly in k, producing eta <= 1/k independent of condition number. The PSD assumption is explicitly required and stated; the derivation does not reduce to any fitted parameter, self-definition, or self-citation chain. The extension to general matrices is presented only as empirical observation, not a proven claim. No load-bearing ansatz, uniqueness theorem, or renaming of known results appears in the provided text. The result is self-contained against the mathematical properties of PSD matrices and Krylov subspaces.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard properties of positive semidefinite matrices and Krylov subspace methods; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The input matrix is positive semidefinite
Required for the unconditional 1/k backward error bound on Richardson iteration.

pith-pipeline@v0.9.0 · 5647 in / 1240 out tokens · 68099 ms · 2026-05-10T08:06:33.910136+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Well-Conditioned Oblivious Perturbations in Linear Space
cs.DS 2026-04 unverdicted novelty 8.0

An O(n)-randomness perturbation combining a dense deterministic pattern matrix with a non-uniform sparse dependent perturbation reduces condition numbers to O(n) for any input matrix.

Reference graph

Works this paper leans on

51 extracted references · 4 canonical work pages · cited by 1 Pith paper

[1]

Allen-Zhu and E

Z. Allen-Zhu and E. Hazan , Optimal black-box reductions between optimization objectives , Advances in Neural Information Processing Systems, 29 (2016)

2016
[2]

Alman, R

J. Alman, R. Duan, V. V. Williams, Y. Xu, Z. Xu, and R. Zhou , More asymmetry yields faster matrix multiplication , in Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 2025, pp. 2005--2039

2025
[3]

Arioli, I

M. Arioli, I. Duff, and D. Ruiz , Stopping criteria for iterative solvers , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 138--144

1992
[4]

Axelsson and I

O. Axelsson and I. Kaporin , On the sublinear and superlinear rate of convergence of conjugate gradient methods , Numerical Algorithms, 25 (2000), pp. 1--22

2000
[5]

Axelsson and G

O. Axelsson and G. Lindskog , On the rate of convergence of the preconditioned conjugate gradient method , Numerische Mathematik, 48 (1986), pp. 499--523

1986
[6]

D. G. Barrett and B. Dherin , Implicit gradient regularization , arXiv preprint arXiv:2009.11162, (2020)

work page arXiv 2009
[7]

Beerens and D

L. Beerens and D. J. Higham , Adversarial ink: Componentwise backward error attacks on deep learning , IMA Journal of Applied Mathematics, 89 (2024), pp. 175--196

2024
[8]

D. P. Bertsekas , Nonlinear Programming , Athena Scientific, 3rd ed., 2016, http://www.athenasc.com/nonlinbook.html

2016
[9]

B \"u rgisser and F

P. B \"u rgisser and F. Cucker , Smoothed analysis of moore--penrose inversion , SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2769--2783

2010
[10]

A. W. Chou , On the optimality of krylov information , Journal of Complexity, 3 (1987), pp. 26--40

1987
[11]

T. A. Davis and Y. Hu , The U niversity of F lorida sparse matrix collection , ACM Trans. Math. Soft., 38 (2011), pp. 1--25

2011
[12]

Derezi \'n ski, E

M. Derezi \'n ski, E. N. Epperly, and R. A. Meyer , The matrix-vector complexity of Ax=b , arXiv preprint arXiv:2602.04842, (2026)

work page arXiv 2026
[13]

Di Giovacchino, D

S. Di Giovacchino, D. J. Higham, and K. Zygalakis , Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent , arXiv preprint arXiv:2309.02082, (2023)

work page arXiv 2023
[14]

d’Aspremont, D

A. d’Aspremont, D. Scieur, and A. Taylor , Acceleration methods , Foundations and Trends in Optimization, 5 (2021), pp. 1--245

2021
[15]

E. N. Epperly, M. Meier, and Y. Nakatsukasa , Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers , Communications on Pure and Applied Mathematics, 79 (2026), pp. 293--339

2026
[16]

Y. Feng, T. Gao, L. Li, J.-G. Liu, and Y. Lu , Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation , arXiv preprint arXiv:1902.00635, (2019)

work page arXiv 1902
[17]

K. V. Fernando and B. N. Parlett , Accurate singular values and differential qd-algorithms , Numer. Math., 67 ( 1994 ), pp. 191--229

1994
[18]

D. C.-L. Fong and M. Saunders , Lsmr: An iterative algorithm for sparse least-squares problems , SIAM Journal on Scientific Computing, 33 (2011), pp. 2950--2971

2011
[19]

G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins University Press , 1996

1996
[20]

G. H. Golub and C. F. Van Loan , Matrix computations , JHU press, 2013

2013
[21]

Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp

A. Greenbaum , Behavior of slightly perturbed lanczos and conjugate-gradient recurrences , Linear Algebra and its Applications, 113 (1989), pp. 7--63

1989
[22]

Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997

A. Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, PA, USA, 1997

1997
[23]

Greenbaum, V

A. Greenbaum, V. Pt \'a k, and Z. e. k. Strako s , Any nonincreasing convergence curve is possible for GMRES , SIAM J. Matrix Anal. Appl., 17 (1996), pp. 465--469

1996
[24]

M. R. Hestenes, E. Stiefel, et al. , Methods of conjugate gradients for solving linear systems , Journal of research of the National Bureau of Standards, 49 (1952), pp. 409--436

1952
[25]

N. J. Higham , Accuracy and Stability of Numerical Algorithms , SIAM, Philadelphia, PA, USA, second ed., 2002

2002
[26]

N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), pp. 347--414

2022
[27]

E. M. Kasenally , GMBACK : a generalised minimum backward error algorithm for nonsymmetric linear systems , SIAM J. Sci. Comput., 16 (1995), pp. 698--719

1995
[28]

E. M. Kasenally and V. Simoncini , Analysis of a minimum perturbation algorithm for nonsymmetric linear systems , SIAM J. Numer. Anal., 34 (1997), pp. 48--66

1997
[29]

Kuczy \'n ski and H

J. Kuczy \'n ski and H. Wo \'z niakowski , Estimating the largest eigenvalue by the power and lanczos algorithms with a random start , SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094--1122

1992
[30]

Laurent and P

B. Laurent and P. Massart , Adaptive estimation of a quadratic functional by model selection , Ann. Stat., (2000), pp. 1302--1338

2000
[31]

Lee and S

C.-p. Lee and S. Wright , First-order algorithms converge faster than o (1/k) on convex problems , in International Conference on Machine Learning, PMLR, 2019, pp. 3754--3762

2019
[32]

Levin and E

A. Levin and E. Saff , Degree of approximation of real functions by reciprocals of real and complex polynomials , SIAM journal on mathematical analysis, 19 (1988), pp. 233--245

1988
[33]

Musco, C

C. Musco, C. Musco, and A. Sidford , Stability of the lanczos method for matrix function approximation , in Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2018, pp. 1605--1624

2018
[34]

A. S. Nemirovsky and D. B. Yudin , Problem Complexity and Method Efficiency in Optimization , Wiley-Interscience, 1983, https://www2.isye.gatech.edu/ nemirovs/Nemirovskii_Yudin_1983.pdf

1983
[35]

Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol

Y. Nesterov , A method for solving the convex programming problem with convergence rate o (1/k2) , in Dokl akad nauk Sssr, vol. 269, 1983, p. 543

1983
[36]

C. C. Paige , Error analysis of the lanczos algorithm for tridiagonalizing a symmetric matrix , IMA Journal of Applied Mathematics, 18 (1976), pp. 341--349

1976
[37]

C. C. Paige and M. A. Saunders , LSQR : An algorithm for sparse linear equations and sparse least squares , ACM Trans. Math. Soft., 8 (1982), pp. 43--71

1982
[38]

C. C. Paige and Z. Strakos , Residual and backward error bounds in minimum residual krylov subspace methods , SIAM J. Sci. Comput., 23 (2002), pp. 1898--1923

2002
[39]

Peters and J

G. Peters and J. H. Wilkinson , Inverse iteration, ill-conditioned equations and N ewton's method , SIAM Rev., 21 (1979), pp. 339--360

1979
[40]

L. F. Richardson , Ix. the approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam , Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 210 (1911), pp. 307--357

1911
[41]

Saad , Iterative methods for sparse linear systems , SIAM, 2003

Y. Saad , Iterative methods for sparse linear systems , SIAM, 2003

2003
[42]

Sankar, D

A. Sankar, D. A. Spielman, and S.-H. Teng , Smoothed analysis of the condition numbers and growth factors of matrices , SIAM J. Matrix Anal. Appl., 28 (2006), pp. 446--476

2006
[43]

D. A. Spielman and S.-H. Teng , Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time , Journal of the ACM (JACM), 51 (2004), pp. 385--463

2004
[44]

L. N. Trefethen and D. Bau , Numerical Linear Algebra , SIAM, Philadelphia, 1997

1997
[45]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al. , Scipy 1.0: fundamental algorithms for scientific computing in python , Nature methods, 17 (2020), pp. 261--272

2020
[46]

von Neumann and H

J. von Neumann and H. Goldstine , Numerical inverting of matrices of high order , Bulletin of the American Mathematical Society, 53 (1947), pp. 1021--1099

1947
[47]

J. H. Wilkinson , Error analysis of floating-point computation , Numerische Mathematik, 2 (1960), pp. 319--340

1960
[48]

J. H. Wilkinson , Error analysis of direct methods of matrix inversion , Journal of the ACM (JACM), 8 (1961), pp. 281--330

1961
[49]

J. H. Wilkinson , Rounding errors in algebraic processes , Her Majesty's Stationery Office, 1963

1963
[50]

J. H. Wilkinson , The Algebraic Eigenvalue Problem , Oxford University Press, 1965

1965
[51]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTION or pop #1 'skip if FUNCTION new.block.checka empty 'skip 'new.block if FUNCTION field.or.null duplicate empty pop "" 'skip ...