Optimal Diagonal Preconditioning Beyond Worst-Case Conditioning: Theory and Practice of Omega Scaling
Pith reviewed 2026-05-18 11:51 UTC · model grok-4.3
The pith
Omega-optimal diagonal preconditioners are cheaper to compute and improve performance of PCG and LSQR more than kappa-optimal ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This paper provides the first unified explicit characterization of optimality conditions for both kappa and omega based diagonal preconditioning. It shows that the kappa problem can be solved via an affine pseudoconvex reformulation using a simple subgradient method that is more scalable than SDP approaches. For omega, classical preconditioners are optimal and matrix balancing monotonically decreases it to a stationary point. Numerical tests reveal that omega-optimal scaling outperforms kappa-optimal in practical iterative method performance.
What carries the argument
The affine-based pseudoconvex reformulation for the kappa-optimal diagonal preconditioning problem, which reduces the search to an n-vector and ensures all stationary points are global minima.
If this is right
- Kappa-optimal preconditioners can be computed more efficiently and accurately using the subgradient method than with SDP solvers.
- Jacobi and row/column normalization preconditioners are exactly omega-optimal.
- Matrix balancing schemes converge to omega-optimal scalings.
- Omega-optimal preconditioners reduce iteration counts in PCG and LSQR more effectively than kappa-optimal ones in tested cases.
- Sequential application of omega scaling after kappa-optimal preconditioning yields additional reductions in PCG iterations.
Where Pith is reading between the lines
- Omega scaling may serve as an inexpensive post-processing step to enhance any existing preconditioner.
- The practical advantage of omega may stem from it optimizing average rather than worst-case behavior, which aligns better with typical linear system spectra in applications.
- These methods could extend to block-diagonal preconditioners in parallel computing settings for larger problems.
- Testing on ill-conditioned systems from specific domains like finite element analysis would verify if the performance gains hold broadly.
Load-bearing premise
The numerical experiments on test problems are representative of the general linear systems where these preconditioners would be applied in practice.
What would settle it
Observing a linear system where the number of PCG iterations required after kappa-optimal preconditioning is lower than after omega-optimal preconditioning, or where the subgradient method converges to a point with higher condition number than an SDP solution.
read the original abstract
We study optimal diagonal preconditioning using the classical worst-case $\kappa$-condition number and the averaging-based $\omega$-condition number. For the $\kappa$-optimal preconditioning problem, we derive an affine-based pseudoconvex reformulation with three key advantages: all stationary points are global minima, subgradients are inexpensive to compute, and the optimization variable is an $n$-dimensional vector rather than an $n\times n$ matrix as in semidefinite programming (SDP) approaches. We develop a simple and highly efficient subgradient method, with convergence guarantees, for solving this pseudoconvex formulation that is substantially more scalable and accurate than existing SDP-based methods. For the $\omega$-condition number, we provide explicit characterizations of optimal diagonal and block diagonal preconditioners. In particular, we show that several classical preconditioners, including Jacobi and row/column normalization, are $\omega$-optimal, and that matrix balancing schemes monotonically reduce $\omega$ and converge to stationary points of the two-sided problem. To the best of our knowledge, this is the first unified and explicit characterization of optimality conditions for both $\kappa$ and $\omega$-based preconditioning. Our numerical experiments further reveal a striking phenomenon: although $\kappa$-optimal preconditioners achieve stronger reductions in the worst-case condition number, $\omega$-optimal preconditioners are substantially cheaper to compute and yield better performance for iterative methods such as preconditioned conjugate gradient (PCG) and least squares method (LSQR). Moreover, applying $\omega$-optimal scaling to linear systems that are already $\kappa$-optimally preconditioned leads to further improvements in PCG iterations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies optimal diagonal preconditioning for linear systems using both the classical worst-case κ-condition number and the averaging-based ω-condition number. For κ-optimal diagonal scaling, it derives an affine pseudoconvex reformulation that reduces the problem to optimizing an n-dimensional vector (rather than an n×n matrix via SDP), proves that all stationary points are global minima, and develops a simple subgradient method with convergence guarantees that is more scalable than prior SDP approaches. For the ω-condition number, it gives explicit characterizations of optimal diagonal and block-diagonal preconditioners, showing that classical methods such as Jacobi and row/column normalization are ω-optimal and that matrix balancing monotonically decreases ω. Numerical experiments on test problems indicate that, although κ-optimal scalings reduce the worst-case condition number more, ω-optimal scalings are far cheaper to compute and yield better iteration counts for PCG and LSQR; moreover, applying ω-scaling on top of an already κ-preconditioned system produces further gains.
Significance. If the central claims hold, the work supplies the first unified explicit optimality theory for both κ- and ω-based diagonal preconditioning and demonstrates a practical advantage for the cheaper ω-optimal scalings in iterative solvers. The pseudoconvex reformulation, inexpensive subgradients, and convergence guarantees for the κ-problem, together with the closed-form characterizations for ω (including monotonicity of balancing), are clear theoretical strengths. The empirical observation that ω-scaling can improve already κ-optimal systems is a noteworthy practical finding that could affect preconditioner choice in numerical linear algebra.
major comments (1)
- [§5] §5 (Numerical Experiments): the test problems are referred to only generically as “test problems” with no stated selection criteria, matrix sources (e.g., SuiteSparse, random generation parameters), size/sparsity/conditioning ranges, or diversity metrics. Because the headline practical claim—that ω-optimal preconditioners are substantially cheaper and outperform κ-optimal ones for PCG/LSQR, and that ω-scaling further improves κ-preconditioned systems—rests entirely on these results, the lack of a documented, representative benchmark set leaves open the possibility that the observed advantages are specific to the chosen instances rather than general.
minor comments (2)
- [§1] The abstract and §1 could more explicitly contrast the computational cost of the new subgradient method versus existing SDP solvers (e.g., by reporting flop counts or wall-clock times for the same matrices).
- [§2] Notation for the two-sided ω-problem and the distinction between one-sided and two-sided optimality could be introduced earlier to aid readers unfamiliar with averaging-based condition numbers.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of our manuscript and the constructive suggestion regarding the numerical experiments. We address the major comment below and will incorporate the requested details in the revision.
read point-by-point responses
-
Referee: [§5] §5 (Numerical Experiments): the test problems are referred to only generically as “test problems” with no stated selection criteria, matrix sources (e.g., SuiteSparse, random generation parameters), size/sparsity/conditioning ranges, or diversity metrics. Because the headline practical claim—that ω-optimal preconditioners are substantially cheaper and outperform κ-optimal ones for PCG/LSQR, and that ω-scaling further improves κ-preconditioned systems—rests entirely on these results, the lack of a documented, representative benchmark set leaves open the possibility that the observed advantages are specific to the chosen instances rather than general.
Authors: We agree that a more explicit description of the test problems would strengthen the presentation and better support the practical claims. In the revised manuscript we will add a dedicated subsection in §5 that specifies the matrix sources (including SuiteSparse identifiers and generation parameters for random instances), reports ranges for dimensions, sparsity, and conditioning, and outlines the selection criteria used to ensure diversity and relevance to typical linear systems arising in iterative solvers. revision: yes
Circularity Check
No significant circularity; derivations are self-contained from definitions.
full rationale
The paper derives an affine-based pseudoconvex reformulation for the κ-optimal problem with stated advantages (all stationary points global, inexpensive subgradients, vector variable) and provides explicit characterizations for ω-optimal diagonal/block-diagonal preconditioners, showing classical methods like Jacobi are ω-optimal and balancing schemes reduce ω. These steps follow directly from the definitions of the respective condition numbers without reducing to fitted inputs, self-definitions, or load-bearing self-citations. Numerical comparisons of practical performance are presented as separate empirical observations. No quoted steps exhibit the enumerated circular patterns; the chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Properties of subgradients and pseudoconvex functions allow stationary points to be global minima in the reformulated problem.
- domain assumption The ω-condition number admits explicit optimality conditions for diagonal and block-diagonal scalings.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We study optimal diagonal preconditioning using the classical worst-case κ-condition number and the averaging-based ω-condition number... explicit characterizations of optimal diagonal and block diagonal preconditioners... Jacobi and row/column normalization are ω-optimal
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2.7... A is κ-optimally diagonally preconditioned iff x1 ⊙ x1 = xn ⊙ xn
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
The Role of Symmetry in Optimizing Overparameterized Networks
Overparameterization introduces symmetries that precondition the Hessian for better-conditioned minima and raise the reachability of global minima from typical starts in neural network loss landscapes.
-
The Role of Symmetry in Optimizing Overparameterized Networks
Overparameterization adds symmetries that precondition the Hessian for better minima and increase the probability mass of global minima near typical initializations.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.