Optimal Diagonal Preconditioning Beyond Worst-Case Conditioning: Theory and Practice of Omega Scaling

Arnesh Sujanani; David Torregrosa-Bel\'en; Henry Wolkowicz; Saeed Ghadimi; Woosuk L. Jung

arxiv: 2509.23439 · v2 · submitted 2025-09-27 · 🧮 math.OC · cs.LG· cs.NA· math.NA

Optimal Diagonal Preconditioning Beyond Worst-Case Conditioning: Theory and Practice of Omega Scaling

Saeed Ghadimi , Woosuk L. Jung , Arnesh Sujanani , David Torregrosa-Bel\'en , Henry Wolkowicz This is my paper

Pith reviewed 2026-05-18 11:51 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NA

keywords optimal diagonal preconditioningkappa condition numberomega condition numberpseudoconvex optimizationpreconditioned conjugate gradientmatrix balancingsubgradient methodJacobi preconditioner

0 comments

The pith

Omega-optimal diagonal preconditioners are cheaper to compute and improve performance of PCG and LSQR more than kappa-optimal ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops theory and methods for optimal diagonal preconditioning under two different condition number measures. For the worst-case kappa condition number, it reformulates the problem as a pseudoconvex optimization over vectors with a scalable subgradient solver that finds global optima. For the averaging-based omega condition number, it gives explicit characterizations showing that standard methods like Jacobi preconditioning and row normalization are optimal, and that balancing schemes converge to optima. Experiments show that despite kappa giving better worst-case reduction, omega is faster to compute and leads to fewer iterations in iterative solvers, with further gains when omega is applied after kappa.

Core claim

This paper provides the first unified explicit characterization of optimality conditions for both kappa and omega based diagonal preconditioning. It shows that the kappa problem can be solved via an affine pseudoconvex reformulation using a simple subgradient method that is more scalable than SDP approaches. For omega, classical preconditioners are optimal and matrix balancing monotonically decreases it to a stationary point. Numerical tests reveal that omega-optimal scaling outperforms kappa-optimal in practical iterative method performance.

What carries the argument

The affine-based pseudoconvex reformulation for the kappa-optimal diagonal preconditioning problem, which reduces the search to an n-vector and ensures all stationary points are global minima.

If this is right

Kappa-optimal preconditioners can be computed more efficiently and accurately using the subgradient method than with SDP solvers.
Jacobi and row/column normalization preconditioners are exactly omega-optimal.
Matrix balancing schemes converge to omega-optimal scalings.
Omega-optimal preconditioners reduce iteration counts in PCG and LSQR more effectively than kappa-optimal ones in tested cases.
Sequential application of omega scaling after kappa-optimal preconditioning yields additional reductions in PCG iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Omega scaling may serve as an inexpensive post-processing step to enhance any existing preconditioner.
The practical advantage of omega may stem from it optimizing average rather than worst-case behavior, which aligns better with typical linear system spectra in applications.
These methods could extend to block-diagonal preconditioners in parallel computing settings for larger problems.
Testing on ill-conditioned systems from specific domains like finite element analysis would verify if the performance gains hold broadly.

Load-bearing premise

The numerical experiments on test problems are representative of the general linear systems where these preconditioners would be applied in practice.

What would settle it

Observing a linear system where the number of PCG iterations required after kappa-optimal preconditioning is lower than after omega-optimal preconditioning, or where the subgradient method converges to a point with higher condition number than an SDP solution.

read the original abstract

We study optimal diagonal preconditioning using the classical worst-case $\kappa$-condition number and the averaging-based $\omega$-condition number. For the $\kappa$-optimal preconditioning problem, we derive an affine-based pseudoconvex reformulation with three key advantages: all stationary points are global minima, subgradients are inexpensive to compute, and the optimization variable is an $n$-dimensional vector rather than an $n\times n$ matrix as in semidefinite programming (SDP) approaches. We develop a simple and highly efficient subgradient method, with convergence guarantees, for solving this pseudoconvex formulation that is substantially more scalable and accurate than existing SDP-based methods. For the $\omega$-condition number, we provide explicit characterizations of optimal diagonal and block diagonal preconditioners. In particular, we show that several classical preconditioners, including Jacobi and row/column normalization, are $\omega$-optimal, and that matrix balancing schemes monotonically reduce $\omega$ and converge to stationary points of the two-sided problem. To the best of our knowledge, this is the first unified and explicit characterization of optimality conditions for both $\kappa$ and $\omega$-based preconditioning. Our numerical experiments further reveal a striking phenomenon: although $\kappa$-optimal preconditioners achieve stronger reductions in the worst-case condition number, $\omega$-optimal preconditioners are substantially cheaper to compute and yield better performance for iterative methods such as preconditioned conjugate gradient (PCG) and least squares method (LSQR). Moreover, applying $\omega$-optimal scaling to linear systems that are already $\kappa$-optimally preconditioned leads to further improvements in PCG iterations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper studies optimal diagonal preconditioning for linear systems using both the classical worst-case κ-condition number and the averaging-based ω-condition number. For κ-optimal diagonal scaling, it derives an affine pseudoconvex reformulation that reduces the problem to optimizing an n-dimensional vector (rather than an n×n matrix via SDP), proves that all stationary points are global minima, and develops a simple subgradient method with convergence guarantees that is more scalable than prior SDP approaches. For the ω-condition number, it gives explicit characterizations of optimal diagonal and block-diagonal preconditioners, showing that classical methods such as Jacobi and row/column normalization are ω-optimal and that matrix balancing monotonically decreases ω. Numerical experiments on test problems indicate that, although κ-optimal scalings reduce the worst-case condition number more, ω-optimal scalings are far cheaper to compute and yield better iteration counts for PCG and LSQR; moreover, applying ω-scaling on top of an already κ-preconditioned system produces further gains.

Significance. If the central claims hold, the work supplies the first unified explicit optimality theory for both κ- and ω-based diagonal preconditioning and demonstrates a practical advantage for the cheaper ω-optimal scalings in iterative solvers. The pseudoconvex reformulation, inexpensive subgradients, and convergence guarantees for the κ-problem, together with the closed-form characterizations for ω (including monotonicity of balancing), are clear theoretical strengths. The empirical observation that ω-scaling can improve already κ-optimal systems is a noteworthy practical finding that could affect preconditioner choice in numerical linear algebra.

major comments (1)

[§5] §5 (Numerical Experiments): the test problems are referred to only generically as “test problems” with no stated selection criteria, matrix sources (e.g., SuiteSparse, random generation parameters), size/sparsity/conditioning ranges, or diversity metrics. Because the headline practical claim—that ω-optimal preconditioners are substantially cheaper and outperform κ-optimal ones for PCG/LSQR, and that ω-scaling further improves κ-preconditioned systems—rests entirely on these results, the lack of a documented, representative benchmark set leaves open the possibility that the observed advantages are specific to the chosen instances rather than general.

minor comments (2)

[§1] The abstract and §1 could more explicitly contrast the computational cost of the new subgradient method versus existing SDP solvers (e.g., by reporting flop counts or wall-clock times for the same matrices).
[§2] Notation for the two-sided ω-problem and the distinction between one-sided and two-sided optimality could be introduced earlier to aid readers unfamiliar with averaging-based condition numbers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript and the constructive suggestion regarding the numerical experiments. We address the major comment below and will incorporate the requested details in the revision.

read point-by-point responses

Referee: [§5] §5 (Numerical Experiments): the test problems are referred to only generically as “test problems” with no stated selection criteria, matrix sources (e.g., SuiteSparse, random generation parameters), size/sparsity/conditioning ranges, or diversity metrics. Because the headline practical claim—that ω-optimal preconditioners are substantially cheaper and outperform κ-optimal ones for PCG/LSQR, and that ω-scaling further improves κ-preconditioned systems—rests entirely on these results, the lack of a documented, representative benchmark set leaves open the possibility that the observed advantages are specific to the chosen instances rather than general.

Authors: We agree that a more explicit description of the test problems would strengthen the presentation and better support the practical claims. In the revised manuscript we will add a dedicated subsection in §5 that specifies the matrix sources (including SuiteSparse identifiers and generation parameters for random instances), reports ranges for dimensions, sparsity, and conditioning, and outlines the selection criteria used to ensure diversity and relevance to typical linear systems arising in iterative solvers. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained from definitions.

full rationale

The paper derives an affine-based pseudoconvex reformulation for the κ-optimal problem with stated advantages (all stationary points global, inexpensive subgradients, vector variable) and provides explicit characterizations for ω-optimal diagonal/block-diagonal preconditioners, showing classical methods like Jacobi are ω-optimal and balancing schemes reduce ω. These steps follow directly from the definitions of the respective condition numbers without reducing to fitted inputs, self-definitions, or load-bearing self-citations. Numerical comparisons of practical performance are presented as separate empirical observations. No quoted steps exhibit the enumerated circular patterns; the chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard results from convex optimization and matrix analysis without introducing new free parameters or postulated entities in the described contributions.

axioms (2)

standard math Properties of subgradients and pseudoconvex functions allow stationary points to be global minima in the reformulated problem.
Invoked to justify the subgradient method and its convergence guarantees for the κ problem.
domain assumption The ω-condition number admits explicit optimality conditions for diagonal and block-diagonal scalings.
Central to the characterizations of classical preconditioners as ω-optimal.

pith-pipeline@v0.9.0 · 5863 in / 1489 out tokens · 53276 ms · 2026-05-18T11:51:32.686850+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study optimal diagonal preconditioning using the classical worst-case κ-condition number and the averaging-based ω-condition number... explicit characterizations of optimal diagonal and block diagonal preconditioners... Jacobi and row/column normalization are ω-optimal
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2.7... A is κ-optimally diagonally preconditioned iff x1 ⊙ x1 = xn ⊙ xn

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Role of Symmetry in Optimizing Overparameterized Networks
cs.LG 2026-04 unverdicted novelty 6.0

Overparameterization introduces symmetries that precondition the Hessian for better-conditioned minima and raise the reachability of global minima from typical starts in neural network loss landscapes.
The Role of Symmetry in Optimizing Overparameterized Networks
cs.LG 2026-04 unverdicted novelty 6.0

Overparameterization adds symmetries that precondition the Hessian for better minima and increase the probability mass of global minima near typical initializations.