pith. sign in

arxiv: 2606.10562 · v1 · pith:5PZMS4QSnew · submitted 2026-06-09 · 🧮 math.OC · cs.LG· cs.NA· math.NA

Accelerating SAV-based optimization via randomized low-rank Hessian approximation

Pith reviewed 2026-06-27 12:26 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NA
keywords SAV methodsNyström approximationHessian approximationoptimizationPINNsconvergence analysisenergy dissipation
0
0 comments X

The pith

N-RSAV accelerates RSAV optimization by using a randomized low-rank Nyström approximation of the Hessian while preserving the modified energy dissipation law.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Nyström-enhanced relaxed scalar auxiliary variable method (N-RSAV) that incorporates approximate curvature information to speed up convergence in RSAV-based optimization. Standard RSAV approaches depend only on first-order information and converge slowly on ill-conditioned problems such as PINN training. The method constructs the linear operator from a randomized low-rank Nyström Hessian approximation, truncates eigenvalues to enforce positive semidefiniteness, and reuses the approximation adaptively based on energy deviation to lower cost. Convergence analysis is given under the Polyak-Łojasiewicz condition together with an added convexity assumption. Experiments on convex quadratics and PINNs confirm substantially faster convergence than conventional RSAV methods.

Core claim

The central claim is that replacing the linear operator in the RSAV scheme with a positive semidefinite Nyström approximation of the Hessian, obtained via randomization and eigenvalue truncation, yields faster convergence on effectively low-rank problems while satisfying an unconditional modified energy dissipation law, supported by convergence guarantees under the PL condition plus convexity.

What carries the argument

The Nyström-enhanced linear operator inside the RSAV scheme, built from a randomized low-rank Nyström Hessian approximation with eigenvalue truncation to ensure positive semidefiniteness.

If this is right

  • The method achieves faster convergence than standard RSAV on ill-conditioned problems with low-rank Hessian structure.
  • The modified energy dissipation law holds unconditionally.
  • Adaptive reuse of the approximate Hessian lowers computational cost while retaining acceleration.
  • Convergence is guaranteed under the PL condition with an added convexity assumption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The adaptive reuse idea could be applied to other curvature approximations beyond Nyström.
  • On problems lacking low-rank Hessian structure the approximation may lose accuracy and slow the method.
  • The approach might extend to other auxiliary-variable optimization schemes that rely on energy dissipation.

Load-bearing premise

The target problems must have an effectively low-rank Hessian structure that makes the Nyström approximation accurate, and the convergence analysis requires both the Polyak-Łojasiewicz condition and convexity.

What would settle it

A run on an ill-conditioned convex quadratic problem with low-rank Hessian where N-RSAV shows no faster convergence than RSAV or where the modified energy increases in an iteration.

Figures

Figures reproduced from arXiv: 2606.10562 by Daisuke Furihata, Ryo Sagawa, Yuto Miyatake.

Figure 1
Figure 1. Figure 1: Eigenvalue spectrum of the Hessian H ∈ R 400×400 in Experiment 1. The eigenvalues consist of λ low i = i −2 (i = 1, . . . , 390) and 10 outlier eigenvalues λ high i (i = 1, . . . , 10) independently sampled from the uniform distribution on [1, 10]. The condition number of H is 2.3016 × 107 . We compare the proposed methods, namely, N-RSAV and AN-RSAV, with the conventional RSAV method and a variant that di… view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the objective value with respect to computation time in Exper [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the test loss for PINNs training of the one-dimensional Burgers [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pointwise absolute error between the PINNs approximation and the reference [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
read the original abstract

We propose a new optimization method, the Nystr\"om-enhanced relaxed scalar auxiliary variable method (N-RSAV), which incorporates curvature information into the RSAV framework to accelerate convergence while preserving an unconditional modified energy dissipation law. Existing RSAV-based methods rely solely on first-order information and often suffer from slow convergence, particularly for ill-conditioned problems such as those arising in physics-informed neural networks (PINNs). To address this limitation, we design the linear operator in the RSAV scheme using approximate Hessian information obtained from a randomized low-rank Nystr\"om approximation. To preserve the dissipation structure, we enforce positive semidefiniteness through eigenvalue truncation. Furthermore, we introduce an adaptive strategy that reuses the approximate Hessian based on the deviation between the original and modified energies, significantly reducing computational cost. We also provide a convergence analysis of the RSAV scheme with a general positive semidefinite operator under the Polyak-Lojasiewicz (PL) condition and establish corresponding convergence guarantees for N-RSAV under the PL condition and an additional convexity assumption. Numerical experiments on ill-conditioned problems with effectively low-rank structure, including convex quadratic problems and training of PINNs, demonstrate that the proposed methods achieve substantially faster convergence than conventional RSAV-based approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Nyström-enhanced relaxed scalar auxiliary variable (N-RSAV) method, which replaces the linear operator in RSAV with a randomized low-rank Nyström approximation of the Hessian (enforced PSD via eigenvalue truncation) to accelerate convergence on ill-conditioned problems while preserving an unconditional modified energy dissipation law. An adaptive reuse strategy based on energy deviation is introduced to lower cost. Convergence is analyzed for general RSAV with PSD operator under the PL condition, and for N-RSAV under PL plus an additional convexity assumption. Experiments on convex quadratics and PINN training report substantially faster convergence than standard RSAV approaches.

Significance. If the results hold, the approach provides a concrete way to inject curvature information into SAV/RSAV schemes for faster convergence on problems with effectively low-rank Hessians, while retaining the unconditional stability property that makes these methods attractive. The provision of a convergence analysis under stated assumptions (PL for the general case) is a positive feature, as is the adaptive reuse mechanism for computational efficiency.

major comments (3)
  1. [Abstract and convergence analysis] Abstract and convergence analysis section: the N-RSAV convergence guarantee requires the PL condition plus an additional convexity assumption, yet the strongest empirical claim concerns acceleration on PINN training (non-convex loss landscapes). The analysis therefore supplies no theoretical support for the reported PINN results.
  2. [Numerical experiments] Numerical experiments section: the claim of substantially faster convergence on ill-conditioned problems (including PINNs) is presented without error bars, multiple random seeds, explicit baseline definitions, or quantitative verification that the adaptive reuse strategy preserves the exact dissipation law.
  3. [Method] Method section on adaptive strategy: it is not shown whether the combination of Nyström approximation, eigenvalue truncation, and energy-deviation-based reuse continues to satisfy the unconditional modified energy dissipation law exactly, which is load-bearing for the stability claim.
minor comments (2)
  1. [Abstract] The abstract refers to 'effectively low-rank structure' without a precise definition or quantitative criterion for when the Nyström approximation is expected to be accurate.
  2. [Method] Notation for the randomized Nyström operator and the reuse threshold parameter should be introduced with explicit definitions and ranges.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the method's potential. We address each major comment below, proposing revisions to strengthen the manuscript where the concerns are valid.

read point-by-point responses
  1. Referee: [Abstract and convergence analysis] Abstract and convergence analysis section: the N-RSAV convergence guarantee requires the PL condition plus an additional convexity assumption, yet the strongest empirical claim concerns acceleration on PINN training (non-convex loss landscapes). The analysis therefore supplies no theoretical support for the reported PINN results.

    Authors: We agree that the N-RSAV convergence result requires both the PL condition and an additional convexity assumption, while the general RSAV analysis holds under PL alone. The PINN experiments demonstrate practical acceleration on non-convex problems and are not claimed to be covered by the N-RSAV theory. We will revise the abstract to explicitly distinguish the scope of the theoretical guarantees (general RSAV under PL; N-RSAV under PL plus convexity) from the empirical results on PINNs. revision: yes

  2. Referee: [Numerical experiments] Numerical experiments section: the claim of substantially faster convergence on ill-conditioned problems (including PINNs) is presented without error bars, multiple random seeds, explicit baseline definitions, or quantitative verification that the adaptive reuse strategy preserves the exact dissipation law.

    Authors: We accept that the experimental presentation lacks statistical rigor and explicit verification. In the revision we will report results with error bars over multiple random seeds, clearly define all baselines (including standard RSAV variants), and include quantitative checks (e.g., tables or plots of energy deviation) confirming that the adaptive reuse strategy preserves the modified energy dissipation law to machine precision. revision: yes

  3. Referee: [Method] Method section on adaptive strategy: it is not shown whether the combination of Nyström approximation, eigenvalue truncation, and energy-deviation-based reuse continues to satisfy the unconditional modified energy dissipation law exactly, which is load-bearing for the stability claim.

    Authors: Eigenvalue truncation guarantees that the Nyström operator remains positive semidefinite, which is the key property used to establish the unconditional dissipation law for the general RSAV scheme with a PSD operator. The adaptive reuse is triggered only when the energy deviation remains small, preserving the structure. We will add an explicit verification (or short proof sketch) in the revised Method section showing that the full combination continues to satisfy the exact unconditional modified energy dissipation law. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain or analysis

full rationale

The paper introduces N-RSAV by constructing a PSD linear operator via randomized Nyström Hessian approximation (with eigenvalue truncation) inside the existing RSAV framework, then separately states convergence results for the general PSD-operator RSAV under PL and for N-RSAV under PL plus convexity. These steps are independent of the target data or fitted values; the PINN experiments are presented purely as numerical evidence without any claim that the convexity assumption holds for them. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the abstract or described structure. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 3 axioms · 0 invented entities

The central claim rests on the Polyak-Lojasiewicz condition and convexity assumption for the convergence proof, plus the domain assumption that the Hessian has effectively low-rank structure. No explicit fitted parameters are named, but the approximation rank and energy-deviation threshold for reuse function as implicit hyperparameters. No new entities are postulated.

free parameters (2)
  • Nyström rank
    The rank parameter for the randomized low-rank approximation must be chosen; it is not derived from first principles.
  • reuse threshold
    The deviation threshold between original and modified energies that triggers Hessian recomputation is a tunable parameter.
axioms (3)
  • domain assumption The objective satisfies the Polyak-Lojasiewicz condition
    Invoked for the convergence analysis of both RSAV and N-RSAV.
  • domain assumption The Hessian possesses effectively low-rank structure
    Required for the randomized Nyström approximation to be accurate and computationally beneficial.
  • domain assumption Additional convexity assumption for N-RSAV
    Stated as necessary for the N-RSAV convergence guarantee beyond the PL condition.

pith-pipeline@v0.9.1-grok · 5761 in / 1605 out tokens · 33366 ms · 2026-06-27T12:26:59.914213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    and Ahmad, S

    Mehmood, F. and Ahmad, S. and Whangbo, T. K. , title =. Mathematics , volume =. 2023 , pages =

  2. [2]

    and Mohri, M

    Kumar, S. and Mohri, M. and Talwalkar, A. , title =. Proceedings of Machine Learning Research , volume =. 2009 , pages =

  3. [3]

    and Shen, J

    Liu, X. and Shen, J. and Zhang, X. , title =. SIAM Journal on Scientific Computing , volume =. 2023 , pages =

  4. [4]

    and Wright, S

    Nocedal, J. and Wright, S. J. , title =

  5. [5]

    and Xu, J

    Shen, J. and Xu, J. and Yang, J. , title =. Journal of Computational Physics , volume =. 2018 , pages =

  6. [6]

    On the saddle point problem for non-convex optimization

    Pascanu, R. and Dauphin, Y. N. and Ganguli, S. and Bengio, Y. , title =. arXiv preprint arXiv:1405.4604 , year =

  7. [7]

    and Perdikaris, P

    Raissi, M. and Perdikaris, P. and Karniadakis, G. E. , title =. Journal of Computational Physics , volume =. 2019 , pages =

  8. [8]

    and Lei, W

    Rathore, P. and Lei, W. and Frangella, Z. and Lu, L. and Udell, M. , title =. Proceedings of Machine Learning Research , volume =. 2024 , pages =

  9. [9]

    and Zhang, Z

    Jiang, M. and Zhang, Z. and Zhao, J. , title =. Journal of Computational Physics , volume =. 2022 , pages =

  10. [10]

    and Shen, J

    Zhang, Y. and Shen, J. , title =. Journal of Computational Physics , volume =. 2022 , pages =

  11. [11]

    and Roosta-Khorasani, F

    Ye, N. and Roosta-Khorasani, F. and Cui, T. , title =. 2017 MATRIX Annals , volume =. 2019 , pages =

  12. [12]

    and Cao, Z

    Sun, S. and Cao, Z. and Zhu, H. and Zhao, J. , title =. IEEE Transactions on Cybernetics , volume =. 2020 , pages =

  13. [13]

    and Zhang, J

    Zhang, S. and Zhang, J. and Shen, J. and Lin, G. , title =. SIAM Journal on Scientific Computing , volume =

  14. [14]

    and Mao, Z

    Ma, Z. and Mao, Z. and Shen, J. , title =. Journal of Computational Physics , volume =

  15. [15]

    and Krishnan, S

    Ghorbani, B. and Krishnan, S. and Xiao, Y. , booktitle =. An Investigation into Neural Net Optimization via. 2019 , volume =

  16. [16]

    and Gholami, A

    Yao, Z. and Gholami, A. and Keutzer, K. and Mahoney, M. W. , booktitle=. Py. 2020 , pages=

  17. [17]

    Adam: A Method for Stochastic Optimization

    Adam: A method for stochastic optimization , author =. arXiv preprint arXiv:1412.6980 , year =

  18. [18]

    Wiley Interdisciplinary Reviews: Computational Statistics , volume=

    Steepest descent , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2010 , publisher=

  19. [19]

    Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=

    Understanding the difficulty of training deep feedforward neural networks , author=. Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=. 2010 , organization=

  20. [20]

    Journal of Research of the National Bureau of Standards , volume=

    Methods of conjugate gradients for solving linear systems , author=. Journal of Research of the National Bureau of Standards , volume=

  21. [21]

    2001 , publisher=

    Chebyshev and Fourier spectral methods , author=. 2001 , publisher=

  22. [22]

    Wanner, Gerhard and Hairer, Ernst , volume=. Solving. 1996 , publisher=

  23. [23]

    Mathematical Programming , volume=

    On the limited memory BFGS method for large scale optimization , author=. Mathematical Programming , volume=. 1989 , publisher=

  24. [24]

    Computers & Mathematics with Applications , volume=

    A computationally optimal relaxed scalar auxiliary variable approach for solving gradient flow systems , author=. Computers & Mathematics with Applications , volume=. 2024 , publisher=

  25. [25]

    A novel energy-optimal scalar auxiliary variable (

    Liu, Zhengguang and Zhang, Yanrong and Li, Xiaoli , journal=. A novel energy-optimal scalar auxiliary variable (

  26. [26]

    SIAM Journal on Scientific Computing , volume=

    A Relaxed Vector Auxiliary Variable Algorithm for Unconstrained Optimization Problems , author=. SIAM Journal on Scientific Computing , volume=. 2025 , publisher=

  27. [27]

    Neural Computation , volume=

    Fast estimation of approximate matrix ranks using spectral densities , author=. Neural Computation , volume=. 2017 , publisher=

  28. [28]

    Linear Algebra and its Applications , volume=

    Fast randomized numerical rank estimation for numerically low-rank matrices , author=. Linear Algebra and its Applications , volume=. 2024 , publisher=

  29. [29]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  30. [30]

    Journal of Optimization Theory and Applications , volume=

    Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , author=. Journal of Optimization Theory and Applications , volume=. 1989 , publisher=

  31. [31]

    Weijie Su and Stephen Boyd and Emmanuel J. Cand. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , journal =. 2016 , volume =

  32. [32]

    Advances in Neural Information Processing Systems , volume=

    Acceleration via symplectic discretization of high-resolution differential equations , author=. Advances in Neural Information Processing Systems , volume=

  33. [33]

    Pearlmutter, B. A. , title =. Neural Computation , volume =. 1994 , pages =

  34. [34]

    and Shukla, K

    Kiyani, E. and Shukla, K. and Urb. Optimizing the optimizer for physics-informed neural networks and. Computer Methods in Applied Mechanics and Engineering , volume=. 2025 , publisher=

  35. [35]

    Gradient alignment in physics-informed neural networks: A second-order optimization perspective.arXiv preprint arXiv:2502.00604, 2025

    Gradient alignment in physics-informed neural networks: A second-order optimization perspective , author=. arXiv preprint arXiv:2502.00604 , year=

  36. [36]

    and Matsuda, T

    Ito, S. and Matsuda, T. and Miyatake, Y. , journal=. Adjoint-based exact

  37. [37]

    Randomized

    Frangella, Zachary and Tropp, Joel A and Udell, Madeleine , journal=. Randomized