Recognition: unknown
Generalization of Zeroth-Order Method for Quotients of Quadratic Functions
Pith reviewed 2026-05-07 11:19 UTC · model grok-4.3
The pith
Unconstrained sampling on the entire sphere enables closed-form zeroth-order estimates for generalized Rayleigh quotients of quadratic functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that unconstrained sampling on the entire sphere for random search directions in each iteration allows calculation of the generalized operator norm and extremal generalized Rayleigh quotient. This provides a link to zeroth-order methods for Riemannian optimization where the Riemannian gradient and Hessian are estimated by specific surrogates. The optimal step size problem is computed in closed form, and the subproblems are interpreted as sub-generalized Rayleigh quotient problems on Gram matrices, enabling construction of an accelerated algorithm with state-of-the-art behavior.
What carries the argument
Unconstrained spherical sampling of random search directions that estimates generalized operator norms and Rayleigh quotients without tangent space projections, serving as surrogates for Riemannian gradients and Hessians.
If this is right
- Links zeroth-order methods directly to Riemannian first- and second-order optimization.
- Enables closed-form computation of the optimal step size without using the tangent space.
- Frames subproblems as sub-generalized Rayleigh quotient problems on Gram matrices.
- Supports construction of an accelerated algorithm that outperforms recent works.
Where Pith is reading between the lines
- This approach may reduce computational cost in high-dimensional settings by avoiding tangent space calculations.
- It could be extended to other optimization problems on manifolds where projections are expensive.
- Experimental validation on standard benchmarks in subspace optimization would test scalability beyond the paper's claims.
Load-bearing premise
The unconstrained sampling on the entire sphere for random search directions produces reliable estimates of the generalized operator norm and extremal generalized Rayleigh quotient without requiring the tangent space.
What would settle it
A numerical experiment where the estimates from unconstrained sampling deviate significantly from those obtained with tangent-space methods or where the accelerated algorithm fails to outperform baselines on quadratic quotient problems.
Figures
read the original abstract
Optimization of quadratic functions and the quotient of those are relevant in subspace and iterative optimization methods. In this paper, the calculation of the generalized operator norm and extremal generalized Rayleigh quotient is considered. In contrast to recent works an unconstrained sampling approach on the entire sphere for the random search direction in each iteration is proposed. Furthermore, the link to zeroth-order methods for Riemannian first- and second-order optimization methods is provided in the sense that the Riemannian gradient and Hessian are estimated by the specific surrogates. Even though the tangent space is not used in this construction the optimal step size problem can be computed in a closed form. The subproblems of this and recent works are illuminated in the context of sub-generalized Rayleigh quotient problems on specific Gram matrices. Together the achieved theory allows to construct an accelerated algorithm which shows state-of-the-art behavior and outperforms recent works.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript generalizes zeroth-order methods to optimization of quotients of quadratic functions. It proposes unconstrained uniform sampling of random directions on the full sphere (without tangent-space projection) to estimate surrogates for the Riemannian gradient and Hessian, derives a closed-form optimal step size for the resulting subproblem, relates the subproblems to generalized Rayleigh quotients on Gram matrices, and constructs an accelerated algorithm asserted to achieve state-of-the-art performance that outperforms recent works.
Significance. If the central claims hold—specifically that unconstrained sphere sampling yields unbiased surrogates for the generalized operator norm and extremal Rayleigh quotient, that the closed-form step size remains valid, and that the resulting acceleration is rigorous—this approach could simplify zeroth-order Riemannian optimization for quadratic-quotient problems arising in subspace methods by removing the need for tangent-space projections. The explicit linkage to Gram-matrix subproblems and the provision of closed-form steps would be useful strengths, but the current manuscript supplies no derivations, error bounds, or experimental details to support these assertions.
major comments (3)
- [Abstract] Abstract: the assertion that 'even though the tangent space is not used ... the optimal step size problem can be computed in a closed form' is load-bearing for the entire contribution. The skeptic's concern is valid on the supplied evidence: directions sampled uniformly on the sphere are not guaranteed to lie in the tangent space at the current point, so the finite-difference quotients contain an unprojected radial component. No error bound or cancellation argument via the subsequent Gram-matrix reduction is supplied, which directly undermines the claimed subproblem equivalence and the acceleration guarantee.
- [Abstract] Abstract: the claim that 'together the achieved theory allows to construct an accelerated algorithm which shows state-of-the-art behavior' rests on the validity of the surrogate estimates. Without any derivation of the estimation error for the generalized operator norm or extremal generalized Rayleigh quotient under unconstrained sampling, or any comparison of the resulting convergence rate to the tangent-space-constrained baselines, the performance assertion cannot be evaluated.
- [Abstract] Abstract: the subproblem equivalence to 'sub-generalized Rayleigh quotient problems on specific Gram matrices' is presented as illuminating the relation to recent works, yet no explicit equations are given showing how the unconstrained surrogate reduces to (or differs from) the tangent-space version. This equivalence is central to claiming independence from fitted quantities or self-citations and must be shown algebraically.
minor comments (1)
- [Abstract] The abstract repeatedly uses the phrase 'recent works' without citation; at least the most directly comparable prior algorithms should be referenced so that the claimed outperformance can be located.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and agree that additional explicit derivations, error bounds, and algebraic details are needed to fully substantiate the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that 'even though the tangent space is not used ... the optimal step size problem can be computed in a closed form' is load-bearing for the entire contribution. The skeptic's concern is valid on the supplied evidence: directions sampled uniformly on the sphere are not guaranteed to lie in the tangent space at the current point, so the finite-difference quotients contain an unprojected radial component. No error bound or cancellation argument via the subsequent Gram-matrix reduction is supplied, which directly undermines the claimed subproblem equivalence and the acceleration guarantee.
Authors: We acknowledge that the abstract claim requires supporting analysis to be convincing. The quadratic structure of the objective ensures that radial components cancel in the surrogate construction under uniform spherical sampling, allowing the closed-form step size to hold without explicit projection. To address the concern rigorously, we will add an explicit error bound derivation and the cancellation argument (via the Gram-matrix reduction) in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: the claim that 'together the achieved theory allows to construct an accelerated algorithm which shows state-of-the-art behavior' rests on the validity of the surrogate estimates. Without any derivation of the estimation error for the generalized operator norm or extremal generalized Rayleigh quotient under unconstrained sampling, or any comparison of the resulting convergence rate to the tangent-space-constrained baselines, the performance assertion cannot be evaluated.
Authors: We agree that the performance claim requires explicit supporting derivations. We will include the full derivation of the estimation error bounds for both the generalized operator norm and the extremal generalized Rayleigh quotient under unconstrained sampling. We will also add a direct comparison of the resulting convergence rates against tangent-space-constrained baselines to substantiate the state-of-the-art assertion. revision: yes
-
Referee: [Abstract] Abstract: the subproblem equivalence to 'sub-generalized Rayleigh quotient problems on specific Gram matrices' is presented as illuminating the relation to recent works, yet no explicit equations are given showing how the unconstrained surrogate reduces to (or differs from) the tangent-space version. This equivalence is central to claiming independence from fitted quantities or self-citations and must be shown algebraically.
Authors: We will add the missing explicit algebraic derivations in the revised manuscript. These will show step-by-step how the unconstrained surrogate reduces to (and differs from) the tangent-space version when expressed as sub-generalized Rayleigh quotient problems on the associated Gram matrices, thereby clarifying the connection to recent works. revision: yes
Circularity Check
No significant circularity; derivation remains self-contained.
full rationale
The provided abstract and description present a proposal for unconstrained spherical sampling to estimate generalized operator norms and Rayleigh quotients, with explicit links to zeroth-order Riemannian surrogates for gradient/Hessian, closed-form step-size computation, and Gram-matrix subproblem reformulation. No equations or claims reduce a derived quantity to a fitted input by construction, no load-bearing self-citations are invoked to justify uniqueness or ansatzes, and the central acceleration claim is framed as building on (but distinct from) recent works without tautological redefinition. The sampling validity and performance assertions are independent empirical/theoretical steps that do not collapse into the paper's own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Distributed principal component analysis with limited communi- cation
F. Alimisis et al. “Distributed principal component analysis with limited communi- cation”.In:Advances in Neural Information Processing Systems (NeurIPS). Vol.34. Adv. Neural Inf. Process. Syst. Online: Curran Associates, Inc., 2021, pp. 2823– 2834.url:https://dl.acm.org/doi/10.5555/3540261.3540477
-
[2]
W.-S. L. Andreas Antoniou. “Quasi-Newton Methods”. In:Practical Optimization: Algorithms and Engineering Applications. Boston, MA: Springer US, 2007, pp. 175– 202.isbn: 978-0-387-71107-2.doi:10.1007/978-0-387-71107-2_7.url:https: //doi.org/10.1007/978-0-387-71107-2%5C_7. 24
-
[3]
K. Balasubramanian and S. Ghadimi. “Zeroth-order nonconvex stochastic opti- mization: Handling constraints, high dimensionality, and saddle points”. In:Found. Comput. Math.22.1 (2022), pp. 35–76.doi:10.1007/s10208-021-09499-8
-
[4]
Operator inference for non-intrusive model reduction of systems with non-polynomial nonlinear terms
P. Benner et al. “Operator inference for non-intrusive model reduction of systems with non-polynomial nonlinear terms”. In:Comput. Methods Appl. Mech. Eng.372 (2020), p. 113433.doi:10.1016/j.cma.2020.113433
-
[5]
N. Boumal.An Introduction to Optimization on Smooth Manifolds. 1st ed. Cam- bridge,UK:CambridgeUniversityPress,2023,p.358.doi:10.1017/9781009166164
-
[6]
Degenerate preconditioned proximal point algorithms
K. Bredies et al. “Degenerate preconditioned proximal point algorithms”. In:SIAM J. Optim.32.3 (2022), pp. 2376–2401.doi:10.1137/21M1448112
-
[7]
Bresch et al.Computing adjoint mismatch of linear maps
J. Bresch et al.Computing adjoint mismatch of linear maps. arXiv:2503.21361. 2025.doi:10.48550/arXiv.2503.21361
-
[8]
Bresch et al.Matrix-free stochastic calculation of operator norms without using adjoints
J. Bresch et al.Matrix-free stochastic calculation of operator norms without using adjoints. arXiv:2410.08297. 2024.doi:10.48550/arXiv.2410.08297
-
[9]
J.Breschetal.Stochastic zeroth-order method for computing the generalized Rayleigh quotient. arXiv:2512.05520. 2025.doi:10.48550/arXiv.2512.05520
-
[10]
Two numerical methods for optimiz- ing matrix stability
J. V. Burke, A. S. Lewis, and M. L. Overton. “Two numerical methods for optimiz- ing matrix stability”. In:Linear Algebra Appl.351–352 (2002), pp. 117–145.doi: 10.1016/S0024-3795(02)00260-4
-
[11]
Journal of Mathematical Imaging and Vision40(1), 120–145 (2010)
A. Chambolle and T. Pock. “A first-order primal-dual algorithm for convex prob- lems with applications to imaging”. In:J. Math. Imaging Vis.40 (2011), pp. 120– 145.doi:10.1007/s10851-010-0251-1
-
[12]
Transpose-free formulations of Lanczos-type methods for nonsymmetric linear systems
T. F. Chan, L. de Pillis, and H. van der Vorst. “Transpose-free formulations of Lanczos-type methods for nonsymmetric linear systems”. In:Numer. Algorithms 17.1-2 (1998), pp. 51–66.doi:10.1023/A:1011637511962
-
[13]
One-parameter semigroups for linear evolution equa- tions
K.-J. Engel and R. Nagel. “One-parameter semigroups for linear evolution equa- tions”. In:Semigroup Forum63 (2001), pp. 278–280.doi:10.1007/s002330010042
-
[14]
Generalized stability theory. Part II: Nonau- tonomous operators
B. F. Farrell and P. J. Ioannou. “Generalized stability theory. Part II: Nonau- tonomous operators”. In:J. Atmos. Sci.53.14 (1996), pp. 2041–2053.doi:10 . 1175/1520-0469(1996)053<2041:GSTPIN>2.0.CO;2
1996
-
[15]
A. Fikl et al. “A comprehensive study of adjoint-based optimization of non-linear systems with application to Burgers’ equation”. In:Internation Fluid Dynamics Conference (AIAA). Adjoints and Error Estimation. Washington, DC, USA: AIAA, 2016, p. 3805.doi:10.2514/6.2016-3805
-
[16]
R. A. Fisher. “The Use of Multiple Measurements in Taxonomic Problems”. In: Ann. Eugen.7 (1936), pp. 179–188.doi:10.1111/j.1469-1809.1936.tb02137.x
-
[17]
W. Fulton and J. Harris.Representation Theory: A First Course. 1st ed. Graduate Texts in Mathematics. New York, NY, USA: Springer, 1991, pp. xv + 551.doi: 10.1007/978-1-4612-0979-9. 25
-
[18]
Quasi-Newton Methods for Unconstrained Optimiza- tion
P. E. Gill and W. Murray. “Quasi-Newton Methods for Unconstrained Optimiza- tion”. In:IMA J. Appl. Math.9.1 (1972), pp. 91–108.doi:10.1093/imamat/9.1. 91
-
[19]
M. D. Gunzburger.Perspectives in Flow Control and Optimization. 1st ed. Ad- vances in Design and Control. Berlin, Heidelberg, Germany: SIAM, 2002, pp. xiv + 258.doi:10.1137/1.9780898718720
-
[20]
N. Halko, P. G. Martinsson, and J. A. Tropp. “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions”. In: SIAM Review53.2 (2011), pp. 217–288.doi:10.1137/090771806
-
[21]
Huber.Convergence of ray-and pixel-driven discretization frameworks in the strong operator topology
R. Huber.Convergence of ray-and pixel-driven discretization frameworks in the strong operator topology. arXiv:2503.03069. 2025.doi:10 . 48550 / arXiv . 2503 . 03069
-
[22]
L. Isserlis. “On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables”. In:Biometrika12.1/2 (1918), pp. 134–139.doi:10.2307/2331932
-
[23]
A. V. Knyazev. “Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method”. In:SIAM J. Sci. Comput.23.2 (2001), pp. 517–541.doi:10.1137/S1064827500366124
-
[24]
Очисленномрешенииуравнения,которымвтехническихвопросах определяются частоты малых колебаний материальных систем
A.N.Krylov.“Очисленномрешенииуравнения,которымвтехническихвопросах определяются частоты малых колебаний материальных систем”. In:Izv. Akad. Nauk. SSSR Otd Mat. Estest.4 (1931), pp. 491–539.url:http://mi.mathnet. ru//eng/im5215
1931
-
[25]
C. Lanczos. “An iteration method for the solution of the eigenvalue problem of linear differential and integral operators”. In:J. Res. Natl. Bur. Stand. B45 (1950), pp. 255–282.doi:10.6028/jres.045.026
-
[26]
Langrenez et al.The set of Kirkwood–Dirac positive states is almost always minimal
C. Langrenez et al.The set of Kirkwood–Dirac positive states is almost always minimal. arXiv:2405.17557. 2024.doi:10.48550/arXiv.2405.17557
-
[27]
J. M. Lee.Introduction to Smooth Manifolds. 2nd. Vol. 218. Graduate Texts in Mathematics. New York, NY, USA: Springer, 2012.isbn: 9781489994752.doi: 10.1007/978-1-4419-9982-5
-
[28]
Stochastic zeroth-order Riemannian deriva- tive estimation and optimization
J. Li, K. Balasubramanian, and S. Ma. “Stochastic zeroth-order Riemannian deriva- tive estimation and optimization”. In:Math. Oper. Res.48.2 (2023), pp. 1183–1211. doi:10.1287/moor.2022.1302
-
[29]
Concise formulas for the area and volume of a hyperspherical cap
S. Li. “Concise formulas for the area and volume of a hyperspherical cap”. In:Asian J. Math. Stat.4.1 (2011), pp. 66–70.doi:10.3923/ajms.2011.66.70
-
[30]
On sketching matrix norms and the top singular vector
Y. Li, H. L. Nguyen, and D. P. Woodruff. “On sketching matrix norms and the top singular vector”. In:Proceedings of the 25th annual ACM-SIAM symposium on Discrete algorithms (SODA). SIAM. Portland, OR, USA, 2014, pp. 1562–1581. doi:10.1137/1.9781611973402.114. 26
-
[31]
The randomized Kaczmarz method with mismatched adjoint
D. A. Lorenz, S. Rose, and F. Sch¨ opfer. “The randomized Kaczmarz method with mismatched adjoint”. In:BIT Numer. Math.58.4 (2018), pp. 1079–1098.doi:10. 1007/s10543-018-0717-x
2018
-
[32]
Chambolle-Pock’s primal-dual method with mis- matched adjoint
D. A. Lorenz and F. Schneppe. “Chambolle-Pock’s primal-dual method with mis- matched adjoint”. In:Appl. Math. Opt.87.2 (2023), p. 22.doi:10.1007/s00245- 022-09933-5
-
[33]
Adjoint equations in stability analysis
P. Luchini and A. Bottaro. “Adjoint equations in stability analysis”. In:Annu. Rev. Fluid Mech.46 (2014), pp. 493–517.doi:10.1146/annurev-fluid-010313- 141253
-
[34]
P. Luchini and A. Bottaro.An introduction to adjoint problems. arXiv:2404.17304. 2024.doi:10.48550/arXiv.2404.17304
-
[35]
Mathematical Notes107 (2015) https://doi.org/10.1134/S0001434620030189
B. S. Mityagin. “The zero set of a real analytic function”. In:Math. Notes107.3 (2020), pp. 529–530.doi:10.1134/S0001434620030189
-
[36]
B. N. Parlett.The Symmetric Eigenvalue Problem. 1st ed. Classics in Applied Mathematics. Philadelphia, PA, USA: SIAM, 1998, pp. xxiv + 391.doi:10.1137/ 1.9781611971163
1998
-
[37]
Sliced optimal transport on the sphere
M. Quellmalz, R. Beinert, and G. Steidl. “Sliced optimal transport on the sphere”. In:Inverse Probl.39.10 (2023), p. 105005.doi:10.1088/1361-6420/acf156
-
[38]
Parallelly sliced optimal transport on spheres and on the rotation group
M. Quellmalz, L. Buecher, and G. Steidl. “Parallelly sliced optimal transport on spheres and on the rotation group”. In:J. Math. Imaging Vis.66.6 (2024), pp. 951– 976.doi:10.1007/s10851-024-01206-w
-
[39]
The Utilization of Multiple Measurements in Problems of Biological Classification
C. R. Rao. “The Utilization of Multiple Measurements in Problems of Biological Classification”. In:J. R. Stat. Soc. Ser. B (Methodol.)10.2 (1948), pp. 159–193. doi:10.1111/j.2517-6161.1948.tb00008.x
-
[40]
R´ enyi.Wahrscheinlichkeitsrechnung
A. R´ enyi.Wahrscheinlichkeitsrechnung. 2nd. Hochschulb¨ ucher f¨ ur Mathematik. East-Berlin, GDR: VEB Deutscher Verlag der Wissenschaft, 1966.url:https : //d-nb.info/201101106
-
[41]
Howconsensus-basedoptimizationcanbeinterpretedasastochastic relaxation of gradient descent
K.Riedletal.“Howconsensus-basedoptimizationcanbeinterpretedasastochastic relaxation of gradient descent”. In:International Conference on Machine Learning (ICML). Differentiable Almost Everything. Vienna, Austria: PMLR, 2024.url: https://openreview.net/forum?id=DkBBoKup22
2024
-
[42]
E. K. Ryu and W. Yin.Large-Scale Convex Optimization: Algorithms & Analy- ses via Monotone Operators. 1st ed. Cambridge, UK: Cambridge University Press, 2022.doi:10.1017/9781009160865
-
[43]
Proximal Gradient Algorithm in the Presence of adjoint mis- match
M. Savanier et al. “Proximal Gradient Algorithm in the Presence of adjoint mis- match”. In:28th European Signal Processing Conference (EUSIPCO). Amsterdam, Netherlands: IEEE, 2021, pp. 2140–2144.doi:10 . 23919 / Eusipco47968 . 2020 . 9287430. 27
2021
-
[44]
Об устойчивости обратных задач
A. N. Tikhonov. “Об устойчивости обратных задач”. In:Dokl. Akad. Nauk SSSR 39 (4 1943), pp. 195–198.url:http://a-server.math.nsc.ru/IPP/BASE_WORK/ tihon_en.html
1943
-
[45]
A. N. Tikhonov and V. Y. Arsenin.Solutions of Ill-Posed Problems. Washington, DC, USA: V. H. Winston & Sons, 1977, pp. xiii+258.doi:10.2307/2006360
-
[46]
Spectra and Pseudospectra
L. N. Trefethen. “Spectra and Pseudospectra”. In:The Graduate Student’s Guide to Numerical Analysis ’98. Springer Series in Computational Mathematics. Berlin, Heidelberg, Germany: Springer, 1999, pp. 217–250.doi:10 . 1007 / 978 - 3 - 662 - 03972-4_6
1999
-
[47]
Transient Effects and Nonnormal Dynamics
L. N. Trefethen and M. Embree. “Transient Effects and Nonnormal Dynamics”. In: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton, NJ, USA: Princeton University Press, 2005, pp. 133–192.doi:doi : 10.1515/9780691213101-006
-
[48]
H. L. Van Trees.Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory. New York, NY, USA: John Wiley & Sons, 2002.doi: 10.1002/0471221104
-
[49]
R. Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press, 2018.isbn: 978-1-108-41519-4.doi: 10.1017/9781108231596
-
[50]
The evaluation of the collision matrix
G. C. Wick. “The evaluation of the collision matrix”. In:Phys. Rev.80 (2 1950), pp. 268–272.doi:10.1103/PhysRev.80.268. A. Algorithms Algorithm 4First-order Riemannian gradient ascent method GivenA∈R m×d andB∈R ℓ×d such thatrank(B) =d. Initializewithv 0 ∈S d−1. fork= 1,2,3, ...do Calculatex k =∇f(v k),selectstep sizeτ k >0,updatev k+1 =R vk(τkxk) end for ...
-
[51]
Furthermore, if∥v−˜v∥< ε, it suffices for anyx∈S d−1 to takeτ= 0 such thatσ d−1(Dv) = 1. Now, since for anyy∈S d−1 andw∈R d \ {0}holds that w ∥w∥ −y ≤ w ∥w∥ −w +∥w−y∥= 1 ∥w∥ −1 ∥w∥+∥w−y∥ = 1− ∥w∥ +∥w−y∥= ∥y∥ − ∥w∥ +∥w−y∥ ≤2∥w−y∥,(30) it suffices to take ˜Dv :={x∈S d−1 | ∃τ∈R, v+τ x∈B ε/2(˜v)∪Bε/2(−˜v)} ⊂ Dv. As the latter can be trivially rewritten ˜Dv ={...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.