Finding Stationary Points by Comparisons

Chenyi Zhang; Helin Wang; Tongyang Li; Xiwen Tao; Yexin Zhang

arxiv: 2606.27082 · v1 · pith:3BUSCQ7Cnew · submitted 2026-06-25 · 💻 cs.LG · cs.DS· math.OC· quant-ph

Finding Stationary Points by Comparisons

Helin Wang , Chenyi Zhang , Xiwen Tao , Yexin Zhang , Tongyang Li This is my paper

Pith reviewed 2026-06-26 05:24 UTC · model grok-4.3

classification 💻 cs.LG cs.DSmath.OCquant-ph

keywords stationary pointscomparison oraclenon-convex optimizationquery complexityHessian estimationquantum algorithmsLipschitz continuity

0 comments

The pith

Comparison oracles suffice to find ε-stationary points of non-convex functions in Õ(n²/ε^{1.5}) queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that stationary points of non-convex functions can be located when the only information available is which of two points has the larger function value. It gives a classical algorithm that reaches an ε-stationary point with Õ(n²/ε^{1.5}) comparison queries for twice differentiable functions whose gradient and Hessian are Lipschitz. The work also supplies the first quantum algorithm for the task, which uses Õ(n/ε^{1.5}) queries. A reader would care because many real-world optimization settings supply only comparative feedback rather than exact values or derivatives.

Core claim

For a twice differentiable f from R^n to R with Lipschitz gradient and Hessian, there exists an algorithm that visits an ε-stationary point using Õ(n²/ε^{1.5}) queries to a comparison oracle. The algorithm rests on a subroutine that estimates the normalized Hessian to accuracy δ with Õ(n² log(1/δ)) queries. A quantum algorithm in the superposition query model achieves the same task with Õ(n/ε^{1.5}) queries.

What carries the argument

The normalized Hessian estimation subroutine that approximates the Hessian direction from pairwise comparison outcomes.

If this is right

Stationary points are reachable without direct access to function values or gradients.
The classical query bound scales quadratically in dimension and as the -3/2 power of the accuracy parameter.
The quantum version reduces the dimension factor from quadratic to linear.
The same subroutine that estimates the normalized Hessian can be reused for other tasks requiring curvature information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend to other oracles that return only ordering information.
Matching lower bounds could be derived to show the stated upper bounds are tight.
The quantum speedup hints at advantages when the dimension is large and comparisons can be superposed.

Load-bearing premise

The function is twice differentiable with Lipschitz continuous gradient and Hessian.

What would settle it

A twice differentiable function with Lipschitz gradient and Hessian for which every comparison-oracle algorithm needs asymptotically more than Õ(n²/ε^{1.5}) queries to reach an ε-stationary point.

Figures

Figures reproduced from arXiv: 2606.27082 by Chenyi Zhang, Helin Wang, Tongyang Li, Xiwen Tao, Yexin Zhang.

**Figure 2.** Figure 2: The intuition of ComparisonHessVec (Algorithm 1) for computing Hessian-vector products using gradient directions. x ∇f(x+ry) ∥∇f(x+ry)∥ ∇f(x−ry) ∥∇f(x−ry)∥ ∇f(x) ∥∇f(x)∥ direction of ∇2 f(x) · y Then, we use ComparisonHessVec to implement ComparisonHE (Algorithm 2), which estimates the normalized Hessian ∇2f(x)/∥∇2f(x)∥. Because a column of ∇2f(x) is given by multiplying a unit vector by ∇2f(x), we can use… view at source ↗

**Figure 3.** Figure 3: The intuition for computing the ratio. By Lemma 25, we have the second-order expansion for some small µ, ∇f(x + µv) = g + µHv + rµ, ∥rµ∥ ≤ 1 2 L2µ 2 . (24) Here we choose v as an eigenvector of Hb corresponding to the eigenvalue λv, which is the largest magnitude eigenvalue satisfying arccos D g1 ∥g1∥ , v E ≥ β and ∥Hbv∥ = |λv| ≥ λ. Otherwise, when finding the stationary point, we apply line search on dire… view at source ↗

read the original abstract

We study the problem of finding stationary points of non-convex functions when access to the objective is provided only through a comparison oracle that, given two points, outputs which has the larger function value. For a twice differentiable $f\colon\mathbb R^n\to\mathbb R$ with Lipschitz gradient and Hessian, we develop an algorithm that visits an $\epsilon$-stationary point using $\widetilde O(n^2/\epsilon^{1.5})$ queries. Our approach uses a subroutine that estimates the normalized Hessian to accuracy $\delta$ using $\widetilde O(n^2\log(1/\delta))$ queries. We further study this problem with a quantum comparison oracle model where queries can be made in superpositions, and develop the first quantum algorithm that finds an $\epsilon$-stationary point, which takes $\widetilde O(n/\epsilon^{1.5})$ queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper states first comparison-oracle bounds for non-convex stationary points but the Hessian estimation subroutine is the weak link.

read the letter

The main takeaway is that this work gives the first stated query bounds for reaching an ε-stationary point of a twice-differentiable non-convex function when you only have a comparison oracle: Õ(n²/ε^{1.5}) classically and Õ(n/ε^{1.5}) in the quantum model.

They set up the model cleanly and route the algorithm through a normalized-Hessian estimation subroutine that uses Õ(n² log(1/δ)) comparisons. That subroutine is presented as the key technical step that lets them improve on naive approaches.

The soft spot is the one the stress-test note flags. Lipschitz gradient and Hessian control local behavior, but it is not obvious how to recover the normalized quadratic form (direction and scale) from pure sign comparisons without extra assumptions such as known bounds on ||∇f||. The abstract does not spell out the concrete procedure, so the overall bound depends on whether that subroutine actually closes under the stated conditions.

This is for people working on query complexity in black-box non-convex optimization. The model is natural and the claimed numbers are concrete, so the paper deserves a serious referee even though the central subroutine needs verification.

Referee Report

1 major / 0 minor

Summary. The paper studies finding ε-stationary points of non-convex functions using only a comparison oracle. For twice differentiable f: R^n → R with Lipschitz gradient and Hessian, it claims a classical algorithm achieving Õ(n²/ε^{1.5}) queries via a subroutine that estimates the normalized Hessian to accuracy δ using Õ(n² log(1/δ)) queries. It also presents the first quantum algorithm in the superposition query model achieving Õ(n/ε^{1.5}) queries.

Significance. If correct, the results establish the first explicit query bounds for stationary-point finding under a comparison oracle, extending non-convex optimization theory to settings where only relative ordering of function values is available. The quantum improvement and the explicit dependence on dimension and smoothness parameters would be notable contributions to oracle complexity.

major comments (1)

[Abstract] Abstract: the claimed Õ(n² log(1/δ)) query bound for estimating the normalized Hessian from comparisons is load-bearing for the overall Õ(n²/ε^{1.5}) classical complexity. Twice differentiability together with Lipschitz gradient and Hessian controls local variation but does not by itself supply a procedure that recovers the quadratic form (or its normalized version) from sign(f(x)−f(y)) queries; additional assumptions on the scale of ||H|| or ||∇f|| appear necessary and are not stated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for raising this point about the Hessian estimation subroutine. We address it directly below.

read point-by-point responses

Referee: [Abstract] Abstract: the claimed Õ(n² log(1/δ)) query bound for estimating the normalized Hessian from comparisons is load-bearing for the overall Õ(n²/ε^{1.5}) classical complexity. Twice differentiability together with Lipschitz gradient and Hessian controls local variation but does not by itself supply a procedure that recovers the quadratic form (or its normalized version) from sign(f(x)−f(y)) queries; additional assumptions on the scale of ||H|| or ||∇f|| appear necessary and are not stated.

Authors: Section 3.2 of the manuscript presents the explicit subroutine. It selects comparison points at distances scaled by the given Lipschitz constants of the gradient and Hessian; these constants directly determine admissible step sizes so that sign(f(x)−f(y)) queries recover directional second-order information. The normalized Hessian is obtained by a separate magnitude estimation step that likewise uses only comparisons and the same Lipschitz bounds. No explicit bounds on ||H|| or ||∇f|| are required because the normalization and scaling are performed internally via the oracle responses. The abstract claim therefore follows from the stated assumptions alone. We will add one clarifying sentence to the abstract describing the role of the Lipschitz constants. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from oracle model

full rationale

The paper presents an algorithmic construction for finding ε-stationary points under a comparison oracle, with query bounds derived directly from twice-differentiability plus Lipschitz gradient/Hessian assumptions. The Hessian estimation subroutine is described as achieving Õ(n² log(1/δ)) queries via the stated oracle model; no equations reduce a claimed prediction to a fitted parameter by construction, no load-bearing self-citation chains appear, and no ansatz or uniqueness result is imported from prior author work. The overall Õ(n²/ε^{1.5}) bound follows from composing the subroutine with standard non-convex optimization steps under the given smoothness, remaining independent of the target result itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the twice-differentiability and Lipschitz conditions stated in the abstract; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption f is twice differentiable with Lipschitz gradient and Hessian
Explicitly stated as the setting in which the algorithms achieve the claimed query bounds.

pith-pipeline@v0.9.1-grok · 5685 in / 1159 out tokens · 19483 ms · 2026-06-26T05:24:02.203822+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 11 linked inside Pith

[1]

Deeksha Adil, Brian Bullins, Aaron Sidford, and Chenyi Zhang,Balancing gradient and Hessian queries in non-convex optimization, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, arXiv:2510.207861

arXiv 2025
[2]

1195–1199, 2017, arXiv:1611.011461

Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma,Finding approximate local minima faster than gradient descent, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199, 2017, arXiv:1611.011461

Pith/arXiv arXiv 2017
[3]

31, 2018, arXiv:1801.029821

Zeyuan Allen-Zhu,How to make the gradients small stochastically: Even faster convex and nonconvex SGD, Advances in Neural Information Processing Systems, vol. 31, 2018, arXiv:1801.029821

arXiv 2018
[4]

Dennis Jr,Mesh adaptive direct search algorithms for constrained optimization, SIAM Journal on Optimization17(2006), no

Charles Audet and John E. Dennis Jr,Mesh adaptive direct search algorithms for constrained optimization, SIAM Journal on Optimization17(2006), no. 1, 188–217. 2

2006
[5]

4, 2726–2749, arXiv:1902.035912, 3, 5

El Houcine Bergou, Eduard Gorbunov, and Peter Richtárik,Stochastic three points method for unconstrained smooth minimization, SIAM Journal on Optimization30(2020), no. 4, 2726–2749, arXiv:1902.035912, 3, 5

Pith/arXiv arXiv 2020
[6]

560–569, PMLR, 2018, arXiv:1802.044341 29

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar, signSGD: Compressed optimisation for non-convex problems, International Conference on Machine Learning, pp. 560–569, PMLR, 2018, arXiv:1802.044341 29

Pith/arXiv arXiv 2018
[7]

Birgin, J.L

Ernesto G. Birgin, J.L. Gardenghi, José Mario Martínez, Sandra Augusta Santos, and Ph.L. Toint,Worst-case evaluation complexity for unconstrained nonlinear optimization us- ing high-order regularized models, Mathematical Programming163(2017), no. 1, 359–368, arXiv:1709.071801

arXiv 2017
[8]

Terry,Rank analysis of incomplete block designs: I

Ralph Allan Bradley and Milton E. Terry,Rank analysis of incomplete block designs: I. the method of paired comparisons, Biometrika39(1952), no. 3/4, 324–345. 5

1952
[9]

Duchi, Oliver Hinder, and Aaron Sidford,Lower bounds for finding stationary points I, Mathematical Programming184(2020), no

Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford,Lower bounds for finding stationary points I, Mathematical Programming184(2020), no. 1, 71–120, arXiv:1710.11606 1, 3

arXiv 2020
[10]

3773–3793, PMLR, 2022, arXiv:2205.11140 2

Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, and Liwei Wang,Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation, International Conference on Machine Learning, pp. 3773–3793, PMLR, 2022, arXiv:2205.11140 2

arXiv 2022
[11]

Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei,Deep reinforcement learning from human preferences, Advances in Neural Information Processing Systems, vol. 30, 2017, arXiv:1706.037411

Pith/arXiv arXiv 2017
[12]

1, 32, arXiv:2309.024125

Nikita Doikov and Geovani Nunes Grapiglia,First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians, Journal of Scientific Computing 103(2025), no. 1, 32, arXiv:2309.024125

arXiv 2025
[13]

713–721, 2025, arXiv:2412.155381

Flint Xiaofeng Fan, Cheston Tan, Yew-Soon Ong, Roger Wattenhofer, and Wei-Tsang Ooi, FedRLHF: A convergence-guaranteed federated framework for privacy-preserving and personalized RLHF, Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, pp. 713–721, 2025, arXiv:2412.155381

arXiv 2025
[14]

Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, and Vaneet Aggarwal,On the global convergence of online RLHF with neural parametrization, 2024, arXiv:2410.156101

arXiv 2024
[15]

Lee, and Tengyu Ma,Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, vol

Rong Ge, Jason D. Lee, and Tengyu Ma,Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, vol. 29, 2016, arXiv:1605.072721, 3

Pith/arXiv arXiv 2016
[16]

Algorithms and Techniques (2015), 829, arXiv:1504.052873

Rong Ge and Tengyu Ma,Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2015), 829, arXiv:1504.052873

Pith/arXiv arXiv 2015
[17]

30, 2017, arXiv:1706.055981

Rong Ge and Tengyu Ma,On the optimization landscape of tensor decompositions, Advances in Neural Information Processing Systems, vol. 30, 2017, arXiv:1706.055981

Pith/arXiv arXiv 2017
[18]

Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, and Qiuyi Zhang, Gradientless descent: High-dimensional zeroth-order optimization, International Conference on Learning Representations, 2020, arXiv:1911.063172

arXiv 2020
[19]

Eduard Gorbunov, Adel Bibi, Ozan Sener, El Houcine Bergou, and Peter Richtarik,A stochastic derivative free optimization method with momentum, International Conference on Learning Representations, 2020, arXiv:1905.132782, 3, 5 30

arXiv 2020
[20]

Jamieson, Robert Nowak, and Ben Recht,Query complexity of derivative-free opti- mization, Advances in Neural Information Processing Systems, vol

Kevin G. Jamieson, Robert Nowak, and Ben Recht,Query complexity of derivative-free opti- mization, Advances in Neural Information Processing Systems, vol. 25, 2012, arXiv:1209.2434 2

Pith/arXiv arXiv 2012
[21]

Karabag, Cyrus Neary, and Ufuk Topcu,Smooth convex optimization using sub- zeroth-order oracles, Proceedings of the AAAI Conference on Artificial Intelligence35(2021), no

Mustafa O. Karabag, Cyrus Neary, and Ufuk Topcu,Smooth convex optimization using sub- zeroth-order oracles, Proceedings of the AAAI Conference on Artificial Intelligence35(2021), no. 5, 3815–3822, arXiv:2103.006672

arXiv 2021
[22]

Kolda, Robert Michael Lewis, and Virginia Torczon,Optimization by direct search: New perspectives on some classical and modern methods, SIAM Review45(2003), no

Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon,Optimization by direct search: New perspectives on some classical and modern methods, SIAM Review45(2003), no. 3, 385–482. 2

2003
[23]

Wild,Derivative-free optimization methods, Acta Numerica28(2019), 287–404, arXiv:904.115852

Jeffrey Larson, Matt Menickelly, and Stefan M. Wild,Derivative-free optimization methods, Acta Numerica28(2019), 287–404, arXiv:904.115852

2019
[24]

Xiuxian Li, Kuo-Yi Lin, Li Li, Yiguang Hong, and Jie Chen,On faster convergence of scaled sign gradient descent, IEEE Transactions on Industrial Informatics, IEEE, 2023, arXiv:2109.01806 1

arXiv 2023
[25]

Lui,Quantum speedups for minimax optimization and beyond, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

Chengchang Liu, Zongqi Wan, Jialin Zhang, Xiaoming Sun, and John C.S. Lui,Quantum speedups for minimax optimization and beyond, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 5

2026
[26]

Sijia Liu, Pin-Yu Chen, Xiangyi Chen, and Mingyi Hong,signSGD via zeroth-order oracle, International Conference on Learning Representations, 2019. 1

2019
[27]

37, 2024, arXiv:2402.090142

Aleksandr Lobanov, Alexander Gasnikov, and Andrei Krasnov,Acceleration exists! Optimization problems when oracle can only compare objective function values, Advances in Neural Information Processing Systems, vol. 37, 2024, arXiv:2402.090142

arXiv 2024
[28]

Wainwright,Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, The Journal of Machine Learning Research16(2015), no

Po-Ling Loh and Martin J. Wainwright,Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, The Journal of Machine Learning Research16(2015), no. 1, 559–616, arXiv:1305.24361, 3

arXiv 2015
[29]

Ken I. M. McKinnon,Convergence of the Nelder–Mead simplex method to a nonstationary point, SIAM Journal on Optimization9(1998), no. 1, 148–158. 2

1998
[30]

Nelder and Roger Mead,A simplex method for function minimization, The Computer Journal7(1965), no

John A. Nelder and Roger Mead,A simplex method for function minimization, The Computer Journal7(1965), no. 4, 308–313. 2

1965
[31]

Mathematical Optimization Society Newsletter88(2012), 10–11

Yurii Nesterov,How to make the gradients small, Optima. Mathematical Optimization Society Newsletter88(2012), 10–11. 1

2012
[32]

87, Springer Science & Business Media, 2013

Yurii Nesterov,Introductory lectures on convex optimization: A basic course, vol. 87, Springer Science & Business Media, 2013. 1

2013
[33]

Program.108(2006), 177–205

Yurii Nesterov and Boris Polyak,Cubic regularization of newton method and its global perfor- mance, Math. Program.108(2006), 177–205. 40

2006
[34]

Polyak,Cubic regularization of Newton method and its global performance, Mathematical Programming108(2006), no

Yurii Nesterov and Boris T. Polyak,Cubic regularization of Newton method and its global performance, Mathematical Programming108(2006), no. 1, 177–205. 3 31

2006
[35]

1029–1038, PMLR, 2020, arXiv:1908.012892

Ellen Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, and Joel Burdick,Dueling posterior sampling for preference-based reinforcement learning, Conference on Uncertainty in Artificial Intelligence, pp. 1029–1038, PMLR, 2020, arXiv:1908.012892

arXiv 2020
[36]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al.,Training language models to follow instructions with human feedback, Advances in Neural Information Processing Systems 35(2022), 27730–27744, arXiv:2203.021552

Pith/arXiv arXiv 2022
[37]

Yuxuan Ren, Abhishek Roy, and Shiqian Ma,Riemannian dueling optimization, 2026, arXiv:2603.000232

arXiv 2026
[38]

Aadirupa Saha, Vitaly Feldman, Yishay Mansour, and Tomer Koren,Faster convergence with multiway preferences, Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 238, pp. 433–441, PMLR, 2024, arXiv:2312.117882

arXiv 2024
[40]

Aadirupa Saha, Tomer Koren, and Yishay Mansour,Dueling convex optimization with general preferences, International Conference on Machine Learning, 2025, arXiv:2210.025622

arXiv 2025
[41]

6263–6289, PMLR, 2023, arXiv:2111.048502

Aadirupa Saha, Aldo Pacchiano, and Jonathan Lee,Dueling RL: Reinforcement learning with trajectory preferences, International Conference on Artificial Intelligence and Statistics, pp. 6263–6289, PMLR, 2023, arXiv:2111.048502

arXiv 2023
[42]

Katya Scheinberg and Zikai Xiong,Function-free optimization via comparison oracles, 2026, arXiv:2604.268672

Pith/arXiv arXiv 2026
[43]

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang,Zeroth-order optimization meets human feedback: Provable learning via ranking oracles, 2023, arXiv:2303.037512

arXiv 2023
[44]

Xiwen Tao, Chenyi Zhang, Helin Wang, Yexin Zhang, and Tongyang Li,Optimal classical and quantum algorithms for gradient testing and estimation by comparisons, 2026, To appear in the Forty-Third International Conference on Machine Learning, arXiv:2405.11454v23, 4, 5, 36

Pith/arXiv arXiv 2026
[45]

Yuanhao Wang, Qinghua Liu, and Chi Jin,Is RLHF more difficult than standard RL? a theoretical perspective, Thirty-seventh Conference on Neural Information Processing Systems, 2023, arXiv:2306.141112

arXiv 2023
[46]

Hermann Von Weyl,Über beschränkte quadratische formen, deren differenz vollstetig ist, Rendi- conti del Circolo Matematico di Palermo (1884-1940)27(1909), 373–392. 39

1940
[47]

Yichong Xu, Ruosong Wang, Lin Yang, Aarti Singh, and Artur Dubrawski,Preference-based reinforcement learning with finite-time guarantees, Advances in Neural Information Processing Systems33(2020), 18784–18794, arXiv:2006.089102

arXiv 2020
[48]

Samworth,A useful variant of the Davis-Kahan theorem for statisticians, Biometrika102(2015), no

Yi Yu, Tengyao Wang, and Richard J. Samworth,A useful variant of the Davis-Kahan theorem for statisticians, Biometrika102(2015), no. 2, 315–323, arXiv:1405.068038

arXiv 2015
[49]

Banghua Zhu, Jiantao Jiao, and Michael Jordan,Principled reinforcement learning with human feedback from pairwise ork-wise comparisons, ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023, arXiv:2301.112702 32

arXiv 2023
[50]

Sorensen,Newton’s method with a model trust region modification, SIAM Journal on Numerical Analysis 19 (1982), no

Danny C. Sorensen,Newton’s method with a model trust region modification, SIAM Journal on Numerical Analysis 19 (1982), no. 2, 409–426. 4 33 A Auxiliary Lemmas In this appendix, we collect all auxiliary lemmas needed for our proofs. A.1 Distance between normalized vectors Lemma 13.Ifv,v ′ ∈R n are two vectors such that∥v∥ ≥γand∥v−v ′∥ ≤τ, we have v ∥v∥ − ...

1982

[1] [1]

Deeksha Adil, Brian Bullins, Aaron Sidford, and Chenyi Zhang,Balancing gradient and Hessian queries in non-convex optimization, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, arXiv:2510.207861

arXiv 2025

[2] [2]

1195–1199, 2017, arXiv:1611.011461

Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma,Finding approximate local minima faster than gradient descent, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199, 2017, arXiv:1611.011461

Pith/arXiv arXiv 2017

[3] [3]

31, 2018, arXiv:1801.029821

Zeyuan Allen-Zhu,How to make the gradients small stochastically: Even faster convex and nonconvex SGD, Advances in Neural Information Processing Systems, vol. 31, 2018, arXiv:1801.029821

arXiv 2018

[4] [4]

Dennis Jr,Mesh adaptive direct search algorithms for constrained optimization, SIAM Journal on Optimization17(2006), no

Charles Audet and John E. Dennis Jr,Mesh adaptive direct search algorithms for constrained optimization, SIAM Journal on Optimization17(2006), no. 1, 188–217. 2

2006

[5] [5]

4, 2726–2749, arXiv:1902.035912, 3, 5

El Houcine Bergou, Eduard Gorbunov, and Peter Richtárik,Stochastic three points method for unconstrained smooth minimization, SIAM Journal on Optimization30(2020), no. 4, 2726–2749, arXiv:1902.035912, 3, 5

Pith/arXiv arXiv 2020

[6] [6]

560–569, PMLR, 2018, arXiv:1802.044341 29

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar, signSGD: Compressed optimisation for non-convex problems, International Conference on Machine Learning, pp. 560–569, PMLR, 2018, arXiv:1802.044341 29

Pith/arXiv arXiv 2018

[7] [7]

Birgin, J.L

Ernesto G. Birgin, J.L. Gardenghi, José Mario Martínez, Sandra Augusta Santos, and Ph.L. Toint,Worst-case evaluation complexity for unconstrained nonlinear optimization us- ing high-order regularized models, Mathematical Programming163(2017), no. 1, 359–368, arXiv:1709.071801

arXiv 2017

[8] [8]

Terry,Rank analysis of incomplete block designs: I

Ralph Allan Bradley and Milton E. Terry,Rank analysis of incomplete block designs: I. the method of paired comparisons, Biometrika39(1952), no. 3/4, 324–345. 5

1952

[9] [9]

Duchi, Oliver Hinder, and Aaron Sidford,Lower bounds for finding stationary points I, Mathematical Programming184(2020), no

Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford,Lower bounds for finding stationary points I, Mathematical Programming184(2020), no. 1, 71–120, arXiv:1710.11606 1, 3

arXiv 2020

[10] [10]

3773–3793, PMLR, 2022, arXiv:2205.11140 2

Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, and Liwei Wang,Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation, International Conference on Machine Learning, pp. 3773–3793, PMLR, 2022, arXiv:2205.11140 2

arXiv 2022

[11] [11]

Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei,Deep reinforcement learning from human preferences, Advances in Neural Information Processing Systems, vol. 30, 2017, arXiv:1706.037411

Pith/arXiv arXiv 2017

[12] [12]

1, 32, arXiv:2309.024125

Nikita Doikov and Geovani Nunes Grapiglia,First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians, Journal of Scientific Computing 103(2025), no. 1, 32, arXiv:2309.024125

arXiv 2025

[13] [13]

713–721, 2025, arXiv:2412.155381

Flint Xiaofeng Fan, Cheston Tan, Yew-Soon Ong, Roger Wattenhofer, and Wei-Tsang Ooi, FedRLHF: A convergence-guaranteed federated framework for privacy-preserving and personalized RLHF, Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, pp. 713–721, 2025, arXiv:2412.155381

arXiv 2025

[14] [14]

Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, and Vaneet Aggarwal,On the global convergence of online RLHF with neural parametrization, 2024, arXiv:2410.156101

arXiv 2024

[15] [15]

Lee, and Tengyu Ma,Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, vol

Rong Ge, Jason D. Lee, and Tengyu Ma,Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, vol. 29, 2016, arXiv:1605.072721, 3

Pith/arXiv arXiv 2016

[16] [16]

Algorithms and Techniques (2015), 829, arXiv:1504.052873

Rong Ge and Tengyu Ma,Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (2015), 829, arXiv:1504.052873

Pith/arXiv arXiv 2015

[17] [17]

30, 2017, arXiv:1706.055981

Rong Ge and Tengyu Ma,On the optimization landscape of tensor decompositions, Advances in Neural Information Processing Systems, vol. 30, 2017, arXiv:1706.055981

Pith/arXiv arXiv 2017

[18] [18]

Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, and Qiuyi Zhang, Gradientless descent: High-dimensional zeroth-order optimization, International Conference on Learning Representations, 2020, arXiv:1911.063172

arXiv 2020

[19] [19]

Eduard Gorbunov, Adel Bibi, Ozan Sener, El Houcine Bergou, and Peter Richtarik,A stochastic derivative free optimization method with momentum, International Conference on Learning Representations, 2020, arXiv:1905.132782, 3, 5 30

arXiv 2020

[20] [20]

Jamieson, Robert Nowak, and Ben Recht,Query complexity of derivative-free opti- mization, Advances in Neural Information Processing Systems, vol

Kevin G. Jamieson, Robert Nowak, and Ben Recht,Query complexity of derivative-free opti- mization, Advances in Neural Information Processing Systems, vol. 25, 2012, arXiv:1209.2434 2

Pith/arXiv arXiv 2012

[21] [21]

Karabag, Cyrus Neary, and Ufuk Topcu,Smooth convex optimization using sub- zeroth-order oracles, Proceedings of the AAAI Conference on Artificial Intelligence35(2021), no

Mustafa O. Karabag, Cyrus Neary, and Ufuk Topcu,Smooth convex optimization using sub- zeroth-order oracles, Proceedings of the AAAI Conference on Artificial Intelligence35(2021), no. 5, 3815–3822, arXiv:2103.006672

arXiv 2021

[22] [22]

Kolda, Robert Michael Lewis, and Virginia Torczon,Optimization by direct search: New perspectives on some classical and modern methods, SIAM Review45(2003), no

Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon,Optimization by direct search: New perspectives on some classical and modern methods, SIAM Review45(2003), no. 3, 385–482. 2

2003

[23] [23]

Wild,Derivative-free optimization methods, Acta Numerica28(2019), 287–404, arXiv:904.115852

Jeffrey Larson, Matt Menickelly, and Stefan M. Wild,Derivative-free optimization methods, Acta Numerica28(2019), 287–404, arXiv:904.115852

2019

[24] [24]

Xiuxian Li, Kuo-Yi Lin, Li Li, Yiguang Hong, and Jie Chen,On faster convergence of scaled sign gradient descent, IEEE Transactions on Industrial Informatics, IEEE, 2023, arXiv:2109.01806 1

arXiv 2023

[25] [25]

Lui,Quantum speedups for minimax optimization and beyond, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

Chengchang Liu, Zongqi Wan, Jialin Zhang, Xiaoming Sun, and John C.S. Lui,Quantum speedups for minimax optimization and beyond, The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 5

2026

[26] [26]

Sijia Liu, Pin-Yu Chen, Xiangyi Chen, and Mingyi Hong,signSGD via zeroth-order oracle, International Conference on Learning Representations, 2019. 1

2019

[27] [27]

37, 2024, arXiv:2402.090142

Aleksandr Lobanov, Alexander Gasnikov, and Andrei Krasnov,Acceleration exists! Optimization problems when oracle can only compare objective function values, Advances in Neural Information Processing Systems, vol. 37, 2024, arXiv:2402.090142

arXiv 2024

[28] [28]

Wainwright,Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, The Journal of Machine Learning Research16(2015), no

Po-Ling Loh and Martin J. Wainwright,Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, The Journal of Machine Learning Research16(2015), no. 1, 559–616, arXiv:1305.24361, 3

arXiv 2015

[29] [29]

Ken I. M. McKinnon,Convergence of the Nelder–Mead simplex method to a nonstationary point, SIAM Journal on Optimization9(1998), no. 1, 148–158. 2

1998

[30] [30]

Nelder and Roger Mead,A simplex method for function minimization, The Computer Journal7(1965), no

John A. Nelder and Roger Mead,A simplex method for function minimization, The Computer Journal7(1965), no. 4, 308–313. 2

1965

[31] [31]

Mathematical Optimization Society Newsletter88(2012), 10–11

Yurii Nesterov,How to make the gradients small, Optima. Mathematical Optimization Society Newsletter88(2012), 10–11. 1

2012

[32] [32]

87, Springer Science & Business Media, 2013

Yurii Nesterov,Introductory lectures on convex optimization: A basic course, vol. 87, Springer Science & Business Media, 2013. 1

2013

[33] [33]

Program.108(2006), 177–205

Yurii Nesterov and Boris Polyak,Cubic regularization of newton method and its global perfor- mance, Math. Program.108(2006), 177–205. 40

2006

[34] [34]

Polyak,Cubic regularization of Newton method and its global performance, Mathematical Programming108(2006), no

Yurii Nesterov and Boris T. Polyak,Cubic regularization of Newton method and its global performance, Mathematical Programming108(2006), no. 1, 177–205. 3 31

2006

[35] [35]

1029–1038, PMLR, 2020, arXiv:1908.012892

Ellen Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, and Joel Burdick,Dueling posterior sampling for preference-based reinforcement learning, Conference on Uncertainty in Artificial Intelligence, pp. 1029–1038, PMLR, 2020, arXiv:1908.012892

arXiv 2020

[36] [36]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al.,Training language models to follow instructions with human feedback, Advances in Neural Information Processing Systems 35(2022), 27730–27744, arXiv:2203.021552

Pith/arXiv arXiv 2022

[37] [37]

Yuxuan Ren, Abhishek Roy, and Shiqian Ma,Riemannian dueling optimization, 2026, arXiv:2603.000232

arXiv 2026

[38] [38]

Aadirupa Saha, Vitaly Feldman, Yishay Mansour, and Tomer Koren,Faster convergence with multiway preferences, Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 238, pp. 433–441, PMLR, 2024, arXiv:2312.117882

arXiv 2024

[39] [40]

Aadirupa Saha, Tomer Koren, and Yishay Mansour,Dueling convex optimization with general preferences, International Conference on Machine Learning, 2025, arXiv:2210.025622

arXiv 2025

[40] [41]

6263–6289, PMLR, 2023, arXiv:2111.048502

Aadirupa Saha, Aldo Pacchiano, and Jonathan Lee,Dueling RL: Reinforcement learning with trajectory preferences, International Conference on Artificial Intelligence and Statistics, pp. 6263–6289, PMLR, 2023, arXiv:2111.048502

arXiv 2023

[41] [42]

Katya Scheinberg and Zikai Xiong,Function-free optimization via comparison oracles, 2026, arXiv:2604.268672

Pith/arXiv arXiv 2026

[42] [43]

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang,Zeroth-order optimization meets human feedback: Provable learning via ranking oracles, 2023, arXiv:2303.037512

arXiv 2023

[43] [44]

Xiwen Tao, Chenyi Zhang, Helin Wang, Yexin Zhang, and Tongyang Li,Optimal classical and quantum algorithms for gradient testing and estimation by comparisons, 2026, To appear in the Forty-Third International Conference on Machine Learning, arXiv:2405.11454v23, 4, 5, 36

Pith/arXiv arXiv 2026

[44] [45]

Yuanhao Wang, Qinghua Liu, and Chi Jin,Is RLHF more difficult than standard RL? a theoretical perspective, Thirty-seventh Conference on Neural Information Processing Systems, 2023, arXiv:2306.141112

arXiv 2023

[45] [46]

Hermann Von Weyl,Über beschränkte quadratische formen, deren differenz vollstetig ist, Rendi- conti del Circolo Matematico di Palermo (1884-1940)27(1909), 373–392. 39

1940

[46] [47]

Yichong Xu, Ruosong Wang, Lin Yang, Aarti Singh, and Artur Dubrawski,Preference-based reinforcement learning with finite-time guarantees, Advances in Neural Information Processing Systems33(2020), 18784–18794, arXiv:2006.089102

arXiv 2020

[47] [48]

Samworth,A useful variant of the Davis-Kahan theorem for statisticians, Biometrika102(2015), no

Yi Yu, Tengyao Wang, and Richard J. Samworth,A useful variant of the Davis-Kahan theorem for statisticians, Biometrika102(2015), no. 2, 315–323, arXiv:1405.068038

arXiv 2015

[48] [49]

Banghua Zhu, Jiantao Jiao, and Michael Jordan,Principled reinforcement learning with human feedback from pairwise ork-wise comparisons, ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023, arXiv:2301.112702 32

arXiv 2023

[49] [50]

Sorensen,Newton’s method with a model trust region modification, SIAM Journal on Numerical Analysis 19 (1982), no

Danny C. Sorensen,Newton’s method with a model trust region modification, SIAM Journal on Numerical Analysis 19 (1982), no. 2, 409–426. 4 33 A Auxiliary Lemmas In this appendix, we collect all auxiliary lemmas needed for our proofs. A.1 Distance between normalized vectors Lemma 13.Ifv,v ′ ∈R n are two vectors such that∥v∥ ≥γand∥v−v ′∥ ≤τ, we have v ∥v∥ − ...

1982