Function-free Optimization via Comparison Oracles
Pith reviewed 2026-05-21 00:03 UTC · model grok-4.3
The pith
Optimization using only pairwise preference comparisons reaches an ε level-set gap in Õ(d D²/ε²) queries by estimating normal directions to level sets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under regularity of the preference relation in a d-dimensional Euclidean space, normal directions to the boundary of the current preferred set can be estimated to accuracy ε using O(d log(d/ε)) comparisons. When the preference is additionally convex and obeys a local growth condition on the regularity radius, repeated normal-direction descent steps reach an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons over O(D²/ε²) steps, where D is the distance from the initial point to the optimal solutions; this matches the lower bound of Ω(D²/ε²) steps for normal-direction-span methods.
What carries the argument
Normal direction estimation from a comparison oracle, which constructs an approximate normal vector to the boundary of the current preference level set to serve as a descent direction without requiring any function values.
If this is right
- The method applies even when no underlying objective function exists or when that function is nonsmooth, nonconvex, or discontinuous.
- The query complexity nearly matches information-theoretic lower bounds for comparison-based methods.
- Adaptive schemes remove the need to know the distance D or target accuracy ε in advance while preserving the same asymptotic bounds up to logs.
- The framework directly addresses preference- and ranking-based applications where numeric objective values are unavailable or meaningless.
Where Pith is reading between the lines
- The geometric view of level sets could extend to noisy or stochastic comparisons by repeating queries to control error.
- Similar normal-direction ideas might apply to other non-numeric oracles such as ranking or tournament feedback in learning settings.
- The approach suggests a purely comparative route to zeroth-order methods that could be tested on real user-preference data sets.
Load-bearing premise
The preference relation must be regular, convex, and satisfy a local growth condition on the regularity radius near the optimum.
What would settle it
A concrete preference relation in which estimating a normal direction to accuracy ε requires more than O(d log(d/ε)) comparisons, or in which normal-direction descent requires asymptotically more than O(D²/ε²) steps to reach an ε level-set gap.
Figures
read the original abstract
In this work, we study optimization specified only through a comparison oracle: given two points, it reports which one is preferred. We call it function-free optimization because we do not assume access to, nor the existence of, a canonical application-given objective function. The goal is to find the most preferred feasible point, which we call the optimal solution. This model arises in preference- and ranking-based settings where objective values and derivatives are unavailable or meaningless. Even when a representative function exists, it may be nonsmooth, nonconvex, or discontinuous. We develop an analytical and algorithmic framework based on the geometry of preference level sets, which remains well-defined from comparisons alone. We introduce the level-set optimality gap, the distance from a preference level set to the optimal solutions, and the regularity radius, a stationarity certificate. Under regularity of the preference relation in a $d$-dimensional Euclidean space, we estimate normal directions to accuracy $\epsilon$ using $O(d\log(d/\epsilon))$ comparisons, nearly matching a lower bound of $\Omega(d\log(1/\epsilon))$. Under convexity, regularity, and a local growth condition on the regularity radius, the resulting normal direction descent method reaches an $\epsilon$ level-set optimality gap using at most $\widetilde O(dD^2/\epsilon^2)$ comparisons over $O(D^2/\epsilon^2)$ normal direction estimation steps, where $D$ is the distance from the initial point to the optimal solutions. This number of steps matches the lower bound of $\Omega(D^2/\epsilon^2)$ for normal direction span-based methods. Since prior knowledge in practical applications is usually limited, we also develop adaptive schemes for estimating the normal direction and solving the optimization problem. They match the fixed-parameter complexity bounds up to logarithmic factors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a function-free optimization framework that operates exclusively via a comparison oracle reporting which of two points is preferred. It introduces the level-set optimality gap (distance from a preference level set to the optimal solutions) and the regularity radius (a stationarity certificate). Under regularity of the preference relation in d-dimensional Euclidean space, normal directions are estimated to accuracy ε using O(d log(d/ε)) comparisons, nearly matching a lower bound of Ω(d log(1/ε)). Under convexity, regularity, and a local growth condition on the regularity radius, a normal direction descent method reaches an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons over O(D²/ε²) steps, where D is the distance from the initial point to the optima; this iteration count matches the Ω(D²/ε²) lower bound for normal-direction-span methods. Adaptive schemes are also developed that match the fixed-parameter bounds up to logarithmic factors.
Significance. If the stated geometric arguments and complexity derivations hold, the work offers a meaningful contribution to preference-based and ranking-driven optimization by supplying explicit, near-optimal complexity guarantees without requiring an explicit objective function. The matching of both the normal-direction estimation bound and the overall iteration count to corresponding lower bounds is a clear strength, as is the development of adaptive variants. The level-set geometry provides a well-defined alternative to classical stationarity notions when only comparisons are available.
major comments (2)
- [§3] §3 (Normal Direction Estimation): The O(d log(d/ε)) comparison bound for estimating the normal direction to accuracy ε is presented as a direct consequence of the regularity assumption on the preference relation, but the manuscript provides only a high-level geometric argument without an explicit derivation or lemma establishing how the sequence of comparisons produces the stated accuracy; this step is load-bearing for both the estimation result and the subsequent optimization complexity.
- [§4.2] §4.2 (Normal Direction Descent): The local growth condition on the regularity radius is invoked to obtain the O(D²/ε²) iteration bound and the Õ(d D²/ε²) comparison bound, yet the manuscript does not include a concrete verification or example showing that this condition holds for a non-trivial class of convex preference relations beyond the abstract statement; without this, the claimed iteration complexity remains conditional on an unverified hypothesis.
minor comments (3)
- [§2] The definition of the regularity radius is introduced in §2 but is referenced in the complexity statements of §3 and §4 without a brief reminder of its quantitative meaning, which would aid readability.
- Notation for the level-set optimality gap is used consistently but could be accompanied by a short table comparing it to classical optimality measures (e.g., function-value gap or gradient norm) to clarify its relation to existing literature.
- [§5] The adaptive schemes in §5 are stated to match the fixed-parameter bounds up to log factors, but the precise logarithmic overhead is not quantified in the theorem statements.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the positive assessment of the geometric framework and complexity results. We address each major comment below and will incorporate revisions to improve rigor and clarity.
read point-by-point responses
-
Referee: [§3] §3 (Normal Direction Estimation): The O(d log(d/ε)) comparison bound for estimating the normal direction to accuracy ε is presented as a direct consequence of the regularity assumption on the preference relation, but the manuscript provides only a high-level geometric argument without an explicit derivation or lemma establishing how the sequence of comparisons produces the stated accuracy; this step is load-bearing for both the estimation result and the subsequent optimization complexity.
Authors: We thank the referee for this observation. The current presentation relies on a geometric argument derived from the regularity assumption to obtain the normal-direction estimation bound. We agree that an explicit lemma would strengthen the exposition and make the load-bearing step fully rigorous. In the revised manuscript we will add a dedicated lemma in Section 3 that specifies the comparison sequence, invokes the regularity condition to control the angular error, and derives the O(d log(d/ε)) guarantee together with its relation to the matching lower bound. This addition will also clarify the link to the optimization results that follow. revision: yes
-
Referee: [§4.2] §4.2 (Normal Direction Descent): The local growth condition on the regularity radius is invoked to obtain the O(D²/ε²) iteration bound and the Õ(d D²/ε²) comparison bound, yet the manuscript does not include a concrete verification or example showing that this condition holds for a non-trivial class of convex preference relations beyond the abstract statement; without this, the claimed iteration complexity remains conditional on an unverified hypothesis.
Authors: We appreciate the referee highlighting the need for concrete verification of the local growth condition. The condition is formulated to encompass a natural family of convex preference relations, yet we recognize that an illustrative example would make the applicability of the O(D²/ε²) bound more transparent. In the revision we will include a new example (in Section 4 or an appendix) of a non-trivial convex preference relation—such as one induced by a smooth, strictly convex quadratic-like function—for which the local growth condition on the regularity radius can be verified explicitly. This will demonstrate that the stated iteration and comparison bounds are attained for a concrete, non-degenerate case. revision: yes
Circularity Check
No significant circularity
full rationale
The paper derives its complexity bounds directly from external geometric assumptions on the preference relation (regularity, convexity, and local growth condition on the regularity radius). These hypotheses are stated as prerequisites rather than being defined in terms of the target quantities. The normal-direction estimation bound O(d log(d/ε)) and the overall Õ(d D²/ε²) comparison bound are presented as consequences of the geometry, with explicit matching to lower bounds under the same assumptions. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Regularity of the preference relation in d-dimensional Euclidean space
- domain assumption Convexity of the preference relation
- domain assumption Local growth condition on the regularity radius
invented entities (2)
-
level-set optimality gap
no independent evidence
-
regularity radius
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under regularity of the preference relation … we estimate normal directions to accuracy ε using O(d log(d/ε)) comparisons … normal direction descent method reaches an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forcing) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
regularity radius r_x … two-sided tangent balls … curvature κ_x = 1/r_x
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
El Houcine Bergou, Eduard Gorbunov, and Peter Richt´ arik. Stochastic three points method for uncon- strained smooth minimization.SIAM Journal on Optimization, 30(4):2726–2749, 2020
work page 2020
-
[2]
HanQin Cai, Daniel McKenzie, Wotao Yin, and Zhenliang Zhang. A one-bit, comparison-based gradient estimator.Applied and Computational Harmonic Analysis, 60:242–266, 2022
work page 2022
-
[3]
ComPO: Preference alignment via comparison oracles
Peter Chen, Xi Chen, Wotao Yin, and Tianyi Lin. ComPO: Preference alignment via comparison oracles. InAdvances in Neural Information Processing Systems, volume 38, 2025
work page 2025
-
[4]
Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017
work page 2017
-
[5]
Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026
Taha El Bakkali, El Mahdi Chayti, and Omar Saadi. Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026
-
[6]
Springer-Verlag, Berlin, Heidelberg, New York, 1969
Herbert Federer.Geometric Measure Theory, volume 153 ofGrundlehren der mathematischen Wis- senschaften. Springer-Verlag, Berlin, Heidelberg, New York, 1969
work page 1969
-
[7]
Johannes F¨ urnkranz, Eyke H¨ ullermeier, Weiwei Cheng, and Sang-Hyeun Park. Preference-based rein- forcement learning: A formal framework and a policy iteration algorithm.Machine Learning, 89(1– 2):123–156, 2012
work page 2012
-
[8]
Gradi- entless descent: High-dimensional zeroth-order optimization
Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, and Qiuyi Zhang. Gradi- entless descent: High-dimensional zeroth-order optimization. InInternational Conference on Learning Representations, 2020
work page 2020
- [9]
-
[10]
Serge Gratton, Cl´ ement W. Royer, Lu´ ıs N. Vicente, and Zaikun Zhang. Direct search based on proba- bilistic descent.SIAM Journal on Optimization, 25(3):1515–1541, 2015
work page 2015
-
[11]
Beyond convexity: Stochastic quasi-convex optimiza- tion
Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimiza- tion. InAdvances in Neural Information Processing Systems, volume 28, 2015
work page 2015
-
[12]
Robert Hooke and T. A. Jeeves. “Direct Search” Solution of Numerical and Statistical Problems. Journal of the ACM, 8(2):212–229, 1961
work page 1961
-
[13]
Kevin G. Jamieson, Robert D. Nowak, and Benjamin Recht. Query complexity of derivative-free opti- mization. InAdvances in Neural Information Processing Systems, volume 25, 2012
work page 2012
-
[14]
An improved cutting plane method for convex optimization, convex-concave games, and its applications
Haotian Jiang, Yin Tat Lee, Zhao Song, and Sam Chiu-wai Wong. An improved cutting plane method for convex optimization, convex-concave games, and its applications. InProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 944–953. ACM, 2020
work page 2020
-
[15]
Karabag, Cyrus Neary, and Ufuk Topcu
Mustafa O. Karabag, Cyrus Neary, and Ufuk Topcu. Smooth convex optimization using sub-zeroth-order oracles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3815–3822, 2021. 42
work page 2021
- [16]
-
[17]
Kolda, Robert Michael Lewis, and Virginia Torczon
Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon. Optimization by direct search: New perspectives on some classical and modern methods.SIAM Review, 45(3):385–482, 2003
work page 2003
-
[18]
Jeffrey C. Lagarias, James A. Reeds, Margaret H. Wright, and Paul E. Wright. Convergence properties of the Nelder–Mead simplex method in low dimensions.SIAM Journal on Optimization, 9(1):112–147, 1998
work page 1998
-
[19]
A faster cutting plane method and its implications for combinatorial and convex optimization
Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. InProceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, pages 1049–1065. IEEE, 2015
work page 2015
-
[20]
Kfir Y. Levy. The power of normalization: Faster evasion of saddle points.arXiv preprint arXiv:1611.04831, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
Kfir Y. Levy. Online to offline conversions, universality and adaptive minibatch sizes. InAdvances in Neural Information Processing Systems, volume 30, 2017
work page 2017
-
[22]
Acceleration exists! optimization prob- lems when oracle can only compare objective function values
Aleksandr Lobanov, Alexander Gasnikov, and Andrei Krasnov. Acceleration exists! optimization prob- lems when oracle can only compare objective function values. InAdvances in Neural Information Processing Systems, volume 37, pages 21428–21460, 2024
work page 2024
-
[23]
Ken I. M. McKinnon. Convergence of the Nelder–Mead simplex method to a nonstationary point.SIAM Journal on Optimization, 9(1):148–158, 1998
work page 1998
-
[24]
Leveraging non-uniformity in first-order non-convex optimization
Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, and Dale Schuurmans. Leveraging non-uniformity in first-order non-convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 7555–7564. PMLR, 2021
work page 2021
-
[25]
Ryan Murray, Brian Swenson, and Soummya Kar. Revisiting normalized gradient descent: Fast evasion of saddle points.IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019
work page 2019
-
[26]
John A. Nelder and Roger Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965
work page 1965
-
[27]
Kluwer Academic Publishers, 2004
Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course, volume 87 ofApplied Optimization. Kluwer Academic Publishers, 2004
work page 2004
-
[28]
Springer, Cham, 2nd edition, 2018
Yurii Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Appli- cations. Springer, Cham, 2nd edition, 2018
work page 2018
- [29]
-
[30]
Coin betting and parameter-free online learning
Francesco Orabona and D´ avid P´ al. Coin betting and parameter-free online learning. InAdvances in Neural Information Processing Systems, volume 29, 2016
work page 2016
-
[31]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedba...
work page 2022
-
[32]
Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026
Yuxuan Ren, Abhishek Roy, and Shiqian Ma. Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026
-
[33]
Faster convergence with multiway preferences
Aadirupa Saha, Vitaly Feldman, Yishay Mansour, and Tomer Koren. Faster convergence with multiway preferences. InInternational Conference on Artificial Intelligence and Statistics, pages 433–441. PMLR, 2024
work page 2024
-
[34]
Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 9245–9254. PMLR, 2021. 43
work page 2021
-
[35]
Dueling convex optimization with general pref- erences
Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization with general pref- erences. InProceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 52552–52564. PMLR, 2025
work page 2025
-
[36]
Springer, Berlin, Heidelberg, 1985
Naum Zuselevich Shor.Minimization Methods for Non-Differentiable Functions, volume 3 ofSpringer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1985
work page 1985
-
[37]
Zeroth-order optimization meets human feedback: Provable learning via ranking oracles
Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang. Zeroth-order optimization meets human feedback: Provable learning via ranking oracles. InInternational Conference on Learning Representations, 2024
work page 2024
-
[39]
On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997
Virginia Torczon. On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997
work page 1997
-
[40]
Maegan Tucker, Ellen Novoseller, Claudia Kann, Yanan Sui, Yisong Yue, Joel W. Burdick, and Aaron D. Ames. Preference-based learning for exoskeleton gait optimization. InProceedings of the IEEE Inter- national Conference on Robotics and Automation, pages 2351–2357, 2020
work page 2020
-
[41]
Lu´ ıs N. Vicente. Worst case complexity of direct search.EURO Journal on Computational Optimization, 1(1–2):143–153, 2013
work page 2013
-
[42]
The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012
Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012
work page 2012
-
[43]
Interactively optimizing information retrieval systems as a dueling bandits problem
Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. InProceedings of the 26th International Conference on Machine Learning, pages 1201–1208, 2009
work page 2009
-
[44]
Chenyi Zhang and Tongyang Li. Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454v1 (Version 1), 2024
-
[45]
Zixi Zhao, Aliya Ahmadi, Caleb Hoover, Logan Grado, Nicholas Peterson, Xinran Wang, David Free- man, Thomas Murray, Andrew Lamperski, David Darrow, and Theoden I. Netoff. Optimization of spinal cord stimulation using Bayesian preference learning and its validation.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1987–1997, 2021. 44
work page 1987
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.