Function-free Optimization via Comparison Oracles

Katya Scheinberg; Zikai Xiong

arxiv: 2604.26867 · v2 · pith:EONCAKGCnew · submitted 2026-04-29 · 🧮 math.OC

Function-free Optimization via Comparison Oracles

Katya Scheinberg , Zikai Xiong This is my paper

Pith reviewed 2026-05-21 00:03 UTC · model grok-4.3

classification 🧮 math.OC

keywords comparison oraclepreference optimizationlevel-set optimality gapnormal direction estimationfunction-free optimizationconvex optimizationzeroth-order methods

0 comments

The pith

Optimization using only pairwise preference comparisons reaches an ε level-set gap in Õ(d D²/ε²) queries by estimating normal directions to level sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a way to solve optimization when the only available information is answers to which of two points is preferred, without any objective function or derivatives. It defines the level-set optimality gap as how far a current set of equally preferred points lies from the best feasible points and introduces the regularity radius as a certificate of stationarity. Under a regularity assumption on the preference relation in d-dimensional space, an approximate normal direction to the boundary of the preferred set can be recovered to accuracy ε using O(d log(d/ε)) comparisons. Adding convexity and a local growth condition on the regularity radius lets a normal-direction descent procedure reduce the gap to ε using O(D²/ε²) estimation steps and Õ(d D²/ε²) total comparisons, where D is the distance from the starting point to the optimum; this matches known lower bounds for methods that use normal directions.

Core claim

Under regularity of the preference relation in a d-dimensional Euclidean space, normal directions to the boundary of the current preferred set can be estimated to accuracy ε using O(d log(d/ε)) comparisons. When the preference is additionally convex and obeys a local growth condition on the regularity radius, repeated normal-direction descent steps reach an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons over O(D²/ε²) steps, where D is the distance from the initial point to the optimal solutions; this matches the lower bound of Ω(D²/ε²) steps for normal-direction-span methods.

What carries the argument

Normal direction estimation from a comparison oracle, which constructs an approximate normal vector to the boundary of the current preference level set to serve as a descent direction without requiring any function values.

If this is right

The method applies even when no underlying objective function exists or when that function is nonsmooth, nonconvex, or discontinuous.
The query complexity nearly matches information-theoretic lower bounds for comparison-based methods.
Adaptive schemes remove the need to know the distance D or target accuracy ε in advance while preserving the same asymptotic bounds up to logs.
The framework directly addresses preference- and ranking-based applications where numeric objective values are unavailable or meaningless.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric view of level sets could extend to noisy or stochastic comparisons by repeating queries to control error.
Similar normal-direction ideas might apply to other non-numeric oracles such as ranking or tournament feedback in learning settings.
The approach suggests a purely comparative route to zeroth-order methods that could be tested on real user-preference data sets.

Load-bearing premise

The preference relation must be regular, convex, and satisfy a local growth condition on the regularity radius near the optimum.

What would settle it

A concrete preference relation in which estimating a normal direction to accuracy ε requires more than O(d log(d/ε)) comparisons, or in which normal-direction descent requires asymptotically more than O(D²/ε²) steps to reach an ε level-set gap.

Figures

Figures reproduced from arXiv: 2604.26867 by Katya Scheinberg, Zikai Xiong.

**Figure 2.** Figure 2: Illustration of the outward unit normal direction [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration examples of Lemma 3.1 for nonconvex sublevel sets Sx (left two plots) and Lemma 3.2 for convex sublevel sets Sx (right one plot). outward orientation. In this subsection, we work with the scheme of fixed comparison radius h, while the adaptive comparison radius case will be presented in Section 3.3. Algorithm 1 solves the planar subproblem in span{a, b} and returns an approximately tangential … view at source ↗

read the original abstract

In this work, we study optimization specified only through a comparison oracle: given two points, it reports which one is preferred. We call it function-free optimization because we do not assume access to, nor the existence of, a canonical application-given objective function. The goal is to find the most preferred feasible point, which we call the optimal solution. This model arises in preference- and ranking-based settings where objective values and derivatives are unavailable or meaningless. Even when a representative function exists, it may be nonsmooth, nonconvex, or discontinuous. We develop an analytical and algorithmic framework based on the geometry of preference level sets, which remains well-defined from comparisons alone. We introduce the level-set optimality gap, the distance from a preference level set to the optimal solutions, and the regularity radius, a stationarity certificate. Under regularity of the preference relation in a $d$-dimensional Euclidean space, we estimate normal directions to accuracy $\epsilon$ using $O(d\log(d/\epsilon))$ comparisons, nearly matching a lower bound of $\Omega(d\log(1/\epsilon))$. Under convexity, regularity, and a local growth condition on the regularity radius, the resulting normal direction descent method reaches an $\epsilon$ level-set optimality gap using at most $\widetilde O(dD^2/\epsilon^2)$ comparisons over $O(D^2/\epsilon^2)$ normal direction estimation steps, where $D$ is the distance from the initial point to the optimal solutions. This number of steps matches the lower bound of $\Omega(D^2/\epsilon^2)$ for normal direction span-based methods. Since prior knowledge in practical applications is usually limited, we also develop adaptive schemes for estimating the normal direction and solving the optimization problem. They match the fixed-parameter complexity bounds up to logarithmic factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sets up a geometric framework for optimization from comparison oracles alone, with normal-direction bounds that nearly match lower bounds under regularity, convexity, and a growth condition.

read the letter

The main takeaway is that the authors treat optimization as finding the most preferred point using only pairwise comparisons, by working directly with the geometry of preference level sets instead of any underlying function. They define a level-set optimality gap to track distance to the optimum and a regularity radius as a stationarity measure. Under a regularity assumption on the preference relation, they show normal directions can be estimated to accuracy ε with O(d log(d/ε)) comparisons, close to the Ω(d log(1/ε)) lower bound. Adding convexity and a local growth condition on the regularity radius, their normal-direction descent method reaches an ε gap in Õ(d D²/ε²) comparisons over O(D²/ε²) steps, matching the lower bound for that style of method. Adaptive versions that avoid knowing parameters in advance are included as well. This is useful because many preference and ranking settings in machine learning have no natural numerical objective, and the approach stays consistent with the comparison model even when the preference is nonsmooth or discontinuous. The near-matching bounds indicate the analysis is reasonably tight. The assumptions are the clearest limitation: regularity, convexity, and the local growth condition on the regularity radius are required for all the stated rates, and the paper is explicit that the guarantees fail without them. How often these hold for actual preference data is not addressed in detail, which limits immediate practical claims. No numerical tests appear in the summary either. This work is aimed at theorists studying oracle models and derivative-free optimization, plus researchers building systems that learn from pairwise feedback. A reader interested in new geometric tools for complexity analysis would get value from the level-set ideas and the tight bounds. It deserves a serious referee because the framework is internally consistent and the results are substantive within the subfield. I would send it out for review.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a function-free optimization framework that operates exclusively via a comparison oracle reporting which of two points is preferred. It introduces the level-set optimality gap (distance from a preference level set to the optimal solutions) and the regularity radius (a stationarity certificate). Under regularity of the preference relation in d-dimensional Euclidean space, normal directions are estimated to accuracy ε using O(d log(d/ε)) comparisons, nearly matching a lower bound of Ω(d log(1/ε)). Under convexity, regularity, and a local growth condition on the regularity radius, a normal direction descent method reaches an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons over O(D²/ε²) steps, where D is the distance from the initial point to the optima; this iteration count matches the Ω(D²/ε²) lower bound for normal-direction-span methods. Adaptive schemes are also developed that match the fixed-parameter bounds up to logarithmic factors.

Significance. If the stated geometric arguments and complexity derivations hold, the work offers a meaningful contribution to preference-based and ranking-driven optimization by supplying explicit, near-optimal complexity guarantees without requiring an explicit objective function. The matching of both the normal-direction estimation bound and the overall iteration count to corresponding lower bounds is a clear strength, as is the development of adaptive variants. The level-set geometry provides a well-defined alternative to classical stationarity notions when only comparisons are available.

major comments (2)

[§3] §3 (Normal Direction Estimation): The O(d log(d/ε)) comparison bound for estimating the normal direction to accuracy ε is presented as a direct consequence of the regularity assumption on the preference relation, but the manuscript provides only a high-level geometric argument without an explicit derivation or lemma establishing how the sequence of comparisons produces the stated accuracy; this step is load-bearing for both the estimation result and the subsequent optimization complexity.
[§4.2] §4.2 (Normal Direction Descent): The local growth condition on the regularity radius is invoked to obtain the O(D²/ε²) iteration bound and the Õ(d D²/ε²) comparison bound, yet the manuscript does not include a concrete verification or example showing that this condition holds for a non-trivial class of convex preference relations beyond the abstract statement; without this, the claimed iteration complexity remains conditional on an unverified hypothesis.

minor comments (3)

[§2] The definition of the regularity radius is introduced in §2 but is referenced in the complexity statements of §3 and §4 without a brief reminder of its quantitative meaning, which would aid readability.
Notation for the level-set optimality gap is used consistently but could be accompanied by a short table comparing it to classical optimality measures (e.g., function-value gap or gradient norm) to clarify its relation to existing literature.
[§5] The adaptive schemes in §5 are stated to match the fixed-parameter bounds up to log factors, but the precise logarithmic overhead is not quantified in the theorem statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment of the geometric framework and complexity results. We address each major comment below and will incorporate revisions to improve rigor and clarity.

read point-by-point responses

Referee: [§3] §3 (Normal Direction Estimation): The O(d log(d/ε)) comparison bound for estimating the normal direction to accuracy ε is presented as a direct consequence of the regularity assumption on the preference relation, but the manuscript provides only a high-level geometric argument without an explicit derivation or lemma establishing how the sequence of comparisons produces the stated accuracy; this step is load-bearing for both the estimation result and the subsequent optimization complexity.

Authors: We thank the referee for this observation. The current presentation relies on a geometric argument derived from the regularity assumption to obtain the normal-direction estimation bound. We agree that an explicit lemma would strengthen the exposition and make the load-bearing step fully rigorous. In the revised manuscript we will add a dedicated lemma in Section 3 that specifies the comparison sequence, invokes the regularity condition to control the angular error, and derives the O(d log(d/ε)) guarantee together with its relation to the matching lower bound. This addition will also clarify the link to the optimization results that follow. revision: yes
Referee: [§4.2] §4.2 (Normal Direction Descent): The local growth condition on the regularity radius is invoked to obtain the O(D²/ε²) iteration bound and the Õ(d D²/ε²) comparison bound, yet the manuscript does not include a concrete verification or example showing that this condition holds for a non-trivial class of convex preference relations beyond the abstract statement; without this, the claimed iteration complexity remains conditional on an unverified hypothesis.

Authors: We appreciate the referee highlighting the need for concrete verification of the local growth condition. The condition is formulated to encompass a natural family of convex preference relations, yet we recognize that an illustrative example would make the applicability of the O(D²/ε²) bound more transparent. In the revision we will include a new example (in Section 4 or an appendix) of a non-trivial convex preference relation—such as one induced by a smooth, strictly convex quadratic-like function—for which the local growth condition on the regularity radius can be verified explicitly. This will demonstrate that the stated iteration and comparison bounds are attained for a concrete, non-degenerate case. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives its complexity bounds directly from external geometric assumptions on the preference relation (regularity, convexity, and local growth condition on the regularity radius). These hypotheses are stated as prerequisites rather than being defined in terms of the target quantities. The normal-direction estimation bound O(d log(d/ε)) and the overall Õ(d D²/ε²) comparison bound are presented as consequences of the geometry, with explicit matching to lower bounds under the same assumptions. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 2 invented entities

The central claims rest on three domain assumptions about the preference relation that are not derived from comparisons alone.

axioms (3)

domain assumption Regularity of the preference relation in d-dimensional Euclidean space
Invoked to guarantee that normal directions can be estimated to accuracy ε with the stated number of comparisons.
domain assumption Convexity of the preference relation
Required for the normal direction descent method to make progress toward the optimal level set.
domain assumption Local growth condition on the regularity radius
Used to obtain the Õ(D²/ε²) iteration bound; without it the complexity guarantee does not hold.

invented entities (2)

level-set optimality gap no independent evidence
purpose: Distance from a preference level set to the optimal solutions
Newly defined quantity that serves as the convergence measure.
regularity radius no independent evidence
purpose: Stationarity certificate based on the geometry of preference level sets
Introduced as a new concept to certify progress from comparisons alone.

pith-pipeline@v0.9.0 · 5850 in / 1541 out tokens · 54241 ms · 2026-05-21T00:03:45.453104+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under regularity of the preference relation … we estimate normal directions to accuracy ε using O(d log(d/ε)) comparisons … normal direction descent method reaches an ε level-set optimality gap using at most Õ(d D²/ε²) comparisons
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking (D=3 forcing) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

regularity radius r_x … two-sided tangent balls … curvature κ_x = 1/r_x

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

Stochastic three points method for uncon- strained smooth minimization.SIAM Journal on Optimization, 30(4):2726–2749, 2020

El Houcine Bergou, Eduard Gorbunov, and Peter Richt´ arik. Stochastic three points method for uncon- strained smooth minimization.SIAM Journal on Optimization, 30(4):2726–2749, 2020

work page 2020
[2]

A one-bit, comparison-based gradient estimator.Applied and Computational Harmonic Analysis, 60:242–266, 2022

HanQin Cai, Daniel McKenzie, Wotao Yin, and Zhenliang Zhang. A one-bit, comparison-based gradient estimator.Applied and Computational Harmonic Analysis, 60:242–266, 2022

work page 2022
[3]

ComPO: Preference alignment via comparison oracles

Peter Chen, Xi Chen, Wotao Yin, and Tianyi Lin. ComPO: Preference alignment via comparison oracles. InAdvances in Neural Information Processing Systems, volume 38, 2025

work page 2025
[4]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[5]

Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026

Taha El Bakkali, El Mahdi Chayti, and Omar Saadi. Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026

work page arXiv 2026
[6]

Springer-Verlag, Berlin, Heidelberg, New York, 1969

Herbert Federer.Geometric Measure Theory, volume 153 ofGrundlehren der mathematischen Wis- senschaften. Springer-Verlag, Berlin, Heidelberg, New York, 1969

work page 1969
[7]

Preference-based rein- forcement learning: A formal framework and a policy iteration algorithm.Machine Learning, 89(1– 2):123–156, 2012

Johannes F¨ urnkranz, Eyke H¨ ullermeier, Weiwei Cheng, and Sang-Hyeun Park. Preference-based rein- forcement learning: A formal framework and a policy iteration algorithm.Machine Learning, 89(1– 2):123–156, 2012

work page 2012
[8]

Gradi- entless descent: High-dimensional zeroth-order optimization

Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, and Qiuyi Zhang. Gradi- entless descent: High-dimensional zeroth-order optimization. InInternational Conference on Learning Representations, 2020

work page 2020
[9]

Lawrence

Javier Gonz´ alez, Zhenwen Dai, Andreas Damianou, and Neil D. Lawrence. Preferential Bayesian op- timization. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1282–1291. PMLR, 2017

work page 2017
[10]

Royer, Lu´ ıs N

Serge Gratton, Cl´ ement W. Royer, Lu´ ıs N. Vicente, and Zaikun Zhang. Direct search based on proba- bilistic descent.SIAM Journal on Optimization, 25(3):1515–1541, 2015

work page 2015
[11]

Beyond convexity: Stochastic quasi-convex optimiza- tion

Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimiza- tion. InAdvances in Neural Information Processing Systems, volume 28, 2015

work page 2015
[12]

Direct Search

Robert Hooke and T. A. Jeeves. “Direct Search” Solution of Numerical and Statistical Problems. Journal of the ACM, 8(2):212–229, 1961

work page 1961
[13]

Jamieson, Robert D

Kevin G. Jamieson, Robert D. Nowak, and Benjamin Recht. Query complexity of derivative-free opti- mization. InAdvances in Neural Information Processing Systems, volume 25, 2012

work page 2012
[14]

An improved cutting plane method for convex optimization, convex-concave games, and its applications

Haotian Jiang, Yin Tat Lee, Zhao Song, and Sam Chiu-wai Wong. An improved cutting plane method for convex optimization, convex-concave games, and its applications. InProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 944–953. ACM, 2020

work page 2020
[15]

Karabag, Cyrus Neary, and Ufuk Topcu

Mustafa O. Karabag, Cyrus Neary, and Ufuk Topcu. Smooth convex optimization using sub-zeroth-order oracles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3815–3822, 2021. 42

work page 2021
[16]

Khachiyan

Leonid G. Khachiyan. Polynomial algorithms in linear programming.USSR Computational Mathematics and Mathematical Physics, 20(1):53–72, 1980

work page 1980
[17]

Kolda, Robert Michael Lewis, and Virginia Torczon

Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon. Optimization by direct search: New perspectives on some classical and modern methods.SIAM Review, 45(3):385–482, 2003

work page 2003
[18]

Lagarias, James A

Jeffrey C. Lagarias, James A. Reeds, Margaret H. Wright, and Paul E. Wright. Convergence properties of the Nelder–Mead simplex method in low dimensions.SIAM Journal on Optimization, 9(1):112–147, 1998

work page 1998
[19]

A faster cutting plane method and its implications for combinatorial and convex optimization

Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. InProceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, pages 1049–1065. IEEE, 2015

work page 2015
[20]

Kfir Y. Levy. The power of normalization: Faster evasion of saddle points.arXiv preprint arXiv:1611.04831, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Kfir Y. Levy. Online to offline conversions, universality and adaptive minibatch sizes. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[22]

Acceleration exists! optimization prob- lems when oracle can only compare objective function values

Aleksandr Lobanov, Alexander Gasnikov, and Andrei Krasnov. Acceleration exists! optimization prob- lems when oracle can only compare objective function values. InAdvances in Neural Information Processing Systems, volume 37, pages 21428–21460, 2024

work page 2024
[23]

Ken I. M. McKinnon. Convergence of the Nelder–Mead simplex method to a nonstationary point.SIAM Journal on Optimization, 9(1):148–158, 1998

work page 1998
[24]

Leveraging non-uniformity in first-order non-convex optimization

Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, and Dale Schuurmans. Leveraging non-uniformity in first-order non-convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 7555–7564. PMLR, 2021

work page 2021
[25]

Revisiting normalized gradient descent: Fast evasion of saddle points.IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019

Ryan Murray, Brian Swenson, and Soummya Kar. Revisiting normalized gradient descent: Fast evasion of saddle points.IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019

work page 2019
[26]

Nelder and Roger Mead

John A. Nelder and Roger Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965

work page 1965
[27]

Kluwer Academic Publishers, 2004

Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course, volume 87 ofApplied Optimization. Kluwer Academic Publishers, 2004

work page 2004
[28]

Springer, Cham, 2nd edition, 2018

Yurii Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Appli- cations. Springer, Cham, 2nd edition, 2018

work page 2018
[29]

Nesterov

Francesco Orabona. Normalized gradients for all.arXiv preprint arXiv:2308.05621, 2023

work page arXiv 2023
[30]

Coin betting and parameter-free online learning

Francesco Orabona and D´ avid P´ al. Coin betting and parameter-free online learning. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[31]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedba...

work page 2022
[32]

Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026

Yuxuan Ren, Abhishek Roy, and Shiqian Ma. Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026

work page arXiv 2026
[33]

Faster convergence with multiway preferences

Aadirupa Saha, Vitaly Feldman, Yishay Mansour, and Tomer Koren. Faster convergence with multiway preferences. InInternational Conference on Artificial Intelligence and Statistics, pages 433–441. PMLR, 2024

work page 2024
[34]

Dueling convex optimization

Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 9245–9254. PMLR, 2021. 43

work page 2021
[35]

Dueling convex optimization with general pref- erences

Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization with general pref- erences. InProceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 52552–52564. PMLR, 2025

work page 2025
[36]

Springer, Berlin, Heidelberg, 1985

Naum Zuselevich Shor.Minimization Methods for Non-Differentiable Functions, volume 3 ofSpringer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1985

work page 1985
[37]

Zeroth-order optimization meets human feedback: Provable learning via ranking oracles

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang. Zeroth-order optimization meets human feedback: Provable learning via ranking oracles. InInternational Conference on Learning Representations, 2024

work page 2024
[39]

On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997

Virginia Torczon. On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997

work page 1997
[40]

Burdick, and Aaron D

Maegan Tucker, Ellen Novoseller, Claudia Kann, Yanan Sui, Yisong Yue, Joel W. Burdick, and Aaron D. Ames. Preference-based learning for exoskeleton gait optimization. InProceedings of the IEEE Inter- national Conference on Robotics and Automation, pages 2351–2357, 2020

work page 2020
[41]

Lu´ ıs N. Vicente. Worst case complexity of direct search.EURO Journal on Computational Optimization, 1(1–2):143–153, 2013

work page 2013
[42]

The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

work page 2012
[43]

Interactively optimizing information retrieval systems as a dueling bandits problem

Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. InProceedings of the 26th International Conference on Machine Learning, pages 1201–1208, 2009

work page 2009
[44]

Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454v1 (Version 1), 2024

Chenyi Zhang and Tongyang Li. Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454v1 (Version 1), 2024

work page arXiv 2024
[45]

Zixi Zhao, Aliya Ahmadi, Caleb Hoover, Logan Grado, Nicholas Peterson, Xinran Wang, David Free- man, Thomas Murray, Andrew Lamperski, David Darrow, and Theoden I. Netoff. Optimization of spinal cord stimulation using Bayesian preference learning and its validation.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1987–1997, 2021. 44

work page 1987

[1] [1]

Stochastic three points method for uncon- strained smooth minimization.SIAM Journal on Optimization, 30(4):2726–2749, 2020

El Houcine Bergou, Eduard Gorbunov, and Peter Richt´ arik. Stochastic three points method for uncon- strained smooth minimization.SIAM Journal on Optimization, 30(4):2726–2749, 2020

work page 2020

[2] [2]

A one-bit, comparison-based gradient estimator.Applied and Computational Harmonic Analysis, 60:242–266, 2022

HanQin Cai, Daniel McKenzie, Wotao Yin, and Zhenliang Zhang. A one-bit, comparison-based gradient estimator.Applied and Computational Harmonic Analysis, 60:242–266, 2022

work page 2022

[3] [3]

ComPO: Preference alignment via comparison oracles

Peter Chen, Xi Chen, Wotao Yin, and Tianyi Lin. ComPO: Preference alignment via comparison oracles. InAdvances in Neural Information Processing Systems, volume 38, 2025

work page 2025

[4] [4]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017

[5] [5]

Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026

Taha El Bakkali, El Mahdi Chayti, and Omar Saadi. Nonsmooth optimization with zeroth order comparison feedback.arXiv preprint arXiv:2602.05622, 2026

work page arXiv 2026

[6] [6]

Springer-Verlag, Berlin, Heidelberg, New York, 1969

Herbert Federer.Geometric Measure Theory, volume 153 ofGrundlehren der mathematischen Wis- senschaften. Springer-Verlag, Berlin, Heidelberg, New York, 1969

work page 1969

[7] [7]

Preference-based rein- forcement learning: A formal framework and a policy iteration algorithm.Machine Learning, 89(1– 2):123–156, 2012

Johannes F¨ urnkranz, Eyke H¨ ullermeier, Weiwei Cheng, and Sang-Hyeun Park. Preference-based rein- forcement learning: A formal framework and a policy iteration algorithm.Machine Learning, 89(1– 2):123–156, 2012

work page 2012

[8] [8]

Gradi- entless descent: High-dimensional zeroth-order optimization

Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, and Qiuyi Zhang. Gradi- entless descent: High-dimensional zeroth-order optimization. InInternational Conference on Learning Representations, 2020

work page 2020

[9] [9]

Lawrence

Javier Gonz´ alez, Zhenwen Dai, Andreas Damianou, and Neil D. Lawrence. Preferential Bayesian op- timization. InProceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1282–1291. PMLR, 2017

work page 2017

[10] [10]

Royer, Lu´ ıs N

Serge Gratton, Cl´ ement W. Royer, Lu´ ıs N. Vicente, and Zaikun Zhang. Direct search based on proba- bilistic descent.SIAM Journal on Optimization, 25(3):1515–1541, 2015

work page 2015

[11] [11]

Beyond convexity: Stochastic quasi-convex optimiza- tion

Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimiza- tion. InAdvances in Neural Information Processing Systems, volume 28, 2015

work page 2015

[12] [12]

Direct Search

Robert Hooke and T. A. Jeeves. “Direct Search” Solution of Numerical and Statistical Problems. Journal of the ACM, 8(2):212–229, 1961

work page 1961

[13] [13]

Jamieson, Robert D

Kevin G. Jamieson, Robert D. Nowak, and Benjamin Recht. Query complexity of derivative-free opti- mization. InAdvances in Neural Information Processing Systems, volume 25, 2012

work page 2012

[14] [14]

An improved cutting plane method for convex optimization, convex-concave games, and its applications

Haotian Jiang, Yin Tat Lee, Zhao Song, and Sam Chiu-wai Wong. An improved cutting plane method for convex optimization, convex-concave games, and its applications. InProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 944–953. ACM, 2020

work page 2020

[15] [15]

Karabag, Cyrus Neary, and Ufuk Topcu

Mustafa O. Karabag, Cyrus Neary, and Ufuk Topcu. Smooth convex optimization using sub-zeroth-order oracles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3815–3822, 2021. 42

work page 2021

[16] [16]

Khachiyan

Leonid G. Khachiyan. Polynomial algorithms in linear programming.USSR Computational Mathematics and Mathematical Physics, 20(1):53–72, 1980

work page 1980

[17] [17]

Kolda, Robert Michael Lewis, and Virginia Torczon

Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon. Optimization by direct search: New perspectives on some classical and modern methods.SIAM Review, 45(3):385–482, 2003

work page 2003

[18] [18]

Lagarias, James A

Jeffrey C. Lagarias, James A. Reeds, Margaret H. Wright, and Paul E. Wright. Convergence properties of the Nelder–Mead simplex method in low dimensions.SIAM Journal on Optimization, 9(1):112–147, 1998

work page 1998

[19] [19]

A faster cutting plane method and its implications for combinatorial and convex optimization

Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. InProceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, pages 1049–1065. IEEE, 2015

work page 2015

[20] [20]

Kfir Y. Levy. The power of normalization: Faster evasion of saddle points.arXiv preprint arXiv:1611.04831, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Kfir Y. Levy. Online to offline conversions, universality and adaptive minibatch sizes. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017

[22] [22]

Acceleration exists! optimization prob- lems when oracle can only compare objective function values

Aleksandr Lobanov, Alexander Gasnikov, and Andrei Krasnov. Acceleration exists! optimization prob- lems when oracle can only compare objective function values. InAdvances in Neural Information Processing Systems, volume 37, pages 21428–21460, 2024

work page 2024

[23] [23]

Ken I. M. McKinnon. Convergence of the Nelder–Mead simplex method to a nonstationary point.SIAM Journal on Optimization, 9(1):148–158, 1998

work page 1998

[24] [24]

Leveraging non-uniformity in first-order non-convex optimization

Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, and Dale Schuurmans. Leveraging non-uniformity in first-order non-convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 7555–7564. PMLR, 2021

work page 2021

[25] [25]

Revisiting normalized gradient descent: Fast evasion of saddle points.IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019

Ryan Murray, Brian Swenson, and Soummya Kar. Revisiting normalized gradient descent: Fast evasion of saddle points.IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019

work page 2019

[26] [26]

Nelder and Roger Mead

John A. Nelder and Roger Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965

work page 1965

[27] [27]

Kluwer Academic Publishers, 2004

Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course, volume 87 ofApplied Optimization. Kluwer Academic Publishers, 2004

work page 2004

[28] [28]

Springer, Cham, 2nd edition, 2018

Yurii Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Appli- cations. Springer, Cham, 2nd edition, 2018

work page 2018

[29] [29]

Nesterov

Francesco Orabona. Normalized gradients for all.arXiv preprint arXiv:2308.05621, 2023

work page arXiv 2023

[30] [30]

Coin betting and parameter-free online learning

Francesco Orabona and D´ avid P´ al. Coin betting and parameter-free online learning. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[31] [31]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedba...

work page 2022

[32] [32]

Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026

Yuxuan Ren, Abhishek Roy, and Shiqian Ma. Riemannian dueling optimization.arXiv preprint arXiv:2603.00023, 2026

work page arXiv 2026

[33] [33]

Faster convergence with multiway preferences

Aadirupa Saha, Vitaly Feldman, Yishay Mansour, and Tomer Koren. Faster convergence with multiway preferences. InInternational Conference on Artificial Intelligence and Statistics, pages 433–441. PMLR, 2024

work page 2024

[34] [34]

Dueling convex optimization

Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 9245–9254. PMLR, 2021. 43

work page 2021

[35] [35]

Dueling convex optimization with general pref- erences

Aadirupa Saha, Tomer Koren, and Yishay Mansour. Dueling convex optimization with general pref- erences. InProceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 52552–52564. PMLR, 2025

work page 2025

[36] [36]

Springer, Berlin, Heidelberg, 1985

Naum Zuselevich Shor.Minimization Methods for Non-Differentiable Functions, volume 3 ofSpringer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1985

work page 1985

[37] [37]

Zeroth-order optimization meets human feedback: Provable learning via ranking oracles

Zhiwei Tang, Dmitry Rybin, and Tsung-Hui Chang. Zeroth-order optimization meets human feedback: Provable learning via ranking oracles. InInternational Conference on Learning Representations, 2024

work page 2024

[38] [39]

On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997

Virginia Torczon. On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25, 1997

work page 1997

[39] [40]

Burdick, and Aaron D

Maegan Tucker, Ellen Novoseller, Claudia Kann, Yanan Sui, Yisong Yue, Joel W. Burdick, and Aaron D. Ames. Preference-based learning for exoskeleton gait optimization. InProceedings of the IEEE Inter- national Conference on Robotics and Automation, pages 2351–2357, 2020

work page 2020

[40] [41]

Lu´ ıs N. Vicente. Worst case complexity of direct search.EURO Journal on Computational Optimization, 1(1–2):143–153, 2013

work page 2013

[41] [42]

The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The K-armed dueling bandits problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

work page 2012

[42] [43]

Interactively optimizing information retrieval systems as a dueling bandits problem

Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. InProceedings of the 26th International Conference on Machine Learning, pages 1201–1208, 2009

work page 2009

[43] [44]

Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454v1 (Version 1), 2024

Chenyi Zhang and Tongyang Li. Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454v1 (Version 1), 2024

work page arXiv 2024

[44] [45]

Zixi Zhao, Aliya Ahmadi, Caleb Hoover, Logan Grado, Nicholas Peterson, Xinran Wang, David Free- man, Thomas Murray, Andrew Lamperski, David Darrow, and Theoden I. Netoff. Optimization of spinal cord stimulation using Bayesian preference learning and its validation.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1987–1997, 2021. 44

work page 1987