Recognition: unknown
From Cursed to Competitive: Closing the ZO-FO Gap via Input-to-State Stability
Pith reviewed 2026-05-07 15:45 UTC · model grok-4.3
The pith
Zeroth-order methods match first-order convergence rates in expectation when their averaged dynamics satisfy input-to-state stability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under several conditions, in expectation, ZO methods do not suffer from extra dimension dependencies in their convergence rates with respect to their FO counterparts. The average of a ZO algorithm is formulated as the average of its FO counterpart with bounded perturbations whose values depend only on design parameters. Using input-to-state stability properties, ZO methods follow the same decay rate as their FO counterparts and converge to a neighbourhood of the fixed point of FO methods, where its radius depends on the bound of the norm of the perturbations, which can be made arbitrarily small.
What carries the argument
Input-to-state stability applied to the dynamical system formed by expressing the averaged zeroth-order iteration as the first-order iteration plus a bounded perturbation that depends only on algorithm parameters.
If this is right
- ZO algorithms inherit the same linear or sublinear rate as their FO versions once the transient due to the perturbation dies out.
- The final error ball around the FO solution can be driven below any tolerance by tightening the perturbation bound through parameter selection.
- Convergence guarantees already proved for FO methods extend immediately to the corresponding ZO methods up to the controllable neighborhood.
- The result holds in expectation over the random perturbations introduced by finite-difference or random-direction estimators.
- The same stability argument applies to any pair of ZO and FO recursions whose difference can be bounded uniformly in the state.
Where Pith is reading between the lines
- The framework could be used to certify robustness of ZO methods against other structured disturbances such as adversarial noise or model mismatch.
- Parameter tuning rules that minimize the perturbation bound might be derived directly from the ISS gain function rather than by trial and error.
- The approach suggests that similar perturbation-plus-stability arguments could close gaps between other derivative-free methods and their gradient-based analogues.
- Extending the analysis from deterministic to stochastic or non-smooth settings would require checking whether the perturbation bound remains uniform under those relaxations.
Load-bearing premise
The averaged trajectory of the zeroth-order algorithm can be written exactly as the first-order trajectory plus a perturbation whose magnitude is bounded by quantities that depend only on design parameters, and the resulting closed-loop system is input-to-state stable.
What would settle it
A high-dimensional quadratic test problem where the observed convergence rate of a standard zeroth-order gradient estimator deviates from the first-order rate by more than the predicted perturbation radius after the transient has died, with the deviation growing with dimension.
Figures
read the original abstract
While it is generally understood that zeroth-order (ZO) algorithms have an extra dependency on their number of iterations for any choice of parameters, compared to their first-order (FO) counterparts, in this work, we show that under several conditions, in expectation, ZO methods do not suffer from extra dimension dependencies in their convergence rates with respect to their FO counterparts. We look at optimisation algorithms from the dynamical systems perspective and analyse the conditions under which one can formulate the average of a ZO algorithm as the average of its FO counterpart with bounded perturbations with values dependent on design parameters. Then, using input-to-state stability properties, we show ZO methods follow the same decay rate as their FO counterparts and converge to a neighbourhood of the fixed point of FO methods, where its radius depends on the bound of the norm of the perturbations, which can be made arbitrarily small. The theoretical findings are illustrated via numerical examples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that, under several conditions, zeroth-order (ZO) optimization algorithms do not incur extra dimension dependencies in their expected convergence rates relative to first-order (FO) counterparts. Viewing the algorithms through a dynamical-systems lens, the averaged ZO iteration is expressed as the corresponding FO iteration plus a perturbation whose norm is bounded solely by design parameters (e.g., smoothing radius). Input-to-state stability (ISS) of the averaged FO system is then invoked to conclude that ZO trajectories follow the same decay rate and converge to a neighborhood of the FO fixed point whose radius can be made arbitrarily small by tightening the perturbation bound. Numerical examples are provided to illustrate the findings.
Significance. If the stated conditions hold and the perturbation bound is shown to be dimension-independent, the result would be significant: it offers a principled way to close the well-known ZO-FO gap in iteration complexity for high-dimensional problems where gradients are unavailable. The dynamical-systems/ISS framework is a strength and could generalize beyond the algorithms treated. The work earns credit for attempting a parameter-dependent perturbation analysis rather than relying on ad-hoc bounds.
major comments (2)
- [Abstract] Abstract: the claim that ZO methods 'do not suffer from extra dimension dependencies' is load-bearing, yet the abstract supplies neither the explicit conditions nor the bias derivation showing that the perturbation norm depends only on design parameters and is free of residual d-dependence; without these steps the central assertion cannot be verified.
- [Dynamical-systems formulation] The weakest assumption (that the averaged ZO map equals the FO map plus a bounded perturbation) must be derived explicitly for each algorithm considered; the manuscript should supply the bias calculation (presumably in the dynamical-systems section) and confirm that the bound contains no hidden dimension factors.
minor comments (2)
- [Abstract] The abstract would be clearer if it named the concrete ZO algorithms (e.g., ZO-GD, ZO-SGD) to which the ISS argument is applied.
- [Numerical examples] Numerical examples should report both the observed convergence rate and the size of the limiting neighborhood as functions of the smoothing radius, allowing direct comparison with the ISS-derived bound.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the abstract should more explicitly state the conditions and that the bias derivations need to be supplied in full detail in the dynamical-systems section. We will make these revisions to strengthen the presentation of our results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that ZO methods 'do not suffer from extra dimension dependencies' is load-bearing, yet the abstract supplies neither the explicit conditions nor the bias derivation showing that the perturbation norm depends only on design parameters and is free of residual d-dependence; without these steps the central assertion cannot be verified.
Authors: We acknowledge the point. While the abstract refers to 'several conditions' and 'bounded perturbations with values dependent on design parameters,' it does not enumerate them or outline the bias. In the revised version, we will modify the abstract to list the primary conditions (Lipschitz smoothness of the objective function, a fixed positive smoothing radius, and taking expectation with respect to the random directions in the ZO estimator) and indicate that the perturbation bound is obtained from the expectation of the ZO gradient estimator bias, which is independent of the dimension d. The explicit bias derivation will be included in the main text. revision: yes
-
Referee: [Dynamical-systems formulation] The weakest assumption (that the averaged ZO map equals the FO map plus a bounded perturbation) must be derived explicitly for each algorithm considered; the manuscript should supply the bias calculation (presumably in the dynamical-systems section) and confirm that the bound contains no hidden dimension factors.
Authors: We agree that this assumption is central and should be derived explicitly. The manuscript currently presents the averaged ZO iteration as the FO iteration plus a perturbation in the dynamical-systems formulation, but we will add a new subsection with the complete calculation of the bias for each algorithm (starting with the standard Gaussian smoothing ZO gradient). Using a Taylor expansion of the objective, we derive that the expected perturbation is bounded by a constant times the smoothing radius, where the constant depends on the gradient Lipschitz constant but not on the dimension. We will include the proof that there are no hidden d factors and state the precise bound. This will be done for all algorithms analyzed in the paper. revision: yes
Circularity Check
No significant circularity
full rationale
The derivation formulates the expected ZO iteration as the FO iteration plus a perturbation whose bound depends only on design parameters (smoothing radius etc.), then invokes standard external ISS theorems to conclude that the ZO trajectory tracks the FO fixed point within a radius controlled by that bound. No step equates a derived quantity to its own input by definition, renames a fitted parameter as a prediction, or relies on a self-citation whose content is itself unverified. The ISS property is an independent dynamical-systems fact; the bias bound is derived from the smoothing operator and does not presuppose the convergence claim. The chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The expected ZO update equals the FO update plus a perturbation whose norm is bounded by a quantity that depends only on design parameters.
- domain assumption The resulting dynamical system satisfies input-to-state stability with respect to the perturbation input.
Reference graph
Works this paper leans on
-
[1]
Minimisation of submodular functions using gaussian zeroth-order random oracles
[FPB+25b] Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, Tyler Summers, and Iman Shames. Minimisation of submodular functions using gaussian zeroth-order random oracles. arXiv preprint arXiv:2510.15257,
-
[2]
10 [FPB+26] Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, Tyler Summers, and Iman Shames. Solving the offline and online min-max problem of non-smooth submodular-concave functions: A zeroth-order approach.arXiv preprint arXiv:2601.21243,
-
[3]
arXiv preprint arXiv:2505.02281 , year=
[FPBS25a] Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, and Iman Shames. Minimisa- tion of quasar-convex functions using random zeroth-order oracles.arXiv preprint arXiv:2505.02281,
-
[4]
[FPBS26] Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, and Iman Shames. On the stability connection between discrete-time algorithms and their resolution odes: Applications to min-max optimisation.arXiv preprint arXiv:2603.01430,
-
[5]
[SHC+17] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolu- tion strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864,
-
[6]
Input to state stability: Basic concepts and results
[Son08] Eduardo D Sontag. Input to state stability: Basic concepts and results. InNonlinear and optimal control theory: lectures given at the CIME summer school held in Cetraro, Italy June 19–29, 2004, pages 163–220. Springer,
2004
-
[7]
Zeroth-order algorithms for nonconvex minimax problems with improved complexities
[WBMR20] Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, and Meisam Razaviyayn. Zeroth-order algorithms for nonconvex minimax problems with improved complexities. arXiv preprint arXiv:2001.07819,
-
[8]
[YHF+18] Haishan Ye, Zhichao Huang, Cong Fang, Chris Junchi Li, and Tong Zhang. Hessian- aware zeroth-order optimization for black-box adversarial attack.arXiv preprint arXiv:1812.11377,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.