pith. sign in

arxiv: 2605.31346 · v1 · pith:6Z5DC6Y6new · submitted 2026-05-29 · 🧮 math.OC · cs.LG

Wall-Clock Complexity for Zeroth-Order Optimization with Tunable Oracle Fidelity

Pith reviewed 2026-06-28 21:18 UTC · model grok-4.3

classification 🧮 math.OC cs.LG
keywords zeroth-order optimizationwall-clock complexitytunable fidelityblack-box optimizationaccelerated methodsoracle noise modelsconstant fidelity
0
0 comments X

The pith

In zeroth-order optimization with tunable-fidelity oracles, accelerated methods can take more wall-clock time than non-accelerated methods to reach a target accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when the accuracy of each black-box query can be adjusted at a computational cost, the total time to optimize depends on choosing both the algorithm and the fidelity levels carefully. A sympathetic reader would care because many real-world problems rely on expensive simulations where faster convergence in iterations does not always mean faster in actual time. The work derives explicit rules for when to use acceleration versus simpler methods and when to keep fidelity constant across queries.

Core claim

The authors establish an accuracy-aware wall-clock model in which each query at fidelity δ costs c(δ), and the goal is to minimize total time subject to reaching a target accuracy. They demonstrate that accelerated zeroth-order methods can be wall-clock inferior to non-accelerated schemes. They further characterize the conditions under which a constant fidelity strategy is optimal in the Big-O sense, providing a way to convert standard convergence guarantees into practical recommendations for fidelity and batch sizes.

What carries the argument

The accuracy-aware wall-clock complexity model that accounts for query cost c(δ) as a function of fidelity δ and translates convergence rates into total time bounds.

If this is right

  • Accelerated methods may require higher total wall-clock time than non-accelerated ones for the same accuracy target.
  • Constant fidelity across all queries is optimal in the Big-O sense under specific conditions on the cost and noise models.
  • The optimal algorithmic parameters, including batch sizes and fidelities, are explicitly determined by the oracle type and noise model.
  • The framework allows direct translation of iteration complexity into wall-clock recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In practice this could mean testing non-accelerated methods first when simulation costs increase sharply with accuracy.
  • The model might extend to adaptive fidelity choices that change during optimization, though the paper focuses on fixed strategies.
  • It raises the question of how to estimate the cost function c(δ) in real applications without prior knowledge.

Load-bearing premise

The cost function relating fidelity to computational cost and the specific noise model are known in advance and follow the functional forms assumed in the derivations.

What would settle it

Measuring the actual total simulation time for both accelerated and non-accelerated methods on a tunable-fidelity simulator and finding that the accelerated method always finishes faster would falsify the claim of wall-clock inferiority.

Figures

Figures reproduced from arXiv: 2605.31346 by Alexander Gasnikov, Alexandra Suvorikova, Anastasia Antsiferova, Artem Vasin, Darina Dvinskikh, Georgii Bychkov, Igor Pavlov.

Figure 1
Figure 1. Figure 1: Convergence of the accelerated ZO-SGD on the synthetic supervised-PageRank instance. Left: trailing-mean [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Wall-clock cost vs. fidelity δ at γ = 2, ε = 0.05. At small δ the cost scales as δ −2 ; the optimum sits just below the divergence threshold, matching the boundary behaviour of (19). Larger B shifts the threshold right. Experiment 3 (flat batching at γ = 2). Fixing δ ∈ {0.02, 0.1, 0.5} and sweeping B ∈ {1, . . . , 32} (with the step size scaled by ηB ∝ 1/(δ/√ B)) yields [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 3
Figure 3. Figure 3: Effect of batch size B at three fidelities, γ = 2. At δ = 0.02 cost grows linearly in B. At δ = 0.5 (boundary), B = 1 does not converge (no marker), and cost is flat-to-helpful for B ∈ {2, 4, 8, 16} — overbatching at the predicted boundary. 0.5 1.0 1.5 2.0 2.5 3.0 oracle-cost exponent 3 × 10 3 4 × 10 3 jointly-o ptim al cost T ( ) T vs.\ ( = 0.05) theory threshold = 2 0.5 1.0 1.5 2.0 2.5 3.0 oracle-cost ex… view at source ↗
Figure 4
Figure 4. Figure 4: Phase transition in γ. Left: jointly optimized cost T ⋆ (γ). Centre: δ ⋆ migrates from ≈ 0.03 at γ = 0.5 to 1 as γ crosses the threshold. Right: B⋆ jumps from 1 to 4 at the same transition (overbatching). 7 Discussion and limitations Discussion. Our framework reveals that the oracle-cost exponent γ behaves differently depending on the noise model. Under adversarial noise, it acts as a template selector: a … view at source ↗
Figure 5
Figure 5. Figure 5: Simple custom CNN, MNIST, batch size – 2048 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: ResNet-18, CIFAR10, batch size – 512 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of the wall-clock time of GM and FGM on ridge regression over the UCI Superconductivity [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of GM and FGM on the quadratic objective. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of the wall-clock time of GM and FGM on regularized logistic regression. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of fidelity allocation strategies on each iteration [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of convergence in function residual under different noise models. [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
read the original abstract

Zeroth-order (black-box) optimization is applied when gradients are unavailable and objective evaluations rely on expensive simulations. In many such applications, the oracle fidelity is tunable: higher-accuracy queries reduce noise but incur higher computational costs. To capture this trade-off, we study an accuracy-aware wall-clock model where each query with fidelity $\delta$ has a cost $c(\delta)$, and we minimize the total time $T_{\mathrm{total}} = \sum_{k=1}^{N} c(\delta_k)$, subject to a target accuracy constraint. We show how the choice of oracle type, noise model, and optimization scheme induces explicit wall-clock-optimal choices for the algorithmic parameters. For instance, we demonstrate that accelerated methods can be wall-clock inferior to non-accelerated schemes. Furthermore, we characterize the conditions under which a constant fidelity strategy is optimal in the Big-O sense. Our framework provides a unified methodology to translate convergence guarantees into practical fidelity and batching recommendations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper develops a wall-clock complexity model for zeroth-order optimization with tunable oracle fidelity δ. Each query at fidelity δ incurs cost c(δ), and the goal is to minimize total time T_total = ∑ c(δ_k) subject to a target accuracy constraint. The framework translates standard zeroth-order convergence rates into explicit recommendations for fidelity schedules and batch sizes. It shows that, under the model, accelerated methods can be wall-clock inferior to non-accelerated schemes and characterizes conditions under which a constant-fidelity strategy is optimal in the Big-O sense.

Significance. If the assumed functional forms for c(δ) and the noise model hold, the results supply concrete, practical guidance for choosing fidelity and batching parameters in expensive simulation-based zeroth-order problems. The unified translation of convergence guarantees into wall-clock recommendations is a useful bridge between theory and implementation. The derivations appear to rest on standard rates translated into the new objective rather than on self-referential or fitted quantities.

major comments (1)
  1. [Abstract] Abstract and the derivations of the wall-clock bounds: the claims that accelerated methods are wall-clock inferior and that constant fidelity is Big-O optimal are obtained by minimizing T_total under explicit parametric assumptions on c(δ) and on how noise variance depends on δ. The manuscript should state these functional forms explicitly (e.g., in the model section) and either prove robustness to monotone deviations or clearly delimit the scope of the Big-O statements, because even monotone but non-matching c(δ) can alter the optimal schedule and the acceleration comparison.
minor comments (1)
  1. Notation for δ, c(δ), and the accuracy constraint should be introduced once and used consistently; any implicit dependence of the noise model on δ should be written explicitly.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and the derivations of the wall-clock bounds: the claims that accelerated methods are wall-clock inferior and that constant fidelity is Big-O optimal are obtained by minimizing T_total under explicit parametric assumptions on c(δ) and on how noise variance depends on δ. The manuscript should state these functional forms explicitly (e.g., in the model section) and either prove robustness to monotone deviations or clearly delimit the scope of the Big-O statements, because even monotone but non-matching c(δ) can alter the optimal schedule and the acceleration comparison.

    Authors: We agree that the specific parametric forms for c(δ) and the noise dependence on δ are essential to the concrete wall-clock bounds and comparisons. The general model is introduced in Section 2, but we will revise the manuscript to state the explicit functional forms (e.g., the power-law or other parametric families used for the Big-O results) directly in the model section. We will also add a clarifying remark delimiting the scope of the Big-O optimality claims and the acceleration comparison to these assumed families, without claiming robustness to arbitrary monotone deviations, as the referee correctly observes that such deviations can alter the optimal schedule. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations translate standard rates under explicit parametric assumptions

full rationale

The paper assumes known functional forms for c(δ) and the noise model, then minimizes T_total subject to accuracy constraints using standard zeroth-order convergence rates. No quoted step reduces a claimed prediction or optimality result to a fitted parameter or self-citation by construction. The framework is self-contained against external benchmarks once the functional forms are granted; the reader's score of 2 reflects only the normal dependence on modeling assumptions rather than any definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all modeling choices (oracle type, noise model, cost function) are left unspecified.

pith-pipeline@v0.9.1-grok · 5724 in / 1048 out tokens · 26379 ms · 2026-06-28T21:18:55.072336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming

    Tadashi Kadowaki and Mitsuru Ambai. Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming. Scientific Reports, 12 0 (1): 0 15482, 2022

  2. [2]

    Black-box sparse adversarial attack via multi-objective optimisation

    Phoenix Neale Williams and Ke Li. Black-box sparse adversarial attack via multi-objective optimisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12291--12301, 2023

  3. [3]

    Simulator-based surrogate optimisation employing adaptive uncertainty-aware sampling

    Yu Liu and Fabricio Oliveira. Simulator-based surrogate optimisation employing adaptive uncertainty-aware sampling. Computers & Chemical Engineering, page 109243, 2025

  4. [4]

    Black-box optimization with local generative surrogates

    Sergey Shirobokov, Vladislav Belavin, Michael Kagan, Andrei Ustyuzhanin, and Atilim Gunes Baydin. Black-box optimization with local generative surrogates. Advances in neural information processing systems, 33: 0 14650--14662, 2020

  5. [5]

    Optimizing molecules using efficient queries from property evaluations

    Samuel C Hoffman, Vijil Chenthamarakshan, Kahini Wadhawan, Pin-Yu Chen, and Payel Das. Optimizing molecules using efficient queries from property evaluations. Nature Machine Intelligence, 4 0 (1): 0 21--31, 2022

  6. [6]

    Learning supervised pagerank with gradient-based and gradient-free optimization methods

    Lev Bogolubsky, Pavel Dvurechenskii, Alexander Gasnikov, Gleb Gusev, Yurii Nesterov, Andrei M Raigorodskii, Aleksey Tikhonov, and Maksim Zhukovskii. Learning supervised pagerank with gradient-based and gradient-free optimization methods. Advances in neural information processing systems, 29, 2016

  7. [7]

    Optimal inexactness schedules for tunable oracle-based methods

    Guillaume Van Dessel and Fran c ois Glineur. Optimal inexactness schedules for tunable oracle-based methods. Optimization Methods and Software, 39 0 (3): 0 664--698, 2024

  8. [8]

    Randomized gradient-free methods in convex optimization

    Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Eduard Gorbunov, Aleksandr Beznosikov, and Alexander Lobanov. Randomized gradient-free methods in convex optimization. In Encyclopedia of Optimization, pages 1--15. Springer, 2023

  9. [9]

    Intermediate gradient methods for smooth convex problems with inexact oracle

    Olivier Devolder, Fran c ois Glineur, Yurii Nesterov, et al. Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report, Technical report, CORE-2013017, 2013

  10. [10]

    Accelerated gradient methods with absolute and relative noise in the gradient

    Artem Vasin, Alexander Gasnikov, Pavel Dvurechensky, and Vladimir Spokoiny. Accelerated gradient methods with absolute and relative noise in the gradient. Optimization Methods and Software, 38 0 (6): 0 1180--1229, 2023

  11. [11]

    overparametrization

    Aleksandr Lobanov and Alexander Gasnikov. Accelerated zero-order sgd method for solving the black box optimization problem under “overparametrization” condition. In International Conference on Optimization and Applications, pages 72--83. Springer, 2023

  12. [12]

    Accelerated zero-order sgd under high-order smoothness and overparameterized regime

    Georgii Bychkov, Darina Dvinskikh, Anastasia Antsiferova, Alexander Gasnikov, and Aleksandr Lobanov. Accelerated zero-order sgd under high-order smoothness and overparameterized regime. arXiv preprint arXiv:2411.13999, 2024

  13. [13]

    Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm

    Arya Akhavan, Evgenii Chzhen, Massimiliano Pontil, and Alexandre B Tsybakov. Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm. Journal of Machine Learning Research, 25 0 (370): 0 1--50, 2024

  14. [14]

    black-box

    Aleksandr Lobanov, Nail Bashirov, and Alexander Gasnikov. The “black-box” optimization problem: Zero-order accelerated stochastic method via kernel approximation. Journal of Optimization Theory and Applications, 203 0 (3): 0 2451--2486, 2024

  15. [15]

    The pagerank citation ranking: Bring order to the web

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bring order to the web. In Proc. of the 7th International World Wide Web Conf.--1998, 1999

  16. [16]

    On solving minimization and min-max problems by first-order methods with relative error in gradients

    Artem Vasin, Valery Krivchenko, Dmitry Kovalev, Fedyor Stonyakin, Nazarii Tupitsa, Pavel Dvurechensky, Mohammad Alkousa, Nikita Kornilov, and Alexander Gasnikov. On solving minimization and min-max problems by first-order methods with relative error in gradients. arXiv preprint arXiv:2503.06628, 2025

  17. [17]

    Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

    Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. In The 22nd international conference on artificial intelligence and statistics, pages 1195--1204. PMLR, 2019

  18. [18]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  19. [19]

    The mnist database of handwritten digit images for machine learning research [best of the web]

    Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29 0 (6): 0 141--142, 2012

  20. [20]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

  21. [21]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  22. [22]

    Stochastic intermediate gradient method for convex problems with stochastic inexact oracle

    Pavel Dvurechensky and Alexander Gasnikov. Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. Journal of Optimization Theory and Applications, 171 0 (1): 0 121--145, 2016

  23. [23]

    Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model

    Fedor Stonyakin, Alexander Tyurin, Alexander Gasnikov, Pavel Dvurechensky, Artem Agafonov, Darina Dvinskikh, Mohammad Alkousa, Dmitry Pasechnyuk, Sergei Artamonov, and Victorya Piskunova. Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model. arXiv preprint arXiv:2001.09013, 2020

  24. [24]

    Inexact model: A framework for optimization and variational inequalities

    Fedor Stonyakin, Alexander Tyurin, Alexander Gasnikov, Pavel Dvurechensky, Artem Agafonov, Darina Dvinskikh, Mohammad Alkousa, Dmitry Pasechnyuk, Sergei Artamonov, and Victorya Piskunova. Inexact model: A framework for optimization and variational inequalities. Optimization Methods and Software, 36 0 (6): 0 1155--1201, 2021

  25. [25]

    Lectures on convex optimization, volume 137

    Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018