Distributed Black-Box Optimization via Error Correcting Codes
Pith reviewed 2026-05-24 22:18 UTC · model grok-4.3
The pith
Distributed black-box optimization uses coded search directions and decoding to recover iterates from partial worker results, tolerating stragglers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a distributed derivative-free optimization framework can be made resilient to stragglers by evaluating coded search directions and applying a decoding step to obtain the next iterate. This construction extends evolution strategies and structured exploration methods that already use non-random directions. When applied to black-box adversarial attacks on deep convolutional neural networks, the resulting procedure yields significantly lower computation times.
What carries the argument
Coded search directions evaluated at workers, followed by a decoding step that reconstructs the optimization update from non-straggling responses
If this is right
- The method tolerates stragglers while preserving the quality of the search direction.
- Computation time decreases in distributed environments compared with uncoded baselines.
- The framework applies directly to black-box adversarial attacks on convolutional neural networks.
- It inherits the derivative-free character of evolution strategies while adding straggler resilience.
Where Pith is reading between the lines
- The same coding pattern could be tested on other derivative-free algorithms that already rely on structured directions.
- Varying the underlying error-correcting code might trade off decoding complexity against tolerance to different fractions of stragglers.
- In large clusters the approach could reduce the need for explicit synchronization barriers.
Load-bearing premise
The decoding step recovers a search direction accurate enough that the overall optimization trajectory is not degraded by missing evaluations from slow workers.
What would settle it
A run in which the decoded direction deviates enough from the full-evaluation direction that convergence slows or fails once stragglers appear.
Figures
read the original abstract
We introduce a novel distributed derivative-free optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider black-box adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a distributed derivative-free optimization framework that is resilient to stragglers by employing coded search directions evaluated at workers followed by a decoding step to recover the next iterate. It positions the approach as an extension of evolution strategies and structured exploration methods, and applies it to black-box adversarial attacks on deep convolutional neural networks, claiming significant improvements in computation times via numerical experiments.
Significance. If the decoded directions remain sufficiently close in distribution or angle to the intended search directions, the framework could provide a practical way to achieve straggler resilience in distributed black-box optimization without sacrificing convergence behavior. The idea of leveraging error-correcting codes for structured directions is a natural extension of prior work on coded computation and evolution strategies, but the absence of any quantitative guarantee on decoding fidelity limits the ability to assess whether the central resilience claim holds.
major comments (2)
- [Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).
- [Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).
minor comments (1)
- [Abstract] The abstract states 'significant improvement' without quantitative details, error bars, or explicit baselines; adding these would strengthen the presentation even if the major issues are addressed.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).
Authors: We acknowledge that the manuscript does not include theoretical guarantees such as concentration bounds or angle closeness results for the decoded direction. The approach uses error-correcting codes so that exact recovery of the intended linear combination of directions occurs whenever the number of stragglers lies within the code's correction radius; this is exact (not approximate) under that condition. When the number of stragglers exceeds the correction capability the decoding may fail or produce an inexact result, and we do not claim distribution closeness in that regime. We will add a short paragraph in the revised manuscript explicitly stating the conditions for exact recovery and noting the lack of general closeness bounds as an open question for future work. revision: partial
-
Referee: [Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).
Authors: We agree that the current experiments focus on wall-clock time and do not directly compare optimization trajectories. In the revised version we will add plots of attack success rate versus iteration count for several fixed straggler fractions, comparing the coded method against the uncoded baseline under identical random seeds. These figures will allow readers to assess whether the number of iterations required remains comparable. revision: yes
Circularity Check
No circularity: framework extension and decoding step presented as independent construction
full rationale
The provided abstract and description introduce a coded-direction extension of evolution strategies with a decoding recovery step for straggler resilience. No equations, fitted parameters, or self-citations are exhibited that would reduce any claimed prediction or uniqueness result to a definitional tautology or input fit. The numerical experiments are described as empirical validation of wall-clock improvement rather than as the source of the method itself. The derivation chain therefore remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We encode ordinary basis perturbations using Hadamard transform into perturbation directions along the rows of the Hadamard matrix H... If we decode the outputs instead of averaging them using the modified successive cancellation decoder of polar codes, however, we obtain partial derivatives as in the finite differences method.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
E. Arikan. Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory , 55:3051–3073, 2009
work page 2009
-
[2]
Straggler Resilient Serverless Computing Based on Polar Codes
B. Bartan and M. Pilanci. Polar coded distributed matrix multiplication. arXiv preprint arXiv:1901.06811 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[3]
S. Boyd and L. Vandenberghe. Convex optimization . Cambridge University Press, Cambridge, UK, 2004
work page 2004
-
[4]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv preprint arXiv:1708.03999, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Structured Evolution with Compact Architectures for Scalable Policy Optimization
Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. arXiv preprint arXiv:1804.02395, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Introduction to derivative-free optimization , volume 8
Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization , volume 8. Siam, 2009
work page 2009
-
[7]
Jeffrey Dean and Luiz Andr ´e Barroso. The tail at scale. Communica- tions of the ACM , 56(2):74–80, 2013
work page 2013
-
[8]
Charac- terizing the influence of system noise on large-scale applications by simulation
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. Charac- terizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–11. IEEE Computer Society, 2010
work page 2010
-
[9]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[11]
K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchan- dran. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory , 64(3):1514–1529, March 2018
work page 2018
-
[12]
Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,
A. Mallick, M. Chaudhari, and G. Joshi. Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication. CoRR, abs/1804.10331, 2018
-
[13]
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforce- ment learning. arXiv preprint arXiv:1703.03864 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
A. Severinson, A. G. i Amat, and E. Rosnes. Block-diagonal and lt codes for distributed computing with straggling servers. IEEE Transactions on Communications , 2018
work page 2018
-
[15]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan and Andrew Zisserman. Very Deep Convolu- tional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[16]
Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Polynomial codes: An optimal design for high-dimensional coded matrix multiplication. Adv. in Neural Info. Proc. Systems (NIPS) 30 , pages 4406–4416, 2017
work page 2017
- [17]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.