Distributed Black-Box Optimization via Error Correcting Codes

Burak Bartan; Mert Pilanci

arxiv: 1907.05984 · v1 · pith:WU5LL6RSnew · submitted 2019-07-13 · 💻 cs.DC · cs.IT· cs.LG· math.IT

Distributed Black-Box Optimization via Error Correcting Codes

Burak Bartan , Mert Pilanci This is my paper

Pith reviewed 2026-05-24 22:18 UTC · model grok-4.3

classification 💻 cs.DC cs.ITcs.LGmath.IT

keywords distributed optimizationblack-box optimizationerror correcting codesstraggler resiliencederivative-free optimizationadversarial attacksevolution strategies

0 comments

The pith

Distributed black-box optimization uses coded search directions and decoding to recover iterates from partial worker results, tolerating stragglers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a derivative-free optimization method designed for distributed systems where some workers may lag. Search directions are encoded with error-correcting codes before evaluation at the workers, after which a decoding step reconstructs the update direction from the faster responses alone. This setup extends evolution strategies by imposing structure on the directions explored. The authors test the approach on generating adversarial examples for convolutional neural networks and observe reduced overall run times.

Core claim

The central claim is that a distributed derivative-free optimization framework can be made resilient to stragglers by evaluating coded search directions and applying a decoding step to obtain the next iterate. This construction extends evolution strategies and structured exploration methods that already use non-random directions. When applied to black-box adversarial attacks on deep convolutional neural networks, the resulting procedure yields significantly lower computation times.

What carries the argument

Coded search directions evaluated at workers, followed by a decoding step that reconstructs the optimization update from non-straggling responses

If this is right

The method tolerates stragglers while preserving the quality of the search direction.
Computation time decreases in distributed environments compared with uncoded baselines.
The framework applies directly to black-box adversarial attacks on convolutional neural networks.
It inherits the derivative-free character of evolution strategies while adding straggler resilience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coding pattern could be tested on other derivative-free algorithms that already rely on structured directions.
Varying the underlying error-correcting code might trade off decoding complexity against tolerance to different fractions of stragglers.
In large clusters the approach could reduce the need for explicit synchronization barriers.

Load-bearing premise

The decoding step recovers a search direction accurate enough that the overall optimization trajectory is not degraded by missing evaluations from slow workers.

What would settle it

A run in which the decoded direction deviates enough from the full-evaluation direction that convergence slows or fails once stragglers appear.

Figures

Figures reproduced from arXiv: 1907.05984 by Burak Bartan, Mert Pilanci.

**Figure 2.** Figure 2: 2-by-2 Hadamard transformation of the perturbation directions. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Example encoding. is 3 4 since 3 out of 4 channels are used for sending in unit vectors. When the erasure probabilities of the transformed channels are computed (see [1]), one will see that the worst one corresponds to the first index. Hence, the first channel is the frozen channel and the remaining 3 channels are the information channels. The frozen channel is set to the zero vector and the information c… view at source ↗

**Figure 6.** Figure 6: Original images θ0. (a) Airplane classified as dog (b) Ship classified as dog [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Adversarial images generated using the proposed method. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Run time distributions used in simulations. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Simulated plots of cost as a function of time (all methods were run [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

read the original abstract

We introduce a novel distributed derivative-free optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider black-box adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses error-correcting codes to create resilient search directions for distributed evolution strategies, but provides no bound on how decoding error affects the optimization path.

read the letter

The main contribution is a distributed derivative-free method that encodes search directions so the next point can be recovered from non-straggling workers. This extends earlier structured exploration work to the straggler setting and is tested on black-box adversarial attacks against CNNs, where it reports faster wall-clock times than uncoded baselines. The framing is clean and the application is concrete, which makes the engineering angle easy to follow. The experiments focus on practical speed gains, which is the right metric for distributed workloads. The soft spot is the decoding step. The paper does not supply any concentration result or Lipschitz-style control showing that the recovered direction stays close enough in angle or distribution to the intended one when some evaluations are missing. Without that, it is hard to know whether the optimization trajectory itself changes under realistic straggler fractions. The numerical results emphasize computation time but do not isolate whether attack success rate or convergence behavior holds up when the fraction of stragglers varies. This leaves the central claim of resilience without degradation on somewhat thin evidence. The work is aimed at people who run distributed black-box optimization in ML pipelines and care about straggler tolerance. A reader looking for a practical tweak to evolution strategies will find the setup and timing numbers useful. Someone wanting formal guarantees on the search direction will not. It is worth sending to peer review so referees can examine the experimental controls and check whether the decoding preserves the necessary statistical properties of the directions.

Referee Report

2 major / 1 minor

Summary. The paper introduces a distributed derivative-free optimization framework that is resilient to stragglers by employing coded search directions evaluated at workers followed by a decoding step to recover the next iterate. It positions the approach as an extension of evolution strategies and structured exploration methods, and applies it to black-box adversarial attacks on deep convolutional neural networks, claiming significant improvements in computation times via numerical experiments.

Significance. If the decoded directions remain sufficiently close in distribution or angle to the intended search directions, the framework could provide a practical way to achieve straggler resilience in distributed black-box optimization without sacrificing convergence behavior. The idea of leveraging error-correcting codes for structured directions is a natural extension of prior work on coded computation and evolution strategies, but the absence of any quantitative guarantee on decoding fidelity limits the ability to assess whether the central resilience claim holds.

major comments (2)

[Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).
[Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).

minor comments (1)

[Abstract] The abstract states 'significant improvement' without quantitative details, error bars, or explicit baselines; adding these would strengthen the presentation even if the major issues are addressed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).

Authors: We acknowledge that the manuscript does not include theoretical guarantees such as concentration bounds or angle closeness results for the decoded direction. The approach uses error-correcting codes so that exact recovery of the intended linear combination of directions occurs whenever the number of stragglers lies within the code's correction radius; this is exact (not approximate) under that condition. When the number of stragglers exceeds the correction capability the decoding may fail or produce an inexact result, and we do not claim distribution closeness in that regime. We will add a short paragraph in the revised manuscript explicitly stating the conditions for exact recovery and noting the lack of general closeness bounds as an open question for future work. revision: partial
Referee: [Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).

Authors: We agree that the current experiments focus on wall-clock time and do not directly compare optimization trajectories. In the revised version we will add plots of attack success rate versus iteration count for several fixed straggler fractions, comparing the coded method against the uncoded baseline under identical random seeds. These figures will allow readers to assess whether the number of iterations required remains comparable. revision: yes

Circularity Check

0 steps flagged

No circularity: framework extension and decoding step presented as independent construction

full rationale

The provided abstract and description introduce a coded-direction extension of evolution strategies with a decoding recovery step for straggler resilience. No equations, fitted parameters, or self-citations are exhibited that would reduce any claimed prediction or uniqueness result to a definitional tautology or input fit. The numerical experiments are described as empirical validation of wall-clock improvement rather than as the source of the method itself. The derivation chain therefore remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5597 in / 968 out tokens · 17708 ms · 2026-05-24T22:18:40.252132+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We encode ordinary basis perturbations using Hadamard transform into perturbation directions along the rows of the Hadamard matrix H... If we decode the outputs instead of averaging them using the modified successive cancellation decoder of polar codes, however, we obtain partial derivatives as in the finite differences method.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 6 internal anchors

[1]

E. Arikan. Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory , 55:3051–3073, 2009

work page 2009
[2]

Straggler Resilient Serverless Computing Based on Polar Codes

B. Bartan and M. Pilanci. Polar coded distributed matrix multiplication. arXiv preprint arXiv:1901.06811 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[3]

Boyd and L

S. Boyd and L. Vandenberghe. Convex optimization . Cambridge University Press, Cambridge, UK, 2004

work page 2004
[4]

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv preprint arXiv:1708.03999, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Structured Evolution with Compact Architectures for Scalable Policy Optimization

Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. arXiv preprint arXiv:1804.02395, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Introduction to derivative-free optimization , volume 8

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization , volume 8. Siam, 2009

work page 2009
[7]

The tail at scale

Jeffrey Dean and Luiz Andr ´e Barroso. The tail at scale. Communica- tions of the ACM , 56(2):74–80, 2013

work page 2013
[8]

Charac- terizing the inﬂuence of system noise on large-scale applications by simulation

Torsten Hoeﬂer, Timo Schneider, and Andrew Lumsdaine. Charac- terizing the inﬂuence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–11. IEEE Computer Society, 2010

work page 2010
[9]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009

work page 2009
[11]

K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchan- dran. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory , 64(3):1514–1529, March 2018

work page 2018
[12]

Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,

A. Mallick, M. Chaudhari, and G. Joshi. Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication. CoRR, abs/1804.10331, 2018

work page arXiv 2018
[13]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforce- ment learning. arXiv preprint arXiv:1703.03864 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Severinson, A

A. Severinson, A. G. i Amat, and E. Rosnes. Block-diagonal and lt codes for distributed computing with straggling servers. IEEE Transactions on Communications , 2018

work page 2018
[15]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolu- tional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[16]

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Polynomial codes: An optimal design for high-dimensional coded matrix multiplication. Adv. in Neural Info. Proc. Systems (NIPS) 30 , pages 4406–4416, 2017

work page 2017
[17]

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding. CoRR, abs/1801.07487, 2018

work page arXiv 2018

[1] [1]

E. Arikan. Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory , 55:3051–3073, 2009

work page 2009

[2] [2]

Straggler Resilient Serverless Computing Based on Polar Codes

B. Bartan and M. Pilanci. Polar coded distributed matrix multiplication. arXiv preprint arXiv:1901.06811 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[3] [3]

Boyd and L

S. Boyd and L. Vandenberghe. Convex optimization . Cambridge University Press, Cambridge, UK, 2004

work page 2004

[4] [4]

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv preprint arXiv:1708.03999, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Structured Evolution with Compact Architectures for Scalable Policy Optimization

Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. arXiv preprint arXiv:1804.02395, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Introduction to derivative-free optimization , volume 8

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization , volume 8. Siam, 2009

work page 2009

[7] [7]

The tail at scale

Jeffrey Dean and Luiz Andr ´e Barroso. The tail at scale. Communica- tions of the ACM , 56(2):74–80, 2013

work page 2013

[8] [8]

Charac- terizing the inﬂuence of system noise on large-scale applications by simulation

Torsten Hoeﬂer, Timo Schneider, and Andrew Lumsdaine. Charac- terizing the inﬂuence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–11. IEEE Computer Society, 2010

work page 2010

[9] [9]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009

work page 2009

[11] [11]

K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchan- dran. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory , 64(3):1514–1529, March 2018

work page 2018

[12] [12]

Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,

A. Mallick, M. Chaudhari, and G. Joshi. Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication. CoRR, abs/1804.10331, 2018

work page arXiv 2018

[13] [13]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforce- ment learning. arXiv preprint arXiv:1703.03864 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Severinson, A

A. Severinson, A. G. i Amat, and E. Rosnes. Block-diagonal and lt codes for distributed computing with straggling servers. IEEE Transactions on Communications , 2018

work page 2018

[15] [15]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolu- tional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[16] [16]

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Polynomial codes: An optimal design for high-dimensional coded matrix multiplication. Adv. in Neural Info. Proc. Systems (NIPS) 30 , pages 4406–4416, 2017

work page 2017

[17] [17]

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding. CoRR, abs/1801.07487, 2018

work page arXiv 2018