pith. sign in

arxiv: 1907.05984 · v1 · pith:WU5LL6RSnew · submitted 2019-07-13 · 💻 cs.DC · cs.IT· cs.LG· math.IT

Distributed Black-Box Optimization via Error Correcting Codes

Pith reviewed 2026-05-24 22:18 UTC · model grok-4.3

classification 💻 cs.DC cs.ITcs.LGmath.IT
keywords distributed optimizationblack-box optimizationerror correcting codesstraggler resiliencederivative-free optimizationadversarial attacksevolution strategies
0
0 comments X

The pith

Distributed black-box optimization uses coded search directions and decoding to recover iterates from partial worker results, tolerating stragglers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a derivative-free optimization method designed for distributed systems where some workers may lag. Search directions are encoded with error-correcting codes before evaluation at the workers, after which a decoding step reconstructs the update direction from the faster responses alone. This setup extends evolution strategies by imposing structure on the directions explored. The authors test the approach on generating adversarial examples for convolutional neural networks and observe reduced overall run times.

Core claim

The central claim is that a distributed derivative-free optimization framework can be made resilient to stragglers by evaluating coded search directions and applying a decoding step to obtain the next iterate. This construction extends evolution strategies and structured exploration methods that already use non-random directions. When applied to black-box adversarial attacks on deep convolutional neural networks, the resulting procedure yields significantly lower computation times.

What carries the argument

Coded search directions evaluated at workers, followed by a decoding step that reconstructs the optimization update from non-straggling responses

If this is right

  • The method tolerates stragglers while preserving the quality of the search direction.
  • Computation time decreases in distributed environments compared with uncoded baselines.
  • The framework applies directly to black-box adversarial attacks on convolutional neural networks.
  • It inherits the derivative-free character of evolution strategies while adding straggler resilience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same coding pattern could be tested on other derivative-free algorithms that already rely on structured directions.
  • Varying the underlying error-correcting code might trade off decoding complexity against tolerance to different fractions of stragglers.
  • In large clusters the approach could reduce the need for explicit synchronization barriers.

Load-bearing premise

The decoding step recovers a search direction accurate enough that the overall optimization trajectory is not degraded by missing evaluations from slow workers.

What would settle it

A run in which the decoded direction deviates enough from the full-evaluation direction that convergence slows or fails once stragglers appear.

Figures

Figures reproduced from arXiv: 1907.05984 by Burak Bartan, Mert Pilanci.

Figure 1
Figure 1. Figure 1: 2-by-2 construction based on Hadamard transformation. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 2-by-2 Hadamard transformation of the perturbation directions. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example encoding. is 3 4 since 3 out of 4 channels are used for sending in unit vectors. When the erasure probabilities of the transformed channels are computed (see [1]), one will see that the worst one corresponds to the first index. Hence, the first channel is the frozen channel and the remaining 3 channels are the infor￾mation channels. The frozen channel is set to the zero vector and the information c… view at source ↗
Figure 6
Figure 6. Figure 6: Original images θ0. (a) Airplane classified as dog (b) Ship classified as dog [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Adversarial images generated using the proposed method. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Run time distributions used in simulations. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Simulated plots of cost as a function of time (all methods were run [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

We introduce a novel distributed derivative-free optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider black-box adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a distributed derivative-free optimization framework that is resilient to stragglers by employing coded search directions evaluated at workers followed by a decoding step to recover the next iterate. It positions the approach as an extension of evolution strategies and structured exploration methods, and applies it to black-box adversarial attacks on deep convolutional neural networks, claiming significant improvements in computation times via numerical experiments.

Significance. If the decoded directions remain sufficiently close in distribution or angle to the intended search directions, the framework could provide a practical way to achieve straggler resilience in distributed black-box optimization without sacrificing convergence behavior. The idea of leveraging error-correcting codes for structured directions is a natural extension of prior work on coded computation and evolution strategies, but the absence of any quantitative guarantee on decoding fidelity limits the ability to assess whether the central resilience claim holds.

major comments (2)
  1. [Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).
  2. [Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).
minor comments (1)
  1. [Abstract] The abstract states 'significant improvement' without quantitative details, error bars, or explicit baselines; adding these would strengthen the presentation even if the major issues are addressed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Decoding procedure (inferred from abstract and framework description)] The central claim that the method is resilient to stragglers without degrading the optimization trajectory rests on the decoding procedure recovering a sufficiently accurate search direction from partial evaluations. No concentration bound, Lipschitz-style guarantee, or angle/distribution closeness result is provided relating the decoded direction to the intended one as a function of the number of stragglers or the code parameters (see the description of the decoding step and the extension of evolution strategies).

    Authors: We acknowledge that the manuscript does not include theoretical guarantees such as concentration bounds or angle closeness results for the decoded direction. The approach uses error-correcting codes so that exact recovery of the intended linear combination of directions occurs whenever the number of stragglers lies within the code's correction radius; this is exact (not approximate) under that condition. When the number of stragglers exceeds the correction capability the decoding may fail or produce an inexact result, and we do not claim distribution closeness in that regime. We will add a short paragraph in the revised manuscript explicitly stating the conditions for exact recovery and noting the lack of general closeness bounds as an open question for future work. revision: partial

  2. Referee: [Numerical experiments] The numerical experiments report wall-clock speed-ups on CNN adversarial attacks but do not isolate trajectory quality metrics such as attack success rate versus iteration count under controlled straggler fractions. This leaves open whether the decoded directions preserve the optimization path or merely accelerate wall-clock time at the potential cost of more iterations (see the numerical experiments section).

    Authors: We agree that the current experiments focus on wall-clock time and do not directly compare optimization trajectories. In the revised version we will add plots of attack success rate versus iteration count for several fixed straggler fractions, comparing the coded method against the uncoded baseline under identical random seeds. These figures will allow readers to assess whether the number of iterations required remains comparable. revision: yes

Circularity Check

0 steps flagged

No circularity: framework extension and decoding step presented as independent construction

full rationale

The provided abstract and description introduce a coded-direction extension of evolution strategies with a decoding recovery step for straggler resilience. No equations, fitted parameters, or self-citations are exhibited that would reduce any claimed prediction or uniqueness result to a definitional tautology or input fit. The numerical experiments are described as empirical validation of wall-clock improvement rather than as the source of the method itself. The derivation chain therefore remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5597 in / 968 out tokens · 17708 ms · 2026-05-24T22:18:40.252132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 6 internal anchors

  1. [1]

    E. Arikan. Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory , 55:3051–3073, 2009

  2. [2]

    Straggler Resilient Serverless Computing Based on Polar Codes

    B. Bartan and M. Pilanci. Polar coded distributed matrix multiplication. arXiv preprint arXiv:1901.06811 , 2019

  3. [3]

    Boyd and L

    S. Boyd and L. Vandenberghe. Convex optimization . Cambridge University Press, Cambridge, UK, 2004

  4. [4]

    ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

    Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv preprint arXiv:1708.03999, 2017

  5. [5]

    Structured Evolution with Compact Architectures for Scalable Policy Optimization

    Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. arXiv preprint arXiv:1804.02395, 2018

  6. [6]

    Introduction to derivative-free optimization , volume 8

    Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization , volume 8. Siam, 2009

  7. [7]

    The tail at scale

    Jeffrey Dean and Luiz Andr ´e Barroso. The tail at scale. Communica- tions of the ACM , 56(2):74–80, 2013

  8. [8]

    Charac- terizing the influence of system noise on large-scale applications by simulation

    Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. Charac- terizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–11. IEEE Computer Society, 2010

  9. [9]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

  10. [10]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009

  11. [11]

    K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchan- dran. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory , 64(3):1514–1529, March 2018

  12. [12]

    Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,

    A. Mallick, M. Chaudhari, and G. Joshi. Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication. CoRR, abs/1804.10331, 2018

  13. [13]

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforce- ment learning. arXiv preprint arXiv:1703.03864 , 2017

  14. [14]

    Severinson, A

    A. Severinson, A. G. i Amat, and E. Rosnes. Block-diagonal and lt codes for distributed computing with straggling servers. IEEE Transactions on Communications , 2018

  15. [15]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very Deep Convolu- tional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014

  16. [16]

    Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Polynomial codes: An optimal design for high-dimensional coded matrix multiplication. Adv. in Neural Info. Proc. Systems (NIPS) 30 , pages 4406–4416, 2017

  17. [17]

    Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding. CoRR, abs/1801.07487, 2018