Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Kaiwen Wu; Yaoliang Yu

arxiv: 1907.11780 · v1 · pith:VERUG3JJnew · submitted 2019-07-26 · 💻 cs.LG · stat.ML

Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Kaiwen Wu , Yaoliang Yu This is my paper

Pith reviewed 2026-05-24 15:30 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords adversarial robustnessmargindeep learningregularizationFisher consistencyclassificationneural networkstrade-off

0 comments

The pith

Deep models maximize the minimum margin for accuracy while decreasing the average margin, which reduces adversarial robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard training pushes deep classifiers to enlarge the smallest margin between classes, which helps clean accuracy but shrinks the typical margin across most examples. This shrinkage makes small adversarial perturbations more likely to flip predictions. The authors introduce a regularizer that directly encourages larger average margins. The modified objective stays Fisher-consistent, so it can still recover the Bayes-optimal classifier as data grows without bound.

Core claim

During training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the average margin hence hurting robustness. A new regularizer explicitly promotes average margin and leads to better robustness in experiments while remaining Fisher-consistent.

What carries the argument

The trade-off between minimum margin and average margin, countered by a Fisher-consistent regularizer that increases average margin.

If this is right

Adding the regularizer produces models with higher average margins and measurably better resistance to adversarial attacks.
The regularized loss remains Fisher-consistent and can still recover the Bayes optimal classifier in the large-sample limit.
Accuracy and robustness exhibit an intrinsic tension under current deep-model training dynamics.
The same margin-reduction pattern appears across multiple architectures and datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The trade-off may arise from any optimizer that aggressively enlarges the smallest margin, not only from the particular loss used here.
Similar margin dynamics could appear in non-neural classifiers if they are trained by margin-maximizing procedures.
Combining the regularizer with existing defense techniques might compound robustness gains.
The pattern could be tested on tabular or sequential data to check whether it is specific to image classification.

Load-bearing premise

The reduction in average margin during ordinary training is the main driver of lost robustness, and explicitly raising it will improve robustness without creating new failure modes.

What would settle it

Training with the proposed regularizer yields no measurable gain in adversarial accuracy on standard benchmarks, or the regularized models lose clean accuracy at the same rate as the gain in robustness.

Figures

Figures reproduced from arXiv: 1907.11780 by Kaiwen Wu, Yaoliang Yu.

**Figure 2.** Figure 2: Average and minimum margin during training of different models on CIFAR10. Avg Margin of LR Avg Margin of MLP Avg Margin of CNN [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Average margin of regularized and standard training. by maximizing the minimum margin, the average margin, which is a better indicator of robustness, is at serious jeopardy. Note that early stopping, while helps preventing the average margin to decrease unnecessarily, is not sufficient by itself to promote average margin. Instead, an explicit average margin regularizer is more effective, as we show next. 5… view at source ↗

**Figure 4.** Figure 4: shows the training loss, training error and test error of logistic regression in Section 3. As training goes, the error rate on both training set and test set never increase, thus the logistic regression is not overfitting, although trained with excessive number of epochs. This again hightlights that the trade-off between minimum and average margin cannot be caused by overfitting [PITH_FULL_IMAGE:figures/… view at source ↗

**Figure 5.** Figure 5: Training curves of 3 models on MNIST and 3 models on CIFAR10 by standard training. First Row : MNIST models. Second Row : CIFAR10 models [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: shows the minimum and average margin trade-off for MNIST models 2 . A similar trade-off between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the figure is very small (from 0.94 to 1.08), in which case the Lipschitz c… view at source ↗

**Figure 7.** Figure 7: Margin histograms of MNIST models at different epochs during training. Top: Histograms of MNIST-LR. Mid: Histograms of MNIST-MLP. Bottom: Hostograms of MNIST-CNN [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Margin histograms of CIFAR models at different epochs during training. Top: Histograms of CIFAR-LR. Mid: Histograms of CIFAR-MLP. Bottom: Histograms of CIFAR-CNN [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the \emph{average} margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames a min-margin vs avg-margin trade-off during training as the source of robustness loss and adds a Fisher-consistent regularizer, but the abstract leaves the causal mechanism unproven.

read the letter

The main thing to know is that the authors claim standard training maximizes the smallest margin for accuracy while shrinking the average margin, which they tie directly to weaker adversarial robustness, and they introduce a regularizer to push the average margin up without breaking asymptotic optimality via Fisher consistency. They report empirical support for both the trade-off and the regularizer's benefits. That framing is a distinct angle on the accuracy-robustness tension even if margin ideas themselves are older. The consistency property is a genuine plus because it shows the objective does not sacrifice the Bayes classifier in the limit. The experiments are described as extensive, which at least suggests they tried to check the practical effect. The soft spot is that nothing in the abstract isolates the average-margin reduction as the primary driver rather than correlated changes in curvature, optimization trajectory, or feature quality. The regularizer is only sketched at a high level, so it is not clear whether gains come from the intended mechanism or from some other side effect. Without seeing the actual derivation, the attack setups, or controlled ablations, the causal story stays untested. This work is aimed at researchers already tuning regularizers for robustness. A reader who wants concrete training modifications might extract value if the numbers hold, but the paper does not yet supply enough detail to stand on its own. It deserves a serious referee to examine the experiments and the exact form of the regularizer rather than a desk reject, because the trade-off claim is specific enough to be worth checking.

Referee Report

2 major / 1 minor

Summary. The paper claims that during standard training, deep models maximize the minimum margin to achieve high accuracy but simultaneously decrease the average margin, creating an intrinsic trade-off that reduces adversarial robustness. It provides empirical support for this dynamic and introduces a regularizer to explicitly promote the average margin. The regularized objective is asserted to remain Fisher-consistent, allowing asymptotic recovery of the Bayes optimal classifier, with experiments showing improved robustness from the regularizer.

Significance. If the observed trade-off is shown to be causal and the regularizer demonstrably improves robustness by targeting average margin (rather than incidental effects), the work would contribute a mechanistic explanation for adversarial vulnerability in standard training and a consistency-preserving regularization approach. The explicit retention of Fisher consistency is a methodological strength that distinguishes it from many heuristic robustness methods.

major comments (2)

[Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.
[Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.

minor comments (1)

The abstract refers to 'extensive experiments' verifying robustness gains; the main text should include precise descriptions of the datasets, architectures, attack methods, and hyperparameter choices used to evaluate the regularizer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments highlight important points about causal evidence and the need for explicit justification of Fisher consistency. We address each below and will revise the manuscript to incorporate additional analysis and derivations.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.

Authors: We agree that the current empirical observations, while showing the simultaneous maximization of minimum margin and reduction of average margin during training along with associated robustness degradation, do not fully isolate the margin effect from other dynamics such as curvature or feature learning. To strengthen the causal interpretation, we will add a controlled experiment in the revision, for example by using synthetic datasets where margin statistics can be directly modulated while holding other factors fixed, or by comparing training trajectories with matched curvature measures. revision: yes
Referee: [Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.

Authors: We acknowledge that the abstract states the Fisher consistency without an explicit derivation. The full manuscript contains a population-level analysis showing that the regularizer term vanishes when the classifier converges to the Bayes optimal decision boundary (as the average margin approaches its maximum under the true conditional distribution). We will include a concise version of this derivation in the revised abstract and a dedicated paragraph in the main text to make the argument self-contained. We note that consistency is an asymptotic guarantee and do not claim it directly implies finite-sample robustness improvements, which are shown empirically. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observation plus Fisher-consistent regularizer with independent grounding

full rationale

The paper reports an empirical observation that standard training increases minimum margin while decreasing average margin, then introduces a regularizer to promote the latter while preserving Fisher-consistency (an asymptotic population property). No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that reduce any load-bearing claim to a tautology or to the input data by construction. The central claims rest on experimental verification and the standard definition of Fisher-consistency rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available so the ledger reflects what is implied by the claims. The regularizer almost certainly introduces at least one tunable hyperparameter; standard supervised learning assumptions are invoked for the consistency claim.

free parameters (1)

regularization coefficient
Weight on the new average-margin term; must be chosen or tuned and directly affects the claimed robustness improvement.

axioms (1)

domain assumption Data are drawn i.i.d. from an underlying distribution
Required for the Fisher-consistency claim that the regularized objective recovers the Bayes optimal classifier asymptotically.

pith-pipeline@v0.9.0 · 5664 in / 1287 out tokens · 29869 ms · 2026-05-24T15:30:58.491294+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Intriguingpropertiesofneuralnetworks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, andRobFergus. Intriguingpropertiesofneuralnetworks. In International Conference on Learning Representations (ICLR), 2014

work page 2014
[2]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing ad- versarial examples. InInternational Conference on Learning Representations (ICLR), 2015

work page 2015
[3]

Deep- fool: a simple and accurate method to fool deep neural networks

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep- fool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016

work page 2016
[4]

Towards evaluating the robustness of neural networks, 2017

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2017. arXiv:1608

work page 2017
[5]

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models. InProceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pages 15–26, 2017

work page 2017
[6]

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses

Mohammad Hashemi, Greg Cusack, and Eric Keller. Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses. In Proceedings of the 11th ACM Workshop on Artiﬁcial Intelligence and Security, pages 25–36, 2018

work page 2018
[7]

Towards Query Eﬃcient Black-box Attacks: An Input-free Perspective

Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, and Dacheng Tao. Towards Query Eﬃcient Black-box Attacks: An Input-free Perspective. InProceedings of the 11th ACM Workshop on Artiﬁcial Intelligence and Security, pages 13–24, 2018

work page 2018
[8]

Berkay Celik, and Ananthram Swami

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical Black-Box Attacks Against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017

work page 2017
[9]

Eﬃcient DefensesAgainstAdversarialAttacks

Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Eﬃcient DefensesAgainstAdversarialAttacks. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pages 39–49, 2017

work page 2017
[10]

Distillation as a defense to adversarial perturbations against deep neural networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. InIEEE Symposium on Security and Privacy, 2016

work page 2016
[11]

Parseval networks: Improving robustness to adversarial examples

Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, pages 854–863, 2017

work page 2017
[12]

Towards deep learning model resisstant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning model resisstant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018

work page 2018
[13]

Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

work page 2018
[14]

Sparse DNNs with Improved Adversarial Robustness

Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with Improved Adversarial Robustness. InAdvances in Neural Information Processing Systems 31, pages 242–251, 2018. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 13

work page 2018
[15]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018

work page 2018
[16]

Formal guarantees on the robust- ness of a classiﬁer against adversarial manipulation

Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robust- ness of a classiﬁer against adversarial manipulation. InAdvances in Neural Information Processing Systems (NIPS), pages 2266–2276, 2017

work page 2017
[17]

Evaluating the robustness of neural networks: An extreme value theory approach

Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018
[18]

Eﬃcient neural network robustness certiﬁcation with general activation func- tions

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Eﬃcient neural network robustness certiﬁcation with general activation func- tions. InAdvances in Neural Information Processing Systems, pages 4944–4953, 2018

work page 2018
[19]

Dhillon, and Luca Daniel

Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, and Luca Daniel. Towards fast computation of certiﬁed robustness for relu networks. InICML, 2018

work page 2018
[20]

Xiao, and Russ Tedrake

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019

work page 2019
[21]

Robust- ness certiﬁcation with reﬁnement

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robust- ness certiﬁcation with reﬁnement. InInternational Conference on Learning Representations, 2019

work page 2019
[22]

Scaling provable adversarial defenses

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. InAdvances in Neural Information Processing Systems, pages 8400–8409, 2018

work page 2018
[23]

Zico Kolter

Eric Wong and J. Zico Kolter. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. InICML, 2018

work page 2018
[24]

Semideﬁnite relax- ations for certifying robustness to adversarial examples

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semideﬁnite relax- ations for certifying robustness to adversarial examples. InAdvances in Neural Information Processing Systems, pages 10900–10910, 2018

work page 2018
[25]

Certiﬁed defenses against adversarial examples

Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarial examples. InInternational Conference on Learning Repre- sentations, 2018

work page 2018
[26]

Fast and Eﬀective Robustness Certiﬁcation

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and Eﬀective Robustness Certiﬁcation. InAdvances in Neural Information Processing Systems 31, pages 10802–10813. 2018

work page 2018
[27]

On the Eﬀectiveness of Interval Bound Propagation for Training Veriﬁably Robust Models

SvenGowal, KrishnamurthyDvijotham, RobertStanforth, RudyBunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the Eﬀectiveness of Interval Bound Propagation for Training Veriﬁably Robust Models. InNeurIPS workshop on Security in Machine Learning. 2018

work page 2018
[28]

Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

Leo Breiman. Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

work page 1999
[29]

Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee

Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: a new explanation for the eﬀectiveness of voting methods.The Annals of Statistics, 26(5):1651–1686, 1998

work page 1998
[30]

On generalization bounds, projection proﬁle, and margin distribution

Ashutosh Garg, Sariel Har-Peled, and Dan Roth. On generalization bounds, projection proﬁle, and margin distribution. InICML, 2002. 14 KAIWEN WU AND YAOLIANG YU

work page 2002
[31]

Margin distribution and learning algorithms

Ashutosh Garg and Dan Roth. Margin distribution and learning algorithms. In ICML, 2003

work page 2003
[32]

Multi-classoptimalmargindistributionmachine

TengZhangandZhi-HuaZhou. Multi-classoptimalmargindistributionmachine. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, pages 4063–4071. JMLR. org, 2017

work page 2017
[33]

The Implicit Bias of Gra- dient Descent on Separable Data

Daniel Soudry, Elad Hoﬀer, and Nathan Srebro. The Implicit Bias of Gra- dient Descent on Separable Data. InInternational Conference on Learning Representations, 2018

work page 2018
[34]

Jaakkola

Guang-He Lee, David Alvarez-Melis, and Tommi S. Jaakkola. Towards ro- bust, locally linear deep networks. InInternational Conference on Learning Representations, 2019

work page 2019
[35]

Provable robustness of relu networks via maximization of linear regions

Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. InAISTATS, 2019

work page 2019
[36]

Deep defense: Training dnns with improved adversarial robustness

Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018

work page 2018
[37]

Implicit bias of gradient descent on linear convolutional networks

Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. InAdvances in Neural Information Processing Systems, pages 9461–9471, 2018

work page 2018
[38]

Gradient descent aligns the layers of deep linear networks

Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. InInternational Conference on Learning Representations, 2019

work page 2019
[39]

Koltchinskii and D

V. Koltchinskii and D. Panchenko. Empirical Margin Distributions and Bound- ing the Generalization Error of Combined Classiﬁers.The Annals of Statistics, 30(1):1–50, 2002

work page 2002
[40]

Convexity, classiﬁ- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

Peter L Bartlett, Michael I Jordan, and Jon D McAuliﬀe. Convexity, classiﬁ- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

work page 2006
[41]

Dill, Kyle Julian, and Mykel J

Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochender- fer. Reluplex: An eﬃcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 97–117, 2017

work page 2017
[42]

Defensive Quantization: When Eﬃciency Meets Robustness

Ji Lin, Chuang Gan, and Song Han. Defensive Quantization: When Eﬃciency Meets Robustness. InInternational Conference on Learning Representations, 2019. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 15 Appendix A. Proofs Proof of Proposition 1.Indeed, according to the deﬁnitions in(5) and (2) we have r(x) = inf{∥z∥ : x + z̸∈ Fˆy(x), x + z∈X}(17) = inf{∥(x ...

work page 2019
[43]

We ﬁrst consider the caseη > 1

work page
[44]

The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0

For the caseη < 1 2, the proof is similar. The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0. Thus∃δ1 > 0 such that ∀α∈ (0, δ1) satisﬁes C φ η (α) < C φ η (0). The right hand derivative ofψ at zero is negative, thus there∃δ2 > 0 such that ∀α∈ (0, δ2) satisﬁes ψ(α) < ψ(0). Notice that ψ(α) is constant whenα≤ 0. Thus ∀α∈ (0, δ2) satisﬁes C ψ η (α) < C ψ...

work page
[45]

The minimum margin keeps increasing while the average margin keeps decreasing

A similar trade-oﬀ between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the ﬁgure is very small (from0.94 to 1.08), in which case the Lipschitz constant estimation ...

work page

[1] [1]

Intriguingpropertiesofneuralnetworks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, andRobFergus. Intriguingpropertiesofneuralnetworks. In International Conference on Learning Representations (ICLR), 2014

work page 2014

[2] [2]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing ad- versarial examples. InInternational Conference on Learning Representations (ICLR), 2015

work page 2015

[3] [3]

Deep- fool: a simple and accurate method to fool deep neural networks

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep- fool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016

work page 2016

[4] [4]

Towards evaluating the robustness of neural networks, 2017

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2017. arXiv:1608

work page 2017

[5] [5]

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models. InProceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pages 15–26, 2017

work page 2017

[6] [6]

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses

Mohammad Hashemi, Greg Cusack, and Eric Keller. Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses. In Proceedings of the 11th ACM Workshop on Artiﬁcial Intelligence and Security, pages 25–36, 2018

work page 2018

[7] [7]

Towards Query Eﬃcient Black-box Attacks: An Input-free Perspective

Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, and Dacheng Tao. Towards Query Eﬃcient Black-box Attacks: An Input-free Perspective. InProceedings of the 11th ACM Workshop on Artiﬁcial Intelligence and Security, pages 13–24, 2018

work page 2018

[8] [8]

Berkay Celik, and Ananthram Swami

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical Black-Box Attacks Against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017

work page 2017

[9] [9]

Eﬃcient DefensesAgainstAdversarialAttacks

Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Eﬃcient DefensesAgainstAdversarialAttacks. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pages 39–49, 2017

work page 2017

[10] [10]

Distillation as a defense to adversarial perturbations against deep neural networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. InIEEE Symposium on Security and Privacy, 2016

work page 2016

[11] [11]

Parseval networks: Improving robustness to adversarial examples

Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, pages 854–863, 2017

work page 2017

[12] [12]

Towards deep learning model resisstant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning model resisstant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018

work page 2018

[13] [13]

Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

work page 2018

[14] [14]

Sparse DNNs with Improved Adversarial Robustness

Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with Improved Adversarial Robustness. InAdvances in Neural Information Processing Systems 31, pages 242–251, 2018. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 13

work page 2018

[15] [15]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018

work page 2018

[16] [16]

Formal guarantees on the robust- ness of a classiﬁer against adversarial manipulation

Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robust- ness of a classiﬁer against adversarial manipulation. InAdvances in Neural Information Processing Systems (NIPS), pages 2266–2276, 2017

work page 2017

[17] [17]

Evaluating the robustness of neural networks: An extreme value theory approach

Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018

[18] [18]

Eﬃcient neural network robustness certiﬁcation with general activation func- tions

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Eﬃcient neural network robustness certiﬁcation with general activation func- tions. InAdvances in Neural Information Processing Systems, pages 4944–4953, 2018

work page 2018

[19] [19]

Dhillon, and Luca Daniel

Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, and Luca Daniel. Towards fast computation of certiﬁed robustness for relu networks. InICML, 2018

work page 2018

[20] [20]

Xiao, and Russ Tedrake

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019

work page 2019

[21] [21]

Robust- ness certiﬁcation with reﬁnement

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robust- ness certiﬁcation with reﬁnement. InInternational Conference on Learning Representations, 2019

work page 2019

[22] [22]

Scaling provable adversarial defenses

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. InAdvances in Neural Information Processing Systems, pages 8400–8409, 2018

work page 2018

[23] [23]

Zico Kolter

Eric Wong and J. Zico Kolter. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. InICML, 2018

work page 2018

[24] [24]

Semideﬁnite relax- ations for certifying robustness to adversarial examples

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semideﬁnite relax- ations for certifying robustness to adversarial examples. InAdvances in Neural Information Processing Systems, pages 10900–10910, 2018

work page 2018

[25] [25]

Certiﬁed defenses against adversarial examples

Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarial examples. InInternational Conference on Learning Repre- sentations, 2018

work page 2018

[26] [26]

Fast and Eﬀective Robustness Certiﬁcation

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and Eﬀective Robustness Certiﬁcation. InAdvances in Neural Information Processing Systems 31, pages 10802–10813. 2018

work page 2018

[27] [27]

On the Eﬀectiveness of Interval Bound Propagation for Training Veriﬁably Robust Models

SvenGowal, KrishnamurthyDvijotham, RobertStanforth, RudyBunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the Eﬀectiveness of Interval Bound Propagation for Training Veriﬁably Robust Models. InNeurIPS workshop on Security in Machine Learning. 2018

work page 2018

[28] [28]

Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

Leo Breiman. Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

work page 1999

[29] [29]

Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee

Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: a new explanation for the eﬀectiveness of voting methods.The Annals of Statistics, 26(5):1651–1686, 1998

work page 1998

[30] [30]

On generalization bounds, projection proﬁle, and margin distribution

Ashutosh Garg, Sariel Har-Peled, and Dan Roth. On generalization bounds, projection proﬁle, and margin distribution. InICML, 2002. 14 KAIWEN WU AND YAOLIANG YU

work page 2002

[31] [31]

Margin distribution and learning algorithms

Ashutosh Garg and Dan Roth. Margin distribution and learning algorithms. In ICML, 2003

work page 2003

[32] [32]

Multi-classoptimalmargindistributionmachine

TengZhangandZhi-HuaZhou. Multi-classoptimalmargindistributionmachine. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, pages 4063–4071. JMLR. org, 2017

work page 2017

[33] [33]

The Implicit Bias of Gra- dient Descent on Separable Data

Daniel Soudry, Elad Hoﬀer, and Nathan Srebro. The Implicit Bias of Gra- dient Descent on Separable Data. InInternational Conference on Learning Representations, 2018

work page 2018

[34] [34]

Jaakkola

Guang-He Lee, David Alvarez-Melis, and Tommi S. Jaakkola. Towards ro- bust, locally linear deep networks. InInternational Conference on Learning Representations, 2019

work page 2019

[35] [35]

Provable robustness of relu networks via maximization of linear regions

Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. InAISTATS, 2019

work page 2019

[36] [36]

Deep defense: Training dnns with improved adversarial robustness

Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018

work page 2018

[37] [37]

Implicit bias of gradient descent on linear convolutional networks

Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. InAdvances in Neural Information Processing Systems, pages 9461–9471, 2018

work page 2018

[38] [38]

Gradient descent aligns the layers of deep linear networks

Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. InInternational Conference on Learning Representations, 2019

work page 2019

[39] [39]

Koltchinskii and D

V. Koltchinskii and D. Panchenko. Empirical Margin Distributions and Bound- ing the Generalization Error of Combined Classiﬁers.The Annals of Statistics, 30(1):1–50, 2002

work page 2002

[40] [40]

Convexity, classiﬁ- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

Peter L Bartlett, Michael I Jordan, and Jon D McAuliﬀe. Convexity, classiﬁ- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

work page 2006

[41] [41]

Dill, Kyle Julian, and Mykel J

Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochender- fer. Reluplex: An eﬃcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 97–117, 2017

work page 2017

[42] [42]

Defensive Quantization: When Eﬃciency Meets Robustness

Ji Lin, Chuang Gan, and Song Han. Defensive Quantization: When Eﬃciency Meets Robustness. InInternational Conference on Learning Representations, 2019. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 15 Appendix A. Proofs Proof of Proposition 1.Indeed, according to the deﬁnitions in(5) and (2) we have r(x) = inf{∥z∥ : x + z̸∈ Fˆy(x), x + z∈X}(17) = inf{∥(x ...

work page 2019

[43] [43]

We ﬁrst consider the caseη > 1

work page

[44] [44]

The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0

For the caseη < 1 2, the proof is similar. The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0. Thus∃δ1 > 0 such that ∀α∈ (0, δ1) satisﬁes C φ η (α) < C φ η (0). The right hand derivative ofψ at zero is negative, thus there∃δ2 > 0 such that ∀α∈ (0, δ2) satisﬁes ψ(α) < ψ(0). Notice that ψ(α) is constant whenα≤ 0. Thus ∀α∈ (0, δ2) satisﬁes C ψ η (α) < C ψ...

work page

[45] [45]

The minimum margin keeps increasing while the average margin keeps decreasing

A similar trade-oﬀ between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the ﬁgure is very small (from0.94 to 1.08), in which case the Lipschitz constant estimation ...

work page