pith. sign in

arxiv: 1907.11780 · v1 · pith:VERUG3JJnew · submitted 2019-07-26 · 💻 cs.LG · stat.ML

Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Pith reviewed 2026-05-24 15:30 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords adversarial robustnessmargindeep learningregularizationFisher consistencyclassificationneural networkstrade-off
0
0 comments X

The pith

Deep models maximize the minimum margin for accuracy while decreasing the average margin, which reduces adversarial robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard training pushes deep classifiers to enlarge the smallest margin between classes, which helps clean accuracy but shrinks the typical margin across most examples. This shrinkage makes small adversarial perturbations more likely to flip predictions. The authors introduce a regularizer that directly encourages larger average margins. The modified objective stays Fisher-consistent, so it can still recover the Bayes-optimal classifier as data grows without bound.

Core claim

During training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the average margin hence hurting robustness. A new regularizer explicitly promotes average margin and leads to better robustness in experiments while remaining Fisher-consistent.

What carries the argument

The trade-off between minimum margin and average margin, countered by a Fisher-consistent regularizer that increases average margin.

If this is right

  • Adding the regularizer produces models with higher average margins and measurably better resistance to adversarial attacks.
  • The regularized loss remains Fisher-consistent and can still recover the Bayes optimal classifier in the large-sample limit.
  • Accuracy and robustness exhibit an intrinsic tension under current deep-model training dynamics.
  • The same margin-reduction pattern appears across multiple architectures and datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The trade-off may arise from any optimizer that aggressively enlarges the smallest margin, not only from the particular loss used here.
  • Similar margin dynamics could appear in non-neural classifiers if they are trained by margin-maximizing procedures.
  • Combining the regularizer with existing defense techniques might compound robustness gains.
  • The pattern could be tested on tabular or sequential data to check whether it is specific to image classification.

Load-bearing premise

The reduction in average margin during ordinary training is the main driver of lost robustness, and explicitly raising it will improve robustness without creating new failure modes.

What would settle it

Training with the proposed regularizer yields no measurable gain in adversarial accuracy on standard benchmarks, or the regularized models lose clean accuracy at the same rate as the gain in robustness.

Figures

Figures reproduced from arXiv: 1907.11780 by Kaiwen Wu, Yaoliang Yu.

Figure 2
Figure 2. Figure 2: Average and minimum margin during training of differ￾ent models on CIFAR10. Avg Margin of LR Avg Margin of MLP Avg Margin of CNN [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average margin of regularized and standard training. by maximizing the minimum margin, the average margin, which is a better indicator of robustness, is at serious jeopardy. Note that early stopping, while helps preventing the average margin to decrease unnecessarily, is not sufficient by itself to promote average margin. Instead, an explicit average margin regularizer is more effective, as we show next. 5… view at source ↗
Figure 4
Figure 4. Figure 4: shows the training loss, training error and test error of logistic regression in Section 3. As training goes, the error rate on both training set and test set never increase, thus the logistic regression is not overfitting, although trained with excessive number of epochs. This again hightlights that the trade-off between minimum and average margin cannot be caused by overfitting [PITH_FULL_IMAGE:figures/… view at source ↗
Figure 5
Figure 5. Figure 5: Training curves of 3 models on MNIST and 3 models on CIFAR10 by standard training. First Row : MNIST models. Second Row : CIFAR10 models [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: shows the minimum and average margin trade-off for MNIST models 2 . A similar trade-off between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the figure is very small (from 0.94 to 1.08), in which case the Lipschitz c… view at source ↗
Figure 7
Figure 7. Figure 7: Margin histograms of MNIST models at different epochs during training. Top: Histograms of MNIST-LR. Mid: Histograms of MNIST-MLP. Bottom: Hostograms of MNIST-CNN [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Margin histograms of CIFAR models at different epochs during training. Top: Histograms of CIFAR-LR. Mid: Histograms of CIFAR-MLP. Bottom: Histograms of CIFAR-CNN [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
read the original abstract

Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the \emph{average} margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that during standard training, deep models maximize the minimum margin to achieve high accuracy but simultaneously decrease the average margin, creating an intrinsic trade-off that reduces adversarial robustness. It provides empirical support for this dynamic and introduces a regularizer to explicitly promote the average margin. The regularized objective is asserted to remain Fisher-consistent, allowing asymptotic recovery of the Bayes optimal classifier, with experiments showing improved robustness from the regularizer.

Significance. If the observed trade-off is shown to be causal and the regularizer demonstrably improves robustness by targeting average margin (rather than incidental effects), the work would contribute a mechanistic explanation for adversarial vulnerability in standard training and a consistency-preserving regularization approach. The explicit retention of Fisher consistency is a methodological strength that distinguishes it from many heuristic robustness methods.

major comments (2)
  1. [Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.
  2. [Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.
minor comments (1)
  1. The abstract refers to 'extensive experiments' verifying robustness gains; the main text should include precise descriptions of the datasets, architectures, attack methods, and hyperparameter choices used to evaluate the regularizer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments highlight important points about causal evidence and the need for explicit justification of Fisher consistency. We address each below and will revise the manuscript to incorporate additional analysis and derivations.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.

    Authors: We agree that the current empirical observations, while showing the simultaneous maximization of minimum margin and reduction of average margin during training along with associated robustness degradation, do not fully isolate the margin effect from other dynamics such as curvature or feature learning. To strengthen the causal interpretation, we will add a controlled experiment in the revision, for example by using synthetic datasets where margin statistics can be directly modulated while holding other factors fixed, or by comparing training trajectories with matched curvature measures. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.

    Authors: We acknowledge that the abstract states the Fisher consistency without an explicit derivation. The full manuscript contains a population-level analysis showing that the regularizer term vanishes when the classifier converges to the Bayes optimal decision boundary (as the average margin approaches its maximum under the true conditional distribution). We will include a concise version of this derivation in the revised abstract and a dedicated paragraph in the main text to make the argument self-contained. We note that consistency is an asymptotic guarantee and do not claim it directly implies finite-sample robustness improvements, which are shown empirically. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observation plus Fisher-consistent regularizer with independent grounding

full rationale

The paper reports an empirical observation that standard training increases minimum margin while decreasing average margin, then introduces a regularizer to promote the latter while preserving Fisher-consistency (an asymptotic population property). No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that reduce any load-bearing claim to a tautology or to the input data by construction. The central claims rest on experimental verification and the standard definition of Fisher-consistency rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available so the ledger reflects what is implied by the claims. The regularizer almost certainly introduces at least one tunable hyperparameter; standard supervised learning assumptions are invoked for the consistency claim.

free parameters (1)
  • regularization coefficient
    Weight on the new average-margin term; must be chosen or tuned and directly affects the claimed robustness improvement.
axioms (1)
  • domain assumption Data are drawn i.i.d. from an underlying distribution
    Required for the Fisher-consistency claim that the regularized objective recovers the Bayes optimal classifier asymptotically.

pith-pipeline@v0.9.0 · 5664 in / 1287 out tokens · 29869 ms · 2026-05-24T15:30:58.491294+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Intriguingpropertiesofneuralnetworks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, andRobFergus. Intriguingpropertiesofneuralnetworks. In International Conference on Learning Representations (ICLR), 2014

  2. [2]

    I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing ad- versarial examples. InInternational Conference on Learning Representations (ICLR), 2015

  3. [3]

    Deep- fool: a simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep- fool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016

  4. [4]

    Towards evaluating the robustness of neural networks, 2017

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2017. arXiv:1608

  5. [5]

    ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models

    Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26, 2017

  6. [6]

    Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses

    Mohammad Hashemi, Greg Cusack, and Eric Keller. Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 25–36, 2018

  7. [7]

    Towards Query Efficient Black-box Attacks: An Input-free Perspective

    Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, and Dacheng Tao. Towards Query Efficient Black-box Attacks: An Input-free Perspective. InProceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 13–24, 2018

  8. [8]

    Berkay Celik, and Ananthram Swami

    Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical Black-Box Attacks Against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017

  9. [9]

    Efficient DefensesAgainstAdversarialAttacks

    Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Efficient DefensesAgainstAdversarialAttacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 39–49, 2017

  10. [10]

    Distillation as a defense to adversarial perturbations against deep neural networks

    Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. InIEEE Symposium on Security and Privacy, 2016

  11. [11]

    Parseval networks: Improving robustness to adversarial examples

    Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, pages 854–863, 2017

  12. [12]

    Towards deep learning model resisstant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning model resisstant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018

  13. [13]

    Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

    Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018

  14. [14]

    Sparse DNNs with Improved Adversarial Robustness

    Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with Improved Adversarial Robustness. InAdvances in Neural Information Processing Systems 31, pages 242–251, 2018. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 13

  15. [15]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

    Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018

  16. [16]

    Formal guarantees on the robust- ness of a classifier against adversarial manipulation

    Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robust- ness of a classifier against adversarial manipulation. InAdvances in Neural Information Processing Systems (NIPS), pages 2266–2276, 2017

  17. [17]

    Evaluating the robustness of neural networks: An extreme value theory approach

    Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. InInternational Conference on Learning Representations (ICLR), 2018

  18. [18]

    Efficient neural network robustness certification with general activation func- tions

    Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation func- tions. InAdvances in Neural Information Processing Systems, pages 4944–4953, 2018

  19. [19]

    Dhillon, and Luca Daniel

    Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, and Luca Daniel. Towards fast computation of certified robustness for relu networks. InICML, 2018

  20. [20]

    Xiao, and Russ Tedrake

    Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019

  21. [21]

    Robust- ness certification with refinement

    Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robust- ness certification with refinement. InInternational Conference on Learning Representations, 2019

  22. [22]

    Scaling provable adversarial defenses

    Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. InAdvances in Neural Information Processing Systems, pages 8400–8409, 2018

  23. [23]

    Zico Kolter

    Eric Wong and J. Zico Kolter. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. InICML, 2018

  24. [24]

    Semidefinite relax- ations for certifying robustness to adversarial examples

    Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relax- ations for certifying robustness to adversarial examples. InAdvances in Neural Information Processing Systems, pages 10900–10910, 2018

  25. [25]

    Certified defenses against adversarial examples

    Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. InInternational Conference on Learning Repre- sentations, 2018

  26. [26]

    Fast and Effective Robustness Certification

    Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and Effective Robustness Certification. InAdvances in Neural Information Processing Systems 31, pages 10802–10813. 2018

  27. [27]

    On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

    SvenGowal, KrishnamurthyDvijotham, RobertStanforth, RudyBunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models. InNeurIPS workshop on Security in Machine Learning. 2018

  28. [28]

    Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

    Leo Breiman. Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999

  29. [29]

    Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee

    Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: a new explanation for the effectiveness of voting methods.The Annals of Statistics, 26(5):1651–1686, 1998

  30. [30]

    On generalization bounds, projection profile, and margin distribution

    Ashutosh Garg, Sariel Har-Peled, and Dan Roth. On generalization bounds, projection profile, and margin distribution. InICML, 2002. 14 KAIWEN WU AND YAOLIANG YU

  31. [31]

    Margin distribution and learning algorithms

    Ashutosh Garg and Dan Roth. Margin distribution and learning algorithms. In ICML, 2003

  32. [32]

    Multi-classoptimalmargindistributionmachine

    TengZhangandZhi-HuaZhou. Multi-classoptimalmargindistributionmachine. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, pages 4063–4071. JMLR. org, 2017

  33. [33]

    The Implicit Bias of Gra- dient Descent on Separable Data

    Daniel Soudry, Elad Hoffer, and Nathan Srebro. The Implicit Bias of Gra- dient Descent on Separable Data. InInternational Conference on Learning Representations, 2018

  34. [34]

    Jaakkola

    Guang-He Lee, David Alvarez-Melis, and Tommi S. Jaakkola. Towards ro- bust, locally linear deep networks. InInternational Conference on Learning Representations, 2019

  35. [35]

    Provable robustness of relu networks via maximization of linear regions

    Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. InAISTATS, 2019

  36. [36]

    Deep defense: Training dnns with improved adversarial robustness

    Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018

  37. [37]

    Implicit bias of gradient descent on linear convolutional networks

    Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. InAdvances in Neural Information Processing Systems, pages 9461–9471, 2018

  38. [38]

    Gradient descent aligns the layers of deep linear networks

    Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. InInternational Conference on Learning Representations, 2019

  39. [39]

    Koltchinskii and D

    V. Koltchinskii and D. Panchenko. Empirical Margin Distributions and Bound- ing the Generalization Error of Combined Classifiers.The Annals of Statistics, 30(1):1–50, 2002

  40. [40]

    Convexity, classifi- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

    Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classifi- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006

  41. [41]

    Dill, Kyle Julian, and Mykel J

    Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochender- fer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117, 2017

  42. [42]

    Defensive Quantization: When Efficiency Meets Robustness

    Ji Lin, Chuang Gan, and Song Han. Defensive Quantization: When Efficiency Meets Robustness. InInternational Conference on Learning Representations, 2019. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 15 Appendix A. Proofs Proof of Proposition 1.Indeed, according to the definitions in(5) and (2) we have r(x) = inf{∥z∥ : x + z̸∈ Fˆy(x), x + z∈X}(17) = inf{∥(x ...

  43. [43]

    We first consider the caseη > 1

  44. [44]

    The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0

    For the caseη < 1 2, the proof is similar. The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0. Thus∃δ1 > 0 such that ∀α∈ (0, δ1) satisfies C φ η (α) < C φ η (0). The right hand derivative ofψ at zero is negative, thus there∃δ2 > 0 such that ∀α∈ (0, δ2) satisfies ψ(α) < ψ(0). Notice that ψ(α) is constant whenα≤ 0. Thus ∀α∈ (0, δ2) satisfies C ψ η (α) < C ψ...

  45. [45]

    The minimum margin keeps increasing while the average margin keeps decreasing

    A similar trade-off between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the figure is very small (from0.94 to 1.08), in which case the Lipschitz constant estimation ...