Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin
Pith reviewed 2026-05-24 15:30 UTC · model grok-4.3
The pith
Deep models maximize the minimum margin for accuracy while decreasing the average margin, which reduces adversarial robustness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
During training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the average margin hence hurting robustness. A new regularizer explicitly promotes average margin and leads to better robustness in experiments while remaining Fisher-consistent.
What carries the argument
The trade-off between minimum margin and average margin, countered by a Fisher-consistent regularizer that increases average margin.
If this is right
- Adding the regularizer produces models with higher average margins and measurably better resistance to adversarial attacks.
- The regularized loss remains Fisher-consistent and can still recover the Bayes optimal classifier in the large-sample limit.
- Accuracy and robustness exhibit an intrinsic tension under current deep-model training dynamics.
- The same margin-reduction pattern appears across multiple architectures and datasets.
Where Pith is reading between the lines
- The trade-off may arise from any optimizer that aggressively enlarges the smallest margin, not only from the particular loss used here.
- Similar margin dynamics could appear in non-neural classifiers if they are trained by margin-maximizing procedures.
- Combining the regularizer with existing defense techniques might compound robustness gains.
- The pattern could be tested on tabular or sequential data to check whether it is specific to image classification.
Load-bearing premise
The reduction in average margin during ordinary training is the main driver of lost robustness, and explicitly raising it will improve robustness without creating new failure modes.
What would settle it
Training with the proposed regularizer yields no measurable gain in adversarial accuracy on standard benchmarks, or the regularized models lose clean accuracy at the same rate as the gain in robustness.
Figures
read the original abstract
Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the \emph{average} margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. To further address this issue, we propose a new regularizer to explicitly promote average margin, and we verify through extensive experiments that it does lead to better robustness. Our regularized objective remains Fisher-consistent, hence asymptotically can still recover the Bayes optimal classifier.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that during standard training, deep models maximize the minimum margin to achieve high accuracy but simultaneously decrease the average margin, creating an intrinsic trade-off that reduces adversarial robustness. It provides empirical support for this dynamic and introduces a regularizer to explicitly promote the average margin. The regularized objective is asserted to remain Fisher-consistent, allowing asymptotic recovery of the Bayes optimal classifier, with experiments showing improved robustness from the regularizer.
Significance. If the observed trade-off is shown to be causal and the regularizer demonstrably improves robustness by targeting average margin (rather than incidental effects), the work would contribute a mechanistic explanation for adversarial vulnerability in standard training and a consistency-preserving regularization approach. The explicit retention of Fisher consistency is a methodological strength that distinguishes it from many heuristic robustness methods.
major comments (2)
- [Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.
- [Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.
minor comments (1)
- The abstract refers to 'extensive experiments' verifying robustness gains; the main text should include precise descriptions of the datasets, architectures, attack methods, and hyperparameter choices used to evaluate the regularizer.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The two major comments highlight important points about causal evidence and the need for explicit justification of Fisher consistency. We address each below and will revise the manuscript to incorporate additional analysis and derivations.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the reduction in average margin during training is the primary driver of reduced robustness (rather than correlated factors such as curvature or feature quality) is load-bearing, yet the manuscript supplies no controlled ablation or counterfactual experiment that isolates margin change from other training dynamics; without this, the causal interpretation of the trade-off remains unsupported.
Authors: We agree that the current empirical observations, while showing the simultaneous maximization of minimum margin and reduction of average margin during training along with associated robustness degradation, do not fully isolate the margin effect from other dynamics such as curvature or feature learning. To strengthen the causal interpretation, we will add a controlled experiment in the revision, for example by using synthetic datasets where margin statistics can be directly modulated while holding other factors fixed, or by comparing training trajectories with matched curvature measures. revision: yes
-
Referee: [Abstract] Abstract: the assertion that the regularized objective remains Fisher-consistent is stated without a derivation showing that the added regularizer term vanishes at the Bayes classifier or any population-level analysis; this must be supplied explicitly, as Fisher consistency is an asymptotic property that does not automatically guarantee finite-sample robustness gains.
Authors: We acknowledge that the abstract states the Fisher consistency without an explicit derivation. The full manuscript contains a population-level analysis showing that the regularizer term vanishes when the classifier converges to the Bayes optimal decision boundary (as the average margin approaches its maximum under the true conditional distribution). We will include a concise version of this derivation in the revised abstract and a dedicated paragraph in the main text to make the argument self-contained. We note that consistency is an asymptotic guarantee and do not claim it directly implies finite-sample robustness improvements, which are shown empirically. revision: yes
Circularity Check
No circularity: empirical observation plus Fisher-consistent regularizer with independent grounding
full rationale
The paper reports an empirical observation that standard training increases minimum margin while decreasing average margin, then introduces a regularizer to promote the latter while preserving Fisher-consistency (an asymptotic population property). No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that reduce any load-bearing claim to a tautology or to the input data by construction. The central claims rest on experimental verification and the standard definition of Fisher-consistency rather than any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization coefficient
axioms (1)
- domain assumption Data are drawn i.i.d. from an underlying distribution
Reference graph
Works this paper leans on
-
[1]
Intriguingpropertiesofneuralnetworks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, andRobFergus. Intriguingpropertiesofneuralnetworks. In International Conference on Learning Representations (ICLR), 2014
work page 2014
-
[2]
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing ad- versarial examples. InInternational Conference on Learning Representations (ICLR), 2015
work page 2015
-
[3]
Deep- fool: a simple and accurate method to fool deep neural networks
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep- fool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016
work page 2016
-
[4]
Towards evaluating the robustness of neural networks, 2017
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2017. arXiv:1608
work page 2017
-
[5]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks Without Training Substitute Models. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26, 2017
work page 2017
-
[6]
Mohammad Hashemi, Greg Cusack, and Eric Keller. Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gra- dient Obfuscation Defenses. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 25–36, 2018
work page 2018
-
[7]
Towards Query Efficient Black-box Attacks: An Input-free Perspective
Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, and Dacheng Tao. Towards Query Efficient Black-box Attacks: An Input-free Perspective. InProceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 13–24, 2018
work page 2018
-
[8]
Berkay Celik, and Ananthram Swami
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical Black-Box Attacks Against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017
work page 2017
-
[9]
Efficient DefensesAgainstAdversarialAttacks
Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Efficient DefensesAgainstAdversarialAttacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 39–49, 2017
work page 2017
-
[10]
Distillation as a defense to adversarial perturbations against deep neural networks
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. InIEEE Symposium on Security and Privacy, 2016
work page 2016
-
[11]
Parseval networks: Improving robustness to adversarial examples
Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, pages 854–863, 2017
work page 2017
-
[12]
Towards deep learning model resisstant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning model resisstant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018
work page 2018
-
[13]
Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making Machine Learning Robust Against Adversarial Inputs.Communications of the ACM, 61 (7):56–66, 2018
work page 2018
-
[14]
Sparse DNNs with Improved Adversarial Robustness
Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with Improved Adversarial Robustness. InAdvances in Neural Information Processing Systems 31, pages 242–251, 2018. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 13
work page 2018
-
[15]
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples
Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018
work page 2018
-
[16]
Formal guarantees on the robust- ness of a classifier against adversarial manipulation
Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robust- ness of a classifier against adversarial manipulation. InAdvances in Neural Information Processing Systems (NIPS), pages 2266–2276, 2017
work page 2017
-
[17]
Evaluating the robustness of neural networks: An extreme value theory approach
Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. InInternational Conference on Learning Representations (ICLR), 2018
work page 2018
-
[18]
Efficient neural network robustness certification with general activation func- tions
Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation func- tions. InAdvances in Neural Information Processing Systems, pages 4944–4953, 2018
work page 2018
-
[19]
Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, and Luca Daniel. Towards fast computation of certified robustness for relu networks. InICML, 2018
work page 2018
-
[20]
Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019
work page 2019
-
[21]
Robust- ness certification with refinement
Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robust- ness certification with refinement. InInternational Conference on Learning Representations, 2019
work page 2019
-
[22]
Scaling provable adversarial defenses
Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. InAdvances in Neural Information Processing Systems, pages 8400–8409, 2018
work page 2018
-
[23]
Eric Wong and J. Zico Kolter. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. InICML, 2018
work page 2018
-
[24]
Semidefinite relax- ations for certifying robustness to adversarial examples
Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relax- ations for certifying robustness to adversarial examples. InAdvances in Neural Information Processing Systems, pages 10900–10910, 2018
work page 2018
-
[25]
Certified defenses against adversarial examples
Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. InInternational Conference on Learning Repre- sentations, 2018
work page 2018
-
[26]
Fast and Effective Robustness Certification
Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and Effective Robustness Certification. InAdvances in Neural Information Processing Systems 31, pages 10802–10813. 2018
work page 2018
-
[27]
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models
SvenGowal, KrishnamurthyDvijotham, RobertStanforth, RudyBunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models. InNeurIPS workshop on Security in Machine Learning. 2018
work page 2018
-
[28]
Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999
Leo Breiman. Prediction Games and Arcing Algorithms.Neural Computation, 11(7):1493–1517, 1999
work page 1999
-
[29]
Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee
Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: a new explanation for the effectiveness of voting methods.The Annals of Statistics, 26(5):1651–1686, 1998
work page 1998
-
[30]
On generalization bounds, projection profile, and margin distribution
Ashutosh Garg, Sariel Har-Peled, and Dan Roth. On generalization bounds, projection profile, and margin distribution. InICML, 2002. 14 KAIWEN WU AND YAOLIANG YU
work page 2002
-
[31]
Margin distribution and learning algorithms
Ashutosh Garg and Dan Roth. Margin distribution and learning algorithms. In ICML, 2003
work page 2003
-
[32]
Multi-classoptimalmargindistributionmachine
TengZhangandZhi-HuaZhou. Multi-classoptimalmargindistributionmachine. In Proceedings of the 34th International Conference on Machine Learning- Volume 70, pages 4063–4071. JMLR. org, 2017
work page 2017
-
[33]
The Implicit Bias of Gra- dient Descent on Separable Data
Daniel Soudry, Elad Hoffer, and Nathan Srebro. The Implicit Bias of Gra- dient Descent on Separable Data. InInternational Conference on Learning Representations, 2018
work page 2018
- [34]
-
[35]
Provable robustness of relu networks via maximization of linear regions
Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. InAISTATS, 2019
work page 2019
-
[36]
Deep defense: Training dnns with improved adversarial robustness
Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018
work page 2018
-
[37]
Implicit bias of gradient descent on linear convolutional networks
Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. InAdvances in Neural Information Processing Systems, pages 9461–9471, 2018
work page 2018
-
[38]
Gradient descent aligns the layers of deep linear networks
Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. InInternational Conference on Learning Representations, 2019
work page 2019
-
[39]
V. Koltchinskii and D. Panchenko. Empirical Margin Distributions and Bound- ing the Generalization Error of Combined Classifiers.The Annals of Statistics, 30(1):1–50, 2002
work page 2002
-
[40]
Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classifi- cation, and risk bounds.Journal of the American Statistical Association, 101 (473):138–156, 2006
work page 2006
-
[41]
Dill, Kyle Julian, and Mykel J
Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochender- fer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117, 2017
work page 2017
-
[42]
Defensive Quantization: When Efficiency Meets Robustness
Ji Lin, Chuang Gan, and Song Han. Defensive Quantization: When Efficiency Meets Robustness. InInternational Conference on Learning Representations, 2019. TRADE-OFF BETWEEN MINIMUM AND A VERAGE MARGIN 15 Appendix A. Proofs Proof of Proposition 1.Indeed, according to the definitions in(5) and (2) we have r(x) = inf{∥z∥ : x + z̸∈ Fˆy(x), x + z∈X}(17) = inf{∥(x ...
work page 2019
-
[43]
We first consider the caseη > 1
-
[44]
The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0
For the caseη < 1 2, the proof is similar. The derivative ofC φ η (α) at zero is(2η− 1)φ′(0) < 0. Thus∃δ1 > 0 such that ∀α∈ (0, δ1) satisfies C φ η (α) < C φ η (0). The right hand derivative ofψ at zero is negative, thus there∃δ2 > 0 such that ∀α∈ (0, δ2) satisfies ψ(α) < ψ(0). Notice that ψ(α) is constant whenα≤ 0. Thus ∀α∈ (0, δ2) satisfies C ψ η (α) < C ψ...
-
[45]
The minimum margin keeps increasing while the average margin keeps decreasing
A similar trade-off between minimum and average margin can be observed as discussed in Sections 3 and 5. The minimum margin keeps increasing while the average margin keeps decreasing. The only exception is the average margin of MNIST-CNN. But the range of margin in the figure is very small (from0.94 to 1.08), in which case the Lipschitz constant estimation ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.