Nearest Neighbor Projection Removal Adversarial Training
Pith reviewed 2026-05-18 17:48 UTC · model grok-4.3
The pith
Removing projections onto nearest inter-class neighbors in feature space during adversarial training reduces the Lipschitz constant and improves robustness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents Nearest Neighbor Projection Removal Adversarial Training, in which the nearest inter-class neighbor is located for each sample in feature space and its projection is removed to enforce stronger separability. The same correction is applied to clean samples. The authors demonstrate theoretically that this logits correction reduces the Lipschitz constant of the network, which lowers Rademacher complexity and thereby improves generalization as well as resistance to adversarial perturbations.
What carries the argument
Nearest neighbor projection removal in feature space, which subtracts the component of a sample's representation along the vector to its closest inter-class neighbor.
If this is right
- Competitive or superior robust accuracy alongside improved clean accuracy on CIFAR-10, CIFAR-100, SVHN, and TinyImagenet.
- Explicit reduction of inter-class feature overlap that lowers adversarial susceptibility.
- Lower Rademacher complexity that yields improved generalization bounds.
- A training procedure that directly targets inter-class proximity in addition to gradient-based adversarial objectives.
Where Pith is reading between the lines
- The projection removal step could be inserted into other adversarial training pipelines without changing their loss functions.
- Similar nearest-neighbor corrections might reduce overlap in learned representations for tasks beyond image classification.
- The approach suggests that explicit geometric interventions in feature space can complement purely gradient-driven robustness methods.
Load-bearing premise
Removing the projection onto the nearest inter-class neighbor enforces stronger separability without discarding information necessary for correct classification.
What would settle it
Measuring whether robust accuracy on standard benchmarks falls below that of vanilla adversarial training when both are evaluated under the same strong attack such as multi-step PGD.
Figures
read the original abstract
Deep neural networks have exhibited impressive performance in image classification tasks but remain vulnerable to adversarial examples. Standard adversarial training enhances robustness but typically fails to explicitly address inter-class feature overlap, a significant contributor to adversarial susceptibility. In this work, we introduce a novel adversarial training framework that actively mitigates inter-class proximity by projecting out inter-class dependencies from adversarial and clean samples in the feature space. Specifically, our approach first identifies the nearest inter-class neighbors for each adversarial sample and subsequently removes projections onto these neighbors to enforce stronger feature separability. Theoretically, we demonstrate that our proposed logits correction reduces the Lipschitz constant of neural networks, thereby lowering the Rademacher complexity, which directly contributes to improved generalization and robustness. Extensive experiments across standard benchmarks including CIFAR-10, CIFAR-100, SVHN, and TinyImagenet show that our method demonstrates strong performance that is competitive with leading adversarial training techniques, highlighting significant achievements in both robust and clean accuracy. Our findings reveal the importance of addressing inter-class feature proximity explicitly to bolster adversarial robustness in DNNs. The code is available in the supplementary material.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Nearest Neighbor Projection Removal Adversarial Training, which identifies nearest inter-class neighbors in feature space for adversarial and clean samples and removes their projections to enforce stronger separability. It claims that the resulting logits correction reduces the network Lipschitz constant, thereby lowering Rademacher complexity and directly improving generalization and robustness. Experiments on CIFAR-10, CIFAR-100, SVHN and TinyImageNet are reported as competitive with leading adversarial training methods, with code provided in supplementary material.
Significance. If the central theoretical claim holds, the work would be significant as an explicit mechanism for reducing inter-class feature overlap during adversarial training, potentially improving the clean-robust accuracy trade-off. The availability of code is a positive for reproducibility. The approach could open a direction for feature-space interventions that complement standard min-max adversarial objectives.
major comments (2)
- [Abstract] Abstract: the claim that the logits correction reduces the Lipschitz constant of neural networks (and thereby lowers Rademacher complexity) is presented without derivation steps, assumptions, or proof sketch, yet this reduction is asserted to directly contribute to improved generalization and robustness.
- [Abstract] Abstract: the connection from reduced Rademacher complexity to adversarial robustness is not derived; standard Rademacher bounds control the clean generalization gap, and no argument is supplied showing how the Lipschitz reduction extends to the robust risk or the min-max adversarial training objective.
minor comments (2)
- [Abstract] Abstract: experimental results are summarized only as 'competitive' and 'strong performance' without numerical values, baseline comparisons, or ablation details.
- [Abstract] Abstract: the projection step assumes that removing the nearest inter-class neighbor projection enforces separability without discarding information required for correct classification, but this assumption receives no further discussion or empirical validation.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address the two major comments on the abstract below. Both comments correctly identify that the abstract is overly concise on the theoretical claims; we have revised the abstract and added a short proof sketch plus an explicit link to robust risk in the theory section.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the logits correction reduces the Lipschitz constant of neural networks (and thereby lowers Rademacher complexity) is presented without derivation steps, assumptions, or proof sketch, yet this reduction is asserted to directly contribute to improved generalization and robustness.
Authors: We agree that the abstract, constrained by length, omitted the derivation steps and assumptions. In the revised manuscript we have expanded the abstract to include a brief proof sketch: under the assumption that the projection removal operator is a contraction with norm less than 1, the corrected logits satisfy ||f(x) - f(y)|| <= L' ||x - y|| with L' < L, where L is the original Lipschitz constant; this directly lowers the Rademacher complexity bound via the standard Lipschitz-to-Rademacher relation. The full derivation appears in Section 3. revision: yes
-
Referee: [Abstract] Abstract: the connection from reduced Rademacher complexity to adversarial robustness is not derived; standard Rademacher bounds control the clean generalization gap, and no argument is supplied showing how the Lipschitz reduction extends to the robust risk or the min-max adversarial training objective.
Authors: The referee correctly notes that standard Rademacher bounds address clean generalization. We have added a paragraph to the revised abstract and expanded Section 3 to show the extension: because the Lipschitz constant bounds the sensitivity of the network to input perturbations, the same reduction controls the gap between clean and robust risk; specifically, we derive that the robust risk is bounded by the clean risk plus an additive term proportional to the Lipschitz constant times the perturbation budget, which is tightened by our projection removal. This argument is now explicitly connected to the min-max objective. revision: yes
Circularity Check
No circularity: theoretical claim presented as independent demonstration
full rationale
The paper states it 'demonstrate[s] that our proposed logits correction reduces the Lipschitz constant of neural networks, thereby lowering the Rademacher complexity' (abstract). This is framed as a first-principles theoretical result rather than a fit, renaming, or self-citation reduction. No equations are exhibited that define the correction in terms of the Lipschitz quantity itself, nor is any load-bearing premise imported solely via overlapping-author citation. The projection step is described operationally (identify nearest inter-class neighbor and remove its projection) without reducing the claimed bound to the input data by construction. Concerns about Rademacher controlling only clean generalization (versus robust risk) are correctness or assumption issues, not circularity. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Identifying and projecting out the nearest inter-class neighbor in feature space reduces inter-class dependencies without harming classification performance.
Reference graph
Works this paper leans on
-
[1]
Rademacher and gaussian complexities: Risk bounds and structural results
Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of machine learning research, 3(Nov):463–482,
-
[2]
Spectrally-normalized margin bounds for neural networks
Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. InNeurIPS, 2017. 4
work page 2017
-
[3]
Jiping Bi, Yongchao Song, Yahong Jiang, Lijun Sun, Xuan Wang, Zhaowei Liu, Jindong Xu, Siwen Quan, Zhe Dai, and Weiqing Yan. Lane detection for autonomous driving: Comprehensive reviews, current challenges, and future pre- dictions.IEEE Transactions on Intelligent Transportation Systems, 2025. 1
work page 2025
-
[4]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InS&P, 2017. 4
work page 2017
-
[5]
Unlabeled data improves adver- sarial robustness
Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adver- sarial robustness. InNeurIPS, 2019. 2
work page 2019
-
[6]
Parseval networks: Improv- ing robustness to adversarial examples
Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improv- ing robustness to adversarial examples. InICML, 2017. 4
work page 2017
-
[7]
Minimally distorted adversarial examples with a fast adaptive boundary attack
Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. arXiv preprint arXiv:1907.02044, 2019. 2
-
[8]
Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. InICML, 2020. 6
work page 2020
-
[9]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. 6
work page 2009
-
[10]
Adversar- ial vulnerability for any classifier
Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversar- ial vulnerability for any classifier. InProceedings of the 32nd International Conference on Neural Information Processing Systems, page 1186–1195, Red Hook, NY , USA, 2018. Cur- ran Associates Inc. 3
work page 2018
-
[11]
Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analy- sis of classifiers’ robustness to adversarial perturbations.Ma- chine learning, 107(3):481–508, 2018. 3
work page 2018
-
[12]
R. A. Fisher. The use of multiple measurements in taxonomic problems.Annals of Eugenics, 7(2):179–188, 1936. 8
work page 1936
-
[13]
Explaining and harnessing adversarial examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InICLR,
-
[14]
Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael J Cree. Regularisation of neural networks by enforc- ing lipschitz continuity.Machine Learning, 110(2):393–416,
-
[15]
Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. Uncovering the limits of adversarial training against norm-bounded adversarial examples.arXiv preprint arXiv:2010.03593, 2020. 3
-
[16]
Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Im- proving robustness using generated data.Advances in Neural Information Processing Systems, 34, 2021. 6
work page 2021
-
[17]
Black-box adversarial attacks with limited queries and information
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. InICML, 2018. 3
work page 2018
-
[18]
Chengze Jiang, Junkai Wang, Minjing Dong, Jie Gui, Xinli Shi, Yuan Cao, Yuan Yan Tang, and James Tin-Yau Kwok. Improving fast adversarial training via self-knowledge guid- ance.IEEE Transactions on Information Forensics and Se- curity, 20:3772–3787, 2025. 1, 3
work page 2025
-
[19]
Contrastive neu- ral processes for self-supervised learning
Konstantinos Kallidromitis, Denis Gudovskiy, Kozuka Kazuki, Ohama Iku, and Luca Rigazio. Contrastive neu- ral processes for self-supervised learning. InProceedings of The 13th Asian Conference on Machine Learning, pages 594–609. PMLR, 2021. 8
work page 2021
-
[20]
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.Technical Report, Uni- versity of Toronto, 2009. 5
work page 2009
-
[21]
Huafeng Kuang, Hong Liu, Yongjian Wu, and Rongrong Ji. Semantically consistent visual representation for adversarial robustness.IEEE Transactions on Information Forensics and Security, 18:5608–5622, 2023. 3, 6
work page 2023
-
[22]
Squeeze training for adversarial robustness
Qizhang Li, Yiwen Guo, Wangmeng Zuo, and Hao Chen. Squeeze training for adversarial robustness. InICLR, 2023. 1, 3, 6
work page 2023
-
[23]
Towards deep learning models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InICML, 2018. 1, 2, 3, 6
work page 2018
-
[24]
Adversarial defense by restricting the hidden space of deep neural networks
Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, and Ling Shao. Adversarial defense by restricting the hidden space of deep neural networks. In 2019 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 3384–3393, 2019. 1
work page 2019
-
[25]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning.NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011,
work page 2011
-
[26]
Exploring generalization in deep learning
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring generalization in deep learning. InNeurIPS, 2017. 5
work page 2017
-
[27]
Trace ratio criterion for fea- ture selection
Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. Trace ratio criterion for fea- ture selection. InAAAI Conference on Artificial Intelligence,
-
[28]
Network generalization prediction for safety critical tasks in novel operating domains
Molly O’Brien, Mike Medoff, Julia Bukowski, and Gre- gory D Hager. Network generalization prediction for safety critical tasks in novel operating domains. InProceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision, pages 614–622, 2022. 1
work page 2022
-
[29]
Rethinking softmax cross-entropy loss for ad- versarial robustness
Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. Rethinking softmax cross-entropy loss for ad- versarial robustness. InInternational Conference on Learn- ing Representations, 2020
work page 2020
-
[30]
Overfitting in adversarially robust deep learning
Leslie Rice, Eric Wong, and J Zico Kolter. Overfitting in adversarially robust deep learning. InICML, 2020. 3, 7
work page 2020
- [31]
-
[32]
Adi Shamir, Odelia Melamed, and Oriel BenShmuel. The dimpled manifold model of adversarial examples in machine learning.arXiv preprint arXiv:2106.10151, 2022. 3
-
[33]
Satoshi Suzuki, Shin’ya Yamaguchi, Shoichiro Takeda, Sek- itoshi Kanai, Naoki Makishima, Atsushi Ando, and Ryo Masumura. Adversarial finetuning with latent representa- tion constraint to mitigate accuracy-robustness tradeoff. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 4367–4378, 2023. 3, 6
work page 2023
-
[34]
In- triguing properties of neural networks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InICLR, 2013. 1
work page 2013
-
[35]
Robustness may be at odds with accuracy
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InInternational Conference on Learning Representations, 2019. 2, 3
work page 2019
-
[36]
Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 8
work page 2008
-
[37]
Desheng Wang, Weidong Jin, and Yunpu Wu. Between-class adversarial training for improving adversarial robustness of image classification.Sensors, 23(6), 2023
work page 2023
-
[38]
Improving adversarial robustness requires revisiting misclassified examples
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InICLR, 2020. 1, 2, 6
work page 2020
-
[39]
Futa Kai Waseda, Ching-Chun Chang, and Isao Echizen. Re- thinking invariance regularization in adversarial training to improve robustness-accuracy trade-off. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 1, 3, 6
work page 2025
-
[40]
Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems, 33, 2020. 2
work page 2020
-
[41]
Bridg- ing the gap: Rademacher complexity in robust and standard generalization
Jiancong Xiao, Ruoyu Sun, Qi Long, and Weijie Su. Bridg- ing the gap: Rademacher complexity in robust and standard generalization. InThe Thirty Seventh Annual Conference on Learning Theory, pages 5074–5075, 2024. 5
work page 2024
-
[42]
An orthogonal classifier for improving the adversarial robustness of neural networks
Cong Xu, Xiang Li, and Min Yang. An orthogonal classifier for improving the adversarial robustness of neural networks. Inf. Sci., 591(C):251–262, 2022
work page 2022
-
[43]
Yiqun Xu, Zhen Wei, Zhehao Li, Xing Wei, and Yang Lu. Dynamic weighting loss for decision boundary adjustment based on robust distance in adversarial training. InInterna- tional Conference on Multimedia and Expo, 2025. 1, 2, 6
work page 2025
-
[44]
Rademacher complexity for adversarially robust generaliza- tion
Dong Yin, Ramchandran Kannan, and Peter Bartlett. Rademacher complexity for adversarially robust generaliza- tion. InICML, 2019. 5
work page 2019
-
[45]
Spectral norm regular- ization for improving the generalizability of deep learning,
Yuichi Yoshida and Takeru Miyato. Spectral norm regular- ization for improving the generalizability of deep learning,
-
[46]
John R Zech, Marcus A Badgeley, Manway Liu, Anthony B Costa, Joseph J Titano, and Eric Karl Oermann. Variable generalization performance of a deep learning model to de- tect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine, 15(11):e1002683, 2018. 1
work page 2018
-
[47]
Defense against adversar- ial attacks using feature scattering-based adversarial training
Haichao Zhang and Jianyu Wang. Defense against adversar- ial attacks using feature scattering-based adversarial training. InNeurIPS, 2019. 4
work page 2019
-
[48]
Xing, Laurent El Ghaoui, and Michael I
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, 2019. 2, 6, 7, 8
work page 2019
-
[49]
Attacks which do not kill training make adversarial learning stronger
Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. InIn- ternational Conference on Machine Learning, pages 11278– 11287. PMLR, 2020. 6
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.