arxiv: 2512.10275 · v2 · submitted 2025-12-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Hongsin Lee , Hye Won Chung

Authors on Pith no claims yet

Pith reviewed 2026-05-16 23:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial distillationrobustness transfersample weightingtransfer consistencyadversarial trainingstudent-teacher

0 comments

The pith

Reweighting distillation samples by adversarial transferability yields more robust student networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that in adversarial distillation, simply using a stronger teacher does not always produce a more robust student because of robust saturation. Instead, the key is adversarial transferability, which measures how many of the student's adversarial examples still fool the teacher. By adaptively weighting each training sample according to this transferability, the method improves the transfer of robustness. This approach requires no extra computation and leads to better performance against strong attacks like AutoAttack on standard datasets.

Core claim

The central discovery is that adversarial transferability, defined as the fraction of student-generated adversarial examples that remain effective against the teacher, is a crucial factor for successful robustness transfer. Based on this, SAAD reweights training examples sample-wise by their transferability to enhance consistency in adversarial distillation.

What carries the argument

Sample-wise Adaptive Adversarial Distillation (SAAD), which measures and uses adversarial transferability to reweight training samples during distillation.

If this is right

Stronger teachers can now be used more effectively in distillation without saturation limiting gains.
Student robustness improves consistently across CIFAR-10, CIFAR-100, and Tiny-ImageNet under AutoAttack.
Training can incorporate state-of-the-art robust teachers more reliably.
The method adds no computational overhead beyond standard adversarial training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Transferability could serve as a general signal for adjusting training dynamics in other robustness methods.
If transferability varies during training, dynamically updating weights might further enhance results.
Applying similar weighting in non-adversarial distillation might improve other transfer tasks.

Load-bearing premise

Adversarial transferability measured on the current student acts as a stable and causal driver of final robustness rather than merely correlating with other training factors.

What would settle it

An experiment where samples are reweighted by transferability but the resulting student shows no improvement or degradation in AutoAttack robustness compared to unweighted distillation would falsify the claim.

Figures

Figures reproduced from arXiv: 2512.10275 by Hongsin Lee, Hye Won Chung.

**Figure 1.** Figure 1: Adversarial distillation results on CIFAR-10 with a ResNet-18 student. Detailed teacher information and full [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: (a) Density histograms of teacher-logit entropies on student-generated PGD-20 adversarial training inputs for two ERTs. (b) Same, but for IRTs. (c) PGD-20 robust accuracy on training and test sets across epochs for students distilled from individual teachers within the ERT and IRT groups. Solid lines indicate group-wise averages, and shaded regions represent standard deviations across teachers in each grou… view at source ↗

**Figure 3.** Figure 3: (a) Teachers with lower entropy on student-generated PGD inputs induce higher adversarial variance in the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Density histograms of teacher-logit entropies on student-generated PGD-20 adversarial training input, separated into TAS and Non-TAS groups. (b) Bar chart of adversarial variance when training only on the TAS subset versus the Non-TAS subset (for each teacher). (c) PGD-20 robust accuracy on train (dashed) and test (solid) over epochs. correlation between the TAS ratio and the student model’s robustness… view at source ↗

read the original abstract

Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAAD reweights adversarial distillation samples by measured transferability from student attacks and reports robustness gains, but the abstract gives no numbers or controls to separate it from difficulty-based weighting.

read the letter

The main thing to know is that this paper proposes Sample-wise Adaptive Adversarial Distillation, which reweights each training example in the distillation loss according to how many student-generated adversarial examples fool the teacher. They compute this quantity on the fly from the PGD attacks already used in training, so there is no extra cost. The authors argue this addresses robust saturation better than simply using stronger teachers, and they report consistent AutoAttack improvements on CIFAR-10, CIFAR-100, and Tiny-ImageNet over prior distillation methods.

Referee Report

2 major / 2 minor

Summary. The paper proposes Sample-wise Adaptive Adversarial Distillation (SAAD) to address robust saturation in adversarial distillation, where stronger teachers do not always produce more robust students. It identifies adversarial transferability—the fraction of student-generated adversarial examples that fool the teacher—as a key factor beyond capacity mismatch, and introduces a reweighting scheme for the distillation loss based on this quantity computed on-the-fly from existing PGD attacks. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet are reported to show consistent AutoAttack robustness gains over prior methods at no extra cost.

Significance. If the gains are shown to arise specifically from transferability rather than correlated sample properties, the approach would provide a practical, zero-overhead improvement to adversarial distillation pipelines for compact robust models.

major comments (2)

[§4] §4 (Experiments): The reported improvements over baselines lack an ablation that substitutes an equivalent non-transferability signal (e.g., reweighting by per-sample loss or gradient norm from the same PGD attacks) to test whether gains vanish, leaving open the possibility that SAAD is a proxy for known difficulty-based reweighting rather than a causal driver of transfer consistency.
[§3.2] §3.2 (Method): The transferability definition is an observable computed directly from student-teacher agreement on adversarial examples; the manuscript provides no analysis (e.g., correlation tables or controlled substitution experiments) demonstrating that this quantity is not reducible to standard hardness metrics already known to affect robust training dynamics.

minor comments (2)

[Abstract] Abstract: The claim of 'consistent improvements' is stated without any numerical deltas, baseline names, or statistical details, making it difficult to assess the practical magnitude of the contribution from the summary alone.
[§4.1] §4.1: Robustness metrics are presented without reporting standard deviations across multiple random seeds or formal statistical tests, which is needed to support the 'consistently improves' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the contribution of transferability in adversarial distillation. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (Experiments): The reported improvements over baselines lack an ablation that substitutes an equivalent non-transferability signal (e.g., reweighting by per-sample loss or gradient norm from the same PGD attacks) to test whether gains vanish, leaving open the possibility that SAAD is a proxy for known difficulty-based reweighting rather than a causal driver of transfer consistency.

Authors: We agree that an explicit ablation isolating transferability from standard difficulty signals would strengthen the causal claim. In the revised manuscript we will add this ablation: using the identical PGD attacks already computed for SAAD, we will reweight samples by (i) per-sample adversarial loss and (ii) gradient norm, then compare AutoAttack robustness against the original transferability weighting. This will directly test whether the observed gains persist only when the transferability signal is used. revision: yes
Referee: [§3.2] §3.2 (Method): The transferability definition is an observable computed directly from student-teacher agreement on adversarial examples; the manuscript provides no analysis (e.g., correlation tables or controlled substitution experiments) demonstrating that this quantity is not reducible to standard hardness metrics already known to affect robust training dynamics.

Authors: We acknowledge that the current manuscript does not include explicit correlation tables or substitution experiments. In the revision we will add (a) Pearson and Spearman correlations between transferability scores and per-sample loss / gradient norms across training epochs on CIFAR-10/100, and (b) a controlled substitution experiment that replaces the transferability weight with a hardness-based weight while keeping all other training elements fixed. These additions will quantify the degree of overlap and demonstrate that transferability captures teacher-specific alignment information beyond generic hardness. revision: yes

Circularity Check

0 steps flagged

No significant circularity: weighting defined from observable quantity with experimental support

full rationale

The paper identifies adversarial transferability via direct measurement (fraction of student adversarial examples effective against the teacher) and uses this to reweight the distillation loss on-the-fly. This is presented as an empirical design choice backed by experiments on CIFAR-10/100 and Tiny-ImageNet showing AutoAttack gains. No equations reduce the claimed improvement to a definitional identity, no fitted parameters are relabeled as predictions, and no self-citation chains or uniqueness theorems are invoked to force the result. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that transferability is the dominant controllable factor in robustness transfer; no explicit free parameters or new entities are introduced beyond standard training hyperparameters.

axioms (1)

domain assumption Adversarial transferability is a key causal factor in successful robustness transfer
Identified via analysis and used to motivate the reweighting rule.

pith-pipeline@v0.9.0 · 5462 in / 1076 out tokens · 32856 ms · 2026-05-16T23:47:53.824295+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SAAD assigns sample-wise weights proportional to the entropy of fT(x+δS), effectively prioritizing transferable adversarial examples without incurring additional computational cost.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We identify adversarial transferability as a key factor for effective adversarial distillation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 10 internal anchors

[1]

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. InThe Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=Uuf2q9TfXGA

work page 2023
[2]

Square attack: a query- efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query- efficient black-box adversarial attack via random search. InEuropean conference on computer vision, pp. 484–501. Springer, 2020

work page 2020
[3]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational conference on machine learning, pp. 274–283. PMLR, 2018

work page 2018
[4]

Adversarial robustness for unsupervised domain adaptation

Muhammad Awais, Fengwei Zhou, Hang Xu, Lanqing Hong, Ping Luo, Sung-Ho Bae, and Zhenguo Li. Adversarial robustness for unsupervised domain adaptation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8568–8577, 2021

work page 2021
[5]

Improving adversarial robustness via channel-wise activation suppressing.arXiv preprint arXiv:2103.08307, 2021

Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, and Yisen Wang. Improving adversarial robustness via channel-wise activation suppressing.arXiv preprint arXiv:2103.08307, 2021

work page arXiv 2021
[6]

Adversarial robustness limits via scaling-law and human-alignment studies.arXiv preprint arXiv:2404.09349, 2024

Brian R Bartoldson, James Diffenderfer, Konstantinos Parasyris, and Bhavya Kailkhura. Adversarial robustness limits via scaling-law and human-alignment studies.arXiv preprint arXiv:2404.09349, 2024

work page arXiv 2024
[7]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In2017 ieee symposium on security and privacy (sp), pp. 39–57. Ieee, 2017

work page 2017
[8]

Unlabeled data improves adversarial robustness.Advances in neural information processing systems, 32, 2019

Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness.Advances in neural information processing systems, 32, 2019

work page 2019
[9]

What it thinks is important is important: Robustness transfers through input gradients

Alvin Chan, Yi Tay, and Yew-Soon Ong. What it thinks is important is important: Robustness transfers through input gradients. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 332–341, 2020

work page 2020
[10]

Cartl: Cooperative adversarially- robust transfer learning

Dian Chen, Hongxin Hu, Qian Wang, Li Yinli, Cong Wang, Chao Shen, and Qi Li. Cartl: Cooperative adversarially- robust transfer learning. InInternational Conference on Machine Learning, pp. 1640–1650. PMLR, 2021

work page 2021
[11]

Ltd: Low temperature distillation for robust adversarial training.arXiv preprint arXiv:2111.02331, 2021

Erh-Chung Chen and Che-Rung Lee. Ltd: Low temperature distillation for robust adversarial training.arXiv preprint arXiv:2111.02331, 2021

work page arXiv 2021
[12]

Hopskipjumpattack: A query-efficient decision-based attack

Jianbo Chen, Michael I Jordan, and Martin J Wainwright. Hopskipjumpattack: A query-efficient decision-based attack. In2020 ieee symposium on security and privacy (sp), pp. 1277–1294. IEEE, 2020

work page 2020
[13]

Rays: A ray searching method for hard-label adversarial attack

Jinghui Chen and Quanquan Gu. Rays: A ray searching method for hard-label adversarial attack. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1739–1747, 2020

work page 2020
[14]

Dair: A query-efficient decision-based attack on image retrieval systems

Mingyang Chen, Junda Lu, Yi Wang, Jianbin Qin, and Wei Wang. Dair: A query-efficient decision-based attack on image retrieval systems. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1064–1073, 2021

work page 2021
[15]

Enhancing robustness in incremental learning with adversarial training

Seungju Cho, Hongsin Lee, and Changick Kim. Enhancing robustness in incremental learning with adversarial training. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp. 2518–2526, 2025

work page 2025
[16]

Long-tailed adversarial training with self-distillation

Seungju Cho, Hongsin Lee, and Changick Kim. Long-tailed adversarial training with self-distillation. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/ forum?id=vM94dZiqx4

work page 2025
[17]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp. 1310–1320. PMLR, 2019. 11

work page 2019
[18]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational conference on machine learning, pp. 2206–2216. PMLR, 2020

work page 2020
[19]

Robustbench: a standardized adversarial robustness benchmark

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. URL https://openreview.net/forum?id=SSKZPJCt7B

work page 2021
[20]

Decoupled kullback- leibler divergence loss.arXiv preprint arXiv:2305.13948, 2023

Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, and Hanwang Zhang. Decoupled kullback- leibler divergence loss.arXiv preprint arXiv:2305.13948, 2023

work page arXiv 2023
[21]

Parameterizing activation functions for adversarial robustness

Sihui Dai, Saeed Mahloujifar, and Prateek Mittal. Parameterizing activation functions for adversarial robustness. In2022 IEEE Security and Privacy Workshops (SPW), pp. 80–87. IEEE, 2022

work page 2022
[22]

Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen, Michael E Kounavis, and Duen Horng Chau. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Distilling adversarial robustness using heterogeneous teachers.arXiv preprint arXiv:2402.15586, 2024

Jieren Deng, Aaron Palmer, Rigel Mahmood, Ethan Rathbun, Jinbo Bi, Kaleel Mahmood, and Derek Aguiar. Distilling adversarial robustness using heterogeneous teachers.arXiv preprint arXiv:2402.15586, 2024

work page arXiv 2024
[24]

Fast and reliable evaluation of adversarial robustness with minimum-margin attack

Ruize Gao, Jiongxiao Wang, Kaiwen Zhou, Feng Liu, Binghui Xie, Gang Niu, Bo Han, and James Cheng. Fast and reliable evaluation of adversarial robustness with minimum-margin attack. InInternational Conference on Machine Learning, pp. 7144–7163. PMLR, 2022

work page 2022
[25]

Adversarially robust distillation

Micah Goldblum, Liam Fowl, Soheil Feizi, and Tom Goldstein. Adversarially robust distillation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 3996–4003, 2020

work page 2020
[26]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

Uncovering the limits of adversarial training against norm-bounded adversarial examples.arXiv preprint arXiv:2010.03593, 2020

Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. Uncovering the limits of adversarial training against norm-bounded adversarial examples.arXiv preprint arXiv:2010.03593, 2020

work page arXiv 2010
[28]

Improving robustness using generated data.Advances in neural information processing systems, 34:4218–4233, 2021

Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data.Advances in neural information processing systems, 34:4218–4233, 2021

work page 2021
[29]

A survey of deep learning techniques for autonomous driving.Journal of Field Robotics, 37(3):362–386, 2020

Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu. A survey of deep learning techniques for autonomous driving.Journal of Field Robotics, 37(3):362–386, 2020

work page 2020
[30]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016

work page 2016
[31]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer, 2016

work page 2016
[32]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[33]

Boosting accuracy and robustness of student models via adaptive adversarial distillation

Bo Huang, Mingyang Chen, Yi Wang, Junda Lu, Minhao Cheng, and Wei Wang. Boosting accuracy and robustness of student models via adaptive adversarial distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24668–24677, 2023

work page 2023
[34]

Exploring architectural ingredients of adversarially robust deep neural networks.Advances in neural information processing systems, 34: 5545–5559, 2021

Hanxun Huang, Yisen Wang, Sarah Erfani, Quanquan Gu, James Bailey, and Xingjun Ma. Exploring architectural ingredients of adversarially robust deep neural networks.Advances in neural information processing systems, 34: 5545–5559, 2021. 12

work page 2021
[35]

Averaging Weights Leads to Wider Optima and Better Generalization

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization.arXiv preprint arXiv:1803.05407, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Enhancing adversarial training with second-order statistics of weights

Gaojie Jin, Xinping Yi, Wei Huang, Sven Schewe, and Xiaowei Huang. Enhancing adversarial training with second-order statistics of weights. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15273–15283, 2022

work page 2022
[37]

Randomized adversarial training via taylor expansion

Gaojie Jin, Xinping Yi, Dengyu Wu, Ronghui Mu, and Xiaowei Huang. Randomized adversarial training via taylor expansion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16447–16457, 2023

work page 2023
[38]

Peeraid: Improving adversarial distillation from a specialized peer tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song, and Jinho Lee. Peeraid: Improving adversarial distillation from a specialized peer tutor. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24482–24491, 2024

work page 2024
[39]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[40]

Improving adversarial robustness via information bottleneck distillation.Advances in Neural Information Processing Systems, 36:10796–10813, 2023

Huafeng Kuang, Hong Liu, Yongjian Wu, Shin’ichi Satoh, and Rongrong Ji. Improving adversarial robustness via information bottleneck distillation.Advances in Neural Information Processing Systems, 36:10796–10813, 2023

work page 2023
[41]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015
[42]

Indirect gradient matching for adversarial robust distillation

Hongsin Lee, Seungju Cho, and Changick Kim. Indirect gradient matching for adversarial robust distillation. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=juKVq5dWTR

work page 2025
[43]

Adversarial training can provably improve robustness: Theoretical analysis of feature learning process under structured data

Binghui Li and Yuanzhi Li. Adversarial training can provably improve robustness: Theoretical analysis of feature learning process under structured data. InThe Thirteenth International Conference on Learning Representations,

work page
[44]

URLhttps://openreview.net/forum?id=inLUnCpDIB

work page
[45]

On the clean generalization and robust overfitting in adversarial training from two theoretical views: Representation complexity and training dynamics

Binghui Li and Yuanzhi Li. On the clean generalization and robust overfitting in adversarial training from two theoretical views: Representation complexity and training dynamics. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=lvR39kEqpZ

work page 2025
[46]

Spratling

Lin Li, Yifei Wang, Chawin Sitawarin, and Michael W. Spratling. OODRobustbench: a benchmark and large-scale analysis of adversarial robustness under distribution shift. InForty-first International Conference on Machine Learning, 2024. URLhttps://openreview.net/forum?id=kAFevjEYsz

work page 2024
[47]

Delving into Transferable Adversarial Examples and Black-box Attacks

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks.arXiv preprint arXiv:1611.02770, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[48]

Understanding adversarial attacks on deep learning based medical image analysis systems.Pattern Recognition, 110:107332, 2021

Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, and Feng Lu. Understanding adversarial attacks on deep learning based medical image analysis systems.Pattern Recognition, 110:107332, 2021

work page 2021
[49]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[50]

On the robustness of vision transformers to adversarial examples

Kaleel Mahmood, Rigel Mahmood, and Marten van Dijk. On the robustness of vision transformers to adversarial examples. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7838–7847, October 2021

work page 2021
[51]

Jikai Wang, Zhenxu Tian, Juntao Li, Qingrong Xia, Xinyu Duan, Zhe-Feng Wang, Baoxing Huai, and Min Zhang

Javier Maroto, Guillermo Ortiz-Jiménez, and Pascal Frossard. On the benefits of knowledge distillation for adversarial robustness.CoRR, abs/2203.07159, 2022. doi: 10.48550/ARXIV .2203.07159. URL https: //doi.org/10.48550/arXiv.2203.07159

work page internal anchor Pith review doi:10.48550/arxiv 2022
[52]

Mixacm: Mixup- based robustness transfer via distillation of activated channel maps.Advances in neural information processing systems, 34:4555–4569, 2021

Awais Muhammad, Fengwei Zhou, Chuanlong Xie, Jiawei Li, Sung-Ho Bae, and Zhenguo Li. Mixacm: Mixup- based robustness transfer via distillation of activated channel maps.Advances in neural information processing systems, 34:4555–4569, 2021. 13

work page 2021
[53]

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples.arXiv preprint arXiv:1605.07277, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[54]

Practical black-box attacks against machine learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InProceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519, 2017

work page 2017
[55]

Dynamic guidance adversarial distillation with enhanced teacher knowledge

Hyejin Park and Dongbo Min. Dynamic guidance adversarial distillation with enhanced teacher knowledge. In European Conference on Computer Vision, pp. 204–219. Springer, 2024

work page 2024
[56]

Adversarial robustness through local linearization.Advances in Neural Information Processing Systems, 32, 2019

Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[57]

Fixing data augmentation to improve adversarial robustness.arXiv preprint arXiv:2103.01946, 2021

Sylvestre-Alvise Rebuffi, Sven Gowal, Dan A Calian, Florian Stimberg, Olivia Wiles, and Timothy Mann. Fixing data augmentation to improve adversarial robustness.arXiv preprint arXiv:2103.01946, 2021

work page arXiv 2021
[58]

Mobilenetv2: Inverted residuals and linear bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018

work page 2018
[59]

Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018

work page 2018
[60]

Robust learning meets generative models: Can proxy distributions improve adversarial robustness?arXiv preprint arXiv:2104.09425, 2021

Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang, Mung Chiang, and Prateek Mittal. Robust learning meets generative models: Can proxy distributions improve adversarial robustness?arXiv preprint arXiv:2104.09425, 2021

work page arXiv 2021
[61]

Adversarially robust transfer learning.arXiv preprint arXiv:1905.08232, 2019

Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David Jacobs, and Tom Goldstein. Adversarially robust transfer learning.arXiv preprint arXiv:1905.08232, 2019

work page arXiv 1905
[62]

How and when adversarial robustness transfers in knowledge distillation?arXiv preprint arXiv:2110.12072, 2021

Rulin Shao, Jinfeng Yi, Pin-Yu Chen, and Cho-Jui Hsieh. How and when adversarial robustness transfers in knowledge distillation?arXiv preprint arXiv:2110.12072, 2021

work page arXiv 2021
[63]

Improving neural network robustness via persistency of excitation

Kaustubh Sridhar, Oleg Sokolsky, Insup Lee, and James Weimer. Improving neural network robustness via persistency of excitation. In2022 American Control Conference (ACC), pp. 1521–1526. IEEE, 2022

work page 2022
[64]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[65]

Consistency regularization for adversarial robustness

Jihoon Tack, Sihyun Yu, Jongheon Jeong, Minseon Kim, Sung Ju Hwang, and Jinwoo Shin. Consistency regularization for adversarial robustness. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8414–8422, 2022

work page 2022
[66]

The Space of Transferable Adversarial Examples

Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples.arXiv preprint arXiv:1704.03453, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

Adversarial risk and the dangers of evaluating against weak attacks

Jonathan Uesato, Brendan O’donoghue, Pushmeet Kohli, and Aaron Oord. Adversarial risk and the dangers of evaluating against weak attacks. InInternational conference on machine learning, pp. 5025–5034. PMLR, 2018

work page 2018
[68]

Transferring adversarial robustness through robust represen- tation matching

Pratik Vaishnavi, Kevin Eykholt, and Amir Rahmati. Transferring adversarial robustness through robust represen- tation matching. In31st USENIX security symposium (USENIX Security 22), pp. 2083–2098, 2022

work page 2083
[69]

Does physical adversarial example really matter to autonomous driving? towards system-level effect of adversarial object evasion attack

Ningfei Wang, Yunpeng Luo, Takami Sato, Kaidi Xu, and Qi Alfred Chen. Does physical adversarial example really matter to autonomous driving? towards system-level effect of adversarial object evasion attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4412–4423, 2023

work page 2023
[70]

Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective.Advances in neural information processing systems, 36:15775–15798, 2023

Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, and Yisen Wang. Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective.Advances in neural information processing systems, 36:15775–15798, 2023. 14

work page 2023
[71]

Improving adversarial robustness requires revisiting misclassified examples

Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational Conference on Learning Representations, 2020

work page 2020
[72]

Better diffusion models further improve adversarial training

Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan. Better diffusion models further improve adversarial training. InInternational Conference on Machine Learning (ICML), 2023

work page 2023
[73]

Cfa: Class-wise calibrated fair adversarial training

Zeming Wei, Yifei Wang, Yiwen Guo, and Yisen Wang. Cfa: Class-wise calibrated fair adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8193–8201, 2023

work page 2023
[74]

Adversarial weight perturbation helps robust generalization

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020

work page 2020
[75]

Feature denoising for improving adversarial robustness

Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 501–509, 2019

work page 2019
[76]

Understanding generalization in adversarial training via the bias-variance decomposition.arXiv preprint arXiv:2103.09947, 2021

Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, and Yi Ma. Understanding generalization in adversarial training via the bias-variance decomposition.arXiv preprint arXiv:2103.09947, 2021

work page arXiv 2021
[77]

Revisiting adversarial robustness distillation from the perspective of robust fairness.Advances in Neural Information Processing Systems, 36:30390–30401, 2023

Xinli Yue, Mou Ningping, Qian Wang, and Lingchen Zhao. Revisiting adversarial robustness distillation from the perspective of robust fairness.Advances in Neural Information Processing Systems, 36:30390–30401, 2023

work page 2023
[78]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pp. 7472–7482. PMLR, 2019

work page 2019
[79]

On adversarial robustness of trajectory prediction for autonomous vehicles

Qingzhao Zhang, Shengtuo Hu, Jiachen Sun, Qi Alfred Chen, and Z Morley Mao. On adversarial robustness of trajectory prediction for autonomous vehicles. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15159–15168, 2022

work page 2022
[80]

Enhanced accuracy and robustness via multi- teacher adversarial distillation

Shiji Zhao, Jie Yu, Zhenlong Sun, Bo Zhang, and Xingxing Wei. Enhanced accuracy and robustness via multi- teacher adversarial distillation. InEuropean Conference on Computer Vision, pp. 585–602. Springer, 2022

work page 2022

Showing first 80 references.