Recognition: unknown
Learning Robustness at Test-Time from a Non-Robust Teacher
Pith reviewed 2026-05-10 15:30 UTC · model grok-4.3
The pith
A label-free framework anchors non-robust teacher predictions to stabilize test-time adversarial robustness adaptation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed label-free framework uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during test-time adaptation. This formulation provides a more stable alternative to self-consistency-based regularization in classical adversarial training, as shown by theoretical insights on optimization behavior. On CIFAR-10 and ImageNet under induced photometric transformations, the method achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than distillation-based baselines in the unsupervised post-deployment setting.
What carries the argument
The label-free framework that uses non-robust teacher predictions as semantic anchors for both clean and adversarial objectives during unsupervised test-time adaptation.
If this is right
- The method delivers improved optimization stability relative to straightforward distillation-based adaptations of adversarial training.
- It exhibits lower sensitivity to hyperparameter choices than self-consistency regularization approaches.
- It produces a superior robustness-accuracy trade-off when adapting models in the unsupervised test-time setting with limited target samples.
Where Pith is reading between the lines
- The anchoring idea could be tested on distribution shifts other than photometric transformations to check broader applicability.
- The stability analysis might extend to other forms of regularization used in robustness training.
- In practice the approach could support post-deployment updates for models facing new environments without requiring labeled data or robust pretraining.
Load-bearing premise
The predictions of the non-robust teacher model provide a reliable semantic anchor for both clean and adversarial objectives during adaptation even though the teacher itself lacks robustness.
What would settle it
An experiment in which the proposed method shows higher hyperparameter sensitivity or a worse robustness-accuracy trade-off than classical distillation-based adversarial training adaptations on CIFAR-10 or ImageNet under photometric shifts would falsify the central claim.
Figures
read the original abstract
Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a pretrained non-robust model can be adapted at test-time for improved adversarial robustness on unlabeled target data by using the teacher's own predictions as semantic anchors in a label-free framework for both clean and adversarial objectives. It analyzes limitations of classical adversarial training extensions (instability and hyperparameter sensitivity), proposes this anchor-based alternative with theoretical insights showing greater stability than self-consistency regularization, and reports experiments on CIFAR-10 and ImageNet under photometric transformations demonstrating better optimization stability, lower parameter sensitivity, and improved robustness-accuracy trade-offs.
Significance. If the central claims hold, the work would be significant for test-time adaptation and adversarial robustness literature. It tackles the practical problem of post-deployment robustness improvement without labels or robust teachers, offering a potentially more stable formulation than distillation or self-consistency baselines. Credit is due for the theoretical insights on stability and the empirical evaluation on standard benchmarks (CIFAR-10, ImageNet), which provide concrete support for the trade-off improvements when the assumptions are met.
major comments (3)
- [Theoretical insights and proposed framework] The central claim in the proposed framework (as described in the abstract and methods) rests on the non-robust teacher's predictions serving as reliable semantic anchors for the adversarial objective. This assumption is load-bearing for the stability advantage over self-consistency regularization; if small perturbations flip the teacher's outputs, the joint optimization could be destabilized rather than stabilized. The theoretical analysis should explicitly address or bound the impact of such inconsistencies.
- [Experiments] Experiments section: evaluation is performed under induced photometric transformations on CIFAR-10 and ImageNet. These do not necessarily replicate the prediction flips induced by the adversarial perturbations inside the adaptation objective, leaving the key assumption about anchor reliability untested for the adversarial case. Direct experiments or analysis on the generated adversarial examples during adaptation are needed to substantiate the reported stability gains.
- [Introduction and classical adversarial training analysis] The abstract states that straightforward distillation-based adaptations remain unstable, but without a detailed comparison (e.g., specific hyperparameter ranges or failure modes in § on classical extensions), it is difficult to assess how much the proposed anchor-based objective improves upon them in a load-bearing way.
minor comments (2)
- [Abstract] The abstract refers to 'theoretical insights' without briefly characterizing the key result (e.g., a stability bound or reduced sensitivity derivation); adding one sentence would improve accessibility.
- [Proposed framework] Notation for the anchor-based objective and the clean/adversarial terms should be introduced with explicit equations early in the methods to aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly where needed to strengthen the presentation.
read point-by-point responses
-
Referee: [Theoretical insights and proposed framework] The central claim in the proposed framework (as described in the abstract and methods) rests on the non-robust teacher's predictions serving as reliable semantic anchors for the adversarial objective. This assumption is load-bearing for the stability advantage over self-consistency regularization; if small perturbations flip the teacher's outputs, the joint optimization could be destabilized rather than stabilized. The theoretical analysis should explicitly address or bound the impact of such inconsistencies.
Authors: We thank the referee for highlighting this assumption. Our theoretical analysis shows that anchoring to fixed teacher predictions yields a more stable objective than self-consistency regularization by avoiding mutual error reinforcement. The non-robust teacher is held fixed, so its clean predictions serve as constant references for both branches. While adversarial flips in the teacher could occur, the formulation regularizes the student toward these fixed anchors, which our experiments indicate improves stability over baselines. We will revise the theoretical section to explicitly discuss this assumption and include a brief bound based on the loss Lipschitz constant. revision: partial
-
Referee: [Experiments] Experiments section: evaluation is performed under induced photometric transformations on CIFAR-10 and ImageNet. These do not necessarily replicate the prediction flips induced by the adversarial perturbations inside the adaptation objective, leaving the key assumption about anchor reliability untested for the adversarial case. Direct experiments or analysis on the generated adversarial examples during adaptation are needed to substantiate the reported stability gains.
Authors: We agree that direct analysis of teacher predictions on the adversarial examples generated during adaptation would provide stronger substantiation. Our current results demonstrate stability through training dynamics and final robustness metrics under the photometric shifts, with the adversarial objective applied at each step. We will add an analysis (new figure or subsection) measuring the rate of teacher prediction changes on the on-the-fly adversarial samples and its correlation with observed stability. revision: yes
-
Referee: [Introduction and classical adversarial training analysis] The abstract states that straightforward distillation-based adaptations remain unstable, but without a detailed comparison (e.g., specific hyperparameter ranges or failure modes in § on classical extensions), it is difficult to assess how much the proposed anchor-based objective improves upon them in a load-bearing way.
Authors: The instability and hyperparameter sensitivity of classical extensions are analyzed in Section 3, supported by preliminary experiments showing divergence for certain ranges. To make the motivation clearer from the outset, we will expand the introduction with a concise summary of these failure modes and example hyperparameter settings, while retaining the detailed treatment in Section 3. revision: partial
Circularity Check
No circularity detected; derivation introduces independent anchor-based objective
full rationale
The paper's core contribution is a new label-free test-time adaptation framework that defines a semantic-anchor loss using the non-robust teacher's predictions for both clean and adversarial terms, together with separate theoretical stability arguments comparing it to self-consistency regularization. No equation reduces a claimed prediction or uniqueness result to a fitted parameter or prior self-citation by construction; the anchor is an explicit modeling choice rather than a re-labeling of data statistics, and the stability claim is presented as an independent analysis rather than a tautology. Experiments on photometric shifts are external to the derivation and do not close any loop. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Kumail Alhamoud, Hasan Abed Al Kader Hammoud, Motasem Al- farra, and Bernard Ghanem. Generalizability of adversarial robust- ness under distribution shifts.arXiv preprint arXiv:2209.15042, 2022
-
[2]
Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016
work page internal anchor Pith review arXiv 2016
-
[3]
Square attack: a query-efficient black-box adver- sarialattackviarandomsearch
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adver- sarialattackviarandomsearch. InEuropeanconferenceoncomputer vision, pages 484–501. Springer, 2020
2020
-
[4]
Ltd: Low temperature distilla- tionforrobustadversarialtraining.arXivpreprintarXiv:2111.02331, 2021
Erh-Chung Chen and Che-Rung Lee. Ltd: Low temperature distilla- tionforrobustadversarialtraining.arXivpreprintarXiv:2111.02331, 2021
-
[5]
Robustbench: a standardized adversarial robustness benchmark
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: a standardized adversarial robustness benchmark.arXiv preprint arXiv:2010.09670, 2020
-
[6]
Evaluating the adversarial ro- bustness of adaptive test-time defenses
Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, and Taylan Cemgil. Evaluating the adversarial ro- bustness of adaptive test-time defenses. InInternational Conference on Machine Learning, pages 4421–4435. PMLR, 2022
2022
-
[7]
Reliable evaluation of adver- sarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce and Matthias Hein. Reliable evaluation of adver- sarial robustness with an ensemble of diverse parameter-free attacks. InInternational conference on machine learning, pages 2206–2216. PMLR, 2020
2020
-
[8]
Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, andHanwangZhang.Decoupledkullback-leiblerdivergenceloss.Ad- vances in Neural Information Processing Systems, 37:74461–74486, 2024
2024
-
[9]
Ad- versariallyrobustdistillationbyreducingthestudent-teachervariance gap
Junhao Dong, Piotr Koniusz, Junxi Chen, and Yew-Soon Ong. Ad- versariallyrobustdistillationbyreducingthestudent-teachervariance gap. InEuropean Conference on Computer Vision, pages 92–111. Springer, 2024
2024
-
[10]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[11]
Born again neural networks
Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. Born again neural networks. In International conference on machine learning, pages 1607–1616. PMLR, 2018
2018
-
[12]
Boosting adversarial transferability by achieving flat local maxima.Advances in Neural Information Processing Systems, 36:70141–70161, 2023
Zhijin Ge, Hongying Liu, Wang Xiaosen, Fanhua Shang, and Yuanyuan Liu. Boosting adversarial transferability by achieving flat local maxima.Advances in Neural Information Processing Systems, 36:70141–70161, 2023
2023
-
[13]
Explaining and Harnessing Adversarial Examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Ex- plaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014
work page internal anchor Pith review arXiv 2014
-
[14]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778, 2016
2016
-
[15]
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks and Thomas Dietterich. Benchmarking neural net- work robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019
work page internal anchor Pith review arXiv 1903
-
[16]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowl- edge in a neural network.arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[17]
Ahmadreza Jeddi, Mohammad Javad Shafiee, and Alexander Wong. A simple fine-tuning is all you need: Towards robust deep learning via adversarial fine-tuning.arXiv preprint arXiv:2012.13628, 2020
-
[18]
Revisiting batch normalization for practical domain adaptation,
YanghaoLi,NaiyanWang,JianpingShi,JiayingLiu,andXiaodiHou. Revisitingbatchnormalizationforpracticaldomainadaptation.arXiv preprint arXiv:1603.04779, 2016
-
[19]
Adaptive batch normalization networksforadversarialrobustness
Shao-Yuan Lo and Vishal M Patel. Adaptive batch normalization networksforadversarialrobustness. In2024IEEEInternationalCon- ference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE, 2024
2024
-
[20]
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017
work page internal anchor Pith review arXiv 2017
-
[21]
When adversarial training meets vision transformers: Recipes from training to architecture.Advances in Neural Information Pro- cessing Systems, 35:18599–18611, 2022
Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, and Yisen Wang. When adversarial training meets vision transformers: Recipes from training to architecture.Advances in Neural Information Pro- cessing Systems, 35:18599–18611, 2022
2022
-
[22]
JonasNgnawé,MaximeHeuillet,SabyasachiSahoo,YannPequignot, OlaAhmad,AudreyDurand,FrédéricPrecioso,andChristianGagné. Robust fine-tuning from non-robust pretrained models: Mitigating suboptimal transfer with adversarial scheduling.arXiv preprint arXiv:2509.23325, 2025
-
[23]
Knowledge distilla- tion methods for efficient unsupervised adaptation across multiple domains.Image and Vision Computing, 108:104096, 2021
Le Thanh Nguyen-Meidine, Atif Belal, Madhu Kiran, Jose Dolz, Louis-Antoine Blais-Morin, and Eric Granger. Knowledge distilla- tion methods for efficient unsupervised adaptation across multiple domains.Image and Vision Computing, 108:104096, 2021
2021
-
[24]
Medbn: Robust test-time adaptation against malicious test samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, and Jungseul Ok. Medbn: Robust test-time adaptation against malicious test samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5997–6007, 2024
2024
-
[25]
Benchmarking the spatial robustness of dnns via natural and adversarial localized corruptions.Pattern Recogni- tion, page 112412, 2025
Giulia Marchiori Pietrosanti, Giulio Rossolini, Alessandro Biondi, and Giorgio Buttazzo. Benchmarking the spatial robustness of dnns via natural and adversarial localized corruptions.Pattern Recogni- tion, page 112412, 2025
2025
-
[26]
Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018
2018
-
[27]
Adversarial training for free!Advances in neural information processing systems, 32, 2019
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free!Advances in neural information processing systems, 32, 2019
2019
-
[28]
Emre Celebi
Pourya Shamsolmoali, Salvador García, Huiyu Zhou, and M. Emre Celebi. Advances in domain adaptation for computer vision.Image and Vision Computing, 114:104268, 2021
2021
-
[29]
Test-timetrainingwithself-supervisionforgeneraliza- tionunderdistributionshifts
Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and MoritzHardt. Test-timetrainingwithself-supervisionforgeneraliza- tionunderdistributionshifts. InInternationalconferenceonmachine learning, pages 9229–9248. PMLR, 2020
2020
-
[30]
Measuring robustness to natural distribution shifts in image classification.Advances in Neural Infor- mation Processing Systems, 33:18583–18599, 2020
Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Ben- jamin Recht, and Ludwig Schmidt. Measuring robustness to natural distribution shifts in image classification.Advances in Neural Infor- mation Processing Systems, 33:18583–18599, 2020
2020
-
[31]
Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020
-
[32]
Continual test-time domain adaptation
Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7201– 7211, 2022
2022
-
[33]
Improving adversarial robustness requires revisiting misclassified examples
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational conference on learning Bianchettin et al.:Preprint submitted to ElsevierPage 10 of 12 Learning Robustness at Test-Time from a Non-Robust Teacher representations, 2019
2019
-
[34]
Fast is better than free: Revisiting adversarial training
Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training.arXiv preprint arXiv:2001.03994, 2020
-
[35]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016
work page internal anchor Pith review arXiv 2016
-
[36]
Rethinking precision of pseudo label: Test-time adaptation via complementary learning.Pattern Recognition Letters, 177:96–102, 2024
Longbin Zeng, Jiayi Han, Liang Du, and Weiyang Ding. Rethinking precision of pseudo label: Test-time adaptation via complementary learning.Pattern Recognition Letters, 177:96–102, 2024
2024
-
[37]
Theoretically principled trade-off between robustness and accuracy
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pages 7472–7482. PMLR, 2019
2019
-
[38]
Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022
Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022
2022
-
[39]
On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023
Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023
-
[40]
Reli- able adversarial distillation with unreliable teachers.arXiv preprint arXiv:2106.04928, 2021
JianingZhu,JiangchaoYao,BoHan,JingfengZhang,TongliangLiu, Gang Niu, Jingren Zhou, Jianliang Xu, and Hongxia Yang. Reli- able adversarial distillation with unreliable teachers.arXiv preprint arXiv:2106.04928, 2021
-
[41]
Im- proving generalization of adversarial training via robust critical fine- tuning
Kaijie Zhu, Xixu Hu, Jindong Wang, Xing Xie, and Ge Yang. Im- proving generalization of adversarial training via robust critical fine- tuning. InProceedingsoftheIEEE/CVFinternationalconferenceon computer vision, pages 4424–4434, 2023. A. Appendix A.1. Robustness under distribution shift. As stated in the motivation (Section 2), the need to explore robusti...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.