Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

Phuc Duc Nguyen; Quang Duc Nguyen

arxiv: 2605.23411 · v1 · pith:577MO3TKnew · submitted 2026-05-22 · 💻 cs.LG · cs.CR· cs.CV

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

Phuc Duc Nguyen , Quang Duc Nguyen This is my paper

Pith reviewed 2026-05-25 04:57 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.CV

keywords test-time adaptationadversarial attackssample-wise attackstargeted attacksmeta-learninggradient alignmentdistributional stealth

0 comments

The pith

Sample-wise targeted attacks on test-time adaptation misclassify only triggered inputs while preserving the overall label distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Test-time adaptation counters distribution shifts but can be manipulated through the unlabeled test stream. Class-wise targeted attacks fail in practice because they force many samples toward one label and create an obvious frequency spike. The paper replaces that with a sample-wise approach that activates only on inputs carrying an attacker-chosen trigger and leaves the global label counts unchanged. The method uses meta-learning together with a priority-aware gradient alignment cast as an ellipsoidal trust-region problem to resolve the conflict between attack success and distributional stealth. If the approach holds, TTA systems become vulnerable to stealthy, hard-to-detect manipulation that existing distribution-based detectors miss.

Core claim

We introduce a sample-wise targeted adversarial attack on TTA that misclassifies only inputs carrying an attacker-chosen trigger while preserving the global label distribution of benign queries; the attack is realized by a meta-learning procedure whose priority-aware gradient alignment is formulated as an ellipsoidal trust-region problem that supplies theoretical guarantees even when attack success and distributional stealth pull the gradient in opposite directions.

What carries the argument

Priority-aware gradient alignment strategy that formulates the gradient update as an ellipsoidal trust-region problem to prioritize attack success while enforcing distributional stealth.

If this is right

The attack reaches high targeted success rates on CIFAR-10-C, CIFAR-100-C and ImageNet-C under multiple TTA protocols.
Label frequencies remain statistically indistinguishable from the clean baseline, evading detection that relies on class-count monitoring.
The attack retains effectiveness against existing defenses that assume class-wise or batch-wise perturbations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

TTA pipelines may need to monitor for localized trigger patterns in addition to global statistics.
The meta-learning framing could be reused for other online adaptation settings that require both efficacy and statistical invisibility.
Trigger-specific robustness training might become a standard safeguard once sample-wise attacks are recognized as practical.

Load-bearing premise

The ellipsoidal trust-region formulation can still optimize the attack objective even when attack success and label-distribution preservation produce misaligned gradients.

What would settle it

An experiment in which the attack either fails to reach high targeted success on triggered samples or produces a label distribution whose deviation from the no-attack baseline exceeds what the paper reports would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.23411 by Phuc Duc Nguyen, Quang Duc Nguyen.

**Figure 2.** Figure 2: TTA attack setting. The attack is selec [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the attack generation using the proposed meta-learning workflow, where [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Cosine similarity between gradients of Lcls and Lstl across attack optimization steps, where the vertical dashed line marks the practical horizon (500 steps). (b) Frequency distribution of cosine similarity values. Nearly all steps (99.9%) exhibit negative cosine similarity, indicating persistent gradient misalignment. In this section, we describe how the aligned direction d k is constructed. For bre… view at source ↗

**Figure 5.** Figure 5: Comparison of label distribution shifts induced by RTTDP, FCA, and ours on CIFAR-10-C. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Trigger patterns on victim input [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: ASR (%) of MedBN [33]. 5.1 Experimental Setting Datasets and backbones. We evaluate the attack on standard benchmarks for TTA, namely CIFAR10-C, CIFAR100-C, and ImageNet-C. These datasets are constructed by applying a diverse set of synthetic corruptions to clean images [13], thereby simulating realistic distribution shifts. Following common practice [42, 36], for CIFAR10-C and CIFAR100-C, we adopt ResNet-… view at source ↗

**Figure 10.** Figure 10: Comparison of label distribution shifts induced by [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of label distribution shifts induced by [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of label distribution shifts induced by Euclidean trust-region formulation and [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of label distribution shifts induced by Euclidean trust-region formulation and [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Comparison of label distribution shifts induced by RTTDP [ [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Comparison of label distribution shifts induced by RTTDP [ [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Comparison of label distribution shifts induced by a class-wise attack (DIA [ [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of label distribution shifts induced by a class-wise attack (DIA [ [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

read the original abstract

Test-time adaptation (TTA) effectively counters distribution shifts but exposes models to adversarial manipulation via the unlabeled test stream. Existing class-wise targeted attacks remain impractical for stealthy exploitation in this setting: since TTA operates on batches, forcing a subset of samples toward a target label unintentionally pulls similar benign samples along, resulting in a conspicuously high frequency of the target label that is easy to detect. To capture a more realistic threat, we introduce a sample-wise targeted attack. Unlike prior approaches, the attacker aims to misclassify only inputs carrying an attacker-chosen trigger, while preserving the global label distribution of benign queries to evade detection. To achieve this, we propose a meta-learning-based attack with a novel priority-aware gradient alignment strategy that explicitly prioritizes attack success. The strategy formulates the gradient update as an ellipsoidal trust-region problem, mitigating the misalignment between attack success and distributional stealth, while providing theoretical guarantees for effective optimization of the attack objective in the presence of gradient misalignment. Extensive experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across TTA protocols demonstrate that our method achieves high targeted success rates while maintaining a label distribution that is consistent with the no-attack baseline, making it difficult to detect in unlabeled TTA deployment scenarios. Furthermore, we demonstrate that our attack shows strong robustness against existing defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shifts to sample-wise targeted attacks on TTA to avoid label-distribution spikes that make class-wise attacks detectable, using meta-learning and an ellipsoidal trust-region step for gradient alignment.

read the letter

The main advance is the sample-wise formulation: the attacker only wants to flip triggered inputs while leaving the overall label counts on benign queries unchanged. Class-wise methods fail here because TTA batches pull similar samples toward the target label, creating an obvious signal. The meta-learning setup with priority-aware alignment framed as an ellipsoidal trust-region problem is a direct attempt to handle the conflicting gradients between attack success and distributional stealth, and the abstract states it comes with theoretical guarantees on the optimization. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across standard TTA protocols report high targeted success while the label distribution matches the no-attack case, plus some resistance to existing defenses. That combination of threat model and empirical check is the useful part. The stress-test concern about batch-level coupling is worth watching. TTA updates use statistics or losses over the entire batch, so a parameter shift driven by triggered samples could still move predictions on clean ones in the same batch and alter the observed distribution. The paper's claim that the distribution stays consistent suggests their end-to-end runs did not see this leakage, but the abstract gives no detail on whether the trust-region constraint explicitly models the downstream effect on non-triggered samples. If the full derivations and ablations show the coupling is negligible or bounded, the stealth result holds; otherwise the practical advantage shrinks. This is for researchers working on adversarial robustness for adaptation methods. The technical move and the experiments are concrete enough that a serious editor should send it to review rather than desk-reject, even if the coupling question needs more scrutiny in revision.

Referee Report

3 major / 2 minor

Summary. The paper claims to introduce a sample-wise targeted adversarial attack on test-time adaptation (TTA) that misclassifies only trigger-carrying inputs while preserving the global label distribution of benign samples to evade detection in unlabeled streams. It uses a meta-learning framework with a novel priority-aware gradient alignment strategy formulated as an ellipsoidal trust-region problem to resolve misalignment between attack success and distributional stealth, accompanied by theoretical guarantees. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across TTA protocols report high targeted success rates, label distributions consistent with no-attack baselines, and robustness to existing defenses.

Significance. If the central claims hold, the work identifies a realistic and stealthy threat model for TTA deployments that prior class-wise attacks do not capture, potentially affecting the security of adaptive models in production. The meta-learning approach with trust-region optimization for balancing conflicting objectives may have value beyond this setting. The reported empirical results on standard corruption benchmarks provide a concrete starting point for evaluating such attacks.

major comments (3)

[§3 (meta-learning attack formulation)] The priority-aware gradient alignment (ellipsoidal trust-region formulation) is presented as mitigating misalignment between attack success and distributional stealth, but the manuscript does not model or constrain the downstream effect of batch-level TTA parameter updates on non-triggered samples. This coupling is load-bearing for the sample-wise isolation and label-distribution stealth claims.
[§4 (experiments on CIFAR/ImageNet-C)] Experiments report label distribution consistency with the no-attack baseline, but without explicit measurements or ablations of indirect prediction shifts on benign samples sharing a batch with triggered inputs, it remains unclear whether the reported frequencies would hold under actual TTA batch updates.
[§3.2 (theoretical analysis)] The theoretical guarantees are stated for effective optimization of the attack objective under gradient misalignment, yet they appear limited to the meta-optimization step and do not extend to end-to-end preservation of batch label statistics after adaptation.

minor comments (2)

[§4.1] Clarify the exact batch sizes and TTA update rules used in the experimental protocols, as these directly affect the coupling concern.
[Related work] Add a reference to prior work on batch-level statistics in TTA if not already present, to contextualize the sample-wise distinction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which help clarify key aspects of our threat model and evaluation. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses

Referee: [§3 (meta-learning attack formulation)] The priority-aware gradient alignment (ellipsoidal trust-region formulation) is presented as mitigating misalignment between attack success and distributional stealth, but the manuscript does not model or constrain the downstream effect of batch-level TTA parameter updates on non-triggered samples. This coupling is load-bearing for the sample-wise isolation and label-distribution stealth claims.

Authors: We agree that the current formulation in §3 optimizes the per-sample attack objective via meta-learning and the ellipsoidal trust-region but does not explicitly simulate or constrain how TTA parameter updates propagate to non-triggered samples in the same batch. This interaction is indeed central to the stealth claims. We will revise §3 to add a dedicated paragraph discussing this batch-level coupling under the assumed threat model (unlabeled streams with mixed triggered and benign inputs) and how the priority-aware alignment indirectly supports isolation by prioritizing distributional consistency at the attack-generation stage. revision: partial
Referee: [§4 (experiments on CIFAR/ImageNet-C)] Experiments report label distribution consistency with the no-attack baseline, but without explicit measurements or ablations of indirect prediction shifts on benign samples sharing a batch with triggered inputs, it remains unclear whether the reported frequencies would hold under actual TTA batch updates.

Authors: The reported label distributions are measured after full TTA runs on streams containing the attacked batches, and they remain consistent with the no-attack baseline. However, we acknowledge the value of isolating indirect effects on co-batch benign samples. We will add new ablations in §4 that track per-sample prediction changes for benign inputs when processed alongside triggered samples under each TTA protocol, providing direct evidence on whether distributional stealth is preserved after the adaptation step. revision: yes
Referee: [§3.2 (theoretical analysis)] The theoretical guarantees are stated for effective optimization of the attack objective under gradient misalignment, yet they appear limited to the meta-optimization step and do not extend to end-to-end preservation of batch label statistics after adaptation.

Authors: The analysis in §3.2 establishes convergence guarantees for the meta-optimization under the priority-aware alignment when gradients are misaligned. We agree these guarantees do not extend to a formal proof of post-adaptation label-statistic preservation, which would require additional assumptions on the TTA update rule and batch statistics. We will revise the text to explicitly state the scope of the theoretical results and emphasize that end-to-end distributional stealth is demonstrated empirically across the evaluated protocols and datasets. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation relies on independent meta-optimization without reduction to inputs

full rationale

The paper introduces a sample-wise targeted attack via meta-learning and a priority-aware gradient alignment cast as an ellipsoidal trust-region problem, with claims of theoretical guarantees for optimization under misalignment. No equations, fitted parameters, or self-citations are shown that reduce the attack success or stealth properties to self-defined quantities or prior results by construction. The central claims rest on the proposed formulation and experimental validation rather than tautological re-labeling of inputs, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described or can be extracted.

pith-pipeline@v0.9.0 · 5767 in / 1054 out tokens · 13678 ms · 2026-05-25T04:57:57.657052+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 4 internal anchors

[1]

A new backdoor attack in cnns by training set corruption without label poisoning

Mauro Barni, Kassem Kallas, and Benedetta Tondi. A new backdoor attack in cnns by training set corruption without label poisoning. In2019 IEEE International Conference on Image Processing (ICIP), pages 101–105. IEEE, 2019

work page 2019
[2]

Cambridge university press, 2004

Stephen Boyd and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004

work page 2004
[3]

A probabilistic framework for lifelong test-time adaptation

Dhanajit Brahma and Piyush Rai. A probabilistic framework for lifelong test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3582–3591, 2023

work page 2023
[4]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning.arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Gradnorm: Gra- dient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gra- dient normalization for adaptive loss balancing in deep multitask networks. InInternational conference on machine learning, pages 794–803. PMLR, 2018

work page 2018
[6]

Test-time poisoning attacks against test-time adaptation models

Tianshuo Cong, Xinlei He, Yun Shen, and Yang Zhang. Test-time poisoning attacks against test-time adaptation models. In2024 IEEE Symposium on Security and Privacy (SP), pages 1306–1324. IEEE, 2024

work page 2024
[7]

Februus: Input purification defense against trojan attacks on deep neural network systems

Bao Gia Doan, Ehsan Abbasnejad, and Damith C Ranasinghe. Februus: Input purification defense against trojan attacks on deep neural network systems. InProceedings of the 36th Annual Computer Security Applications Conference, pages 897–912, 2020

work page 2020
[8]

Robust physical-world attacks on deep learning visual classification

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018

work page 2018
[9]

Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36:14070–14093, 2023

Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, and Sung-Ju Lee. Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36:14070–14093, 2023

work page 2023
[10]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Ranked entropy minimization for continual test-time adaptation.arXiv preprint arXiv:2505.16441, 2025

Jisu Han, Jaemin Na, and Wonjun Hwang. Ranked entropy minimization for continual test-time adaptation.arXiv preprint arXiv:2505.16441, 2025

work page arXiv 2025
[12]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[13]

Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

work page 2019
[14]

Beyond entropy: Region confidence proxy for wild test-time adaptation.arXiv preprint arXiv:2505.20704, 2025

Zixuan Hu, Yichun Hu, Xiaotong Li, Shixiang Tang, and Ling-Yu Duan. Beyond entropy: Region confidence proxy for wild test-time adaptation.arXiv preprint arXiv:2505.20704, 2025

work page arXiv 2025
[15]

Test-time adaptation via self-training with nearest neighbor information.arXiv preprint arXiv:2207.10792, 2022

Minguk Jang, Sae-Young Chung, and Hye Won Chung. Test-time adaptation via self-training with nearest neighbor information.arXiv preprint arXiv:2207.10792, 2022

work page arXiv 2022
[16]

Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018

work page 2018
[17]

Stationary latent weight inference for unreliable obser- vations from online test-time adaptation

Jae-Hong Lee and Joon-Hyuk Chang. Stationary latent weight inference for unreliable obser- vations from online test-time adaptation. InForty-first International Conference on Machine Learning, 2024. 10

work page 2024
[18]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366, 2024

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366, 2024

work page arXiv 2024
[19]

Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization

Jungsoo Lee, Debasmit Das, Jaegul Choo, and Sungha Choi. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16380–16389, 2023

work page 2023
[20]

Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

work page arXiv 2022
[21]

A comprehensive survey on source-free domain adaptation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5743–5762, 2024

Jingjing Li, Zhiqi Yu, Zhekai Du, Lei Zhu, and Heng Tao Shen. A comprehensive survey on source-free domain adaptation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5743–5762, 2024

work page 2024
[22]

Backdoor learning: A survey.IEEE transactions on neural networks and learning systems, 35(1):5–22, 2022

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor learning: A survey.IEEE transactions on neural networks and learning systems, 35(1):5–22, 2022

work page 2022
[23]

Invisible backdoor attack with sample-specific triggers

Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. Invisible backdoor attack with sample-specific triggers. InProceedings of the IEEE/CVF international conference on computer vision, pages 16463–16472, 2021

work page 2021
[24]

A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

work page 2025
[25]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018
[26]

Conflict-averse gradient descent for multi-task learning.Advances in Neural Information Processing Systems, 34:18878–18890, 2021

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning.Advances in Neural Information Processing Systems, 34:18878–18890, 2021

work page 2021
[27]

Variational continual test-time adaptation.arXiv preprint arXiv:2402.08182, 2024

Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, and Liang Wang. Variational continual test-time adaptation.arXiv preprint arXiv:2402.08182, 2024

work page arXiv 2024
[28]

Improved self-training for test-time adaptation

Jing Ma. Improved self-training for test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23701–23710, 2024

work page 2024
[29]

Springer Science & Business Media, 2013

Yurii Nesterov.Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013

work page 2013
[30]

and Tran, A

Anh Nguyen and Anh Tran. Wanet–imperceptible warping-based backdoor attack.arXiv preprint arXiv:2102.10369, 2021

work page arXiv 2021
[31]

Efficient test-time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. InInternational confer- ence on machine learning, pages 16888–16905. PMLR, 2022

work page 2022
[32]

Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

work page arXiv 2023
[33]

Medbn: Robust test-time adaptation against malicious test samples

Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, and Jungseul Ok. Medbn: Robust test-time adaptation against malicious test samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5997–6007, 2024

work page 2024
[34]

A survey of robust ad- versarial training in pattern recognition: Fundamental, theory, and methodologies.Pattern Recognition, 131:108889, 2022

Zhuang Qian, Kaizhu Huang, Qiu-Feng Wang, and Xu-Yao Zhang. A survey of robust ad- versarial training in pattern recognition: Fundamental, theory, and methodologies.Pattern Recognition, 131:108889, 2022

work page 2022
[35]

Failing loudly: An empirical study of methods for detecting dataset shift.Advances in Neural Information Processing Systems, 32, 2019

Stephan Rabanser, Stephan Günnemann, and Zachary Lipton. Failing loudly: An empirical study of methods for detecting dataset shift.Advances in Neural Information Processing Systems, 32, 2019. 11

work page 2019
[36]

On the adversarial vulnerability of label-free test-time adaptation

Shahriar Rifat, Jonathan Ashdown, Michael J De Lucia, Ananthram Swami, and Francesco Restuccia. On the adversarial vulnerability of label-free test-time adaptation. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[37]

If your data distribution shifts, use self-learning

Evgenia Rusak, Steffen Schneider, George Pachitariu, Luisa Eck, Peter Gehler, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. If your data distribution shifts, use self-learning. arXiv preprint arXiv:2104.12928, 2021

work page arXiv 2021
[38]

Hidden trigger backdoor attacks

Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. Hidden trigger backdoor attacks. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 11957–11965, 2020

work page 2020
[39]

Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

work page 2018
[40]

Test: Test-time self- training under distribution shift

Samarth Sinha, Peter Gehler, Francesco Locatello, and Bernt Schiele. Test: Test-time self- training under distribution shift. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2759–2769, 2023

work page 2023
[41]

On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176, 1958

Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176, 1958

work page 1958
[42]

On the adversarial risk of test time adaptation: An investigation into realistic test-time data poisoning

Yongyi Su, Yushu Li, Nanqing Liu, Kui Jia, Xulei Yang, Chuan-Sheng Foo, and Xun Xu. On the adversarial risk of test time adaptation: An investigation into realistic test-time data poisoning. arXiv preprint arXiv:2410.04682, 2024

work page arXiv 2024
[43]

Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering.Advances in Neural Information Processing Systems, 35:17543–17555, 2022

Yongyi Su, Xun Xu, and Kui Jia. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering.Advances in Neural Information Processing Systems, 35:17543–17555, 2022

work page 2022
[44]

Parameter-selective continual test-time adaptation

Jiaxu Tian and Fan Lyu. Parameter-selective continual test-time adaptation. InProceedings of the Asian Conference on Computer Vision, pages 1384–1400, 2024

work page 2024
[45]

Tent: Fully Test-time Adaptation by Entropy Minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[46]

Continual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022

work page 2022
[47]

Backdoor attacks against deep learning systems in the physical world

Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji, Yuanshun Yao, Haitao Zheng, and Ben Y Zhao. Backdoor attacks against deep learning systems in the physical world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6206–6215, 2021

work page 2021
[48]

Uncovering adversarial risks of test-time adaptation.arXiv preprint arXiv:2301.12576, 2023

Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T Wang, Vikash Sehwag, Saeed Mahloujifar, and Prateek Mittal. Uncovering adversarial risks of test-time adaptation.arXiv preprint arXiv:2301.12576, 2023

work page arXiv 2023
[49]

Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

work page 2020
[50]

Come: Test-time adaptation by conservatively minimizing entropy.arXiv preprint arXiv:2410.10894, 2024

Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, and Changqing Zhang. Come: Test-time adaptation by conservatively minimizing entropy.arXiv preprint arXiv:2410.10894, 2024

work page arXiv 2024
[51]

Limitations

Yu Zhang and Qiang Yang. A survey on multi-task learning.IEEE transactions on knowledge and data engineering, 34(12):5586–5609, 2021. 12 A Extended Related Works Test-time Adaptation.TTA updates model parameters at inference time using unlabeled test data to handle distribution shift [24]. Unlike training-time domain adaptation [21], it assumes no access ...

work page arXiv 2021
[52]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

A new backdoor attack in cnns by training set corruption without label poisoning

Mauro Barni, Kassem Kallas, and Benedetta Tondi. A new backdoor attack in cnns by training set corruption without label poisoning. In2019 IEEE International Conference on Image Processing (ICIP), pages 101–105. IEEE, 2019

work page 2019

[2] [2]

Cambridge university press, 2004

Stephen Boyd and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004

work page 2004

[3] [3]

A probabilistic framework for lifelong test-time adaptation

Dhanajit Brahma and Piyush Rai. A probabilistic framework for lifelong test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3582–3591, 2023

work page 2023

[4] [4]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning.arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Gradnorm: Gra- dient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gra- dient normalization for adaptive loss balancing in deep multitask networks. InInternational conference on machine learning, pages 794–803. PMLR, 2018

work page 2018

[6] [6]

Test-time poisoning attacks against test-time adaptation models

Tianshuo Cong, Xinlei He, Yun Shen, and Yang Zhang. Test-time poisoning attacks against test-time adaptation models. In2024 IEEE Symposium on Security and Privacy (SP), pages 1306–1324. IEEE, 2024

work page 2024

[7] [7]

Februus: Input purification defense against trojan attacks on deep neural network systems

Bao Gia Doan, Ehsan Abbasnejad, and Damith C Ranasinghe. Februus: Input purification defense against trojan attacks on deep neural network systems. InProceedings of the 36th Annual Computer Security Applications Conference, pages 897–912, 2020

work page 2020

[8] [8]

Robust physical-world attacks on deep learning visual classification

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018

work page 2018

[9] [9]

Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36:14070–14093, 2023

Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, and Sung-Ju Lee. Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36:14070–14093, 2023

work page 2023

[10] [10]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Ranked entropy minimization for continual test-time adaptation.arXiv preprint arXiv:2505.16441, 2025

Jisu Han, Jaemin Na, and Wonjun Hwang. Ranked entropy minimization for continual test-time adaptation.arXiv preprint arXiv:2505.16441, 2025

work page arXiv 2025

[12] [12]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[13] [13]

Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

work page 2019

[14] [14]

Beyond entropy: Region confidence proxy for wild test-time adaptation.arXiv preprint arXiv:2505.20704, 2025

Zixuan Hu, Yichun Hu, Xiaotong Li, Shixiang Tang, and Ling-Yu Duan. Beyond entropy: Region confidence proxy for wild test-time adaptation.arXiv preprint arXiv:2505.20704, 2025

work page arXiv 2025

[15] [15]

Test-time adaptation via self-training with nearest neighbor information.arXiv preprint arXiv:2207.10792, 2022

Minguk Jang, Sae-Young Chung, and Hye Won Chung. Test-time adaptation via self-training with nearest neighbor information.arXiv preprint arXiv:2207.10792, 2022

work page arXiv 2022

[16] [16]

Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018

work page 2018

[17] [17]

Stationary latent weight inference for unreliable obser- vations from online test-time adaptation

Jae-Hong Lee and Joon-Hyuk Chang. Stationary latent weight inference for unreliable obser- vations from online test-time adaptation. InForty-first International Conference on Machine Learning, 2024. 10

work page 2024

[18] [18]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366, 2024

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366, 2024

work page arXiv 2024

[19] [19]

Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization

Jungsoo Lee, Debasmit Das, Jaegul Choo, and Sungha Choi. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16380–16389, 2023

work page 2023

[20] [20]

Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

work page arXiv 2022

[21] [21]

A comprehensive survey on source-free domain adaptation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5743–5762, 2024

Jingjing Li, Zhiqi Yu, Zhekai Du, Lei Zhu, and Heng Tao Shen. A comprehensive survey on source-free domain adaptation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5743–5762, 2024

work page 2024

[22] [22]

Backdoor learning: A survey.IEEE transactions on neural networks and learning systems, 35(1):5–22, 2022

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor learning: A survey.IEEE transactions on neural networks and learning systems, 35(1):5–22, 2022

work page 2022

[23] [23]

Invisible backdoor attack with sample-specific triggers

Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. Invisible backdoor attack with sample-specific triggers. InProceedings of the IEEE/CVF international conference on computer vision, pages 16463–16472, 2021

work page 2021

[24] [24]

A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

work page 2025

[25] [25]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018

[26] [26]

Conflict-averse gradient descent for multi-task learning.Advances in Neural Information Processing Systems, 34:18878–18890, 2021

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning.Advances in Neural Information Processing Systems, 34:18878–18890, 2021

work page 2021

[27] [27]

Variational continual test-time adaptation.arXiv preprint arXiv:2402.08182, 2024

Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, and Liang Wang. Variational continual test-time adaptation.arXiv preprint arXiv:2402.08182, 2024

work page arXiv 2024

[28] [28]

Improved self-training for test-time adaptation

Jing Ma. Improved self-training for test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23701–23710, 2024

work page 2024

[29] [29]

Springer Science & Business Media, 2013

Yurii Nesterov.Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013

work page 2013

[30] [30]

and Tran, A

Anh Nguyen and Anh Tran. Wanet–imperceptible warping-based backdoor attack.arXiv preprint arXiv:2102.10369, 2021

work page arXiv 2021

[31] [31]

Efficient test-time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. InInternational confer- ence on machine learning, pages 16888–16905. PMLR, 2022

work page 2022

[32] [32]

Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

work page arXiv 2023

[33] [33]

Medbn: Robust test-time adaptation against malicious test samples

Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, and Jungseul Ok. Medbn: Robust test-time adaptation against malicious test samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5997–6007, 2024

work page 2024

[34] [34]

A survey of robust ad- versarial training in pattern recognition: Fundamental, theory, and methodologies.Pattern Recognition, 131:108889, 2022

Zhuang Qian, Kaizhu Huang, Qiu-Feng Wang, and Xu-Yao Zhang. A survey of robust ad- versarial training in pattern recognition: Fundamental, theory, and methodologies.Pattern Recognition, 131:108889, 2022

work page 2022

[35] [35]

Failing loudly: An empirical study of methods for detecting dataset shift.Advances in Neural Information Processing Systems, 32, 2019

Stephan Rabanser, Stephan Günnemann, and Zachary Lipton. Failing loudly: An empirical study of methods for detecting dataset shift.Advances in Neural Information Processing Systems, 32, 2019. 11

work page 2019

[36] [36]

On the adversarial vulnerability of label-free test-time adaptation

Shahriar Rifat, Jonathan Ashdown, Michael J De Lucia, Ananthram Swami, and Francesco Restuccia. On the adversarial vulnerability of label-free test-time adaptation. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[37] [37]

If your data distribution shifts, use self-learning

Evgenia Rusak, Steffen Schneider, George Pachitariu, Luisa Eck, Peter Gehler, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. If your data distribution shifts, use self-learning. arXiv preprint arXiv:2104.12928, 2021

work page arXiv 2021

[38] [38]

Hidden trigger backdoor attacks

Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. Hidden trigger backdoor attacks. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 11957–11965, 2020

work page 2020

[39] [39]

Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

work page 2018

[40] [40]

Test: Test-time self- training under distribution shift

Samarth Sinha, Peter Gehler, Francesco Locatello, and Bernt Schiele. Test: Test-time self- training under distribution shift. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2759–2769, 2023

work page 2023

[41] [41]

On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176, 1958

Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176, 1958

work page 1958

[42] [42]

On the adversarial risk of test time adaptation: An investigation into realistic test-time data poisoning

Yongyi Su, Yushu Li, Nanqing Liu, Kui Jia, Xulei Yang, Chuan-Sheng Foo, and Xun Xu. On the adversarial risk of test time adaptation: An investigation into realistic test-time data poisoning. arXiv preprint arXiv:2410.04682, 2024

work page arXiv 2024

[43] [43]

Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering.Advances in Neural Information Processing Systems, 35:17543–17555, 2022

Yongyi Su, Xun Xu, and Kui Jia. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering.Advances in Neural Information Processing Systems, 35:17543–17555, 2022

work page 2022

[44] [44]

Parameter-selective continual test-time adaptation

Jiaxu Tian and Fan Lyu. Parameter-selective continual test-time adaptation. InProceedings of the Asian Conference on Computer Vision, pages 1384–1400, 2024

work page 2024

[45] [45]

Tent: Fully Test-time Adaptation by Entropy Minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[46] [46]

Continual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022

work page 2022

[47] [47]

Backdoor attacks against deep learning systems in the physical world

Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji, Yuanshun Yao, Haitao Zheng, and Ben Y Zhao. Backdoor attacks against deep learning systems in the physical world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6206–6215, 2021

work page 2021

[48] [48]

Uncovering adversarial risks of test-time adaptation.arXiv preprint arXiv:2301.12576, 2023

Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T Wang, Vikash Sehwag, Saeed Mahloujifar, and Prateek Mittal. Uncovering adversarial risks of test-time adaptation.arXiv preprint arXiv:2301.12576, 2023

work page arXiv 2023

[49] [49]

Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

work page 2020

[50] [50]

Come: Test-time adaptation by conservatively minimizing entropy.arXiv preprint arXiv:2410.10894, 2024

Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, and Changqing Zhang. Come: Test-time adaptation by conservatively minimizing entropy.arXiv preprint arXiv:2410.10894, 2024

work page arXiv 2024

[51] [51]

Limitations

Yu Zhang and Qiang Yang. A survey on multi-task learning.IEEE transactions on knowledge and data engineering, 34(12):5586–5609, 2021. 12 A Extended Related Works Test-time Adaptation.TTA updates model parameters at inference time using unlabeled test data to handle distribution shift [24]. Unlike training-time domain adaptation [21], it assumes no access ...

work page arXiv 2021

[52] [52]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page