Dual Strategies for Test-Time Adaptation

Duc Nguyen The Minh; Ehsan Abbasnejad; Minh Hoai; Nam Nguyen Phuong; Phi Le Nguyen

arxiv: 2604.17542 · v1 · submitted 2026-04-19 · 💻 cs.CV

Dual Strategies for Test-Time Adaptation

Nam Nguyen Phuong , Duc Nguyen The Minh , Phi Le Nguyen , Ehsan Abbasnejad , Minh Hoai This is my paper

Pith reviewed 2026-05-10 06:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords test-time adaptationdistribution shiftentropy minimizationentropy maximizationreliability criterionprediction stabilitymodel adaptation

0 comments

The pith

DualTTA separates test samples by stability under transformations to apply entropy minimization on reliable ones and maximization on unreliable ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Conventional test-time adaptation limits updates to low-entropy samples and leaves much of the test distribution unused. DualTTA instead partitions samples into two groups using a reliability criterion that checks whether predictions remain stable under both semantic-preserving and semantic-altering transformations. Reliable samples receive entropy minimization to strengthen decisions aligned with semantics, while unreliable samples receive entropy maximization to suppress overconfident errors and remove spurious correlations. Theoretical analysis establishes that the dual objectives produce a tighter separation of samples by adaptation suitability, which in turn supports more effective model updates.

Core claim

DualTTA identifies two groups of test samples: one where predictions are likely consistent with underlying semantics and another where predictions are likely incorrect. The groups are selected by a reliability criterion that measures prediction stability under semantic-preserving and semantic-altering transformations. Reliable samples undergo entropy minimization to reinforce correct decisions; unreliable samples undergo entropy maximization to suppress errors and unlearn spurious behavior. Theoretical analysis and empirical results show this produces a tighter separation between reliable and unreliable samples, leading to provably more effective model updates.

What carries the argument

The reliability criterion that scores prediction stability under both semantic-preserving and semantic-altering transformations to decide which samples receive entropy minimization versus entropy maximization.

If this is right

A larger and more diverse portion of the test distribution can be used for adaptation instead of only low-entropy samples.
The dual objectives create a sharper distinction between samples suitable and unsuitable for model updates.
Model updates become provably more effective under distribution shifts.
Reliable predictions are reinforced while overconfident errors and spurious patterns are actively suppressed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stability-based partition could be reused in continual learning to decide which incoming data should reinforce versus unlearn patterns.
Entropy maximization on unreliable samples offers a concrete mechanism for online forgetting that might combine with other regularization techniques.
The same dual-strategy logic could be tested in non-vision domains once equivalent semantic-preserving and altering operations are defined.

Load-bearing premise

Prediction stability under semantic-preserving and semantic-altering transformations accurately identifies which samples have predictions consistent with their underlying semantics.

What would settle it

Ground-truth evaluation on a held-out test set showing that samples labeled reliable by the stability criterion have lower accuracy than those labeled unreliable would falsify the separation's validity.

Figures

Figures reproduced from arXiv: 2604.17542 by Duc Nguyen The Minh, Ehsan Abbasnejad, Minh Hoai, Nam Nguyen Phuong, Phi Le Nguyen.

**Figure 2.** Figure 2: Overview of DualTTA. DualTTA apply 2 transformations: semantic-altering and semantic-preserving on each sample. The model’s predictions for the original samples and its transformed variants are compared to determine reliability. Samples with stable predictions under semantic alteration but varying predictions under semantic preservation are classified as likely-incorrect, while those with unstable predict… view at source ↗

**Figure 3.** Figure 3: Sensitivity of DualTTA to semantic-altering and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Quadrant-based accuracy analysis of 50000 impulse [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Conventional test-time adaptation (TTA) approaches typically adapt the model using only a small fraction of test samples, often those with low-entropy predictions, thereby failing to fully leverage the available information in the test distribution. This paper introduces DualTTA, a novel framework that improves performance under distribution shifts by utilizing a larger and more diverse set of test samples. DualTTA identifies two distinct groups: one where the model's predictions are likely consistent with the underlying semantics, and another where predictions are likely incorrect. For the first group, it minimizes prediction entropy to reinforce reliable decisions; for the second, it maximizes entropy to suppress overconfident errors and unlearn spurious behavior. These groups are adaptively selected using a new reliability criterion that measures prediction stability under both semantic-preserving and semantic-altering transformations, addressing the limitations of purely entropy-based selection. We further provide theoretical analysis and empirical justification showing that our approach enables a tighter separation between reliable and unreliable samples, in the context of their suitability for adaptation, leading to provably more effective model updates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes DualTTA for test-time adaptation under distribution shift. It partitions test samples into two groups using a reliability criterion based on prediction stability under semantic-preserving and semantic-altering transformations: reliable samples (likely semantically consistent) receive entropy-minimization updates, while unreliable samples (likely incorrect) receive entropy-maximization updates. The central claim is that this dual strategy yields a tighter separation between reliable and unreliable samples than entropy-based selection alone, enabling provably more effective model updates and better utilization of the full test distribution.

Significance. If the theoretical link between the stability criterion and semantic correctness holds and the empirical gains are reproducible, the work would meaningfully advance TTA by moving beyond low-entropy-only adaptation and providing a principled way to both reinforce correct predictions and suppress overconfident errors. The dual min/max objective and the explicit reliability criterion are novel relative to prior entropy or pseudo-label methods.

major comments (2)

[theoretical analysis] Theoretical analysis (referenced in the abstract): the claim of 'provably more effective model updates' is conditioned on the reliability criterion producing an accurate separation, yet the provided description supplies no formal argument or bound establishing why stability under semantic-altering transformations implies semantic inconsistency rather than transformation artifacts, model invariances, or other factors. This assumption is load-bearing for the 'provable' improvement and for the superiority over entropy-based selection.
[reliability criterion] Reliability criterion definition (abstract and method description): the criterion combines stability under both transformation types, but it is unclear how the two stability measures are combined into a single selection rule and whether the rule is independent of the adaptation objective or risks circularity when the same model is used for both stability measurement and the subsequent min/max updates.

minor comments (1)

[abstract] The abstract states that the approach 'addresses the limitations of purely entropy-based selection' but does not quantify those limitations or cite the specific prior works whose entropy thresholds are being improved upon.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our work. We address each of the major comments in detail below, providing clarifications and indicating revisions made to the manuscript.

read point-by-point responses

Referee: [theoretical analysis] Theoretical analysis (referenced in the abstract): the claim of 'provably more effective model updates' is conditioned on the reliability criterion producing an accurate separation, yet the provided description supplies no formal argument or bound establishing why stability under semantic-altering transformations implies semantic inconsistency rather than transformation artifacts, model invariances, or other factors. This assumption is load-bearing for the 'provable' improvement and for the superiority over entropy-based selection.

Authors: We agree that our theoretical analysis assumes the reliability criterion provides an accurate separation and does not include a formal proof that stability under semantic-altering transformations necessarily corresponds to semantic inconsistency (as opposed to artifacts or invariances). The analysis instead shows that, given such a separation, the dual strategy yields more effective updates than single-sided entropy minimization by deriving bounds on the change in model parameters or expected loss. We have revised the manuscript to explicitly state the assumptions, discuss potential confounding factors, and moderate the language from 'provably' to 'theoretically motivated' in the abstract and relevant sections. revision: yes
Referee: [reliability criterion] Reliability criterion definition (abstract and method description): the criterion combines stability under both transformation types, but it is unclear how the two stability measures are combined into a single selection rule and whether the rule is independent of the adaptation objective or risks circularity when the same model is used for both stability measurement and the subsequent min/max updates.

Authors: We have clarified the definition in the revised method section. The reliability criterion is computed on the initial pre-adaptation model by measuring prediction consistency under semantic-preserving transformations and inconsistency under semantic-altering ones. These are combined into a single score with a threshold for partitioning into reliable and unreliable groups. Since this measurement uses only the initial model outputs and does not depend on the adaptation updates, there is no circularity. We have added a detailed description, pseudocode, and an ablation study on the combination rule to the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper defines a new reliability criterion based on prediction stability under semantic-preserving and semantic-altering transformations, which is presented as an independent selection rule separate from the adaptation objective. Dual min-entropy and max-entropy updates are then applied conditionally on this criterion, with theoretical analysis claiming tighter separation and more effective updates. No equations or steps reduce the claimed predictions or provable improvements to fitted parameters, self-defined quantities, or prior self-citations by construction. The central claims rest on the external validity of the stability measure rather than tautological re-use of inputs, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the method rests on the unstated premise that transformations can be cleanly labeled as semantic-preserving or altering and that stability correlates with semantic correctness.

axioms (1)

domain assumption Prediction stability under chosen transformations reliably indicates whether a sample's prediction matches underlying semantics
This is the core of the new reliability criterion used to partition samples.

pith-pipeline@v0.9.0 · 5485 in / 1168 out tokens · 32946 ms · 2026-05-10T06:07:44.398632+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Contrastive test-time adaptation

Dian Chen, Dequan Wang, Trevor Darrell, and Sayna Ebrahimi. Contrastive test-time adaptation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 1

work page 2022
[2]

Feature augmentation based test- time adaptation

Younggeol Cho, Youngrae Kim, Junho Yoon, Seunghoon Hong, and Dongman Lee. Feature augmentation based test- time adaptation. InProceedings of the IEEE Workshop on Applications of Computer Vision, 2025. 2

work page 2025
[3]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InAdvances in Neural Infor- mation Processing Systems, 2020. 2

work page 2020
[4]

Abolfazl Farahani, Sahar V oghoei, Khaled Rasheed, and Hamid R. Arabnia. A brief review of domain adaptation. arXiv preprint arXiv:2010.03978, 2020. 1

work page arXiv 2010
[5]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InProceedings of International Conference on Learning and Representation, 2021. 2

work page 2021
[6]

Representative batch normalization with feature calibration

Shang-Hua Gao, Qi Han, Duo Li, Ming-Ming Cheng, and Pai Peng. Representative batch normalization with feature calibration. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2021. 5

work page 2021
[7]

Im- age style transfer using convolutional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2

work page 2016
[8]

Benchmarking neu- ral network robustness to common corruptions and perturba- tions

Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions. InProceedings of International Conference on Learn- ing and Representation, 2019. 1, 5

work page 2019
[9]

Arbitrary style transfer in real-time with adaptive instance normalization

Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceed- ings of the International Conference on Computer Vision,

work page
[10]

Style augmen- tation: Data augmentation via style randomization

Philip T Jackson, Amir Atapour-Abarghouei, Stephen Bon- ner, Toby P Breckon, and Boguslaw Obara. Style augmen- tation: Data augmentation via style randomization. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. 2

work page 2019
[11]

Phillips, Irena Gao, et al

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubra- mani, Weihua Hu, Michihiro Yasunaga, Richard L. Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InProceedings of the International Conference on Machine Learning, 2021. 1

work page 2021
[12]

Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization

Jungsoo Lee, Debasmit Das, Jaegul Choo, and Sungha Choi. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. InProceedings of the International Conference on Computer Vision, 2023. 2

work page 2023
[13]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. InProceedings of International Con- ference on Learning and Representation, 2024. 1, 2, 3, 5, 6, 7, 8

work page 2024
[14]

Weinberger

Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, and Kil- ian Q. Weinberger. On feature normalization and data aug- mentation. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2021. 5

work page 2021
[15]

Hospedales

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, Broader and Artier Domain General- ization. InProceedings of the International Conference on Computer Vision, 2017. 5

work page 2017
[16]

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Jian Liang, Ran He, and Tieniu Tan. A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts. International Journal of Computer Vision, 133(1):31–64,

work page
[17]

Ttn: A domain-shift aware batch normalization in test- time adaptation

Hyesu Lim, Byeonggeun Kim, Jaegul Choo, and Sungha Choi. Ttn: A domain-shift aware batch normalization in test- time adaptation. InProceedings of International Conference on Learning and Representation, 2022. 2

work page 2022
[18]

Ttt++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems,

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems,

work page
[19]

Efficient test- time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test- time model adaptation without forgetting. InProceedings of the International Conference on Machine Learning, 2022. 1, 2, 3, 6, 8

work page 2022
[20]

Towards sta- ble test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards sta- ble test-time adaptation in dynamic wild world. InProceed- ings of International Conference on Learning and Represen- tation, 2023. 1, 2, 3, 5, 6

work page 2023
[21]

Label shift adapter for test-time adaptation under co- variate and label shifts

Sunghyun Park, Seunghan Yang, Jaegul Choo, and Sungrack Yun. Label shift adapter for test-time adaptation under co- variate and label shifts. InProceedings of the International Conference on Computer Vision, 2023. 2

work page 2023
[22]

A mathematical theory of commu- nication.The Bell System Technical Journal, 1948

Claude Elwood Shannon. A mathematical theory of commu- nication.The Bell System Technical Journal, 1948. 1, 3

work page 1948
[23]

A survey of multi-source domain adaptation.Information Fusion, 24:84– 92, 2015

Shiliang Sun, Honglei Shi, and Yuanbin Wu. A survey of multi-source domain adaptation.Information Fusion, 24:84– 92, 2015. 1

work page 2015
[24]

Test-time training with self- supervision for generalization under distribution shifts,

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self- supervision for generalization under distribution shifts,

work page
[25]

Test-time training with self- supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self- supervision for generalization under distribution shifts. In Proceedings of the International Conference on Machine Learning, 2020. 1

work page 2020
[26]

Conststyle: Robust domain gen- eralization with unified style transformation

Nam Duong Tran, Nam Nguyen Phuong, Hieu H Pham, Phi Le Nguyen, and My T Thai. Conststyle: Robust domain gen- eralization with unified style transformation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 3174–3183, 2025. 1

work page 2025
[27]

Deep Hashing Network for Unsupervised Domain Adaptation

Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep Hashing Network for Unsupervised Domain Adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2017. 5

work page 2017
[28]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InProceedings of International Conference on Learning and Representation, 2021. 1, 2, 5, 6

work page 2021
[29]

Con- tinual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Con- tinual test-time domain adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2022. 2

work page 2022
[30]

Feature alignment and uniformity for test time adap- tation

Shuai Wang, Daoan Zhang, Zipei Yan, Jianguo Zhang, and Rui Li. Feature alignment and uniformity for test time adap- tation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023. 2

work page 2023
[31]

Ximei Wang, Ying Jin, Mingsheng Long, Jianmin Wang, and Michael I. Jordan. Transferable normalization: Towards im- proving transferability of deep neural networks. InAdvances in Neural Information Processing Systems, 2019. 5

work page 2019
[32]

Fda: Fourier domain adaptation for semantic segmentation

Yanchao Yang and Stefano Soatto. Fda: Fourier domain adaptation for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2020. 2

work page 2020
[33]

Cutmix: Regu- larization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the International Conference on Computer Vision, 2019. 2

work page 2019
[34]

Memo: Test time robustness via adaptation and augmentation.Ad- vances in Neural Information Processing Systems, 35: 38629–38642, 2022

Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Ad- vances in Neural Information Processing Systems, 35: 38629–38642, 2022. 2

work page 2022
[35]

Object detection with self- supervised scene adaptation

Zekun Zhang and Minh Hoai. Object detection with self- supervised scene adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

work page
[36]

Efficiency- preserving scene-adaptive object detection

Zekun Zhang, Vu Quang Truong, and Minh Hoai. Efficiency- preserving scene-adaptive object detection. InProceedings of the British Machine Vision Conference, 2024. 1, 2

work page 2024
[37]

Do- main generalization with mixstyle

Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. Do- main generalization with mixstyle. InProceedings of Inter- national Conference on Learning and Representation, 2021. 1

work page 2021
[38]

Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,

work page

[1] [1]

Contrastive test-time adaptation

Dian Chen, Dequan Wang, Trevor Darrell, and Sayna Ebrahimi. Contrastive test-time adaptation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 1

work page 2022

[2] [2]

Feature augmentation based test- time adaptation

Younggeol Cho, Youngrae Kim, Junho Yoon, Seunghoon Hong, and Dongman Lee. Feature augmentation based test- time adaptation. InProceedings of the IEEE Workshop on Applications of Computer Vision, 2025. 2

work page 2025

[3] [3]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InAdvances in Neural Infor- mation Processing Systems, 2020. 2

work page 2020

[4] [4]

Abolfazl Farahani, Sahar V oghoei, Khaled Rasheed, and Hamid R. Arabnia. A brief review of domain adaptation. arXiv preprint arXiv:2010.03978, 2020. 1

work page arXiv 2010

[5] [5]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InProceedings of International Conference on Learning and Representation, 2021. 2

work page 2021

[6] [6]

Representative batch normalization with feature calibration

Shang-Hua Gao, Qi Han, Duo Li, Ming-Ming Cheng, and Pai Peng. Representative batch normalization with feature calibration. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2021. 5

work page 2021

[7] [7]

Im- age style transfer using convolutional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2

work page 2016

[8] [8]

Benchmarking neu- ral network robustness to common corruptions and perturba- tions

Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions. InProceedings of International Conference on Learn- ing and Representation, 2019. 1, 5

work page 2019

[9] [9]

Arbitrary style transfer in real-time with adaptive instance normalization

Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceed- ings of the International Conference on Computer Vision,

work page

[10] [10]

Style augmen- tation: Data augmentation via style randomization

Philip T Jackson, Amir Atapour-Abarghouei, Stephen Bon- ner, Toby P Breckon, and Boguslaw Obara. Style augmen- tation: Data augmentation via style randomization. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. 2

work page 2019

[11] [11]

Phillips, Irena Gao, et al

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubra- mani, Weihua Hu, Michihiro Yasunaga, Richard L. Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InProceedings of the International Conference on Machine Learning, 2021. 1

work page 2021

[12] [12]

Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization

Jungsoo Lee, Debasmit Das, Jaegul Choo, and Sungha Choi. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. InProceedings of the International Conference on Computer Vision, 2023. 2

work page 2023

[13] [13]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. InProceedings of International Con- ference on Learning and Representation, 2024. 1, 2, 3, 5, 6, 7, 8

work page 2024

[14] [14]

Weinberger

Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, and Kil- ian Q. Weinberger. On feature normalization and data aug- mentation. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2021. 5

work page 2021

[15] [15]

Hospedales

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, Broader and Artier Domain General- ization. InProceedings of the International Conference on Computer Vision, 2017. 5

work page 2017

[16] [16]

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Jian Liang, Ran He, and Tieniu Tan. A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts. International Journal of Computer Vision, 133(1):31–64,

work page

[17] [17]

Ttn: A domain-shift aware batch normalization in test- time adaptation

Hyesu Lim, Byeonggeun Kim, Jaegul Choo, and Sungha Choi. Ttn: A domain-shift aware batch normalization in test- time adaptation. InProceedings of International Conference on Learning and Representation, 2022. 2

work page 2022

[18] [18]

Ttt++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems,

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? InAdvances in Neural Information Processing Systems,

work page

[19] [19]

Efficient test- time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test- time model adaptation without forgetting. InProceedings of the International Conference on Machine Learning, 2022. 1, 2, 3, 6, 8

work page 2022

[20] [20]

Towards sta- ble test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards sta- ble test-time adaptation in dynamic wild world. InProceed- ings of International Conference on Learning and Represen- tation, 2023. 1, 2, 3, 5, 6

work page 2023

[21] [21]

Label shift adapter for test-time adaptation under co- variate and label shifts

Sunghyun Park, Seunghan Yang, Jaegul Choo, and Sungrack Yun. Label shift adapter for test-time adaptation under co- variate and label shifts. InProceedings of the International Conference on Computer Vision, 2023. 2

work page 2023

[22] [22]

A mathematical theory of commu- nication.The Bell System Technical Journal, 1948

Claude Elwood Shannon. A mathematical theory of commu- nication.The Bell System Technical Journal, 1948. 1, 3

work page 1948

[23] [23]

A survey of multi-source domain adaptation.Information Fusion, 24:84– 92, 2015

Shiliang Sun, Honglei Shi, and Yuanbin Wu. A survey of multi-source domain adaptation.Information Fusion, 24:84– 92, 2015. 1

work page 2015

[24] [24]

Test-time training with self- supervision for generalization under distribution shifts,

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self- supervision for generalization under distribution shifts,

work page

[25] [25]

Test-time training with self- supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self- supervision for generalization under distribution shifts. In Proceedings of the International Conference on Machine Learning, 2020. 1

work page 2020

[26] [26]

Conststyle: Robust domain gen- eralization with unified style transformation

Nam Duong Tran, Nam Nguyen Phuong, Hieu H Pham, Phi Le Nguyen, and My T Thai. Conststyle: Robust domain gen- eralization with unified style transformation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 3174–3183, 2025. 1

work page 2025

[27] [27]

Deep Hashing Network for Unsupervised Domain Adaptation

Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep Hashing Network for Unsupervised Domain Adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2017. 5

work page 2017

[28] [28]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InProceedings of International Conference on Learning and Representation, 2021. 1, 2, 5, 6

work page 2021

[29] [29]

Con- tinual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Con- tinual test-time domain adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2022. 2

work page 2022

[30] [30]

Feature alignment and uniformity for test time adap- tation

Shuai Wang, Daoan Zhang, Zipei Yan, Jianguo Zhang, and Rui Li. Feature alignment and uniformity for test time adap- tation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023. 2

work page 2023

[31] [31]

Ximei Wang, Ying Jin, Mingsheng Long, Jianmin Wang, and Michael I. Jordan. Transferable normalization: Towards im- proving transferability of deep neural networks. InAdvances in Neural Information Processing Systems, 2019. 5

work page 2019

[32] [32]

Fda: Fourier domain adaptation for semantic segmentation

Yanchao Yang and Stefano Soatto. Fda: Fourier domain adaptation for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2020. 2

work page 2020

[33] [33]

Cutmix: Regu- larization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the International Conference on Computer Vision, 2019. 2

work page 2019

[34] [34]

Memo: Test time robustness via adaptation and augmentation.Ad- vances in Neural Information Processing Systems, 35: 38629–38642, 2022

Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Ad- vances in Neural Information Processing Systems, 35: 38629–38642, 2022. 2

work page 2022

[35] [35]

Object detection with self- supervised scene adaptation

Zekun Zhang and Minh Hoai. Object detection with self- supervised scene adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

work page

[36] [36]

Efficiency- preserving scene-adaptive object detection

Zekun Zhang, Vu Quang Truong, and Minh Hoai. Efficiency- preserving scene-adaptive object detection. InProceedings of the British Machine Vision Conference, 2024. 1, 2

work page 2024

[37] [37]

Do- main generalization with mixstyle

Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. Do- main generalization with mixstyle. InProceedings of Inter- national Conference on Learning and Representation, 2021. 1

work page 2021

[38] [38]

Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,

work page