pith. machine review for the scientific record.
sign in

arxiv: 2512.10421 · v2 · submitted 2025-12-11 · 💻 cs.CV

Neural Collapse in Test-Time Adaptation

Pith reviewed 2026-05-16 23:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords test-time adaptationneural collapsesample-wise alignmentdomain shiftfeature embeddingsclassifier weightspseudo-labelingout-of-distribution robustness
0
0 comments X

The pith

Sample-wise neural collapse shows that feature-classifier misalignment drives test-time adaptation failures under domain shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends neural collapse from class-level geometry to individual samples and identifies a new pattern called Sample-wise Alignment Collapse. In this pattern, each sample's learned feature vector sits close to its matching classifier weight vector in a trained network. When the model adapts online to shifted test data, this per-sample alignment breaks, producing unreliable pseudo-labels whose errors grow with the size of the shift. The authors therefore introduce a method that restores alignment by blending geometric proximity to the weights with the model's own prediction confidence, rather than depending only on noisy labels.

Core claim

By extending neural collapse to the sample level, the work observes that a sample's feature embedding aligns closely with its corresponding classifier weight vector. This alignment collapses during test-time adaptation, and the resulting sample-wise misalignment is the direct source of performance degradation that becomes worse under larger distribution shifts. Restoring the alignment therefore requires new targets that combine geometric proximity with predictive confidence to overcome the unreliability of pseudo-labels.

What carries the argument

Sample-wise Alignment Collapse (NC3+), the per-sample geometric alignment between feature embeddings and classifier weights that holds in a trained model and breaks under domain-shifted adaptation.

If this is right

  • Realigning each sample's features to its classifier weight recovers accuracy lost during test-time adaptation.
  • The hybrid targets reduce reliance on unreliable pseudo-labels when distribution shifts are large.
  • Gains from the method increase as the domain gap widens, as shown by the 14.52 percent improvement over Tent on ImageNet-C.
  • The same geometric principle explains why standard pseudo-labeling schemes degrade and suggests replacing them with alignment-driven objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sample-wise alignment view could be tested in unsupervised domain adaptation or continual learning to see whether misalignment is a general failure mode.
  • If NC3+ is universal, future adaptation methods could add an explicit alignment loss term instead of relying solely on classification or entropy objectives.
  • Measuring the degree of sample-wise collapse before adaptation might serve as a cheap diagnostic for how much a model will degrade on a new domain.

Load-bearing premise

That the observed sample-wise misalignment is the primary cause of adaptation degradation rather than a side effect of other factors, and that blending geometric proximity with model confidence produces more reliable targets than existing pseudo-label schemes.

What would settle it

A controlled experiment in which models are forced to maintain sample-wise feature-weight alignment during adaptation yet still suffer the same accuracy drop, or in which the hybrid targets improve accuracy without any measurable reduction in misalignment.

Figures

Figures reproduced from arXiv: 2512.10421 by Jiazhen Huang, Jingyan Jiang, Li Lu, Xiao Chen, Xu Jiang, Zhi Wang, Zhongjing Du.

Figure 1
Figure 1. Figure 1: Overview of Main Contributions. (a) NC3+ high￾lights the convergence of sample feature embeddings with their corresponding classifier weights. (b) Sample feature embeddings deviate from the ground-truth classifier weights, leading to per￾formance degradation. to adapt online during inference using only a mini-batch of test data, without altering the training process [13]. This lightweight and efficient par… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical validation of NC3+. We evaluate NC3+ on ImageNet-100 [22] using various backbones. The G-FCA distance diyi decreases throughout training, indicating sample-wise align￾ment collapse. More details are provided in Appendix A.2. NC3+ encapsulates the phenomenon of sample-wise alignment collapse during the TPT, extending NC theory to settings where class-wise and global means are not fea￾sible. We emp… view at source ↗
Figure 3
Figure 3. Figure 3: Histograms of G-FCA and P-FCA distances. We present the distributions of d correct iyi , d wrong iyi , and d wrong iyˆi on correctly and incorrectly classified samples from ImageNet-C datasets un￾der severity level 5 Gaussian noise and Snow corruption. The re￾sults reveal that NC3+ is violated on OOD data, leading to Sample￾wise Misalignment in Adaptation. Level 1 Level 2 Level 3 Level 4 Level 5 Gaussian N… view at source ↗
Figure 4
Figure 4. Figure 4: Violin plots of G-FCA and P-FCA distances for mis￾classified samples. The plots show the distributions of distances from misclassified OOD samples to both G-FCA d wrong iyi and P￾FCA d wrong iyˆi under increasing Gaussian noise or Snow severity on ImageNet-C. The results reveal that misalignment becomes pro￾gressively more severe with higher corruption levels. shifts, OOD sample feature embeddings deviate … view at source ↗
Figure 5
Figure 5. Figure 5: Overview of our proposed NCTTA. During test-time adaptation, NCTTA blends geometric proximity (FCA distance) and predictive confidence to form hybrid targets, pulling features toward plausible classifier weights while pushing away negatives via LNC. Therefore, simply constraining P-FCA distance diyˆi is im￾practical. Our NCTTA replace yˆi with dual-guided hybrid targets determined by ye ∈ R K via: yei = (1… view at source ↗
Figure 6
Figure 6. Figure 6: Ablation of LNC, α and k. (a) LNC is instantiated with three variants: InfoNCE-style, L2-style, and Triplet-style. (b) Sen￾sitivity analysis of α and k, evaluated on ImageNet-C under the Contrast corruption at severity level 5. configurations, are provided in Appendix B.2. The back￾bone used in the experiments are ResNet50 [6] for CIFAR￾10/100-C, WaterBirds and PACS, and ViT-B/16 [2] for ImageNet-C. Experi… view at source ↗
Figure 8
Figure 8. Figure 8: t-SNE comparison under Gaussian noise. The figure compares t-SNE visualizations of feature representations for Tent and NCTTA on CIFAR-10-C under Gaussian noise with severity level 5. NCTTA forms more distinct and well-separated clusters compared to Tent, enhancing the model’s discriminative power un￾der severe corruption. formulating Sample-wise Alignment Collapse (NC3+). We theoretically and empirically … view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of G-FCA distance under Gaussian noise (ImageNet-C, ViT-B/16). The plot compares diyi for Tent, SAR and NCTTA on ImageNet-C under Gaussian noise with severity level 5. The results show that NCTTA consistently achieves lower dyi compared to Tent and SAR, demonstrating bet￾ter feature-classifier alignment and enhanced robustness. Furthermore, t-SNE visualizations [27] presented in Fig￾ure 8 substa… view at source ↗
read the original abstract

Test-Time Adaptation (TTA) enhances model robustness to out-of-distribution (OOD) data by updating the model online during inference, yet existing methods lack theoretical insights into the fundamental causes of performance degradation under domain shifts. Recently, Neural Collapse (NC) has been proposed as an emergent geometric property of deep neural networks (DNNs), providing valuable insights for TTA. In this work, we extend NC to the sample-wise level and discover a novel phenomenon termed Sample-wise Alignment Collapse (NC3+), demonstrating that a sample's feature embedding, obtained by a trained model, aligns closely with the corresponding classifier weight. Building on NC3+, we identify that the performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts. This indicates the necessity of realigning the feature embeddings with their corresponding classifier weights. However, the misalignment makes pseudo-labels unreliable under domain shifts. To address this challenge, we propose NCTTA, a novel feature-classifier alignment method with hybrid targets to mitigate the impact of unreliable pseudo-labels, which blends geometric proximity with predictive confidence. Extensive experiments demonstrate the effectiveness of NCTTA in enhancing robustness to domain shifts. For example, NCTTA outperforms Tent by 14.52% on ImageNet-C. Project page is publicly available at https://github.com/Cevaaa/NCTTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript extends Neural Collapse (NC) to the sample-wise level by introducing Sample-wise Alignment Collapse (NC3+), which shows that a sample's feature embedding aligns closely with its corresponding classifier weight. It argues that TTA performance degradation under domain shifts stems from sample-wise misalignment between features and weights (worsening with larger shifts), leading to unreliable pseudo-labels. To address this, the authors propose NCTTA, which uses hybrid targets blending geometric proximity and predictive confidence for realignment, and report large empirical gains such as +14.52% over Tent on ImageNet-C.

Significance. If the causal link between sample-wise misalignment and TTA degradation is established and the hybrid-target method proves robust, the work could supply a useful geometric lens on TTA failures and a practical adaptation technique. The reported gains on ImageNet-C are notable, but overall significance is limited by the absence of controls that isolate the proposed mechanism from other shift-induced effects.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.
  2. [Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.
minor comments (1)
  1. [Abstract] Abstract: The term 'NC3+' is introduced without a concise recap of the standard NC1–NC4 properties; a one-sentence reminder of the prior collapse metrics would improve readability for readers unfamiliar with the NC literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the concerns about establishing causality for sample-wise misalignment and the need for targeted ablations on the hybrid targets. Below we provide point-by-point responses and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.

    Authors: We thank the referee for highlighting the importance of causal evidence. Our analysis across ImageNet-C severity levels and other shift benchmarks shows that sample-wise misalignment (via NC3+) increases monotonically with shift intensity and correlates strongly with TTA degradation, while NCTTA's targeted realignment yields consistent gains. This provides robust observational support for the mechanism. We agree a direct intervention would strengthen the claim further. In revision we will add a controlled experiment that perturbs feature embeddings to induce misalignment while holding the input distribution fixed, measuring effects on pseudo-label quality and adaptation performance. This will appear as a new analysis subsection. revision: partial

  2. Referee: [Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.

    Authors: We appreciate this suggestion for isolating component contributions. The geometric-proximity term is directly derived from NC3+ to encourage feature-classifier alignment, while predictive confidence mitigates pseudo-label noise under shifts. In the revised manuscript we will add comprehensive ablations comparing (i) geometric-proximity only, (ii) predictive-confidence only, and (iii) the full hybrid NCTTA. We will also benchmark against stronger pseudo-labeling baselines (e.g., entropy-minimization variants and consistency-regularized self-training) under identical TTA protocols and report results in an expanded experimental table with discussion of why the NC3+-motivated blend is necessary for the observed gains. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on empirical observation of NC3+ and a proposed alignment method without self-referential derivations.

full rationale

The paper extends Neural Collapse to the sample-wise level via empirical discovery of Sample-wise Alignment Collapse (NC3+), attributes TTA degradation to misalignment based on observed correlations with distribution shifts, and introduces NCTTA using hybrid geometric-predictive targets. No equations, fitted parameters, or derivations are shown that reduce the claimed phenomenon or performance gains to inputs by construction. The abstract cites prior NC work as external foundation and presents new observations plus a practical fix; the derivation chain is self-contained against external benchmarks with no load-bearing self-citation or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on extending the existing neural-collapse framework to the per-sample level and on the empirical observation that misalignment drives TTA failure; no new free parameters or invented physical entities are visible in the abstract.

axioms (1)
  • domain assumption Neural collapse properties observed in trained DNNs on in-distribution data continue to be relevant under test-time domain shifts
    The paper builds directly on prior NC literature and assumes the geometric alignment insight transfers to the TTA setting.
invented entities (1)
  • Sample-wise Alignment Collapse (NC3+) no independent evidence
    purpose: To name and describe the per-sample drift between feature embeddings and classifier weights under domain shift
    New descriptive term introduced to capture the observed misalignment phenomenon.

pith-pipeline@v0.9.0 · 5545 in / 1420 out tokens · 70152 ms · 2026-05-16T23:32:50.471051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

  1. [1]

    Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023

    Mou ¨ın Ben Ammar, Nacim Belkhir, Sebastian Popescu, An- toine Manzanera, and Gianni Franchi. Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023. 1, 2, 4

  2. [2]

    An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021. 6

  3. [3]

    Layer- peeled model: Toward understanding well-trained deep neu- ral networks.arXiv preprint arXiv:2101.12699, 4, 2021

    Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Layer- peeled model: Toward understanding well-trained deep neu- ral networks.arXiv preprint arXiv:2101.12699, 4, 2021. 2

  4. [4]

    Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021

    Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. 1, 2

  5. [5]

    NOTE: Robust continual test-time adaptation against temporal correlation

    Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. NOTE: Robust continual test-time adaptation against temporal correlation. InAd- vances in Neural Information Processing Systems (NeurIPS),

  6. [6]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

  7. [7]

    Benchmarking neu- ral network robustness to common corruptions and perturba- tions.Proceedings of the International Conference on Learn- ing Representations, 2019

    Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions.Proceedings of the International Conference on Learn- ing Representations, 2019. 5

  8. [8]

    Test-time classifier adjustment module for model-agnostic domain generaliza- tion

    Yusuke Iwasawa and Yutaka Matsuo. Test-time classifier adjustment module for model-agnostic domain generaliza- tion. InAdvances in Neural Information Processing Systems, pages 2427–2440. Curran Associates, Inc., 2021. 1, 2

  9. [9]

    Efficient test-time adaptation of vision-language models.The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

    Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models.The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 8

  10. [10]

    Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366,

    Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366,

  11. [11]

    Hospedales

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, broader and artier domain generaliza- tion, 2017. 5

  12. [12]

    Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation

    Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation. InInternational Conference on Machine Learning (ICML), pages 6028–6039, 2020. 1, 2

  13. [13]

    A comprehensive sur- vey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2303.15361, 2023

    Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2303.15361, 2023. 1

  14. [14]

    Deep unsupervised domain adaptation: A review of recent ad- vances and perspectives.APSIPA Transactions on Signal and Information Processing, 11(1), 2022

    Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, Jonghye Woo, et al. Deep unsupervised domain adaptation: A review of recent ad- vances and perspectives.APSIPA Transactions on Signal and Information Processing, 11(1), 2022. 1

  15. [15]

    Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022

    Jianfeng Lu and Stefan Steinerberger. Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022. 2

  16. [16]

    Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

    R Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference.arXiv preprint arXiv:1902.01007, 2019. 2

  17. [17]

    Efficient test-time model adaptation without forgetting

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. InInterna- tional conference on machine learning, pages 16888–16905. PMLR, 2022. 1, 2, 5

  18. [18]

    Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023. 1, 2, 5

  19. [19]

    Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020

    Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. 1, 2, 3

  20. [20]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 8

  21. [21]

    Rumelhart, Geoffrey E

    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.Learning Internal Representations by Error Prop- agation. 1985. 5

  22. [22]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Chal- lenge.International Journal of Computer Vision (IJCV), 115 (3):211–252, 2015. 3

  23. [23]

    Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization

    Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization. InInternational Conference on Learn- ing Representations, 2019. 5

  24. [24]

    Schneider, E

    Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. Removing covariate shift improves robustness against common corrup- tions.CoRR, abs/2006.16971, 2020. 1, 5

  25. [25]

    Test- time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Process- ing Systems, 35:14274–14289, 2022

    Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test- time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Process- ing Systems, 35:14274–14289, 2022. 8

  26. [26]

    Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017

    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017. 1 9

  27. [27]

    Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008

    Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008. 8

  28. [28]

    Tent: Fully Test-time Adaptation by Entropy Minimization

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,

  29. [29]

    Continual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 1, 2

  30. [30]

    On the emergence of simplex symmetry in the final and penultimate layers of neu- ral network classifiers

    E Weinan and Stephan Wojtowytsch. On the emergence of simplex symmetry in the final and penultimate layers of neu- ral network classifiers. InMathematical and Scientific Ma- chine Learning, pages 270–290. PMLR, 2022. 2

  31. [31]

    How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

    Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2

  32. [32]

    Memo: Test time robustness via adaptation and augmentation.Ad- vances in neural information processing systems, 35:38629– 38642, 2022

    Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Ad- vances in neural information processing systems, 35:38629– 38642, 2022. 1, 2, 5

  33. [33]

    Boostadapter: Improving test- time adaptation via regional bootstrapping.arXiv preprint arXiv:2410.15430, 2024

    Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, and Shu-Tao Xia. Boostadapter: Improving test- time adaptation via regional bootstrapping.arXiv preprint arXiv:2410.15430, 2024. 8

  34. [34]

    On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023

    Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023. 6

  35. [35]

    Understanding imbalanced semantic segmentation through neural collapse

    Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xi- aojuan Qi, Xiangyu Zhang, and Jiaya Jia. Understanding imbalanced semantic segmentation through neural collapse. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 19550–19560, 2023. 1, 2

  36. [36]

    Are all losses created equal: A neural collapse perspective.Advances in Neural Information Processing Systems, 35:31697–31710, 2022

    Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, and Zhihui Zhu. Are all losses created equal: A neural collapse perspective.Advances in Neural Information Processing Systems, 35:31697–31710, 2022. 2

  37. [37]

    Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022

    Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022. 1

  38. [38]

    Neural collapse anchored prompt tuning for generalizable vision-language models

    Didi Zhu, Zexi Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, and Chao Wu. Neural collapse anchored prompt tuning for generalizable vision-language models. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 4631–4640, 2024. 1, 2

  39. [39]

    A geometric analysis of neu- ral collapse with unconstrained features.Advances in Neural Information Processing Systems, 34:29820–29834, 2021

    Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. A geometric analysis of neu- ral collapse with unconstrained features.Advances in Neural Information Processing Systems, 34:29820–29834, 2021. 2 10