arxiv: 2512.10421 · v2 · submitted 2025-12-11 · 💻 cs.CV

Neural Collapse in Test-Time Adaptation

Xiao Chen , Zhongjing Du , Jiazhen Huang , Xu Jiang , Li Lu , Jingyan Jiang , Zhi Wang This is my paper

Pith reviewed 2026-05-16 23:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords test-time adaptationneural collapsesample-wise alignmentdomain shiftfeature embeddingsclassifier weightspseudo-labelingout-of-distribution robustness

0 comments

The pith

Sample-wise neural collapse shows that feature-classifier misalignment drives test-time adaptation failures under domain shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends neural collapse from class-level geometry to individual samples and identifies a new pattern called Sample-wise Alignment Collapse. In this pattern, each sample's learned feature vector sits close to its matching classifier weight vector in a trained network. When the model adapts online to shifted test data, this per-sample alignment breaks, producing unreliable pseudo-labels whose errors grow with the size of the shift. The authors therefore introduce a method that restores alignment by blending geometric proximity to the weights with the model's own prediction confidence, rather than depending only on noisy labels.

Core claim

By extending neural collapse to the sample level, the work observes that a sample's feature embedding aligns closely with its corresponding classifier weight vector. This alignment collapses during test-time adaptation, and the resulting sample-wise misalignment is the direct source of performance degradation that becomes worse under larger distribution shifts. Restoring the alignment therefore requires new targets that combine geometric proximity with predictive confidence to overcome the unreliability of pseudo-labels.

What carries the argument

Sample-wise Alignment Collapse (NC3+), the per-sample geometric alignment between feature embeddings and classifier weights that holds in a trained model and breaks under domain-shifted adaptation.

If this is right

Realigning each sample's features to its classifier weight recovers accuracy lost during test-time adaptation.
The hybrid targets reduce reliance on unreliable pseudo-labels when distribution shifts are large.
Gains from the method increase as the domain gap widens, as shown by the 14.52 percent improvement over Tent on ImageNet-C.
The same geometric principle explains why standard pseudo-labeling schemes degrade and suggests replacing them with alignment-driven objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sample-wise alignment view could be tested in unsupervised domain adaptation or continual learning to see whether misalignment is a general failure mode.
If NC3+ is universal, future adaptation methods could add an explicit alignment loss term instead of relying solely on classification or entropy objectives.
Measuring the degree of sample-wise collapse before adaptation might serve as a cheap diagnostic for how much a model will degrade on a new domain.

Load-bearing premise

That the observed sample-wise misalignment is the primary cause of adaptation degradation rather than a side effect of other factors, and that blending geometric proximity with model confidence produces more reliable targets than existing pseudo-label schemes.

What would settle it

A controlled experiment in which models are forced to maintain sample-wise feature-weight alignment during adaptation yet still suffer the same accuracy drop, or in which the hybrid targets improve accuracy without any measurable reduction in misalignment.

Figures

Figures reproduced from arXiv: 2512.10421 by Jiazhen Huang, Jingyan Jiang, Li Lu, Xiao Chen, Xu Jiang, Zhi Wang, Zhongjing Du.

**Figure 1.** Figure 1: Overview of Main Contributions. (a) NC3+ highlights the convergence of sample feature embeddings with their corresponding classifier weights. (b) Sample feature embeddings deviate from the ground-truth classifier weights, leading to performance degradation. to adapt online during inference using only a mini-batch of test data, without altering the training process [13]. This lightweight and efficient par… view at source ↗

**Figure 2.** Figure 2: Empirical validation of NC3+. We evaluate NC3+ on ImageNet-100 [22] using various backbones. The G-FCA distance diyi decreases throughout training, indicating sample-wise alignment collapse. More details are provided in Appendix A.2. NC3+ encapsulates the phenomenon of sample-wise alignment collapse during the TPT, extending NC theory to settings where class-wise and global means are not feasible. We emp… view at source ↗

**Figure 3.** Figure 3: Histograms of G-FCA and P-FCA distances. We present the distributions of d correct iyi , d wrong iyi , and d wrong iyˆi on correctly and incorrectly classified samples from ImageNet-C datasets under severity level 5 Gaussian noise and Snow corruption. The results reveal that NC3+ is violated on OOD data, leading to Samplewise Misalignment in Adaptation. Level 1 Level 2 Level 3 Level 4 Level 5 Gaussian N… view at source ↗

**Figure 4.** Figure 4: Violin plots of G-FCA and P-FCA distances for misclassified samples. The plots show the distributions of distances from misclassified OOD samples to both G-FCA d wrong iyi and PFCA d wrong iyˆi under increasing Gaussian noise or Snow severity on ImageNet-C. The results reveal that misalignment becomes progressively more severe with higher corruption levels. shifts, OOD sample feature embeddings deviate … view at source ↗

**Figure 5.** Figure 5: Overview of our proposed NCTTA. During test-time adaptation, NCTTA blends geometric proximity (FCA distance) and predictive confidence to form hybrid targets, pulling features toward plausible classifier weights while pushing away negatives via LNC. Therefore, simply constraining P-FCA distance diyˆi is impractical. Our NCTTA replace yˆi with dual-guided hybrid targets determined by ye ∈ R K via: yei = (1… view at source ↗

**Figure 6.** Figure 6: Ablation of LNC, α and k. (a) LNC is instantiated with three variants: InfoNCE-style, L2-style, and Triplet-style. (b) Sensitivity analysis of α and k, evaluated on ImageNet-C under the Contrast corruption at severity level 5. configurations, are provided in Appendix B.2. The backbone used in the experiments are ResNet50 [6] for CIFAR10/100-C, WaterBirds and PACS, and ViT-B/16 [2] for ImageNet-C. Experi… view at source ↗

**Figure 8.** Figure 8: t-SNE comparison under Gaussian noise. The figure compares t-SNE visualizations of feature representations for Tent and NCTTA on CIFAR-10-C under Gaussian noise with severity level 5. NCTTA forms more distinct and well-separated clusters compared to Tent, enhancing the model’s discriminative power under severe corruption. formulating Sample-wise Alignment Collapse (NC3+). We theoretically and empirically … view at source ↗

**Figure 7.** Figure 7: Comparison of G-FCA distance under Gaussian noise (ImageNet-C, ViT-B/16). The plot compares diyi for Tent, SAR and NCTTA on ImageNet-C under Gaussian noise with severity level 5. The results show that NCTTA consistently achieves lower dyi compared to Tent and SAR, demonstrating better feature-classifier alignment and enhanced robustness. Furthermore, t-SNE visualizations [27] presented in Figure 8 substa… view at source ↗

read the original abstract

Test-Time Adaptation (TTA) enhances model robustness to out-of-distribution (OOD) data by updating the model online during inference, yet existing methods lack theoretical insights into the fundamental causes of performance degradation under domain shifts. Recently, Neural Collapse (NC) has been proposed as an emergent geometric property of deep neural networks (DNNs), providing valuable insights for TTA. In this work, we extend NC to the sample-wise level and discover a novel phenomenon termed Sample-wise Alignment Collapse (NC3+), demonstrating that a sample's feature embedding, obtained by a trained model, aligns closely with the corresponding classifier weight. Building on NC3+, we identify that the performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts. This indicates the necessity of realigning the feature embeddings with their corresponding classifier weights. However, the misalignment makes pseudo-labels unreliable under domain shifts. To address this challenge, we propose NCTTA, a novel feature-classifier alignment method with hybrid targets to mitigate the impact of unreliable pseudo-labels, which blends geometric proximity with predictive confidence. Extensive experiments demonstrate the effectiveness of NCTTA in enhancing robustness to domain shifts. For example, NCTTA outperforms Tent by 14.52% on ImageNet-C. Project page is publicly available at https://github.com/Cevaaa/NCTTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends neural collapse to per-sample alignment, links its breakdown under shift to TTA failure, and proposes a hybrid-target fix that delivers sizable gains on ImageNet-C.

read the letter

The main thing to know is that the authors take neural collapse down to the individual sample, observe that a sample's embedding aligns with its classifier weight in the source model, and show this alignment collapses under distribution shift. They treat the resulting misalignment as the reason standard TTA methods lose ground and respond with NCTTA, which blends a geometric pull toward the nearest classifier weight with the usual predictive confidence to create more stable targets. The reported lift over Tent on ImageNet-C is large enough to notice.

Referee Report

2 major / 1 minor

Summary. The manuscript extends Neural Collapse (NC) to the sample-wise level by introducing Sample-wise Alignment Collapse (NC3+), which shows that a sample's feature embedding aligns closely with its corresponding classifier weight. It argues that TTA performance degradation under domain shifts stems from sample-wise misalignment between features and weights (worsening with larger shifts), leading to unreliable pseudo-labels. To address this, the authors propose NCTTA, which uses hybrid targets blending geometric proximity and predictive confidence for realignment, and report large empirical gains such as +14.52% over Tent on ImageNet-C.

Significance. If the causal link between sample-wise misalignment and TTA degradation is established and the hybrid-target method proves robust, the work could supply a useful geometric lens on TTA failures and a practical adaptation technique. The reported gains on ImageNet-C are notable, but overall significance is limited by the absence of controls that isolate the proposed mechanism from other shift-induced effects.

major comments (2)

[Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.
[Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.

minor comments (1)

[Abstract] Abstract: The term 'NC3+' is introduced without a concise recap of the standard NC1–NC4 properties; a one-sentence reminder of the prior collapse metrics would improve readability for readers unfamiliar with the NC literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the concerns about establishing causality for sample-wise misalignment and the need for targeted ablations on the hybrid targets. Below we provide point-by-point responses and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.

Authors: We thank the referee for highlighting the importance of causal evidence. Our analysis across ImageNet-C severity levels and other shift benchmarks shows that sample-wise misalignment (via NC3+) increases monotonically with shift intensity and correlates strongly with TTA degradation, while NCTTA's targeted realignment yields consistent gains. This provides robust observational support for the mechanism. We agree a direct intervention would strengthen the claim further. In revision we will add a controlled experiment that perturbs feature embeddings to induce misalignment while holding the input distribution fixed, measuring effects on pseudo-label quality and adaptation performance. This will appear as a new analysis subsection. revision: partial
Referee: [Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.

Authors: We appreciate this suggestion for isolating component contributions. The geometric-proximity term is directly derived from NC3+ to encourage feature-classifier alignment, while predictive confidence mitigates pseudo-label noise under shifts. In the revised manuscript we will add comprehensive ablations comparing (i) geometric-proximity only, (ii) predictive-confidence only, and (iii) the full hybrid NCTTA. We will also benchmark against stronger pseudo-labeling baselines (e.g., entropy-minimization variants and consistency-regularized self-training) under identical TTA protocols and report results in an expanded experimental table with discussion of why the NC3+-motivated blend is necessary for the observed gains. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on empirical observation of NC3+ and a proposed alignment method without self-referential derivations.

full rationale

The paper extends Neural Collapse to the sample-wise level via empirical discovery of Sample-wise Alignment Collapse (NC3+), attributes TTA degradation to misalignment based on observed correlations with distribution shifts, and introduces NCTTA using hybrid geometric-predictive targets. No equations, fitted parameters, or derivations are shown that reduce the claimed phenomenon or performance gains to inputs by construction. The abstract cites prior NC work as external foundation and presents new observations plus a practical fix; the derivation chain is self-contained against external benchmarks with no load-bearing self-citation or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on extending the existing neural-collapse framework to the per-sample level and on the empirical observation that misalignment drives TTA failure; no new free parameters or invented physical entities are visible in the abstract.

axioms (1)

domain assumption Neural collapse properties observed in trained DNNs on in-distribution data continue to be relevant under test-time domain shifts
The paper builds directly on prior NC literature and assumes the geometric alignment insight transfers to the TTA setting.

invented entities (1)

Sample-wise Alignment Collapse (NC3+) no independent evidence
purpose: To name and describe the per-sample drift between feature embeddings and classifier weights under domain shift
New descriptive term introduced to capture the observed misalignment phenomenon.

pith-pipeline@v0.9.0 · 5545 in / 1420 out tokens · 70152 ms · 2026-05-16T23:32:50.471051+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem: Sample-wise Alignment Collapse (NC3+). During the TPT, the G-FCA distance d_iyi ... converges to zero
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

NCTTA ... blends geometric proximity with predictive confidence ... L_NC(xi) = ℓ({d_ij}j∈Ti , {d_ij}j∉Ti)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

[1]

Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023

Mou ¨ın Ben Ammar, Nacim Belkhir, Sebastian Popescu, An- toine Manzanera, and Gianni Franchi. Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023. 1, 2, 4

work page arXiv 2023
[2]

An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021. 6

work page 2021
[3]

Layer- peeled model: Toward understanding well-trained deep neu- ral networks.arXiv preprint arXiv:2101.12699, 4, 2021

Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Layer- peeled model: Toward understanding well-trained deep neu- ral networks.arXiv preprint arXiv:2101.12699, 4, 2021. 2

work page arXiv 2021
[4]

Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021

Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. 1, 2

work page 2021
[5]

NOTE: Robust continual test-time adaptation against temporal correlation

Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. NOTE: Robust continual test-time adaptation against temporal correlation. InAd- vances in Neural Information Processing Systems (NeurIPS),

work page
[6]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

work page 2016
[7]

Benchmarking neu- ral network robustness to common corruptions and perturba- tions.Proceedings of the International Conference on Learn- ing Representations, 2019

Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions.Proceedings of the International Conference on Learn- ing Representations, 2019. 5

work page 2019
[8]

Test-time classifier adjustment module for model-agnostic domain generaliza- tion

Yusuke Iwasawa and Yutaka Matsuo. Test-time classifier adjustment module for model-agnostic domain generaliza- tion. InAdvances in Neural Information Processing Systems, pages 2427–2440. Curran Associates, Inc., 2021. 1, 2

work page 2021
[9]

Efficient test-time adaptation of vision-language models.The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models.The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 8

work page 2024
[10]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366,

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366,

work page arXiv
[11]

Hospedales

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, broader and artier domain generaliza- tion, 2017. 5

work page 2017
[12]

Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation

Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation. InInternational Conference on Machine Learning (ICML), pages 6028–6039, 2020. 1, 2

work page 2020
[13]

A comprehensive sur- vey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2303.15361, 2023

Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2303.15361, 2023. 1

work page arXiv 2023
[14]

Deep unsupervised domain adaptation: A review of recent ad- vances and perspectives.APSIPA Transactions on Signal and Information Processing, 11(1), 2022

Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, Jonghye Woo, et al. Deep unsupervised domain adaptation: A review of recent ad- vances and perspectives.APSIPA Transactions on Signal and Information Processing, 11(1), 2022. 1

work page 2022
[15]

Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022

Jianfeng Lu and Stefan Steinerberger. Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022. 2

work page 2022
[16]

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

R Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference.arXiv preprint arXiv:1902.01007, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1902
[17]

Efficient test-time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. InInterna- tional conference on machine learning, pages 16888–16905. PMLR, 2022. 1, 2, 5

work page 2022
[18]

Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023. 1, 2, 5

work page arXiv 2023
[19]

Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020

Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. 1, 2, 3

work page 2020
[20]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 8

work page 2021
[21]

Rumelhart, Geoffrey E

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.Learning Internal Representations by Error Prop- agation. 1985. 5

work page 1985
[22]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Chal- lenge.International Journal of Computer Vision (IJCV), 115 (3):211–252, 2015. 3

work page 2015
[23]

Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization. InInternational Conference on Learn- ing Representations, 2019. 5

work page 2019
[24]

Schneider, E

Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. Removing covariate shift improves robustness against common corrup- tions.CoRR, abs/2006.16971, 2020. 1, 5

work page arXiv 2006
[25]

Test- time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Process- ing Systems, 35:14274–14289, 2022

Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test- time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Process- ing Systems, 35:14274–14289, 2022. 8

work page 2022
[26]

Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017. 1 9

work page 2017
[27]

Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008. 8

work page 2008
[28]

Tent: Fully Test-time Adaptation by Entropy Minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[29]

Continual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 1, 2

work page 2022
[30]

On the emergence of simplex symmetry in the final and penultimate layers of neu- ral network classifiers

E Weinan and Stephan Wojtowytsch. On the emergence of simplex symmetry in the final and penultimate layers of neu- ral network classifiers. InMathematical and Scientific Ma- chine Learning, pages 270–290. PMLR, 2022. 2

work page 2022
[31]

How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2

work page 2014
[32]

Memo: Test time robustness via adaptation and augmentation.Ad- vances in neural information processing systems, 35:38629– 38642, 2022

Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Ad- vances in neural information processing systems, 35:38629– 38642, 2022. 1, 2, 5

work page 2022
[33]

Boostadapter: Improving test- time adaptation via regional bootstrapping.arXiv preprint arXiv:2410.15430, 2024

Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, and Shu-Tao Xia. Boostadapter: Improving test- time adaptation via regional bootstrapping.arXiv preprint arXiv:2410.15430, 2024. 8

work page arXiv 2024
[34]

On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023

Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023. 6

work page arXiv 2023
[35]

Understanding imbalanced semantic segmentation through neural collapse

Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xi- aojuan Qi, Xiangyu Zhang, and Jiaya Jia. Understanding imbalanced semantic segmentation through neural collapse. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 19550–19560, 2023. 1, 2

work page 2023
[36]

Are all losses created equal: A neural collapse perspective.Advances in Neural Information Processing Systems, 35:31697–31710, 2022

Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, and Zhihui Zhu. Are all losses created equal: A neural collapse perspective.Advances in Neural Information Processing Systems, 35:31697–31710, 2022. 2

work page 2022
[37]

Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022. 1

work page 2022
[38]

Neural collapse anchored prompt tuning for generalizable vision-language models

Didi Zhu, Zexi Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, and Chao Wu. Neural collapse anchored prompt tuning for generalizable vision-language models. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 4631–4640, 2024. 1, 2

work page 2024
[39]

A geometric analysis of neu- ral collapse with unconstrained features.Advances in Neural Information Processing Systems, 34:29820–29834, 2021

Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. A geometric analysis of neu- ral collapse with unconstrained features.Advances in Neural Information Processing Systems, 34:29820–29834, 2021. 2 10

work page 2021