Neural Collapse in Test-Time Adaptation
Pith reviewed 2026-05-16 23:32 UTC · model grok-4.3
The pith
Sample-wise neural collapse shows that feature-classifier misalignment drives test-time adaptation failures under domain shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By extending neural collapse to the sample level, the work observes that a sample's feature embedding aligns closely with its corresponding classifier weight vector. This alignment collapses during test-time adaptation, and the resulting sample-wise misalignment is the direct source of performance degradation that becomes worse under larger distribution shifts. Restoring the alignment therefore requires new targets that combine geometric proximity with predictive confidence to overcome the unreliability of pseudo-labels.
What carries the argument
Sample-wise Alignment Collapse (NC3+), the per-sample geometric alignment between feature embeddings and classifier weights that holds in a trained model and breaks under domain-shifted adaptation.
If this is right
- Realigning each sample's features to its classifier weight recovers accuracy lost during test-time adaptation.
- The hybrid targets reduce reliance on unreliable pseudo-labels when distribution shifts are large.
- Gains from the method increase as the domain gap widens, as shown by the 14.52 percent improvement over Tent on ImageNet-C.
- The same geometric principle explains why standard pseudo-labeling schemes degrade and suggests replacing them with alignment-driven objectives.
Where Pith is reading between the lines
- The same sample-wise alignment view could be tested in unsupervised domain adaptation or continual learning to see whether misalignment is a general failure mode.
- If NC3+ is universal, future adaptation methods could add an explicit alignment loss term instead of relying solely on classification or entropy objectives.
- Measuring the degree of sample-wise collapse before adaptation might serve as a cheap diagnostic for how much a model will degrade on a new domain.
Load-bearing premise
That the observed sample-wise misalignment is the primary cause of adaptation degradation rather than a side effect of other factors, and that blending geometric proximity with model confidence produces more reliable targets than existing pseudo-label schemes.
What would settle it
A controlled experiment in which models are forced to maintain sample-wise feature-weight alignment during adaptation yet still suffer the same accuracy drop, or in which the hybrid targets improve accuracy without any measurable reduction in misalignment.
Figures
read the original abstract
Test-Time Adaptation (TTA) enhances model robustness to out-of-distribution (OOD) data by updating the model online during inference, yet existing methods lack theoretical insights into the fundamental causes of performance degradation under domain shifts. Recently, Neural Collapse (NC) has been proposed as an emergent geometric property of deep neural networks (DNNs), providing valuable insights for TTA. In this work, we extend NC to the sample-wise level and discover a novel phenomenon termed Sample-wise Alignment Collapse (NC3+), demonstrating that a sample's feature embedding, obtained by a trained model, aligns closely with the corresponding classifier weight. Building on NC3+, we identify that the performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts. This indicates the necessity of realigning the feature embeddings with their corresponding classifier weights. However, the misalignment makes pseudo-labels unreliable under domain shifts. To address this challenge, we propose NCTTA, a novel feature-classifier alignment method with hybrid targets to mitigate the impact of unreliable pseudo-labels, which blends geometric proximity with predictive confidence. Extensive experiments demonstrate the effectiveness of NCTTA in enhancing robustness to domain shifts. For example, NCTTA outperforms Tent by 14.52% on ImageNet-C. Project page is publicly available at https://github.com/Cevaaa/NCTTA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends Neural Collapse (NC) to the sample-wise level by introducing Sample-wise Alignment Collapse (NC3+), which shows that a sample's feature embedding aligns closely with its corresponding classifier weight. It argues that TTA performance degradation under domain shifts stems from sample-wise misalignment between features and weights (worsening with larger shifts), leading to unreliable pseudo-labels. To address this, the authors propose NCTTA, which uses hybrid targets blending geometric proximity and predictive confidence for realignment, and report large empirical gains such as +14.52% over Tent on ImageNet-C.
Significance. If the causal link between sample-wise misalignment and TTA degradation is established and the hybrid-target method proves robust, the work could supply a useful geometric lens on TTA failures and a practical adaptation technique. The reported gains on ImageNet-C are notable, but overall significance is limited by the absence of controls that isolate the proposed mechanism from other shift-induced effects.
major comments (2)
- [Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.
- [Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.
minor comments (1)
- [Abstract] Abstract: The term 'NC3+' is introduced without a concise recap of the standard NC1–NC4 properties; a one-sentence reminder of the prior collapse metrics would improve readability for readers unfamiliar with the NC literature.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the concerns about establishing causality for sample-wise misalignment and the need for targeted ablations on the hybrid targets. Below we provide point-by-point responses and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'performance degradation stems from sample-wise misalignment in adaptation which exacerbates under larger distribution shifts' is presented as following from NC3+, yet no intervention is described that holds the distribution shift fixed while selectively altering misalignment (e.g., via controlled feature perturbation or weight adjustment). Without such a test, the causal status of misalignment versus correlated symptoms (feature degradation, pseudo-label noise) remains unproven.
Authors: We thank the referee for highlighting the importance of causal evidence. Our analysis across ImageNet-C severity levels and other shift benchmarks shows that sample-wise misalignment (via NC3+) increases monotonically with shift intensity and correlates strongly with TTA degradation, while NCTTA's targeted realignment yields consistent gains. This provides robust observational support for the mechanism. We agree a direct intervention would strengthen the claim further. In revision we will add a controlled experiment that perturbs feature embeddings to induce misalignment while holding the input distribution fixed, measuring effects on pseudo-label quality and adaptation performance. This will appear as a new analysis subsection. revision: partial
-
Referee: [Abstract] Abstract: The hybrid-target construction in NCTTA is offered as the solution to unreliable pseudo-labels, but the manuscript supplies no ablation that isolates the geometric-proximity term from the predictive-confidence term, nor any comparison against stronger pseudo-labeling baselines under matched conditions. This leaves open whether the reported gains require the specific NC3+-motivated blend or would arise from any sufficiently stable labeling scheme.
Authors: We appreciate this suggestion for isolating component contributions. The geometric-proximity term is directly derived from NC3+ to encourage feature-classifier alignment, while predictive confidence mitigates pseudo-label noise under shifts. In the revised manuscript we will add comprehensive ablations comparing (i) geometric-proximity only, (ii) predictive-confidence only, and (iii) the full hybrid NCTTA. We will also benchmark against stronger pseudo-labeling baselines (e.g., entropy-minimization variants and consistency-regularized self-training) under identical TTA protocols and report results in an expanded experimental table with discussion of why the NC3+-motivated blend is necessary for the observed gains. revision: partial
Circularity Check
No circularity; claims rest on empirical observation of NC3+ and a proposed alignment method without self-referential derivations.
full rationale
The paper extends Neural Collapse to the sample-wise level via empirical discovery of Sample-wise Alignment Collapse (NC3+), attributes TTA degradation to misalignment based on observed correlations with distribution shifts, and introduces NCTTA using hybrid geometric-predictive targets. No equations, fitted parameters, or derivations are shown that reduce the claimed phenomenon or performance gains to inputs by construction. The abstract cites prior NC work as external foundation and presents new observations plus a practical fix; the derivation chain is self-contained against external benchmarks with no load-bearing self-citation or renaming of known results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural collapse properties observed in trained DNNs on in-distribution data continue to be relevant under test-time domain shifts
invented entities (1)
-
Sample-wise Alignment Collapse (NC3+)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem: Sample-wise Alignment Collapse (NC3+). During the TPT, the G-FCA distance d_iyi ... converges to zero
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NCTTA ... blends geometric proximity with predictive confidence ... L_NC(xi) = ℓ({d_ij}j∈Ti , {d_ij}j∉Ti)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023
Mou ¨ın Ben Ammar, Nacim Belkhir, Sebastian Popescu, An- toine Manzanera, and Gianni Franchi. Neco: Neural col- lapse based out-of-distribution detection.arXiv preprint arXiv:2310.06823, 2023. 1, 2, 4
-
[2]
An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.ICLR, 2021. 6
work page 2021
-
[3]
Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Layer- peeled model: Toward understanding well-trained deep neu- ral networks.arXiv preprint arXiv:2101.12699, 4, 2021. 2
-
[4]
Cong Fang, Hangfeng He, Qi Long, and Weijie J Su. Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. 1, 2
work page 2021
-
[5]
NOTE: Robust continual test-time adaptation against temporal correlation
Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. NOTE: Robust continual test-time adaptation against temporal correlation. InAd- vances in Neural Information Processing Systems (NeurIPS),
-
[6]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6
work page 2016
-
[7]
Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions.Proceedings of the International Conference on Learn- ing Representations, 2019. 5
work page 2019
-
[8]
Test-time classifier adjustment module for model-agnostic domain generaliza- tion
Yusuke Iwasawa and Yutaka Matsuo. Test-time classifier adjustment module for model-agnostic domain generaliza- tion. InAdvances in Neural Information Processing Systems, pages 2427–2440. Curran Associates, Inc., 2021. 1, 2
work page 2021
-
[9]
Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models.The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 8
work page 2024
-
[10]
Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors.arXiv preprint arXiv:2403.07366,
-
[11]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, broader and artier domain generaliza- tion, 2017. 5
work page 2017
-
[12]
Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation. InInternational Conference on Machine Learning (ICML), pages 6028–6039, 2020. 1, 2
work page 2020
-
[13]
Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2303.15361, 2023. 1
-
[14]
Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, Jonghye Woo, et al. Deep unsupervised domain adaptation: A review of recent ad- vances and perspectives.APSIPA Transactions on Signal and Information Processing, 11(1), 2022. 1
work page 2022
-
[15]
Jianfeng Lu and Stefan Steinerberger. Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022. 2
work page 2022
-
[16]
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R Thomas McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference.arXiv preprint arXiv:1902.01007, 2019. 2
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[17]
Efficient test-time model adaptation without forgetting
Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. InInterna- tional conference on machine learning, pages 16888–16905. PMLR, 2022. 1, 2, 5
work page 2022
-
[18]
Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023
Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023. 1, 2, 5
-
[19]
Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. 1, 2, 3
work page 2020
-
[20]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 8
work page 2021
-
[21]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.Learning Internal Representations by Error Prop- agation. 1985. 5
work page 1985
-
[22]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Chal- lenge.International Journal of Computer Vision (IJCV), 115 (3):211–252, 2015. 3
work page 2015
-
[23]
Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization. InInternational Conference on Learn- ing Representations, 2019. 5
work page 2019
-
[24]
Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bring- mann, Wieland Brendel, and Matthias Bethge. Removing covariate shift improves robustness against common corrup- tions.CoRR, abs/2006.16971, 2020. 1, 5
-
[25]
Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test- time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Process- ing Systems, 35:14274–14289, 2022. 8
work page 2022
-
[26]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017. 1 9
work page 2017
-
[27]
Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008. 8
work page 2008
-
[28]
Tent: Fully Test-time Adaptation by Entropy Minimization
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[29]
Continual test-time domain adaptation
Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 1, 2
work page 2022
-
[30]
E Weinan and Stephan Wojtowytsch. On the emergence of simplex symmetry in the final and penultimate layers of neu- ral network classifiers. InMathematical and Scientific Ma- chine Learning, pages 270–290. PMLR, 2022. 2
work page 2022
-
[31]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2
work page 2014
-
[32]
Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Ad- vances in neural information processing systems, 35:38629– 38642, 2022. 1, 2, 5
work page 2022
-
[33]
Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, and Shu-Tao Xia. Boostadapter: Improving test- time adaptation via regional bootstrapping.arXiv preprint arXiv:2410.15430, 2024. 8
-
[34]
On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023
Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023. 6
-
[35]
Understanding imbalanced semantic segmentation through neural collapse
Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xi- aojuan Qi, Xiangyu Zhang, and Jiaya Jia. Understanding imbalanced semantic segmentation through neural collapse. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 19550–19560, 2023. 1, 2
work page 2023
-
[36]
Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, and Zhihui Zhu. Are all losses created equal: A neural collapse perspective.Advances in Neural Information Processing Systems, 35:31697–31710, 2022. 2
work page 2022
-
[37]
Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022. 1
work page 2022
-
[38]
Neural collapse anchored prompt tuning for generalizable vision-language models
Didi Zhu, Zexi Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, and Chao Wu. Neural collapse anchored prompt tuning for generalizable vision-language models. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 4631–4640, 2024. 1, 2
work page 2024
-
[39]
Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. A geometric analysis of neu- ral collapse with unconstrained features.Advances in Neural Information Processing Systems, 34:29820–29834, 2021. 2 10
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.