pith. sign in

arxiv: 2508.15568 · v8 · submitted 2025-08-21 · 💻 cs.CV · cs.LG

Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment

Pith reviewed 2026-05-18 21:23 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords test-time adaptationdistribution shiftGaussian modelingbackpropagation-freeprobabilistic inferencecomputer visionzero-shot robustness
0
0 comments X

The pith

ADAPT reframes test-time adaptation as closed-form Gaussian inference on online-updated class means with a shared covariance, eliminating all gradient steps and source data needs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that test-time adaptation under distribution shifts can be performed reliably by modeling unlabeled target features as draws from class-conditional Gaussians whose means are updated gradually and whose covariance is estimated once and shared across classes. This modeling supplies closed-form likelihoods for inference and calibrated predictions without backpropagation, iterative optimization, or access to source data. A sympathetic reader would care because most prior TTA methods either require gradients that prevent real-time use or lack explicit class-conditional modeling, leaving decision boundaries unreliable when shifts occur. The approach further adds lightweight CLIP-guided regularization and a historical bank to correct likelihood bias while supporting both streaming and batch target settings.

Core claim

We reframe TTA as a Gaussian probabilistic inference task by modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix. This enables closed-form, training-free inference. To correct potential likelihood bias, we introduce lightweight regularization guided by CLIP priors and a historical knowledge bank. ADAPT requires no source data, no gradient updates, and no full access to target data, supporting both online and transductive settings.

What carries the argument

Gaussian probabilistic inference that computes class likelihoods from online-updated per-class means and one shared covariance matrix estimated directly from unlabeled target features.

If this is right

  • Real-time inference becomes feasible on edge devices because no gradients or iterative optimization are required.
  • Both online streaming and transductive batch adaptation are supported with only partial or full unlabeled target batches.
  • Calibrated predictions improve under a wide range of distribution shifts without retraining or source replay.
  • Scalability increases because the method avoids storing or accessing the full source dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same online Gaussian update pattern could be applied to other probabilistic heads such as normalizing flows or mixture models to relax the single-covariance assumption.
  • Connecting the historical knowledge bank to Bayesian updating would allow explicit uncertainty quantification over the running means.
  • The CLIP prior regularization suggests a broader pattern: using large-scale vision-language models as cheap, label-free anchors during test-time distribution alignment.

Load-bearing premise

Class-conditional feature distributions in the target domain can be adequately captured by gradually updated class means and a single shared covariance matrix estimated without any labels or source data.

What would settle it

Run the method on a benchmark whose test features exhibit strong non-Gaussian structure or class-conditional covariance differences; if accuracy or calibration falls below optimization-based TTA baselines, the modeling premise fails.

Figures

Figures reproduced from arXiv: 2508.15568 by Hongyeob Kim, Huiling Liu, Sungeun Hong, Youjia Zhang, Youngeun Kim, Young-Geun Choi.

Figure 1
Figure 1. Figure 1: Overview of Online ADAPT. We perform TTA by modeling class-conditional feature [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hyperparameter analysis. 4.3 Ablation Studies and Further Analysis Ablation Study. We ablate the key components in ADAPT. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of decision boundaries on ImageNet-A. The colors indicate different classes. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of covariance properties on ImageNet: (Left) shows Frobenius distance [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of few-shot classification across 30 datasets. We evaluate our method under both [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison of proposed ADAPT on different VLMs. [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
read the original abstract

Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference. Despite notable advances, several challenges still limit its broader applicability. First, most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment. Second, they lack explicit modeling of class-conditional feature distributions. This modeling is crucial for producing reliable decision boundaries and calibrated predictions, but it remains underexplored due to the lack of both source data and supervision at test time. In this paper, we propose ADAPT, an Advanced Distribution-Aware and backPropagation-free Test-time adaptation method. We reframe TTA as a Gaussian probabilistic inference task by modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix. This enables closed-form, training-free inference. To correct potential likelihood bias, we introduce lightweight regularization guided by CLIP priors and a historical knowledge bank. ADAPT requires no source data, no gradient updates, and no full access to target data, supporting both online and transductive settings. Extensive experiments across diverse benchmarks demonstrate that our method achieves state-of-the-art performance under a wide range of distribution shifts with superior scalability and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ADAPT, a backpropagation-free test-time adaptation method for improving zero-shot robustness under distribution shifts. It reframes TTA as closed-form Gaussian probabilistic inference by modeling class-conditional likelihoods with gradually updated class means and a single shared covariance matrix, both estimated from unlabeled target data without source data or gradients. Lightweight regularization via CLIP priors and a historical knowledge bank is introduced to correct likelihood bias. The method supports online and transductive settings and claims state-of-the-art performance with superior scalability across diverse benchmarks.

Significance. If the central claims hold, ADAPT would represent a meaningful advance in efficient TTA by eliminating optimization and backpropagation while providing explicit probabilistic modeling of class-conditional distributions. This could enable real-time deployment in resource-limited settings and improve calibration under shifts, provided the unsupervised mean updates and shared-covariance assumption prove robust.

major comments (3)
  1. [§3] §3 (Method), around the class-mean update rule: the unsupervised assignment of samples to classes for updating means relies on the model's own (potentially biased) predictions under distribution shift. This creates a risk of error accumulation that directly undermines the robustness and SOTA claims, yet no analysis or mitigation beyond the CLIP regularization is detailed to demonstrate stability from unreliable initial predictions.
  2. [Abstract and §4] Abstract and §4 (Experiments): the SOTA performance claim is asserted without reported error bars, ablation studies on the shared covariance assumption, or comparisons isolating the effect of the historical knowledge bank. This is load-bearing because the central contribution is the closed-form Gaussian inference under the stated assumptions.
  3. [§3.2] §3.2, covariance estimation: the single shared covariance matrix is estimated without labels or source data, but the paper does not address how class-specific scale differences (often amplified by shifts) are handled or why this does not degrade decision boundaries relative to per-class covariances.
minor comments (2)
  1. [§3] Notation for the Gaussian parameters (means and covariance) should be introduced with explicit equations early in the method section to clarify the closed-form inference steps.
  2. [§4] Figure captions and experimental tables would benefit from clearer indication of online vs. transductive settings and the exact benchmarks used for each result.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications on our approach and indicate planned revisions to strengthen the presentation and analysis.

read point-by-point responses
  1. Referee: [§3] §3 (Method), around the class-mean update rule: the unsupervised assignment of samples to classes for updating means relies on the model's own (potentially biased) predictions under distribution shift. This creates a risk of error accumulation that directly undermines the robustness and SOTA claims, yet no analysis or mitigation beyond the CLIP regularization is detailed to demonstrate stability from unreliable initial predictions.

    Authors: We agree that relying on the model's initial predictions for unsupervised class-mean updates introduces a risk of error accumulation under distribution shift. Our design mitigates this through gradual momentum-based updates, CLIP priors that serve as an external anchor to correct biased likelihoods, and the historical knowledge bank that accumulates and reuses more reliable statistics over time. These components are intended to limit drift even from imperfect early assignments. To make this robustness explicit, we will add a dedicated stability analysis in the revised manuscript, including plots of prediction consistency across update steps and sensitivity experiments under varying initial conditions. revision: yes

  2. Referee: [Abstract and §4] Abstract and §4 (Experiments): the SOTA performance claim is asserted without reported error bars, ablation studies on the shared covariance assumption, or comparisons isolating the effect of the historical knowledge bank. This is load-bearing because the central contribution is the closed-form Gaussian inference under the stated assumptions.

    Authors: We acknowledge that stronger statistical reporting and component-wise ablations would better substantiate the SOTA claims and the contribution of the closed-form Gaussian inference. In the revised version we will report all main results with error bars computed over multiple random seeds. We will also add an ablation comparing the shared covariance to alternatives (such as diagonal or limited per-class estimates) and a controlled study isolating the historical knowledge bank by removing or varying its contribution. These additions will directly address the load-bearing nature of the assumptions. revision: yes

  3. Referee: [§3.2] §3.2, covariance estimation: the single shared covariance matrix is estimated without labels or source data, but the paper does not address how class-specific scale differences (often amplified by shifts) are handled or why this does not degrade decision boundaries relative to per-class covariances.

    Authors: The shared covariance is chosen because, in the TTA regime, the number of samples per class is typically too small for stable per-class covariance estimation, which would lead to noisy or singular matrices. Pooling across classes provides a more reliable estimate of the overall feature distribution while the class means are updated individually. Although class-specific scales can differ under shifts, the combination of mean alignment and the shared covariance still produces effective probabilistic decision boundaries, as shown by our consistent outperformance of baselines. We will expand §3.2 with an explicit discussion of this design choice, its limitations, and supporting empirical evidence. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; no reductions to inputs by construction

full rationale

The paper reframes TTA as closed-form Gaussian probabilistic inference using gradually updated class means and a shared covariance, with lightweight regularization from external CLIP priors and a historical bank. These steps rely on explicit modeling assumptions and iterative updates from unlabeled target data rather than fitting parameters to a subset and renaming the output as a prediction. No self-citations, uniqueness theorems, or ansatz smuggling are invoked to justify core choices. The derivation does not reduce to its inputs by definition; the probabilistic alignment produces new decision boundaries from the estimated distributions. This is the common honest outcome for a method whose central claim rests on modeling choices that remain falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the unverified assumption that target-domain features are well-modeled by per-class Gaussians with shared covariance; no free parameters are explicitly named in the abstract, and no new entities are postulated.

axioms (1)
  • domain assumption Class-conditional distributions in the target domain are approximately Gaussian and can be tracked via running means and a shared covariance without labels.
    Invoked in the description of modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix.

pith-pipeline@v0.9.0 · 5758 in / 1264 out tokens · 38585 ms · 2026-05-18T21:23:46.666095+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration

    cs.CV 2026-04 unverdicted novelty 6.0

    A probabilistic Gaussian model with adaptive contrastive asymmetry rectification improves multi-modal test-time adaptation by modeling category distributions and correcting modality asymmetry for better predictions un...

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Food-101–Mining Discriminative Components with Random Forests

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–Mining Discriminative Components with Random Forests. InECCV, 2014

  2. [2]

    Information maximization for few-shot learning

    Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, José Dolz, Pablo Piantanida, and Ismail Ben Ayed. Information maximization for few-shot learning. InNeurIPS, 2020

  3. [3]

    Describing Textures in the Wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing Textures in the Wild. InCVPR, 2014

  4. [4]

    Imagenet: A Large-Scale Hierarchical Image Database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A Large-Scale Hierarchical Image Database. InCVPR, 2009

  5. [5]

    A normality test for multivariate dependent samples.Signal Processing, 201:108705, 2022

    Sara El Bouch, Olivier Michel, and Pierre Comon. A normality test for multivariate dependent samples.Signal Processing, 201:108705, 2022

  6. [6]

    Joint normality test via two-dimensional projection

    Sara ElBouch, Olivier JJ Michel, and Pierre Comon. Joint normality test via two-dimensional projection. InICASSP, 2022

  7. [7]

    Frus- tratingly easy test-time adaptation of vision-language models

    Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini, and Elisa Ricci. Frus- tratingly easy test-time adaptation of vision-language models. InNeurIPS, 2024

  8. [8]

    Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

    Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In CVPRW, 2004

  9. [9]

    Diverse data augmenta- tion with diffusions for effective test-time prompt tuning

    Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmenta- tion with diffusions for effective test-time prompt tuning. InICCV, 2023

  10. [10]

    Online gaussian test-time adaptation of vision-language models.arXiv preprint arXiv:2501.04352, 2025

    Clément Fuchs, Maxime Zanella, and Christophe De Vleeschouwer. Online gaussian test-time adaptation of vision-language models.arXiv preprint arXiv:2501.04352, 2025

  11. [11]

    Clip-adapter: Better vision-language models with feature adapters.IJCV, 132(2), 2024

    Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. Clip-adapter: Better vision-language models with feature adapters.IJCV, 132(2), 2024

  12. [12]

    Dota: Distributional test-time adaptation of vision-language models.arXiv preprint arXiv:2409.19375, 2024

    Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, and Changqing Zhang. Dota: Distributional test-time adaptation of vision-language models.arXiv preprint arXiv:2409.19375, 2024

  13. [13]

    Discriminant analysis by gaussian mixtures.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):155–176, 1996

    Trevor Hastie and Robert Tibshirani. Discriminant analysis by gaussian mixtures.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):155–176, 1996

  14. [14]

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217– 2226, 2019

  15. [15]

    The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. InCVPR, 2021. 10

  16. [16]

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. InICLR, 2019

  17. [17]

    Natural Adversarial Examples

    Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural Adversarial Examples. InCVPR, 2021

  18. [18]

    A class of invariant consistent tests for multivariate normality

    Norbert Henze and Bernd Zirkler. A class of invariant consistent tests for multivariate normality. Communications in statistics-Theory and Methods, 19(10):3595–3617, 1990

  19. [19]

    Transductive inference for text classification using support vector machines

    Thorsten Joachims. Transductive inference for text classification using support vector machines. InICML, 1999

  20. [20]

    Label propagation for zero-shot classification with vision-language models

    Yannis Kalantidis, Giorgos Tolias, et al. Label propagation for zero-shot classification with vision-language models. InCVPR, 2024

  21. [21]

    Efficient test-time adaptation of vision-language models

    Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models. InCVPR, 2024

  22. [22]

    3D Object Representations for Fine-Grained Categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D Object Representations for Fine-Grained Categorization. InCVPRW, 2013

  23. [23]

    Estimation of the precision matrix of a singular wishart distribution and its application in high-dimensional data

    Tatsuya Kubokawa and Muni S Srivastava. Estimation of the precision matrix of a singular wishart distribution and its application in high-dimensional data. 99(9):1906–1928, 2008

  24. [24]

    Ra-tta: Retrieval-augmented test-time adaptation for vision-language models

    Youngjun Lee, Doyoung Kim, Junhyeok Kang, Jihwan Bang, Hwanjun Song, and Jae-Gil Lee. Ra-tta: Retrieval-augmented test-time adaptation for vision-language models. InICLR, 2025

  25. [25]

    BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. InICML, 2022

  26. [26]

    Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation

    Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation. InNeurIPS, 2021

  27. [27]

    Using discriminant analysis for multi-class classification: an experimental investigation.Knowledge and information systems, 10:453–472, 2006

    Tao Li, Shenghuo Zhu, and Mitsunori Ogihara. Using discriminant analysis for multi-class classification: an experimental investigation.Knowledge and information systems, 10:453–472, 2006

  28. [28]

    Text and image are mutually beneficial: Enhancing training-free few-shot classification with clip

    Yayuan Li, Jintao Guo, Lei Qi, Wenbin Li, and Yinghuan Shi. Text and image are mutually beneficial: Enhancing training-free few-shot classification with clip. InAAAI, 2025

  29. [29]

    Efficient and context-aware label propagation for zero-/few-shot training-free adaptation of vision-language model

    Yushu Li, Yongyi Su, Adam Goodge, Kui Jia, and Xun Xu. Efficient and context-aware label propagation for zero-/few-shot training-free adaptation of vision-language model. InICLR, 2025

  30. [30]

    Learning to propagate labels: Transductive propagation network for few-shot learning

    Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang. Learning to propagate labels: Transductive propagation network for few-shot learning. InICLR, 2019

  31. [31]

    Swapprompt: Test-time prompt adaptation for vision-language models

    Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. Swapprompt: Test-time prompt adaptation for vision-language models. InNeurIPS, 2023

  32. [32]

    Fine-Grained Visual Classification of Aircraft

    Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine- Grained Visual Classification of Aircraft.arXiv preprint arXiv:1306.5151, 2013

  33. [33]

    Test-time prompt tuning for zero-shot generalization in vision-language models

    Shu Manli, Nie Weili, Huang De-An, Yu Zhiding, Goldstein Tom, Anandkumar Anima, and Xiao Chaowei. Test-time prompt tuning for zero-shot generalization in vision-language models. InNeurIPS, 2022

  34. [34]

    Black-box test-time prompt tuning for vision-language models

    Fan’an Meng, Chaoran Cui, Hongjun Dai, and Shuai Gong. Black-box test-time prompt tuning for vision-language models. InAAAI, 2025

  35. [35]

    A random-projection based test of gaussianity for stationary processes.Computational Statistics & Data Analysis, 75:124–141, 2014

    Alicia Nieto-Reyes, Juan Antonio Cuesta-Albertos, and Fabrice Gamboa. A random-projection based test of gaussianity for stationary processes.Computational Statistics & Data Analysis, 75:124–141, 2014. 11

  36. [36]

    Automated Flower Classification over a Large Number of Classes

    Maria-Elena Nilsback and Andrew Zisserman. Automated Flower Classification over a Large Number of Classes. InICVGIP. IEEE, 2008

  37. [37]

    Cats and Dogs

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and Dogs. In CVPR, 2012

  38. [38]

    The matrix cookbook.Technical University of Denmark, 7(15):510, 2008

    Kaare Brandt Petersen, Michael Syskind Pedersen, et al. The matrix cookbook.Technical University of Denmark, 7(15):510, 2008

  39. [39]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021

  40. [40]

    Do imagenet classifiers generalize to imagenet? InICML, 2019

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InICML, 2019

  41. [41]

    An extension of shapiro and wilk’s w test for normality to large samples

    J Patrick Royston. An extension of shapiro and wilk’s w test for normality to large samples. Journal of the Royal Statistical Society: Series C (Applied Statistics), 31(2):115–124, 1982

  42. [42]

    Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization

    Jameel Hassan Abdul Samadh, Hanan Gani, Noor Hazim Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Khan, and Salman Khan. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. InNeurIPS, 2023

  43. [43]

    An analysis of variance test for normality.Biometrika, 52(3):591– 611, 1965

    S Shaphiro and MBJB Wilk. An analysis of variance test for normality.Biometrika, 52(3):591– 611, 1965

  44. [44]

    High-dimensional linear discriminant analysis classifier for spiked covariance model.Journal of Machine Learning Research, 21(112):1–24, 2020

    Houssem Sifaou, Abla Kammoun, and Mohamed-Slim Alouini. High-dimensional linear discriminant analysis classifier for spiked covariance model.Journal of Machine Learning Research, 21(112):1–24, 2020

  45. [45]

    UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

    Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild.arXiv preprint arXiv:1212.0402, 2012

  46. [46]

    Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models

    Elaine Sui, Xiaohan Wang, and Serena Yeung-Levy. Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models. InWACV. IEEE, 2025

  47. [47]

    Sus-x: Training-free name-only transfer of vision-language models

    Vishaal Udandarao, Ankush Gupta, and Samuel Albanie. Sus-x: Training-free name-only transfer of vision-language models. InICCV, 2023

  48. [48]

    Discriminative gaussian process latent variable model for classification

    Raquel Urtasun and Trevor Darrell. Discriminative gaussian process latent variable model for classification. InICML, 2007

  49. [49]

    Tent: Fully test-time adaptation by entropy minimization

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InICLR, 2021

  50. [50]

    Learning Robust Global Representations by Penalizing Local Rredictive Power

    Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning Robust Global Representations by Penalizing Local Rredictive Power. InNeurIPS, 2019

  51. [51]

    A hard-to-beat baseline for training-free clip-based adaptation

    Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, and Tieniu Tan. A hard-to-beat baseline for training-free clip-based adaptation. InICLR, 2024

  52. [52]

    Is less more? exploring token condensation as training-free adaptation for clip

    Zixin Wang, Dong Gong, Sen Wang, Zi Huang, and Yadan Luo. Is less more? exploring token condensation as training-free adaptation for clip. InICCV, 2025

  53. [53]

    Sun Database: Large-Scale Scene Recognition from Abbey to Zoo

    Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun Database: Large-Scale Scene Recognition from Abbey to Zoo. InCVPR, 2010

  54. [54]

    Dynaprompt: Dynamic test-time prompt tuning

    Zehao Xiao, Shilin Yan, Jack Hong, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiayi Shen, Qi Wang, and Cees GM Snoek. Dynaprompt: Dynamic test-time prompt tuning. InICLR, 2025

  55. [55]

    C-tpt: Calibrated test-time prompt tuning for vision-language models via text feature dispersion

    Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, and Chang D Yoo. C-tpt: Calibrated test-time prompt tuning for vision-language models via text feature dispersion. InICLR, 2024. 12

  56. [56]

    Task residual for tuning vision- language models

    Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision- language models. InCVPR, 2023

  57. [57]

    On the test-time zero-shot generalization of vision- language models: Do we really need prompt learning? InCVPR, 2024

    Maxime Zanella and Ismail Ben Ayed. On the test-time zero-shot generalization of vision- language models: Do we really need prompt learning? InCVPR, 2024

  58. [58]

    Realistic test-time adaptation of vision-language models

    Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, and Ismail Ben Ayed. Realistic test-time adaptation of vision-language models. InCVPR, 2025

  59. [59]

    Boosting vision-language models with transduction

    Maxime Zanella, Benoît Gérin, and Ismail Ayed. Boosting vision-language models with transduction. InNeurIPS, 2024

  60. [60]

    Boosting vision-language models for histopathology classification: Predict all at once

    Maxime Zanella, Fereshteh Shakeri, Yunshi Huang, Houda Bahig, and Ismail Ben Ayed. Boosting vision-language models for histopathology classification: Predict all at once. InJ. Multivar . Anal., 2024

  61. [61]

    Dual prototype evolving for test-time generalization of vision-language models

    Ce Zhang, Simon Stepputtis, Katia Sycara, and Yaqi Xie. Dual prototype evolving for test-time generalization of vision-language models. InNeurIPS, 2024

  62. [62]

    Historical test-time prompt tuning for vision foundation models

    Jingyi Zhang, Jiaxing Huang, Xiaoqin Zhang, Ling Shao, and Shijian Lu. Historical test-time prompt tuning for vision foundation models. InNeurIPS, 2024

  63. [63]

    Tip-adapter: Training-free adaption of clip for few-shot classification

    Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. Tip-adapter: Training-free adaption of clip for few-shot classification. InECCV. Springer, 2022

  64. [64]

    Boostadapter: Improving vision-language test-time adaptation via regional bootstrapping

    Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, and Shu-Tao Xia. Boostadapter: Improving vision-language test-time adaptation via regional bootstrapping. InNeurIPS, 2024

  65. [65]

    Dual memory networks: A versatile adaptation approach for vision-language models

    Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, and Lei Zhang. Dual memory networks: A versatile adaptation approach for vision-language models. InCVPR, 2024

  66. [66]

    Learning with local and global consistency

    Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. InNeurIPS, 2003

  67. [67]

    Bayesian test-time adaptation for vision-language models

    Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Xiatian Zhu, Lei Deng, Hongbin Liu, and Zhen Lei. Bayesian test-time adaptation for vision-language models. InCVPR, 2025

  68. [68]

    Not all features matter: Enhancing few-shot clip with adaptive prior refinement

    Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, and Peng Gao. Not all features matter: Enhancing few-shot clip with adaptive prior refinement. InICCV, 2023

  69. [69]

    Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting

    Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, and Hanwang Zhang. Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting. In NeurIPS, 2024

  70. [70]

    Awt: Transferring vision-language models via augmentation, weighting, and transportation

    Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, and Limin Wang. Awt: Transferring vision-language models via augmentation, weighting, and transportation. InNeurIPS, 2024

  71. [71]

    Efficient test-time prompt tuning for vision-language models.arXiv preprint arXiv:2408.05775, 2024

    Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, and Limin Wang. Efficient test-time prompt tuning for vision-language models.arXiv preprint arXiv:2408.05775, 2024

  72. [72]

    Laplacian regularized few-shot learning

    Imtiaz Ziko, Jose Dolz, Eric Granger, and Ismail Ben Ayed. Laplacian regularized few-shot learning. InICML, 2020. 13 Technical Appendices and Supplementary Material This appendix provides a detailed theoretical analysis of our method, along with additional experi- mental results. The contents are organized as follows: •Appendix A: Theoretical Analysis A.1...

  73. [73]

    provides modest improvements (e.g., +2.43% on Task 1), likely due to better- aligned class centers. However, updating only Σ (Row 2) leads to substantial per- formance drops (e.g., down to 9.58% on Task 2), indicating that estimating covari- ance from noisy test-time predictions alone is highly unstable and unreliable. The lower block (Rows 5–8) introduce...