Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
Pith reviewed 2026-05-18 21:23 UTC · model grok-4.3
The pith
ADAPT reframes test-time adaptation as closed-form Gaussian inference on online-updated class means with a shared covariance, eliminating all gradient steps and source data needs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We reframe TTA as a Gaussian probabilistic inference task by modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix. This enables closed-form, training-free inference. To correct potential likelihood bias, we introduce lightweight regularization guided by CLIP priors and a historical knowledge bank. ADAPT requires no source data, no gradient updates, and no full access to target data, supporting both online and transductive settings.
What carries the argument
Gaussian probabilistic inference that computes class likelihoods from online-updated per-class means and one shared covariance matrix estimated directly from unlabeled target features.
If this is right
- Real-time inference becomes feasible on edge devices because no gradients or iterative optimization are required.
- Both online streaming and transductive batch adaptation are supported with only partial or full unlabeled target batches.
- Calibrated predictions improve under a wide range of distribution shifts without retraining or source replay.
- Scalability increases because the method avoids storing or accessing the full source dataset.
Where Pith is reading between the lines
- The same online Gaussian update pattern could be applied to other probabilistic heads such as normalizing flows or mixture models to relax the single-covariance assumption.
- Connecting the historical knowledge bank to Bayesian updating would allow explicit uncertainty quantification over the running means.
- The CLIP prior regularization suggests a broader pattern: using large-scale vision-language models as cheap, label-free anchors during test-time distribution alignment.
Load-bearing premise
Class-conditional feature distributions in the target domain can be adequately captured by gradually updated class means and a single shared covariance matrix estimated without any labels or source data.
What would settle it
Run the method on a benchmark whose test features exhibit strong non-Gaussian structure or class-conditional covariance differences; if accuracy or calibration falls below optimization-based TTA baselines, the modeling premise fails.
Figures
read the original abstract
Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference. Despite notable advances, several challenges still limit its broader applicability. First, most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment. Second, they lack explicit modeling of class-conditional feature distributions. This modeling is crucial for producing reliable decision boundaries and calibrated predictions, but it remains underexplored due to the lack of both source data and supervision at test time. In this paper, we propose ADAPT, an Advanced Distribution-Aware and backPropagation-free Test-time adaptation method. We reframe TTA as a Gaussian probabilistic inference task by modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix. This enables closed-form, training-free inference. To correct potential likelihood bias, we introduce lightweight regularization guided by CLIP priors and a historical knowledge bank. ADAPT requires no source data, no gradient updates, and no full access to target data, supporting both online and transductive settings. Extensive experiments across diverse benchmarks demonstrate that our method achieves state-of-the-art performance under a wide range of distribution shifts with superior scalability and robustness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ADAPT, a backpropagation-free test-time adaptation method for improving zero-shot robustness under distribution shifts. It reframes TTA as closed-form Gaussian probabilistic inference by modeling class-conditional likelihoods with gradually updated class means and a single shared covariance matrix, both estimated from unlabeled target data without source data or gradients. Lightweight regularization via CLIP priors and a historical knowledge bank is introduced to correct likelihood bias. The method supports online and transductive settings and claims state-of-the-art performance with superior scalability across diverse benchmarks.
Significance. If the central claims hold, ADAPT would represent a meaningful advance in efficient TTA by eliminating optimization and backpropagation while providing explicit probabilistic modeling of class-conditional distributions. This could enable real-time deployment in resource-limited settings and improve calibration under shifts, provided the unsupervised mean updates and shared-covariance assumption prove robust.
major comments (3)
- [§3] §3 (Method), around the class-mean update rule: the unsupervised assignment of samples to classes for updating means relies on the model's own (potentially biased) predictions under distribution shift. This creates a risk of error accumulation that directly undermines the robustness and SOTA claims, yet no analysis or mitigation beyond the CLIP regularization is detailed to demonstrate stability from unreliable initial predictions.
- [Abstract and §4] Abstract and §4 (Experiments): the SOTA performance claim is asserted without reported error bars, ablation studies on the shared covariance assumption, or comparisons isolating the effect of the historical knowledge bank. This is load-bearing because the central contribution is the closed-form Gaussian inference under the stated assumptions.
- [§3.2] §3.2, covariance estimation: the single shared covariance matrix is estimated without labels or source data, but the paper does not address how class-specific scale differences (often amplified by shifts) are handled or why this does not degrade decision boundaries relative to per-class covariances.
minor comments (2)
- [§3] Notation for the Gaussian parameters (means and covariance) should be introduced with explicit equations early in the method section to clarify the closed-form inference steps.
- [§4] Figure captions and experimental tables would benefit from clearer indication of online vs. transductive settings and the exact benchmarks used for each result.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications on our approach and indicate planned revisions to strengthen the presentation and analysis.
read point-by-point responses
-
Referee: [§3] §3 (Method), around the class-mean update rule: the unsupervised assignment of samples to classes for updating means relies on the model's own (potentially biased) predictions under distribution shift. This creates a risk of error accumulation that directly undermines the robustness and SOTA claims, yet no analysis or mitigation beyond the CLIP regularization is detailed to demonstrate stability from unreliable initial predictions.
Authors: We agree that relying on the model's initial predictions for unsupervised class-mean updates introduces a risk of error accumulation under distribution shift. Our design mitigates this through gradual momentum-based updates, CLIP priors that serve as an external anchor to correct biased likelihoods, and the historical knowledge bank that accumulates and reuses more reliable statistics over time. These components are intended to limit drift even from imperfect early assignments. To make this robustness explicit, we will add a dedicated stability analysis in the revised manuscript, including plots of prediction consistency across update steps and sensitivity experiments under varying initial conditions. revision: yes
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the SOTA performance claim is asserted without reported error bars, ablation studies on the shared covariance assumption, or comparisons isolating the effect of the historical knowledge bank. This is load-bearing because the central contribution is the closed-form Gaussian inference under the stated assumptions.
Authors: We acknowledge that stronger statistical reporting and component-wise ablations would better substantiate the SOTA claims and the contribution of the closed-form Gaussian inference. In the revised version we will report all main results with error bars computed over multiple random seeds. We will also add an ablation comparing the shared covariance to alternatives (such as diagonal or limited per-class estimates) and a controlled study isolating the historical knowledge bank by removing or varying its contribution. These additions will directly address the load-bearing nature of the assumptions. revision: yes
-
Referee: [§3.2] §3.2, covariance estimation: the single shared covariance matrix is estimated without labels or source data, but the paper does not address how class-specific scale differences (often amplified by shifts) are handled or why this does not degrade decision boundaries relative to per-class covariances.
Authors: The shared covariance is chosen because, in the TTA regime, the number of samples per class is typically too small for stable per-class covariance estimation, which would lead to noisy or singular matrices. Pooling across classes provides a more reliable estimate of the overall feature distribution while the class means are updated individually. Although class-specific scales can differ under shifts, the combination of mean alignment and the shared covariance still produces effective probabilistic decision boundaries, as shown by our consistent outperformance of baselines. We will expand §3.2 with an explicit discussion of this design choice, its limitations, and supporting empirical evidence. revision: yes
Circularity Check
Derivation chain is self-contained; no reductions to inputs by construction
full rationale
The paper reframes TTA as closed-form Gaussian probabilistic inference using gradually updated class means and a shared covariance, with lightweight regularization from external CLIP priors and a historical bank. These steps rely on explicit modeling assumptions and iterative updates from unlabeled target data rather than fitting parameters to a subset and renaming the output as a prediction. No self-citations, uniqueness theorems, or ansatz smuggling are invoked to justify core choices. The derivation does not reduce to its inputs by definition; the probabilistic alignment produces new decision boundaries from the estimated distributions. This is the common honest outcome for a method whose central claim rests on modeling choices that remain falsifiable against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Class-conditional distributions in the target domain are approximately Gaussian and can be tracked via running means and a shared covariance without labels.
Forward citations
Cited by 1 Pith paper
-
Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration
A probabilistic Gaussian model with adaptive contrastive asymmetry rectification improves multi-modal test-time adaptation by modeling category distributions and correcting modality asymmetry for better predictions un...
Reference graph
Works this paper leans on
-
[1]
Food-101–Mining Discriminative Components with Random Forests
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–Mining Discriminative Components with Random Forests. InECCV, 2014
work page 2014
-
[2]
Information maximization for few-shot learning
Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, José Dolz, Pablo Piantanida, and Ismail Ben Ayed. Information maximization for few-shot learning. InNeurIPS, 2020
work page 2020
-
[3]
Describing Textures in the Wild
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing Textures in the Wild. InCVPR, 2014
work page 2014
-
[4]
Imagenet: A Large-Scale Hierarchical Image Database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A Large-Scale Hierarchical Image Database. InCVPR, 2009
work page 2009
-
[5]
A normality test for multivariate dependent samples.Signal Processing, 201:108705, 2022
Sara El Bouch, Olivier Michel, and Pierre Comon. A normality test for multivariate dependent samples.Signal Processing, 201:108705, 2022
work page 2022
-
[6]
Joint normality test via two-dimensional projection
Sara ElBouch, Olivier JJ Michel, and Pierre Comon. Joint normality test via two-dimensional projection. InICASSP, 2022
work page 2022
-
[7]
Frus- tratingly easy test-time adaptation of vision-language models
Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini, and Elisa Ricci. Frus- tratingly easy test-time adaptation of vision-language models. InNeurIPS, 2024
work page 2024
-
[8]
Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In CVPRW, 2004
work page 2004
-
[9]
Diverse data augmenta- tion with diffusions for effective test-time prompt tuning
Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmenta- tion with diffusions for effective test-time prompt tuning. InICCV, 2023
work page 2023
-
[10]
Online gaussian test-time adaptation of vision-language models.arXiv preprint arXiv:2501.04352, 2025
Clément Fuchs, Maxime Zanella, and Christophe De Vleeschouwer. Online gaussian test-time adaptation of vision-language models.arXiv preprint arXiv:2501.04352, 2025
-
[11]
Clip-adapter: Better vision-language models with feature adapters.IJCV, 132(2), 2024
Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. Clip-adapter: Better vision-language models with feature adapters.IJCV, 132(2), 2024
work page 2024
-
[12]
Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, and Changqing Zhang. Dota: Distributional test-time adaptation of vision-language models.arXiv preprint arXiv:2409.19375, 2024
-
[13]
Trevor Hastie and Robert Tibshirani. Discriminant analysis by gaussian mixtures.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):155–176, 1996
work page 1996
-
[14]
Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217– 2226, 2019
work page 2019
-
[15]
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. InCVPR, 2021. 10
work page 2021
-
[16]
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks and Thomas Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. InICLR, 2019
work page 2019
-
[17]
Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural Adversarial Examples. InCVPR, 2021
work page 2021
-
[18]
A class of invariant consistent tests for multivariate normality
Norbert Henze and Bernd Zirkler. A class of invariant consistent tests for multivariate normality. Communications in statistics-Theory and Methods, 19(10):3595–3617, 1990
work page 1990
-
[19]
Transductive inference for text classification using support vector machines
Thorsten Joachims. Transductive inference for text classification using support vector machines. InICML, 1999
work page 1999
-
[20]
Label propagation for zero-shot classification with vision-language models
Yannis Kalantidis, Giorgos Tolias, et al. Label propagation for zero-shot classification with vision-language models. InCVPR, 2024
work page 2024
-
[21]
Efficient test-time adaptation of vision-language models
Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models. InCVPR, 2024
work page 2024
-
[22]
3D Object Representations for Fine-Grained Categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D Object Representations for Fine-Grained Categorization. InCVPRW, 2013
work page 2013
-
[23]
Tatsuya Kubokawa and Muni S Srivastava. Estimation of the precision matrix of a singular wishart distribution and its application in high-dimensional data. 99(9):1906–1928, 2008
work page 1906
-
[24]
Ra-tta: Retrieval-augmented test-time adaptation for vision-language models
Youngjun Lee, Doyoung Kim, Junhyeok Kang, Jihwan Bang, Hwanjun Song, and Jae-Gil Lee. Ra-tta: Retrieval-augmented test-time adaptation for vision-language models. InICLR, 2025
work page 2025
-
[25]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. InICML, 2022
work page 2022
-
[26]
Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation. InNeurIPS, 2021
work page 2021
-
[27]
Tao Li, Shenghuo Zhu, and Mitsunori Ogihara. Using discriminant analysis for multi-class classification: an experimental investigation.Knowledge and information systems, 10:453–472, 2006
work page 2006
-
[28]
Text and image are mutually beneficial: Enhancing training-free few-shot classification with clip
Yayuan Li, Jintao Guo, Lei Qi, Wenbin Li, and Yinghuan Shi. Text and image are mutually beneficial: Enhancing training-free few-shot classification with clip. InAAAI, 2025
work page 2025
-
[29]
Yushu Li, Yongyi Su, Adam Goodge, Kui Jia, and Xun Xu. Efficient and context-aware label propagation for zero-/few-shot training-free adaptation of vision-language model. InICLR, 2025
work page 2025
-
[30]
Learning to propagate labels: Transductive propagation network for few-shot learning
Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang. Learning to propagate labels: Transductive propagation network for few-shot learning. InICLR, 2019
work page 2019
-
[31]
Swapprompt: Test-time prompt adaptation for vision-language models
Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. Swapprompt: Test-time prompt adaptation for vision-language models. InNeurIPS, 2023
work page 2023
-
[32]
Fine-Grained Visual Classification of Aircraft
Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine- Grained Visual Classification of Aircraft.arXiv preprint arXiv:1306.5151, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[33]
Test-time prompt tuning for zero-shot generalization in vision-language models
Shu Manli, Nie Weili, Huang De-An, Yu Zhiding, Goldstein Tom, Anandkumar Anima, and Xiao Chaowei. Test-time prompt tuning for zero-shot generalization in vision-language models. InNeurIPS, 2022
work page 2022
-
[34]
Black-box test-time prompt tuning for vision-language models
Fan’an Meng, Chaoran Cui, Hongjun Dai, and Shuai Gong. Black-box test-time prompt tuning for vision-language models. InAAAI, 2025
work page 2025
-
[35]
Alicia Nieto-Reyes, Juan Antonio Cuesta-Albertos, and Fabrice Gamboa. A random-projection based test of gaussianity for stationary processes.Computational Statistics & Data Analysis, 75:124–141, 2014. 11
work page 2014
-
[36]
Automated Flower Classification over a Large Number of Classes
Maria-Elena Nilsback and Andrew Zisserman. Automated Flower Classification over a Large Number of Classes. InICVGIP. IEEE, 2008
work page 2008
-
[37]
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and Dogs. In CVPR, 2012
work page 2012
-
[38]
The matrix cookbook.Technical University of Denmark, 7(15):510, 2008
Kaare Brandt Petersen, Michael Syskind Pedersen, et al. The matrix cookbook.Technical University of Denmark, 7(15):510, 2008
work page 2008
-
[39]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021
work page 2021
-
[40]
Do imagenet classifiers generalize to imagenet? InICML, 2019
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InICML, 2019
work page 2019
-
[41]
An extension of shapiro and wilk’s w test for normality to large samples
J Patrick Royston. An extension of shapiro and wilk’s w test for normality to large samples. Journal of the Royal Statistical Society: Series C (Applied Statistics), 31(2):115–124, 1982
work page 1982
-
[42]
Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization
Jameel Hassan Abdul Samadh, Hanan Gani, Noor Hazim Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Khan, and Salman Khan. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. InNeurIPS, 2023
work page 2023
-
[43]
An analysis of variance test for normality.Biometrika, 52(3):591– 611, 1965
S Shaphiro and MBJB Wilk. An analysis of variance test for normality.Biometrika, 52(3):591– 611, 1965
work page 1965
-
[44]
Houssem Sifaou, Abla Kammoun, and Mohamed-Slim Alouini. High-dimensional linear discriminant analysis classifier for spiked covariance model.Journal of Machine Learning Research, 21(112):1–24, 2020
work page 2020
-
[45]
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild.arXiv preprint arXiv:1212.0402, 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[46]
Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models
Elaine Sui, Xiaohan Wang, and Serena Yeung-Levy. Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models. InWACV. IEEE, 2025
work page 2025
-
[47]
Sus-x: Training-free name-only transfer of vision-language models
Vishaal Udandarao, Ankush Gupta, and Samuel Albanie. Sus-x: Training-free name-only transfer of vision-language models. InICCV, 2023
work page 2023
-
[48]
Discriminative gaussian process latent variable model for classification
Raquel Urtasun and Trevor Darrell. Discriminative gaussian process latent variable model for classification. InICML, 2007
work page 2007
-
[49]
Tent: Fully test-time adaptation by entropy minimization
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InICLR, 2021
work page 2021
-
[50]
Learning Robust Global Representations by Penalizing Local Rredictive Power
Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning Robust Global Representations by Penalizing Local Rredictive Power. InNeurIPS, 2019
work page 2019
-
[51]
A hard-to-beat baseline for training-free clip-based adaptation
Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, and Tieniu Tan. A hard-to-beat baseline for training-free clip-based adaptation. InICLR, 2024
work page 2024
-
[52]
Is less more? exploring token condensation as training-free adaptation for clip
Zixin Wang, Dong Gong, Sen Wang, Zi Huang, and Yadan Luo. Is less more? exploring token condensation as training-free adaptation for clip. InICCV, 2025
work page 2025
-
[53]
Sun Database: Large-Scale Scene Recognition from Abbey to Zoo
Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun Database: Large-Scale Scene Recognition from Abbey to Zoo. InCVPR, 2010
work page 2010
-
[54]
Dynaprompt: Dynamic test-time prompt tuning
Zehao Xiao, Shilin Yan, Jack Hong, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiayi Shen, Qi Wang, and Cees GM Snoek. Dynaprompt: Dynamic test-time prompt tuning. InICLR, 2025
work page 2025
-
[55]
C-tpt: Calibrated test-time prompt tuning for vision-language models via text feature dispersion
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, and Chang D Yoo. C-tpt: Calibrated test-time prompt tuning for vision-language models via text feature dispersion. InICLR, 2024. 12
work page 2024
-
[56]
Task residual for tuning vision- language models
Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision- language models. InCVPR, 2023
work page 2023
-
[57]
Maxime Zanella and Ismail Ben Ayed. On the test-time zero-shot generalization of vision- language models: Do we really need prompt learning? InCVPR, 2024
work page 2024
-
[58]
Realistic test-time adaptation of vision-language models
Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, and Ismail Ben Ayed. Realistic test-time adaptation of vision-language models. InCVPR, 2025
work page 2025
-
[59]
Boosting vision-language models with transduction
Maxime Zanella, Benoît Gérin, and Ismail Ayed. Boosting vision-language models with transduction. InNeurIPS, 2024
work page 2024
-
[60]
Boosting vision-language models for histopathology classification: Predict all at once
Maxime Zanella, Fereshteh Shakeri, Yunshi Huang, Houda Bahig, and Ismail Ben Ayed. Boosting vision-language models for histopathology classification: Predict all at once. InJ. Multivar . Anal., 2024
work page 2024
-
[61]
Dual prototype evolving for test-time generalization of vision-language models
Ce Zhang, Simon Stepputtis, Katia Sycara, and Yaqi Xie. Dual prototype evolving for test-time generalization of vision-language models. InNeurIPS, 2024
work page 2024
-
[62]
Historical test-time prompt tuning for vision foundation models
Jingyi Zhang, Jiaxing Huang, Xiaoqin Zhang, Ling Shao, and Shijian Lu. Historical test-time prompt tuning for vision foundation models. InNeurIPS, 2024
work page 2024
-
[63]
Tip-adapter: Training-free adaption of clip for few-shot classification
Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. Tip-adapter: Training-free adaption of clip for few-shot classification. InECCV. Springer, 2022
work page 2022
-
[64]
Boostadapter: Improving vision-language test-time adaptation via regional bootstrapping
Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, and Shu-Tao Xia. Boostadapter: Improving vision-language test-time adaptation via regional bootstrapping. InNeurIPS, 2024
work page 2024
-
[65]
Dual memory networks: A versatile adaptation approach for vision-language models
Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, and Lei Zhang. Dual memory networks: A versatile adaptation approach for vision-language models. InCVPR, 2024
work page 2024
-
[66]
Learning with local and global consistency
Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. InNeurIPS, 2003
work page 2003
-
[67]
Bayesian test-time adaptation for vision-language models
Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Xiatian Zhu, Lei Deng, Hongbin Liu, and Zhen Lei. Bayesian test-time adaptation for vision-language models. InCVPR, 2025
work page 2025
-
[68]
Not all features matter: Enhancing few-shot clip with adaptive prior refinement
Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, and Peng Gao. Not all features matter: Enhancing few-shot clip with adaptive prior refinement. InICCV, 2023
work page 2023
-
[69]
Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting
Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, and Hanwang Zhang. Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting. In NeurIPS, 2024
work page 2024
-
[70]
Awt: Transferring vision-language models via augmentation, weighting, and transportation
Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, and Limin Wang. Awt: Transferring vision-language models via augmentation, weighting, and transportation. InNeurIPS, 2024
work page 2024
-
[71]
Efficient test-time prompt tuning for vision-language models.arXiv preprint arXiv:2408.05775, 2024
Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, and Limin Wang. Efficient test-time prompt tuning for vision-language models.arXiv preprint arXiv:2408.05775, 2024
-
[72]
Laplacian regularized few-shot learning
Imtiaz Ziko, Jose Dolz, Eric Granger, and Ismail Ben Ayed. Laplacian regularized few-shot learning. InICML, 2020. 13 Technical Appendices and Supplementary Material This appendix provides a detailed theoretical analysis of our method, along with additional experi- mental results. The contents are organized as follows: •Appendix A: Theoretical Analysis A.1...
-
[73]
provides modest improvements (e.g., +2.43% on Task 1), likely due to better- aligned class centers. However, updating only Σ (Row 2) leads to substantial per- formance drops (e.g., down to 9.58% on Task 2), indicating that estimating covari- ance from noisy test-time predictions alone is highly unstable and unreliable. The lower block (Rows 5–8) introduce...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.