Dual Distribution Estimation for Zero-shot Noisy Test-Time Adaptation with VLMs
Pith reviewed 2026-06-25 21:00 UTC · model grok-4.3
The pith
Training-free Gaussian modeling of VLM features raises noisy test-time adaptation accuracy by 3.7 percent while cutting OOD detection errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DDE shifts the zero-shot NTTA paradigm from instance-level learning to training-free Gaussian distribution modeling. PFDE explicitly models class-wise inclusion and exclusion Gaussian distributions from test-batch features to formulate a calibrated contrastive score that robustly enhances ID accuracy. NLDE improves OOD identification by explicitly modeling the negative label distribution to mine highly discriminative labels and mitigate spurious correlations. On the large-scale ImageNet benchmark this yields a 3.70 percent improvement in harmonic mean accuracy and a 6.20 percent reduction in FPR95 for OOD detection while ensuring highly scalable and efficient online inference.
What carries the argument
Dual Distribution Estimation (DDE) via Positive Feature Distribution Estimation (PFDE) and Negative Label Distribution Estimation (NLDE), which fit class-wise Gaussians to frozen VLM test-batch features to produce contrastive inclusion/exclusion scores and mined negative labels.
If this is right
- Enables highly scalable and efficient online inference without retraining or post-hoc tuning.
- Maintains robustness in data-scarce scenarios while remaining zero-shot.
- Simultaneously improves in-distribution classification accuracy and out-of-distribution detection.
- Avoids overconfident misclassifications that arise from test-time discriminative training.
Where Pith is reading between the lines
- If the Gaussian modeling holds, the same dual-estimation structure could replace instance-level training loops in other online adaptation settings.
- A testable extension is to measure how performance changes when test-batch size shrinks below the point where reliable Gaussian fits become possible.
- The approach implies that single-batch feature statistics alone suffice for reliable ID/OOD separation, which would simplify deployment in streaming environments where labeled data never arrives.
- If the contrastive scores prove stable across domains, similar distribution estimation could be explored for non-vision modalities that also rely on frozen encoders.
Load-bearing premise
Test-batch features from a frozen VLM can be reliably modeled as class-wise Gaussians whose inclusion/exclusion contrastive scores separate ID from OOD samples without any labeled supervision or post-hoc tuning.
What would settle it
Apply DDE to ImageNet with controlled mixtures of ID and OOD samples and check whether harmonic-mean accuracy gains of 3.70 percent and FPR95 reductions of 6.20 percent disappear relative to prior zero-shot NTTA baselines.
read the original abstract
While test-time adaptation (TTA) empowers vision-language models to adapt without costly retraining, it remains highly vulnerable to out-of-distribution (OOD) outliers prevalent in real-world applications. This discrepancy motivates Noisy TTA (NTTA), an online task to filter noisy OOD samples on the fly while maximizing in-distribution (ID) classification accuracy. Existing zero-shot NTTA approaches typically rely on test-time discriminative training, leading to overconfident misclassifications and significantly degraded inference efficiency. To address these limitations, we propose a novel framework named Dual Distribution Estimation (DDE), shifting the zero-shot NTTA paradigm from instance-level learning to training-free Gaussian distribution modeling. DDE incorporates two novel modules: Positive Feature Distribution Estimation (PFDE) and Negative Label Distribution Estimation (NLDE). PFDE explicitly models class-wise inclusion and exclusion Gaussian distributions to formulate a calibrated contrastive score, robustly enhancing ID accuracy. In parallel, NLDE improves OOD identification by explicitly modeling the negative label distribution to mine highly discriminative labels, effectively mitigating spurious correlations. Extensive experiments show that on the large-scale ImageNet benchmark, DDE achieves an improvement of 3.70\% in harmonic mean accuracy and reduces the FPR95 for OOD detection by 6.20\%, while ensuring highly scalable and efficient online inference. Furthermore, DDE is zero-shot and training-free, demonstrating remarkable robustness in data-scarce scenarios. Codes are available at https://github.com/ZhuWenjie98/DDE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dual Distribution Estimation (DDE) for zero-shot noisy test-time adaptation (NTTA) with vision-language models. It introduces Positive Feature Distribution Estimation (PFDE) to explicitly model class-wise inclusion/exclusion Gaussian distributions from frozen VLM test-batch features for a calibrated contrastive score, and Negative Label Distribution Estimation (NLDE) to model negative label distributions for mining discriminative labels and mitigating spurious correlations. The central empirical claim is a 3.70% gain in harmonic mean accuracy and 6.20% reduction in FPR95 on ImageNet, with the method positioned as training-free, post-hoc-tuning-free, and scalable for online inference.
Significance. If the Gaussian modeling assumptions hold without supervision, the shift from instance-level discriminative training to explicit dual distribution estimation could meaningfully improve robustness and efficiency in real-world NTTA settings, particularly for data-scarce or OOD-contaminated batches. The training-free nature and reported scalability are notable strengths if supported by reproducible code and ablations.
major comments (2)
- The PFDE module's reliance on modeling test-batch features as class-wise Gaussians (whose inclusion/exclusion contrastive scores drive ID accuracy) rests on unsupervised assignment of samples to classes; this creates a potential feedback loop when initial zero-shot logits misassign OOD-contaminated samples, directly undermining the claimed 3.70% harmonic mean gain. No derivation details, sensitivity analysis, or robustness checks against this circularity are evident from the abstract.
- The abstract reports quantitative gains (3.70% harmonic mean, 6.20% FPR95) but provides no error bars, statistical significance tests, ablation evidence on PFDE/NLDE components, or full experimental protocol, making it impossible to verify whether the improvements are load-bearing or sensitive to post-hoc choices.
minor comments (1)
- The abstract should explicitly name the VLM backbone, test-batch sizes, and any clustering or logit-thresholding steps used for initial class assignment in PFDE/NLDE.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive feedback on our work. We address each major comment below with clarifications from the manuscript and commitments to revisions where appropriate.
read point-by-point responses
-
Referee: The PFDE module's reliance on modeling test-batch features as class-wise Gaussians (whose inclusion/exclusion contrastive scores drive ID accuracy) rests on unsupervised assignment of samples to classes; this creates a potential feedback loop when initial zero-shot logits misassign OOD-contaminated samples, directly undermining the claimed 3.70% harmonic mean gain. No derivation details, sensitivity analysis, or robustness checks against this circularity are evident from the abstract.
Authors: The manuscript details in Section 3.2 that PFDE fits class-wise Gaussians directly on the frozen VLM features of the test batch after initial zero-shot logit-based pseudo-labeling, then derives a calibrated contrastive score from the inclusion/exclusion distributions. This is not a closed feedback loop because the Gaussian parameters are estimated once per batch in a training-free manner and the contrastive score explicitly down-weights outliers via the exclusion component; NLDE further mitigates spurious assignments by mining discriminative negative labels. Derivation of the calibrated score appears in Equations (3)–(5). We agree that sensitivity analysis to initial assignment errors is not present and will add it (varying pseudo-label noise levels on ImageNet) in the revision. revision: partial
-
Referee: The abstract reports quantitative gains (3.70% harmonic mean, 6.20% FPR95) but provides no error bars, statistical significance tests, ablation evidence on PFDE/NLDE components, or full experimental protocol, making it impossible to verify whether the improvements are load-bearing or sensitive to post-hoc choices.
Authors: The full manuscript provides the experimental protocol in Section 4.1, component ablations in Section 4.3 (showing PFDE and NLDE each contribute to the harmonic-mean gain), and results on multiple datasets beyond the abstract. However, the abstract itself omits error bars and significance tests. We will add per-run standard deviations, paired t-test results, and expanded ablation tables to both the abstract and main results in the revised version. revision: yes
Circularity Check
No significant circularity; derivation is self-contained distribution estimation
full rationale
The paper's core contribution is explicit Gaussian modeling (PFDE for class-wise inclusion/exclusion and NLDE for negative labels) directly from unlabeled test-batch VLM features to produce contrastive scores. This is the method itself rather than any prediction or result that reduces by construction to fitted inputs or prior self-citations. No equations or steps in the abstract or description exhibit self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations. The approach is presented as training-free and zero-shot, with performance claims tied to empirical benchmarks rather than internal tautologies. This matches the default case of an honest non-finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Test-batch features from a frozen VLM follow class-conditional Gaussian distributions that can be estimated without labels.
- domain assumption Negative label distributions mined from the test batch reduce spurious correlations for OOD detection.
invented entities (2)
-
Positive Feature Distribution Estimation (PFDE)
no independent evidence
-
Negative Label Distribution Estimation (NLDE)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Noisy test-time adaptation in vision-language models.arXiv preprint arXiv:2502.14604, 2025
Chentao Cao, Zhun Zhong, Zhanke Zhou, Tongliang Liu, Yang Liu, Kun Zhang, and Bo Han. Noisy test-time adaptation in vision-language models.arXiv preprint arXiv:2502.14604, 2025
arXiv 2025
-
[2]
Dual memory networks: A versatile adaptation approach for vision-language models
Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, and Lei Zhang. Dual memory networks: A versatile adaptation approach for vision-language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 28718–28728, 2024
2024
-
[3]
Efficienttest-timeadaptationof vision-language models
AdilbekKarmanov,DayanGuan,ShijianLu,AbdulmotalebElSaddik,andEricXing. Efficienttest-timeadaptationof vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14162–14171, 2024
2024
-
[4]
Multi-cache enhanced prototype learning for test-time generalization of vision-language models
Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, and Ruirui Li. Multi-cache enhanced prototype learning for test-time generalization of vision-language models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2281–2291, 2025
2025
-
[5]
Dual prototype evolving for test-time generalization of vision-language models.Advances in Neural Information Processing Systems, 37:32111–32136, 2024
Ce Zhang, Simon Stepputtis, Katia Sycara, and Yaqi Xie. Dual prototype evolving for test-time generalization of vision-language models.Advances in Neural Information Processing Systems, 37:32111–32136, 2024
2024
-
[6]
Adaneg: Adaptive negative proxy guided ood detection with vision-language models
Yabin Zhang and Lei Zhang. Adaneg: Adaptive negative proxy guided ood detection with vision-language models. Advances in Neural Information Processing Systems, 37:38744–38768, 2024
2024
-
[7]
Oodd: Test-time out-of-distribution detection with dynamic dictionary
Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, and Nanyang Ye. Oodd: Test-time out-of-distribution detection with dynamic dictionary. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30630–30639, 2025
2025
-
[8]
On the robustness of open-world test-time training: Self-training with dynamic prototype expansion
Yushu Li, Xun Xu, Yongyi Su, and Kui Jia. On the robustness of open-world test-time training: Self-training with dynamic prototype expansion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11836–11846, 2023
2023
-
[9]
Tent: Fullytest-timeadaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020
DequanWang,EvanShelhamer,ShaotengLiu,BrunoOlshausen,andTrevorDarrell. Tent: Fullytest-timeadaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020
Pith/arXiv arXiv 2006
-
[10]
Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023
Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023
arXiv 2023
-
[11]
On the efficacy of small self-supervised contrastive models without distillation signals
Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li, Yandong Guo, and Yueting Zhuang. On the efficacy of small self-supervised contrastive models without distillation signals. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2225–2234, 2022
2022
-
[12]
Topocl: Topological contrastive learning for medical imaging
Guangyu Meng, Pengfei Gu, Peixian Liang, John P Lalor, Erin Wolf Chambers, and Danny Z Chen. Topocl: Topological contrastive learning for medical imaging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 42681–42690, 2026
2026
-
[13]
Instruct where the model fails: Generative data augmentation via guided self-contrastive fine-tuning
Weijian Ma, Ruoxin Chen, Keyue Zhang, Shuang Wu, and Shouhong Ding. Instruct where the model fails: Generative data augmentation via guided self-contrastive fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5991–5999, 2025
2025
-
[14]
The norm must go on: Dynamic unsupervised domain adaptation by normalization
M Jehanzeb Mirza, Jakub Micorek, Horst Possegger, and Horst Bischof. The norm must go on: Dynamic unsupervised domain adaptation by normalization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14765–14775, 2022
2022
-
[15]
Delta: degradation-free fully test-time adaptation.arXiv preprint arXiv:2301.13018, 2023
Bowen Zhao, Chen Chen, and Shu-Tao Xia. Delta: degradation-free fully test-time adaptation.arXiv preprint arXiv:2301.13018, 2023
arXiv 2023
-
[16]
Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models
Elaine Sui, Xiaohan Wang, and Serena Yeung-Levy. Just shift it: Test-time prototype shifting for zero-shot generalization with vision-language models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 825–835. IEEE, 2025
2025
-
[17]
Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting.Advances in Neural Information Processing Systems, 37:2001–2025, 2024
Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, and Hanwang Zhang. Enhancing zero-shot vision models by label-free prompt distribution learning and bias correcting.Advances in Neural Information Processing Systems, 37:2001–2025, 2024
2001
-
[18]
Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, and Changqing Zhang. Dota: Distributional test-time adaptation of vision-language models.arXiv preprint arXiv:2409.19375, 2024. Visual Computing Lab·The Hong Kong Polytechnic University 11 / 17
arXiv 2024
-
[19]
Marc Lafon, Gustavo Adolfo Vargas Hakim, Clément Rambour, Christian Desrosier, and Nicolas Thome. Cliptta: Robust contrastive vision-language test-time adaptation.arXiv preprint arXiv:2507.14312, 2025
arXiv 2025
-
[20]
Bayesian test-time adaptation for vision-language models
Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Xiatian Zhu, Lei Deng, Hongbin Liu, and Zhen Lei. Bayesian test-time adaptation for vision-language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29999–30009, 2025
2025
-
[21]
Wei Luo, Yangfan Ou, Jin Deng, Zeshuai Deng, Xiquan Yan, Zhiquan Wen, and Mingkui Tan. Protodcs: Towards robust and efficient open-set test-time adaptation for vision-language models.arXiv preprint arXiv:2602.23653, 2026
arXiv 2026
-
[22]
Model-free test time adaptation for out-of-distribution detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
YiFan Zhang, Xue Wang, Tian Zhou, Kun Yuan, Zhang Zhang, Liang Wang, and Rong Jin. Model-free test time adaptation for out-of-distribution detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[23]
Re-calibrating feature attributions for model interpretation
Peiyu Yang, Naveed Akhtar, Zeyi Wen, Mubarak Shah, and Ajmal Saeed Mian. Re-calibrating feature attributions for model interpretation. InInternational Conference on Learning Representations, 2023
2023
-
[24]
Peiyu Yang, Naveed Akhtar, Jiantong Jiang, and Ajmal Mian. Backdoor-based explainable ai benchmark for high fidelity evaluation of attribution methods.arXiv preprint arXiv:2405.02344, 2024
arXiv 2024
-
[25]
Out-of-distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025
Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of-distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025
2025
-
[26]
Beyond the static world: Continual category discovery under visual drift
Wei Feng, Yiwen Jiang, Sijin Zhou, and Zongyuan Ge. Beyond the static world: Continual category discovery under visual drift. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25032–25042, 2026
2026
-
[27]
Seeing through the shift: Causality-inspired robust generalized category discovery
Wei Feng, Yiwen Jiang, Sijin Zhou, Zhuang Qi, Zhongxing Xu, Zhonghua Wang, Feilong Tang, and Zongyuan Ge. Seeing through the shift: Causality-inspired robust generalized category discovery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17766–17775, 2026
2026
-
[28]
Generalized category discovery under domain shift: A frequency domain perspective
Wei Feng and Zongyuan Ge. Generalized category discovery under domain shift: A frequency domain perspective. Advances in Neural Information Processing Systems, 38:111721–111749, 2026
2026
-
[29]
Physics-informed unsupervised domain adaptation framework for cross-machine bearing fault diagnosis.Advanced Engineering Informatics, 62:102774, 2024
Jia Ning, Weiguo Huang, Chuancang Ding, Jun Wang, and Zhongkui Zhu. Physics-informed unsupervised domain adaptation framework for cross-machine bearing fault diagnosis.Advanced Engineering Informatics, 62:102774, 2024
2024
-
[30]
A physics-guided memory enhancement and causality-inspired generalization framework for continual fault diagnosis
Jia Ning, Weiguo Huang, Panpan Guo, Chuancang Ding, Yifan Huangfu, Changqing Shen, and Zhongkui Zhu. A physics-guided memory enhancement and causality-inspired generalization framework for continual fault diagnosis. Knowledge-Based Systems, 325:114044, 2025. Corresponding author: Weiguo Huang
2025
-
[31]
Cinematte: Background matting for virtual production and beyond
Yuanjian He, Chen Zhang, Fasheng Chen, and Jiangbo Cao. Cinematte: Background matting for virtual production and beyond. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8725–8735, 2026
2026
-
[32]
Advancing ultrasoundmedicalcontinuouslearningwithtask-specificgeneralizationandadaptability
Chunzheng Zhu, Jianxin Lin, Guanghua Tan, Ningbo Zhu, Kenli Li, Chunlian Wang, and Shengli Li. Advancing ultrasoundmedicalcontinuouslearningwithtask-specificgeneralizationandadaptability. In2024IEEEInternational Conference on Bioinformatics and Biomedicine (BIBM), pages 3019–3025. IEEE, 2024
2024
-
[33]
Medeyes: Learning dynamic visual focus for medical progressive diagnosis
Chunzheng Zhu, Yangfang Lin, Shen Chen, Yijun Wang, and Jianxin Lin. Medeyes: Learning dynamic visual focus for medical progressive diagnosis. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 13916–13924, 2026
2026
-
[34]
Yicheng Tao, Yiqun Wang, Xiangchen Song, Xin Luo, Kai Liu, and Jie Liu. Grasp: Plan-guided graph retrieval with adaptive fusion and reranking on semi-structured knowledge bases.arXiv preprint arXiv:2605.30237, 2026
Pith/arXiv arXiv 2026
-
[35]
RongFu,WeiZhiTang,ZimingWang,JiaYeeTan,ZijianZhang,ZhaoluKang,MugeQi,ShuningZhang,andSimon Fong. Modalimmune: Immunity driven unlearning via self destructive training.arXiv preprint arXiv:2602.16197, 2026
Pith/arXiv arXiv 2026
-
[36]
Sppo: Sequence-level ppo for long-horizon reasoning tasks, 2026
Tianyi Wang, Yixia Li, Long Li, Yibiao Chen, Shaohan Huang, Yun Chen, Peng Li, Yang Liu, and Guanhua Chen. Sppo: Sequence-level ppo for long-horizon reasoning tasks, 2026. URL https://arxiv.org/abs/2604.08865
Pith/arXiv arXiv 2026
-
[37]
Regulating model reliance on non-robust features by smoothing input marginal density
Peiyu Yang, Naveed Akhtar, Mubarak Shah, and Ajmal Mian. Regulating model reliance on non-robust features by smoothing input marginal density. InEuropean Conference on Computer Vision, pages 329–347. Springer, 2024. Visual Computing Lab·The Hong Kong Polytechnic University 12 / 17
2024
-
[38]
Lapt: Label-driven automated prompt tuning for ood detection with vision-language models
Yabin Zhang, Wenjie Zhu, Chenhang He, and Lei Zhang. Lapt: Label-driven automated prompt tuning for ood detection with vision-language models. InEuropean conference on computer vision, pages 271–288. Springer, 2024
2024
-
[39]
Knowledge regularized negative feature tuning of vision-language models for out-of-distribution detection
Wenjie Zhu, Yabin Zhang, Xin Jin, Wenjun Zeng, and Lei Zhang. Knowledge regularized negative feature tuning of vision-language models for out-of-distribution detection. InProceedings of the 33rd ACM International Conference on Multimedia, pages 3565–3574, 2025
2025
-
[40]
Ants: Adaptive negative textual space shaping for ood detection via test-time mllm understanding and reasoning
Wenjie Zhu, Yabin Zhang, Xin Jin, Wenjun Zeng, and Lei Zhang. Ants: Adaptive negative textual space shaping for ood detection via test-time mllm understanding and reasoning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20–30, 2026
2026
-
[41]
Yabin Zhang, Maya Varma, Yunhe Gao, Jean-Benoit Delbrouck, Jiaming Liu, Chong Wang, and Curtis Langlotz. Activationmatters: Test-timeactivatednegativelabelsforooddetectionwithvision-languagemodels.arXivpreprint arXiv:2603.25250, 2026
arXiv 2026
-
[42]
Hao Tang, Yu Liu, Shuanglin Yan, Fei Shen, Shengfeng He, and Jing Qin. Cross-modal proxy evolving for ood detection with vision-language models.arXiv preprint arXiv:2601.08476, 2026
arXiv 2026
-
[43]
Jiajun Guo, Xin Luo, Jiayin Zheng, Yiqun Wang, Kai-Wei Chang, Wei Wang, and Jie Liu. Quantized-tinyllava: a new multimodal foundation model enables efficient split learning.arXiv preprint arXiv:2511.23402, 2025
arXiv 2025
-
[44]
Shuang Zeng, Xinyuan Chang, Mengwei Xie, Xinran Liu, Yifan Bai, Zheng Pan, Mu Xu, and Xing Wei. Future- sightdrive: Thinking visually with spatio-temporal cot for autonomous driving.arXiv preprint arXiv:2505.17685, 2025
Pith/arXiv arXiv 2025
-
[45]
Reversible primitive– composition alignment for continual vision–language learning
Canran Xiao, Tianxiang Xu, Siyuan Ma, Yiyang Jiang, Haoyu Gao, and Yuhan Wu. Reversible primitive– composition alignment for continual vision–language learning. InThe Fourteenth International Conference on Learning Representations, 2026
2026
-
[46]
Pi-cca: Prompt-invariant cca certificates for replay-free continual multimodal learning
Jiayu Zhang, Chuangxin Zhao, Canran Xiao, Ruibo Duan, Wenyi Mo, Haoyu Gao, and Wenshuo Wang. Pi-cca: Prompt-invariant cca certificates for replay-free continual multimodal learning. InThe Fourteenth International Conference on Learning Representations, 2026
2026
-
[47]
WeijianMa,ShizhaoSun,TianyuYu,RuiyuWang,Tat-SengChua,andJiangBian. Thinkingwithblueprints: Assist- ingvision-languagemodelsinspatialreasoningviastructuredobjectrepresentation.arXivpreprintarXiv:2601.01984, 2026
arXiv 2026
-
[48]
YuqiLi,JunhaoDong, ChuanguangYang, ShipingWen,PiotrKoniusz, TingwenHuang,YingliTian, andYew-Soon Ong. Mmt-ard: Multimodal multi-teacher adversarial distillation for robust vision-language models.arXiv preprint arXiv:2511.17448, 2025
arXiv 2025
-
[49]
Yuqi Li, Chuanguang Yang, Junhao Dong, Zhengtao Yao, Haoyan Xu, Zeyu Dong, Hansheng Zeng, Zhulin An, and Yingli Tian. Ammkd: Adaptive multimodal multi-teacher distillation for lightweight vision-language models.arXiv preprint arXiv:2509.00039, 2025
arXiv 2025
-
[50]
Discriminant analysis by gaussian mixtures.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):155–176, 1996
Trevor Hastie and Robert Tibshirani. Discriminant analysis by gaussian mixtures.Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):155–176, 1996
1996
-
[51]
Delving into out-of-distribution detection with vision-language representations.Advances in neural information processing systems, 35:35087–35102, 2022
Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, and Yixuan Li. Delving into out-of-distribution detection with vision-language representations.Advances in neural information processing systems, 35:35087–35102, 2022
2022
-
[52]
Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Negative label guided ood detection with pretrained vision-language models.arXiv preprint arXiv:2403.20078, 2024
arXiv 2024
-
[53]
Springer, 2006
Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006
2006
-
[54]
Regularized discriminant analysis.Journal of the American statistical association, 84(405): 165–175, 1989
Jerome H Friedman. Regularized discriminant analysis.Journal of the American statistical association, 84(405): 165–175, 1989
1989
-
[55]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
2021
-
[56]
Test-time prompt tuning for zero-shot generalization in vision-language models.Advances in Neural Information Processing Systems, 35:14274–14289, 2022
ManliShu,WeiliNie,De-AnHuang,ZhidingYu,TomGoldstein,AnimaAnandkumar,andChaoweiXiao. Test-time prompt tuning for zero-shot generalization in vision-language models.Advances in Neural Information Processing Systems, 35:14274–14289, 2022. Visual Computing Lab·The Hong Kong Polytechnic University 13 / 17
2022
-
[57]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
2009
-
[58]
Learning robust global representations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019
Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019
2019
-
[59]
Natural adversarial examples
Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15262–15271, 2021
2021
-
[60]
Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400. PMLR, 2019
2019
-
[61]
The many faces of robustness: A critical analysis of out-of-distribution generalization
Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021
2021
-
[62]
Thecaltech-ucsdbirds-200-2011 dataset
CatherineWah,SteveBranson,PeterWelinder,PietroPerona,SergeBelongie,etal. Thecaltech-ucsdbirds-200-2011 dataset. Technical report
2011
-
[63]
3d object representations for fine-grained categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013
2013
-
[64]
Food-101–miningdiscriminativecomponentswithrandom forests
LukasBossard,MatthieuGuillaumin,andLucVanGool. Food-101–miningdiscriminativecomponentswithrandom forests. InEuropean conference on computer vision, pages 446–461. Springer, 2014
2014
-
[65]
Cats and dogs
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012
2012
-
[66]
The inaturalist species classification and detection dataset
GrantVanHorn,OisinMacAodha,YangSong,YinCui,ChenSun,AlexShepard,HartwigAdam,PietroPerona,and Serge Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018
2018
-
[67]
Sun database: Large-scale scene recognition from abbey to zoo
Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010
2010
-
[68]
Describing textures in the wild
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014
2014
-
[69]
bell-shaped
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017. Visual Computing Lab·The Hong Kong Polytechnic University 14 / 17 Supplementary Material Table S1.Hyper-parameter settings for the pro...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.