EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation
Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3
The pith
EGA adapter uses self-limiting updates to refine seen classes without harming unseen ones in frozen-encoder vector search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EGA is a residual adapter whose three coupled principles induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct, leaving unseen-class regions largely untouched while enabling full-capacity refinement of seen classes.
What carries the argument
The self-limiting dynamic produced by coupling zero initialization, local triplet loss, and hypersphere projection in Euclidean Geodesic Alignment, which drives gradient sparsity on satisfied triplets.
If this is right
- Worst-case label precision is highest on four of five primary OOD splits and improves on the fifth.
- At convergence 96.5 percent of triplets produce no gradient, so unseen-class regions remain largely untouched.
- The same bounded-degradation pattern holds when stronger backbones replace the original CLIP encoder.
- Gradient sparsity supplies an analytical justification for bounded OOD perturbation.
Where Pith is reading between the lines
- Local triplet objectives may be systematically preferable to global contrastive losses whenever preservation of certain geometric regions is required.
- The same self-limiting mechanism could be ported to text or multimodal encoders facing class-shift at deployment.
- Production vector-search pipelines could adopt EGA to reduce the risk of silent accuracy drops on novel queries.
- The gradient-sparsity statistic itself could serve as a lightweight monitor for OOD stability during adapter training.
Load-bearing premise
The three design principles together create a self-limiting update rule that automatically halts where local geometry is already correct and that this rule generalizes across OOD benchmarks and stronger backbones.
What would settle it
A new OOD benchmark in which EGA's worst-case label precision falls below the frozen baseline, or in which fewer than 90 percent of triplets are gradient-free at convergence, would falsify the central claim.
Figures
read the original abstract
Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local triplet loss, and hypersphere projection. These collectively induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct. Our experiments show that at convergence $96.5\%$ of triplets are gradient-free, leaving unseen-class regions largely untouched while still enabling full-capacity refinement of seen classes. Across five diverse out-of-distribution (OOD) benchmarks, EGA achieves the highest worst-case Label Precision on the four primary splits and a consistent improvement on the fifth. The design also transfers to stronger backbones in addition to CLIP, and we provide an analytical justification linking gradient sparsity to bounded OOD perturbation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Euclidean Geodesic Alignment (EGA), a residual adapter for frozen vision encoders that combines zero initialization, local triplet loss, and hypersphere projection. This design is claimed to induce a self-limiting dynamic in which triplets satisfying a small margin produce no gradients, resulting in 96.5% gradient-free triplets at convergence and thereby bounding out-of-distribution degradation on unseen classes. Experiments across five OOD benchmarks report that EGA attains the highest worst-case Label Precision on four primary splits (with consistent improvement on the fifth), transfers to stronger backbones, and is supported by an analytical justification linking gradient sparsity to bounded OOD perturbation.
Significance. If the analytical justification and empirical robustness hold, the result would be significant for practical vector-search systems that must adapt frozen encoders without access to deployment-time classes. The explicit attempt to derive a bound from the self-limiting property, the high reported gradient sparsity, and the transfer experiments constitute concrete strengths that distinguish the work from standard adapter training.
major comments (3)
- [Section 3.3 (analytical justification)] The central analytical justification (Section 3.3) links gradient sparsity measured on seen-class triplets to bounded perturbation of unseen-class regions, yet the adapter is a single global residual function. Because updates are driven exclusively by seen triplets, it remains to be shown that sparsity on the training distribution precludes non-local shifts in the embedding geometry of unseen classes; a concrete bound or invariance argument addressing this global coupling is required.
- [Table 2 and Section 4.2] Table 2 and the associated experimental protocol report highest worst-case Label Precision, but the 96.5% gradient-free statistic is computed only on training (seen-class) triplets. An auxiliary measurement or bound demonstrating that the same sparsity pattern holds, or at least does not increase perturbation, for held-out unseen-class samples would directly support the OOD claim.
- [Section 4.3 (ablations)] The three design principles are asserted to act collectively, yet no ablation isolates the contribution of hypersphere projection versus local triplet loss alone to the observed gradient sparsity and OOD preservation. Removing one principle at a time while keeping the others fixed would clarify whether the self-limiting dynamic is robust or depends on a specific combination.
minor comments (2)
- [Section 3.1] The definition of the triplet margin and its interaction with the zero-initialization scheme could be stated more explicitly with a single equation in the method section to avoid ambiguity when readers reproduce the gradient-free condition.
- [Figure 3] Figure 3 (embedding visualizations) would benefit from an additional panel showing the change in nearest-neighbor assignments for a few unseen-class queries before and after adaptation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and positive assessment of the work's potential significance. We address each major comment below and describe the revisions we will incorporate.
read point-by-point responses
-
Referee: [Section 3.3 (analytical justification)] The central analytical justification (Section 3.3) links gradient sparsity measured on seen-class triplets to bounded perturbation of unseen-class regions, yet the adapter is a single global residual function. Because updates are driven exclusively by seen triplets, it remains to be shown that sparsity on the training distribution precludes non-local shifts in the embedding geometry of unseen classes; a concrete bound or invariance argument addressing this global coupling is required.
Authors: We agree that the current analysis in Section 3.3 would benefit from an explicit treatment of global coupling. The existing derivation shows that the self-limiting triplet loss combined with zero initialization and hypersphere projection bounds the adapter's output norm, thereby limiting perturbation magnitude. To address the referee's concern, we will revise Section 3.3 to include an additional invariance argument: because the residual adapter is applied uniformly and the projection constrains all embeddings to the unit hypersphere, any local update on seen-class support induces only a Lipschitz-bounded shift that cannot arbitrarily rearrange distant unseen-class regions. A formal statement and short proof sketch will be added. revision: yes
-
Referee: [Table 2 and Section 4.2] Table 2 and the associated experimental protocol report highest worst-case Label Precision, but the 96.5% gradient-free statistic is computed only on training (seen-class) triplets. An auxiliary measurement or bound demonstrating that the same sparsity pattern holds, or at least does not increase perturbation, for held-out unseen-class samples would directly support the OOD claim.
Authors: The 96.5% gradient-free statistic characterizes the training dynamics on seen-class triplets. To strengthen the link to OOD preservation, we will add an auxiliary measurement in the revised Section 4.2: we will evaluate the fraction of gradient-free triplets (using the same margin) on held-out samples drawn from the unseen classes in the OOD benchmarks. This will be reported alongside the existing worst-case Label Precision results to show that sparsity does not degrade for unseen regions. revision: yes
-
Referee: [Section 4.3 (ablations)] The three design principles are asserted to act collectively, yet no ablation isolates the contribution of hypersphere projection versus local triplet loss alone to the observed gradient sparsity and OOD preservation. Removing one principle at a time while keeping the others fixed would clarify whether the self-limiting dynamic is robust or depends on a specific combination.
Authors: We concur that isolating the hypersphere projection from the local triplet loss would improve clarity. We will expand Section 4.3 with two new ablation configurations while retaining zero initialization: (i) local triplet loss without hypersphere projection, and (ii) hypersphere projection paired with a global contrastive loss in place of the local triplet loss. Both will report gradient sparsity at convergence and worst-case Label Precision on the OOD benchmarks, thereby quantifying the contribution of each principle to the self-limiting behavior. revision: yes
Circularity Check
No significant circularity; derivation remains self-contained
full rationale
The paper's core derivation rests on three explicitly stated design principles (zero initialization, local triplet loss, hypersphere projection) whose interaction is described as producing gradient sparsity on satisfied margins. This mechanism is a direct, standard consequence of the triplet loss formulation rather than a redefinition of the target OOD bound. The 96.5% gradient-free statistic is an empirical measurement on seen-class training triplets, not a fitted parameter renamed as a prediction. The link from sparsity to bounded OOD perturbation is asserted via a separate analytical justification whose content is not shown to collapse into the input definitions or prior self-citations. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling appears in the provided text, so the central claim retains independent content beyond its own assumptions.
Axiom & Free-Parameter Ledger
free parameters (1)
- triplet margin
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
triplet loss ... max(0, d(ẑ_a,ẑ_p)-d(ẑ_a,ẑ_n)+m) ... active triplet ratio ρ(t) ... 96.5% of triplets are gradient-free
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
zero-initialized residual ... ℓ2 projection onto the unit hypersphere
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
I-con: A unifying framework for representation learning
Shaden Naif Alshammari, John R Hershey, Axel Feldmann, William T Freeman, and Mark Hamilton. I-con: A unifying framework for representation learning. InThe Thirteenth Interna- tional Conference on Learning Representations (ICLR), 2025
work page 2025
-
[2]
Spann: highly-efficient billion-scale approximate nearest neighbor search
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. Spann: highly-efficient billion-scale approximate nearest neighbor search. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc
work page 2021
-
[3]
Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits
Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 59513–59531. Curran Associates...
work page 2024
-
[4]
Improve representation for imbalanced regression through geometric constraints
Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, and Juan Helen Zhou. Improve representation for imbalanced regression through geometric constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[5]
Mitigate the gap: Improving cross-modal alignment in CLIP
Sedigheh Eslami and Gerard de Melo. Mitigate the gap: Improving cross-modal alignment in CLIP. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[6]
SimCSE: Simple contrastive learning of sentence embeddings
Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple contrastive learning of sentence embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic, November 2021. Assoc...
work page 2021
-
[7]
Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd
Hao Guo and Youyou Lu. Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 171–186, Boston, MA, 2025. USENIX Association
work page 2025
-
[8]
Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion- scale graph-based vector search. In24th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, 2026. USENIX Association
work page 2026
-
[9]
Parameter-efficient transfer learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn...
work page 2019
-
[10]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022
work page 2022
-
[11]
Squeeze-and-excitation networks
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell., 42(8):2011–2023, August 2020
work page 2011
-
[12]
Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 66239–66256. Curran Associates, Inc., 2023. 10
work page 2023
-
[13]
LoRANN: Low-rank matrix factorization for approximate nearest neighbor search
Elias Jääsaari, Ville Hyvönen, and Teemu Roos. LoRANN: Low-rank matrix factorization for approximate nearest neighbor search. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[14]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, page 709–727, Berlin, Heidelberg, 2022. Springer-Verlag
work page 2022
-
[15]
FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding
Dong Jing, Xiaolong He, Yutian Luo, Nanyi Fei, Guoxing Yang, Wei Wei, Huiwen Zhao, and Zhiwu Lu. FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[16]
Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021
work page 2021
-
[17]
Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs
Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Assoc...
work page 2020
-
[18]
Decoupled contrastive learning for federated learning, 2025
Hyungbin Kim, Incheol Baek, and Yon Dohn Chung. Decoupled contrastive learning for federated learning, 2025
work page 2025
-
[19]
Nicolaou, and Yannis Panagakis
Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, and Yannis Panagakis. A principled framework for multi-view contrastive learning, 2025
work page 2025
-
[20]
Degradation of feature space in continual learning, 2026
Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, and Paolo Dini. Degradation of feature space in continual learning, 2026
work page 2026
-
[21]
Vision language models: A survey of 26k papers, 2025
Fengming Lin. Vision language models: A survey of 26k papers, 2025
work page 2025
-
[22]
Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, April 2020
work page 2020
-
[23]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...
work page 2024
-
[24]
Compositional entailment learning for hyperbolic vision-language models
Avik Pal, Max van Spengler, Guido Maria D’Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, and Pascal Mettes. Compositional entailment learning for hyperbolic vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[25]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...
work page 2021
-
[26]
Accept the modality gap: An exploration in the hyperbolic space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, and Ajanthan Thalaiyasingam. Accept the modality gap: An exploration in the hyperbolic space. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27253–27262, 2024
work page 2024
-
[27]
$\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning
Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, and Alberto Del Bimbo. $\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 11
work page 2026
-
[28]
Daniele Savietto, Declan Campbell, André Panisson, Marco Nurisso, Giovanni Petri, Jonathan D. Cohen, and Alan Perotti. The geometry of representational failures in vision language models, 2026
work page 2026
-
[29]
Sevilla, Brittany N Dugger, and Chen-Nee Chuah
Shivam Rajendra Rai Sharma, Xiaoguang Zhu, Luca Cerny Oliveira, Kartik Patwari, La Rissa Vasquez, David Garcia, Louise Nicole C. Sevilla, Brittany N Dugger, and Chen-Nee Chuah. Benchmarking parameter efficient adaptation of vision language models on pathology. In NeurIPS 2025 Workshop for Imageomics: Discovering Biological Knowledge from Images Using AI, 2025
work page 2025
-
[30]
Diskann: fast accurate billion-point nearest neighbor search on a single node
Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishaswamy, and Har- sha Vardhan Simhadri. Diskann: fast accurate billion-point nearest neighbor search on a single node. InProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019
work page 2019
-
[31]
Hangke Sui, Yuqing Wang, and Minh N. Do. Unicon: Unified framework for efficient contrastive alignment via kernels. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[32]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[33]
Rethinking graph masked autoencoders through alignment and uniformity
Liang Wang, Xiang Tao, Qiang Liu, Shu Wu, and Liang Wang. Rethinking graph masked autoencoders through alignment and uniformity. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intel...
work page 2024
-
[34]
Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InProceedings of the 37th International Confer- ence on Machine Learning, ICML’20. JMLR.org, 2020
work page 2020
-
[35]
Mingqi Wu, Qiang Sun, and Archer Y . Yang. PCA++: How uniformity induces robustness to background noise in contrastive learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026
work page 2026
-
[36]
Easemvc: Efficient dual selection mechanism for deep multi-view clustering
Baili Xiao, Zhibin Dong, Ke Liang, Suyuan Liu, Siwei Wang, Tianrui Liu, Xingchen Hu, En Zhu, and Xinwang Liu. Easemvc: Efficient dual selection mechanism for deep multi-view clustering. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20716–20726, 2025
work page 2025
-
[37]
Xuqian Xue, Yiming Lei, Qi Cai, Hongming Shan, and Junping Zhang. PROTOCOL: Partial optimal transport-enhanced contrastive learning for imbalanced multi-view clustering. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[38]
Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search
Ming Yang, Yuzheng Cai, and Weiguo Zheng. Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 103076–103100. Curran Associates, Inc., 2024
work page 2024
-
[39]
Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11941–11952, 2023
work page 2023
-
[40]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023
work page 2023
-
[41]
Tip-adapter: Training-free adaption of clip for few-shot classification
Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. Tip-adapter: Training-free adaption of clip for few-shot classification. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXV, page 493–510, Berlin, Heidelberg, 2022. Springer-Verlag. 12
work page 2022
-
[42]
An information-theoretic regularizer for lossy neural image compression
Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, and Shiqi Wang. An information-theoretic regularizer for lossy neural image compression. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15573–15582, October 2025
work page 2025
-
[43]
Representation de- generation problem in prompt-based models for natural language understanding
Qingyan Zhao, Ruifang He, Jinpeng Zhang, Chang Liu, and Bo Wang. Representation de- generation problem in prompt-based models for natural language understanding. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors,Proceedings of the 2024 Joint International Conference on Computational Linguistic...
work page 2024
-
[44]
Conditional prompt learning for vision-language models
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022
work page 2022
-
[45]
Learning to prompt for vision-language models.Int
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.Int. J. Comput. Vision, 130(9):2337–2348, September 2022
work page 2022
-
[46]
Cm 3: Calibrating multimodal recommendation, 2025
Xin Zhou, Yongjie Wang, and Zhiqi Shen. Cm 3: Calibrating multimodal recommendation, 2025. 13 A Theoretical Analysis of EGA’s OOD Stability We provide a theoretical analysis to explain why the proposed EGA adapter limits the perturbation of unseen-class geometry, in contrast to global contrastive methods. The analysis is structured in five steps, proceedi...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.