EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation

Dongfang Zhao

arxiv: 2605.05674 · v2 · submitted 2026-05-07 · 💻 cs.CV · cs.AI· cs.LG

EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation

Dongfang Zhao This is my paper

Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords Euclidean Geodesic Alignmentfrozen encodersout-of-distribution adaptationvector searchadapter trainingtriplet losslabel precisiongradient sparsity

0 comments

The pith

EGA adapter uses self-limiting updates to refine seen classes without harming unseen ones in frozen-encoder vector search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Frozen vision encoders for vector search encounter queries from unseen classes at deployment, but standard adapter training with global losses reassigns those samples to wrong clusters and drops worst-case label precision by more than 40 points. EGA couples zero initialization, local triplet loss, and hypersphere projection so that triplets already satisfying a small margin produce no gradient; the adapter therefore stops changing geometry that is already correct. At convergence 96.5 percent of triplets are gradient-free, leaving unseen-class regions largely untouched while still allowing full refinement of seen classes. Across five diverse OOD benchmarks EGA records the highest worst-case label precision on four splits and a steady gain on the fifth, and the same pattern holds when stronger backbones replace CLIP.

Core claim

EGA is a residual adapter whose three coupled principles induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct, leaving unseen-class regions largely untouched while enabling full-capacity refinement of seen classes.

What carries the argument

The self-limiting dynamic produced by coupling zero initialization, local triplet loss, and hypersphere projection in Euclidean Geodesic Alignment, which drives gradient sparsity on satisfied triplets.

If this is right

Worst-case label precision is highest on four of five primary OOD splits and improves on the fifth.
At convergence 96.5 percent of triplets produce no gradient, so unseen-class regions remain largely untouched.
The same bounded-degradation pattern holds when stronger backbones replace the original CLIP encoder.
Gradient sparsity supplies an analytical justification for bounded OOD perturbation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Local triplet objectives may be systematically preferable to global contrastive losses whenever preservation of certain geometric regions is required.
The same self-limiting mechanism could be ported to text or multimodal encoders facing class-shift at deployment.
Production vector-search pipelines could adopt EGA to reduce the risk of silent accuracy drops on novel queries.
The gradient-sparsity statistic itself could serve as a lightweight monitor for OOD stability during adapter training.

Load-bearing premise

The three design principles together create a self-limiting update rule that automatically halts where local geometry is already correct and that this rule generalizes across OOD benchmarks and stronger backbones.

What would settle it

A new OOD benchmark in which EGA's worst-case label precision falls below the frozen baseline, or in which fewer than 90 percent of triplets are gradient-free at convergence, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.05674 by Dongfang Zhao.

**Figure 1.** Figure 1: Label Precision@K on CIFAR-100 across K ∈ {1, 3, 5, 10} and nprobe ∈ {1, 5, 10}. 0.0 0.5 1.0 1.5 2.0 0 2 4 6 Density Latent Manifold (K = 1) Top 1 Neighbors Background 0.0 0.5 1.0 1.5 2.0 0 2 4 6 Latent Manifold (K = 2) Top 2 Neighbors Background 0.0 0.5 1.0 1.5 2.0 0 2 4 6 Latent Manifold (K = 4) Top 4 Neighbors Background 0.0 0.5 1.0 1.5 2.0 0 2 4 6 Latent Manifold (K = 8) Top 8 Neighbors Background 0.0 … view at source ↗

**Figure 2.** Figure 2: Distance distributions for Top-K neighbors vs. random background points on CIFAR-100. 4 Evaluation Experiments were conducted on Chameleon Cloud [17] with 256 AMD EPYC-7763 CPU cores, 512 GiB RAM, and an NVIDIA A100 GPU of 80 GB HBM2e. Datasets include CIFAR-100 and ImageNet-1K for in-distribution, and CIFAR-10, FGVC-Aircraft, Food-101, ImageNet-1K held-out classes, and Oxford-IIIT Pet for OOD evaluation. … view at source ↗

**Figure 3.** Figure 3: ANNS Recall@K versus nprobe on CIFAR-100. two distributions overlap substantially, which is the condition under which IVF-based ANN indexing degrades. After EGA, the two distributions separate cleanly across all K, and this separation drives the recall-efficiency improvement in view at source ↗

**Figure 4.** Figure 4: ID/OOD operating points of adapter training in the (ID, OOD) Label Precision plane. view at source ↗

**Figure 5.** Figure 5: Gradient sparsity as OOD protection. Route 1: Gradient sparsity (EGA). A triplet loss with margin m produces a non-zero gradient only when d(a, p) − d(a, n) + m > 0 view at source ↗

**Figure 6.** Figure 6: Label Precision@1 across three OOD benchmarks and three search budgets ( view at source ↗

read the original abstract

Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local triplet loss, and hypersphere projection. These collectively induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct. Our experiments show that at convergence $96.5\%$ of triplets are gradient-free, leaving unseen-class regions largely untouched while still enabling full-capacity refinement of seen classes. Across five diverse out-of-distribution (OOD) benchmarks, EGA achieves the highest worst-case Label Precision on the four primary splits and a consistent improvement on the fifth. The design also transfers to stronger backbones in addition to CLIP, and we provide an analytical justification linking gradient sparsity to bounded OOD perturbation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EGA gives a practical adapter that limits damage to unseen classes via gradient sparsity, but the global residual still needs tighter proof that seen-only updates truly bound OOD shifts.

read the letter

The main thing to know is that this method trains a residual adapter on a frozen vision encoder so it improves seen-class geometry while mostly leaving unseen-class regions alone, and the experiments show it holds up on worst-case label precision across the OOD splits they tested. The 96.5% gradient-free triplets at convergence is the concrete observation that supports the self-limiting claim. What is new is the specific combination of zero initialization, local triplet loss, and hypersphere projection to produce that sparsity instead of the usual global contrastive training that pulls unseen samples into seen clusters. The paper does well by reporting consistent gains on five diverse benchmarks, including the highest worst-case numbers on four of them, and by showing the approach transfers to stronger backbones beyond CLIP. The analytical justification they provide for linking sparsity to bounded perturbation is a step in the right direction if the details check out. The soft spot is whether the global residual function really keeps unseen embeddings untouched. Even with sparse gradients on seen triplets, the single adapter applied everywhere could still induce non-local shifts in regions without training data, and the stress-test concern about this coupling is worth checking against the full derivation. The 96.5% figure is measured only on training triplets, so the independence from seen-class definitions needs to be clear. This is for engineers and researchers who run vector search on evolving image collections and need adapters that do not silently degrade on new classes. A reader focused on practical OOD robustness in embeddings will get usable design rules and numbers from it. The work has enough substance and addresses a real deployment issue to deserve a serious referee, even if the global-coupling argument needs more scrutiny in review.

Referee Report

3 major / 2 minor

Summary. The paper proposes Euclidean Geodesic Alignment (EGA), a residual adapter for frozen vision encoders that combines zero initialization, local triplet loss, and hypersphere projection. This design is claimed to induce a self-limiting dynamic in which triplets satisfying a small margin produce no gradients, resulting in 96.5% gradient-free triplets at convergence and thereby bounding out-of-distribution degradation on unseen classes. Experiments across five OOD benchmarks report that EGA attains the highest worst-case Label Precision on four primary splits (with consistent improvement on the fifth), transfers to stronger backbones, and is supported by an analytical justification linking gradient sparsity to bounded OOD perturbation.

Significance. If the analytical justification and empirical robustness hold, the result would be significant for practical vector-search systems that must adapt frozen encoders without access to deployment-time classes. The explicit attempt to derive a bound from the self-limiting property, the high reported gradient sparsity, and the transfer experiments constitute concrete strengths that distinguish the work from standard adapter training.

major comments (3)

[Section 3.3 (analytical justification)] The central analytical justification (Section 3.3) links gradient sparsity measured on seen-class triplets to bounded perturbation of unseen-class regions, yet the adapter is a single global residual function. Because updates are driven exclusively by seen triplets, it remains to be shown that sparsity on the training distribution precludes non-local shifts in the embedding geometry of unseen classes; a concrete bound or invariance argument addressing this global coupling is required.
[Table 2 and Section 4.2] Table 2 and the associated experimental protocol report highest worst-case Label Precision, but the 96.5% gradient-free statistic is computed only on training (seen-class) triplets. An auxiliary measurement or bound demonstrating that the same sparsity pattern holds, or at least does not increase perturbation, for held-out unseen-class samples would directly support the OOD claim.
[Section 4.3 (ablations)] The three design principles are asserted to act collectively, yet no ablation isolates the contribution of hypersphere projection versus local triplet loss alone to the observed gradient sparsity and OOD preservation. Removing one principle at a time while keeping the others fixed would clarify whether the self-limiting dynamic is robust or depends on a specific combination.

minor comments (2)

[Section 3.1] The definition of the triplet margin and its interaction with the zero-initialization scheme could be stated more explicitly with a single equation in the method section to avoid ambiguity when readers reproduce the gradient-free condition.
[Figure 3] Figure 3 (embedding visualizations) would benefit from an additional panel showing the change in nearest-neighbor assignments for a few unseen-class queries before and after adaptation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of the work's potential significance. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses

Referee: [Section 3.3 (analytical justification)] The central analytical justification (Section 3.3) links gradient sparsity measured on seen-class triplets to bounded perturbation of unseen-class regions, yet the adapter is a single global residual function. Because updates are driven exclusively by seen triplets, it remains to be shown that sparsity on the training distribution precludes non-local shifts in the embedding geometry of unseen classes; a concrete bound or invariance argument addressing this global coupling is required.

Authors: We agree that the current analysis in Section 3.3 would benefit from an explicit treatment of global coupling. The existing derivation shows that the self-limiting triplet loss combined with zero initialization and hypersphere projection bounds the adapter's output norm, thereby limiting perturbation magnitude. To address the referee's concern, we will revise Section 3.3 to include an additional invariance argument: because the residual adapter is applied uniformly and the projection constrains all embeddings to the unit hypersphere, any local update on seen-class support induces only a Lipschitz-bounded shift that cannot arbitrarily rearrange distant unseen-class regions. A formal statement and short proof sketch will be added. revision: yes
Referee: [Table 2 and Section 4.2] Table 2 and the associated experimental protocol report highest worst-case Label Precision, but the 96.5% gradient-free statistic is computed only on training (seen-class) triplets. An auxiliary measurement or bound demonstrating that the same sparsity pattern holds, or at least does not increase perturbation, for held-out unseen-class samples would directly support the OOD claim.

Authors: The 96.5% gradient-free statistic characterizes the training dynamics on seen-class triplets. To strengthen the link to OOD preservation, we will add an auxiliary measurement in the revised Section 4.2: we will evaluate the fraction of gradient-free triplets (using the same margin) on held-out samples drawn from the unseen classes in the OOD benchmarks. This will be reported alongside the existing worst-case Label Precision results to show that sparsity does not degrade for unseen regions. revision: yes
Referee: [Section 4.3 (ablations)] The three design principles are asserted to act collectively, yet no ablation isolates the contribution of hypersphere projection versus local triplet loss alone to the observed gradient sparsity and OOD preservation. Removing one principle at a time while keeping the others fixed would clarify whether the self-limiting dynamic is robust or depends on a specific combination.

Authors: We concur that isolating the hypersphere projection from the local triplet loss would improve clarity. We will expand Section 4.3 with two new ablation configurations while retaining zero initialization: (i) local triplet loss without hypersphere projection, and (ii) hypersphere projection paired with a global contrastive loss in place of the local triplet loss. Both will report gradient sparsity at convergence and worst-case Label Precision on the OOD benchmarks, thereby quantifying the contribution of each principle to the self-limiting behavior. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper's core derivation rests on three explicitly stated design principles (zero initialization, local triplet loss, hypersphere projection) whose interaction is described as producing gradient sparsity on satisfied margins. This mechanism is a direct, standard consequence of the triplet loss formulation rather than a redefinition of the target OOD bound. The 96.5% gradient-free statistic is an empirical measurement on seen-class training triplets, not a fitted parameter renamed as a prediction. The link from sparsity to bounded OOD perturbation is asserted via a separate analytical justification whose content is not shown to collapse into the input definitions or prior self-citations. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling appears in the provided text, so the central claim retains independent content beyond its own assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is populated from stated design choices; no explicit free parameters or invented entities are named, but the triplet margin and projection radius are implicit hyperparameters.

free parameters (1)

triplet margin
Standard hyperparameter in the local triplet loss; value not reported in abstract but required for the self-limiting dynamic to function.

pith-pipeline@v0.9.0 · 5497 in / 1209 out tokens · 49039 ms · 2026-05-11T00:43:34.671170+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

triplet loss ... max(0, d(ẑ_a,ẑ_p)-d(ẑ_a,ẑ_n)+m) ... active triplet ratio ρ(t) ... 96.5% of triplets are gradient-free
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

zero-initialized residual ... ℓ2 projection onto the unit hypersphere

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

[1]

I-con: A unifying framework for representation learning

Shaden Naif Alshammari, John R Hershey, Axel Feldmann, William T Freeman, and Mark Hamilton. I-con: A unifying framework for representation learning. InThe Thirteenth Interna- tional Conference on Learning Representations (ICLR), 2025

work page 2025
[2]

Spann: highly-efficient billion-scale approximate nearest neighbor search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. Spann: highly-efficient billion-scale approximate nearest neighbor search. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021
[3]

Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits

Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 59513–59531. Curran Associates...

work page 2024
[4]

Improve representation for imbalanced regression through geometric constraints

Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, and Juan Helen Zhou. Improve representation for imbalanced regression through geometric constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[5]

Mitigate the gap: Improving cross-modal alignment in CLIP

Sedigheh Eslami and Gerard de Melo. Mitigate the gap: Improving cross-modal alignment in CLIP. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[6]

SimCSE: Simple contrastive learning of sentence embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple contrastive learning of sentence embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic, November 2021. Assoc...

work page 2021
[7]

Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd

Hao Guo and Youyou Lu. Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 171–186, Boston, MA, 2025. USENIX Association

work page 2025
[8]

Odinann: Direct insert for consistently stable performance in billion- scale graph-based vector search

Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion- scale graph-based vector search. In24th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, 2026. USENIX Association

work page 2026
[9]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn...

work page 2019
[10]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[11]

Squeeze-and-excitation networks

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell., 42(8):2011–2023, August 2020

work page 2011
[12]

Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations

Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 66239–66256. Curran Associates, Inc., 2023. 10

work page 2023
[13]

LoRANN: Low-rank matrix factorization for approximate nearest neighbor search

Elias Jääsaari, Ville Hyvönen, and Teemu Roos. LoRANN: Low-rank matrix factorization for approximate nearest neighbor search. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[14]

Visual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, page 709–727, Berlin, Heidelberg, 2022. Springer-Verlag

work page 2022
[15]

FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding

Dong Jing, Xiaolong He, Yutian Luo, Nanyi Fei, Guoxing Yang, Wei Wei, Huiwen Zhao, and Zhiwu Lu. FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[16]

Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

work page 2021
[17]

Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs

Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Assoc...

work page 2020
[18]

Decoupled contrastive learning for federated learning, 2025

Hyungbin Kim, Incheol Baek, and Yon Dohn Chung. Decoupled contrastive learning for federated learning, 2025

work page 2025
[19]

Nicolaou, and Yannis Panagakis

Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, and Yannis Panagakis. A principled framework for multi-view contrastive learning, 2025

work page 2025
[20]

Degradation of feature space in continual learning, 2026

Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, and Paolo Dini. Degradation of feature space in continual learning, 2026

work page 2026
[21]

Vision language models: A survey of 26k papers, 2025

Fengming Lin. Vision language models: A survey of 26k papers, 2025

work page 2025
[22]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, April 2020

work page 2020
[23]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024
[24]

Compositional entailment learning for hyperbolic vision-language models

Avik Pal, Max van Spengler, Guido Maria D’Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, and Pascal Mettes. Compositional entailment learning for hyperbolic vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[25]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...

work page 2021
[26]

Accept the modality gap: An exploration in the hyperbolic space

Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, and Ajanthan Thalaiyasingam. Accept the modality gap: An exploration in the hyperbolic space. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27253–27262, 2024

work page 2024
[27]

$\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning

Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, and Alberto Del Bimbo. $\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 11

work page 2026
[28]

Cohen, and Alan Perotti

Daniele Savietto, Declan Campbell, André Panisson, Marco Nurisso, Giovanni Petri, Jonathan D. Cohen, and Alan Perotti. The geometry of representational failures in vision language models, 2026

work page 2026
[29]

Sevilla, Brittany N Dugger, and Chen-Nee Chuah

Shivam Rajendra Rai Sharma, Xiaoguang Zhu, Luca Cerny Oliveira, Kartik Patwari, La Rissa Vasquez, David Garcia, Louise Nicole C. Sevilla, Brittany N Dugger, and Chen-Nee Chuah. Benchmarking parameter efficient adaptation of vision language models on pathology. In NeurIPS 2025 Workshop for Imageomics: Discovering Biological Knowledge from Images Using AI, 2025

work page 2025
[30]

Diskann: fast accurate billion-point nearest neighbor search on a single node

Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishaswamy, and Har- sha Vardhan Simhadri. Diskann: fast accurate billion-point nearest neighbor search on a single node. InProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

work page 2019
[31]

Hangke Sui, Yuqing Wang, and Minh N. Do. Unicon: Unified framework for efficient contrastive alignment via kernels. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[32]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

Rethinking graph masked autoencoders through alignment and uniformity

Liang Wang, Xiang Tao, Qiang Liu, Shu Wu, and Liang Wang. Rethinking graph masked autoencoders through alignment and uniformity. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intel...

work page 2024
[34]

Understanding contrastive representation learning through alignment and uniformity on the hypersphere

Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InProceedings of the 37th International Confer- ence on Machine Learning, ICML’20. JMLR.org, 2020

work page 2020
[35]

Mingqi Wu, Qiang Sun, and Archer Y . Yang. PCA++: How uniformity induces robustness to background noise in contrastive learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026
[36]

Easemvc: Efficient dual selection mechanism for deep multi-view clustering

Baili Xiao, Zhibin Dong, Ke Liang, Suyuan Liu, Siwei Wang, Tianrui Liu, Xingchen Hu, En Zhu, and Xinwang Liu. Easemvc: Efficient dual selection mechanism for deep multi-view clustering. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20716–20726, 2025

work page 2025
[37]

PROTOCOL: Partial optimal transport-enhanced contrastive learning for imbalanced multi-view clustering

Xuqian Xue, Yiming Lei, Qi Cai, Hongming Shan, and Junping Zhang. PROTOCOL: Partial optimal transport-enhanced contrastive learning for imbalanced multi-view clustering. In Forty-second International Conference on Machine Learning, 2025

work page 2025
[38]

Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search

Ming Yang, Yuzheng Cai, and Weiguo Zheng. Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 103076–103100. Curran Associates, Inc., 2024

work page 2024
[39]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11941–11952, 2023

work page 2023
[40]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023

work page 2023
[41]

Tip-adapter: Training-free adaption of clip for few-shot classification

Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. Tip-adapter: Training-free adaption of clip for few-shot classification. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXV, page 493–510, Berlin, Heidelberg, 2022. Springer-Verlag. 12

work page 2022
[42]

An information-theoretic regularizer for lossy neural image compression

Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, and Shiqi Wang. An information-theoretic regularizer for lossy neural image compression. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15573–15582, October 2025

work page 2025
[43]

Representation de- generation problem in prompt-based models for natural language understanding

Qingyan Zhao, Ruifang He, Jinpeng Zhang, Chang Liu, and Bo Wang. Representation de- generation problem in prompt-based models for natural language understanding. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors,Proceedings of the 2024 Joint International Conference on Computational Linguistic...

work page 2024
[44]

Conditional prompt learning for vision-language models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022

work page 2022
[45]

Learning to prompt for vision-language models.Int

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.Int. J. Comput. Vision, 130(9):2337–2348, September 2022

work page 2022
[46]

Cm 3: Calibrating multimodal recommendation, 2025

Xin Zhou, Yongjie Wang, and Zhiqi Shen. Cm 3: Calibrating multimodal recommendation, 2025. 13 A Theoretical Analysis of EGA’s OOD Stability We provide a theoretical analysis to explain why the proposed EGA adapter limits the perturbation of unseen-class geometry, in contrast to global contrastive methods. The analysis is structured in five steps, proceedi...

work page 2025

[1] [1]

I-con: A unifying framework for representation learning

Shaden Naif Alshammari, John R Hershey, Axel Feldmann, William T Freeman, and Mark Hamilton. I-con: A unifying framework for representation learning. InThe Thirteenth Interna- tional Conference on Learning Representations (ICLR), 2025

work page 2025

[2] [2]

Spann: highly-efficient billion-scale approximate nearest neighbor search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. Spann: highly-efficient billion-scale approximate nearest neighbor search. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021

[3] [3]

Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits

Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 59513–59531. Curran Associates...

work page 2024

[4] [4]

Improve representation for imbalanced regression through geometric constraints

Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, and Juan Helen Zhou. Improve representation for imbalanced regression through geometric constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[5] [5]

Mitigate the gap: Improving cross-modal alignment in CLIP

Sedigheh Eslami and Gerard de Melo. Mitigate the gap: Improving cross-modal alignment in CLIP. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[6] [6]

SimCSE: Simple contrastive learning of sentence embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple contrastive learning of sentence embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic, November 2021. Assoc...

work page 2021

[7] [7]

Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd

Hao Guo and Youyou Lu. Achieving low-latency graph-based vector search via aligning best- first search algorithm with ssd. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 171–186, Boston, MA, 2025. USENIX Association

work page 2025

[8] [8]

Odinann: Direct insert for consistently stable performance in billion- scale graph-based vector search

Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion- scale graph-based vector search. In24th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, 2026. USENIX Association

work page 2026

[9] [9]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn...

work page 2019

[10] [10]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022

[11] [11]

Squeeze-and-excitation networks

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell., 42(8):2011–2023, August 2020

work page 2011

[12] [12]

Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations

Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 66239–66256. Curran Associates, Inc., 2023. 10

work page 2023

[13] [13]

LoRANN: Low-rank matrix factorization for approximate nearest neighbor search

Elias Jääsaari, Ville Hyvönen, and Teemu Roos. LoRANN: Low-rank matrix factorization for approximate nearest neighbor search. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[14] [14]

Visual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, page 709–727, Berlin, Heidelberg, 2022. Springer-Verlag

work page 2022

[15] [15]

FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding

Dong Jing, Xiaolong He, Yutian Luo, Nanyi Fei, Guoxing Yang, Wei Wei, Huiwen Zhao, and Zhiwu Lu. FineCLIP: Self-distilled region-based CLIP for better fine-grained understanding. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[16] [16]

Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

work page 2021

[17] [17]

Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs

Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Assoc...

work page 2020

[18] [18]

Decoupled contrastive learning for federated learning, 2025

Hyungbin Kim, Incheol Baek, and Yon Dohn Chung. Decoupled contrastive learning for federated learning, 2025

work page 2025

[19] [19]

Nicolaou, and Yannis Panagakis

Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, and Yannis Panagakis. A principled framework for multi-view contrastive learning, 2025

work page 2025

[20] [20]

Degradation of feature space in continual learning, 2026

Chiara Lanza, Roberto Pereira, Marco Miozzo, Eduard Angelats, and Paolo Dini. Degradation of feature space in continual learning, 2026

work page 2026

[21] [21]

Vision language models: A survey of 26k papers, 2025

Fengming Lin. Vision language models: A survey of 26k papers, 2025

work page 2025

[22] [22]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, April 2020

work page 2020

[23] [23]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024

[24] [24]

Compositional entailment learning for hyperbolic vision-language models

Avik Pal, Max van Spengler, Guido Maria D’Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, and Pascal Mettes. Compositional entailment learning for hyperbolic vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[25] [25]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...

work page 2021

[26] [26]

Accept the modality gap: An exploration in the hyperbolic space

Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, and Ajanthan Thalaiyasingam. Accept the modality gap: An exploration in the hyperbolic space. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27253–27262, 2024

work page 2024

[27] [27]

$\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning

Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, and Alberto Del Bimbo. $\boldsymbol{\lambda}$-orthogonality regularization for compatible representation learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. 11

work page 2026

[28] [28]

Cohen, and Alan Perotti

Daniele Savietto, Declan Campbell, André Panisson, Marco Nurisso, Giovanni Petri, Jonathan D. Cohen, and Alan Perotti. The geometry of representational failures in vision language models, 2026

work page 2026

[29] [29]

Sevilla, Brittany N Dugger, and Chen-Nee Chuah

Shivam Rajendra Rai Sharma, Xiaoguang Zhu, Luca Cerny Oliveira, Kartik Patwari, La Rissa Vasquez, David Garcia, Louise Nicole C. Sevilla, Brittany N Dugger, and Chen-Nee Chuah. Benchmarking parameter efficient adaptation of vision language models on pathology. In NeurIPS 2025 Workshop for Imageomics: Discovering Biological Knowledge from Images Using AI, 2025

work page 2025

[30] [30]

Diskann: fast accurate billion-point nearest neighbor search on a single node

Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishaswamy, and Har- sha Vardhan Simhadri. Diskann: fast accurate billion-point nearest neighbor search on a single node. InProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

work page 2019

[31] [31]

Hangke Sui, Yuqing Wang, and Minh N. Do. Unicon: Unified framework for efficient contrastive alignment via kernels. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026

[32] [32]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

Rethinking graph masked autoencoders through alignment and uniformity

Liang Wang, Xiang Tao, Qiang Liu, Shu Wu, and Liang Wang. Rethinking graph masked autoencoders through alignment and uniformity. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intel...

work page 2024

[34] [34]

Understanding contrastive representation learning through alignment and uniformity on the hypersphere

Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InProceedings of the 37th International Confer- ence on Machine Learning, ICML’20. JMLR.org, 2020

work page 2020

[35] [35]

Mingqi Wu, Qiang Sun, and Archer Y . Yang. PCA++: How uniformity induces robustness to background noise in contrastive learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026

[36] [36]

Easemvc: Efficient dual selection mechanism for deep multi-view clustering

Baili Xiao, Zhibin Dong, Ke Liang, Suyuan Liu, Siwei Wang, Tianrui Liu, Xingchen Hu, En Zhu, and Xinwang Liu. Easemvc: Efficient dual selection mechanism for deep multi-view clustering. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20716–20726, 2025

work page 2025

[37] [37]

PROTOCOL: Partial optimal transport-enhanced contrastive learning for imbalanced multi-view clustering

Xuqian Xue, Yiming Lei, Qi Cai, Hongming Shan, and Junping Zhang. PROTOCOL: Partial optimal transport-enhanced contrastive learning for imbalanced multi-view clustering. In Forty-second International Conference on Machine Learning, 2025

work page 2025

[38] [38]

Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search

Ming Yang, Yuzheng Cai, and Weiguo Zheng. Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 103076–103100. Curran Associates, Inc., 2024

work page 2024

[39] [39]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11941–11952, 2023

work page 2023

[40] [40]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023

work page 2023

[41] [41]

Tip-adapter: Training-free adaption of clip for few-shot classification

Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. Tip-adapter: Training-free adaption of clip for few-shot classification. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXV, page 493–510, Berlin, Heidelberg, 2022. Springer-Verlag. 12

work page 2022

[42] [42]

An information-theoretic regularizer for lossy neural image compression

Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, and Shiqi Wang. An information-theoretic regularizer for lossy neural image compression. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15573–15582, October 2025

work page 2025

[43] [43]

Representation de- generation problem in prompt-based models for natural language understanding

Qingyan Zhao, Ruifang He, Jinpeng Zhang, Chang Liu, and Bo Wang. Representation de- generation problem in prompt-based models for natural language understanding. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors,Proceedings of the 2024 Joint International Conference on Computational Linguistic...

work page 2024

[44] [44]

Conditional prompt learning for vision-language models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022

work page 2022

[45] [45]

Learning to prompt for vision-language models.Int

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.Int. J. Comput. Vision, 130(9):2337–2348, September 2022

work page 2022

[46] [46]

Cm 3: Calibrating multimodal recommendation, 2025

Xin Zhou, Yongjie Wang, and Zhiqi Shen. Cm 3: Calibrating multimodal recommendation, 2025. 13 A Theoretical Analysis of EGA’s OOD Stability We provide a theoretical analysis to explain why the proposed EGA adapter limits the perturbation of unseen-class geometry, in contrast to global contrastive methods. The analysis is structured in five steps, proceedi...

work page 2025