Multimodal LLMs act as training-free similarity estimators for instance-level image retrieval by converting next-token probabilities from image-pair prompts into scores, combined with efficient indexing for scalability.
Segment any- thing
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
WildPose unifies feedforward 3D features from MASt3R with differentiable bundle adjustment for robust monocular pose estimation across dynamic, static, and low-ego-motion scenes.
citing papers explorer
-
Indexing Multimodal Language Models for Large-scale Image Retrieval
Multimodal LLMs act as training-free similarity estimators for instance-level image retrieval by converting next-token probabilities from image-pair prompts into scores, combined with efficient indexing for scalability.
-
WildPose: A Unified Framework for Robust Pose Estimation in the Wild
WildPose unifies feedforward 3D features from MASt3R with differentiable bundle adjustment for robust monocular pose estimation across dynamic, static, and low-ego-motion scenes.