Presents a training-free personalization toolkit for LVLMs that extracts features via vision foundation models, applies RAG for instance retrieval, and uses visual prompting for multi-concept adaptation on images and videos, claiming SOTA results on a new real-world benchmark.
What does clip know about a red circle? visual prompt engineering for vlms
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Personalization Toolkit: Training Free Personalization of Large Vision Language Models
Presents a training-free personalization toolkit for LVLMs that extracts features via vision foundation models, applies RAG for instance retrieval, and uses visual prompting for multi-concept adaptation on images and videos, claiming SOTA results on a new real-world benchmark.