E5-V produces strong universal multimodal embeddings from MLLMs trained solely on text pairs, often surpassing prior methods across retrieval and related tasks without multimodal fine-tuning.
Microsoft coco: Common objects in context
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
E5-V: Universal Embeddings with Multimodal Large Language Models
E5-V produces strong universal multimodal embeddings from MLLMs trained solely on text pairs, often surpassing prior methods across retrieval and related tasks without multimodal fine-tuning.