CoMa uses a compressed pre-training phase as a warm-up for contrastive learning to efficiently convert MLLMs into competitive multimodal embedding models, achieving new SOTA on MMEB with limited data.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
CoMa uses a compressed pre-training phase as a warm-up for contrastive learning to efficiently convert MLLMs into competitive multimodal embedding models, achieving new SOTA on MMEB with limited data.