CodeBind uses a modality-shared-specific codebook and compositional vector quantization to decouple shared semantic features from modality-unique details, achieving state-of-the-art multimodal classification and retrieval across nine modalities without requiring fully paired data.
Teledyne FLIR
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook
CodeBind uses a modality-shared-specific codebook and compositional vector quantization to decouple shared semantic features from modality-unique details, achieving state-of-the-art multimodal classification and retrieval across nine modalities without requiring fully paired data.