MACCO applies cross-modal masked reconstruction of compositional concepts with inter- and intra-modal auxiliary objectives to improve visio-linguistic compositionality in VLMs.
arXiv preprint arXiv:2109.12178 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality
MACCO applies cross-modal masked reconstruction of compositional concepts with inter- and intra-modal auxiliary objectives to improve visio-linguistic compositionality in VLMs.