mplug-2: A modularized multi-modal foundation model across text, image and video

Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, et al

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

cs.CV · 2025-11-26 · unverdicted · novelty 6.0

Contrastive Fusion (ConFu) adds a fused-modality contrastive term to jointly align individual modalities and their combinations, enabling capture of higher-order dependencies like XOR relations while preserving pairwise alignments.

citing papers explorer

Showing 1 of 1 citing paper.

The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment cs.CV · 2025-11-26 · unverdicted · none · ref 40
Contrastive Fusion (ConFu) adds a fused-modality contrastive term to jointly align individual modalities and their combinations, enabling capture of higher-order dependencies like XOR relations while preserving pairwise alignments.

mplug-2: A modularized multi-modal foundation model across text, image and video

fields

years

verdicts

representative citing papers

citing papers explorer