Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
Momentum contrast for unsupervised visual representation learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 1
method 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Spatial Prediction pretext task learns spatial structure in self-supervised learning by regressing relative position and scale between image views, yielding more structured representations and better generalization.
citing papers explorer
-
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
-
Learning to Perceive "Where": Spatial Pretext Tasks for Robust Self-Supervised Learning
Spatial Prediction pretext task learns spatial structure in self-supervised learning by regressing relative position and scale between image views, yielding more structured representations and better generalization.