Evaluates 42 variants of foundation models across three formalized paradigms for missing modality reconstruction, identifies shortfalls in semantic extraction and validation, and introduces an agentic framework that reduces FID by at least 14% for images and MER by at least 10% for text.
Found in translation: Learning robust joint representations by cyclic translations between modalities,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
How Far Are We from Generating Missing Modalities with Foundation Models?
Evaluates 42 variants of foundation models across three formalized paradigms for missing modality reconstruction, identifies shortfalls in semantic extraction and validation, and introduces an agentic framework that reduces FID by at least 14% for images and MER by at least 10% for text.