The paper identifies task-composition and fusion bottlenecks as the main barriers in multimodal reasoning, with experiments showing extra modalities help only when they supply independent reasoning paths.
Proceedings of the 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
citing papers explorer
-
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
The paper identifies task-composition and fusion bottlenecks as the main barriers in multimodal reasoning, with experiments showing extra modalities help only when they supply independent reasoning paths.
- Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models