Introduces group matching score for better evaluation of compositional reasoning and Test-Time Matching (TTM) algorithm for unsupervised self-improvement in multimodal models, achieving SOTA gains including surpassing GPT-4.1 and estimated human performance.
The role of chain-of-thought in complex vision-language reasoning task.arXiv preprint arXiv:2311.09193
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Perception-verified self-training with PerceptEval and two-stage curriculum learning improves VLM reasoning by up to 16% over standard self-training baselines.
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
citing papers explorer
-
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
Introduces group matching score for better evaluation of compositional reasoning and Test-Time Matching (TTM) algorithm for unsupervised self-improvement in multimodal models, achieving SOTA gains including surpassing GPT-4.1 and estimated human performance.
-
Improving Reasoning in Vision-Language Models via Perception Verified Self-Training
Perception-verified self-training with PerceptEval and two-stage curriculum learning improves VLM reasoning by up to 16% over standard self-training baselines.
-
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.