EyeVLM benchmark finds that current VLMs underperform specialized visual models on gaze following and social gaze prediction, with fine-tuning narrowing but not closing the gap.
Chain of thought prompt tuning in vision language models,
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.
citing papers explorer
-
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models
EyeVLM benchmark finds that current VLMs underperform specialized visual models on gaze following and social gaze prediction, with fine-tuning narrowing but not closing the gap.
-
A Survey on Multimodal Large Language Models
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.
-
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
-
A Comprehensive Overview of Large Language Models
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.