OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.
Compositional chain-of-thought prompting for large multimodal models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2025 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
AgroCoT is a new Chain-of-Thought VQA benchmark with 4759 samples to evaluate reasoning capabilities of vision-language models in agriculture.
citing papers explorer
-
OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.
-
AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
AgroCoT is a new Chain-of-Thought VQA benchmark with 4759 samples to evaluate reasoning capabilities of vision-language models in agriculture.