M3 cot: A novel benchmark for multi- domain multi-step multi-modal chain-of-thought

Qiguang Chen, Libo Qin, Jin Zhang, Zhi Chen, Xiao Xu, Wanxiang Che · 2024 · arXiv 2405.16473

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 dataset 2

citation-polarity summary

background 2 use dataset 2

representative citing papers

Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.

AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning

cs.CV · 2025-09-30 · unverdicted · novelty 6.0

AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

cs.IR · 2025-08-07 · unverdicted · novelty 6.0

WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

cs.CV · 2025-04-14 · conditional · novelty 6.0

InternVL3-78B sets a new open-source SOTA of 72.2 on MMMU via native joint multimodal pre-training, V2PE, MPO, and test-time scaling while remaining competitive with proprietary models.

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

cs.CL · 2024-11-15 · conditional · novelty 6.0

Mixed Preference Optimization with the MMPR dataset boosts multimodal CoT reasoning, lifting InternVL2-8B to 67.0 accuracy on MathVista (+8.7 points) and matching the 76B model.

Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

cs.LG · 2026-04-17 · unverdicted · novelty 5.0

CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-critical settings.

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

cs.AI · 2026-04-16 · unverdicted · novelty 5.0

UEC-RL improves RL reasoning performance in LLMs and VLMs by activating exploration on hard prompts and stabilizing entropy, delivering a 37.9% relative gain over GRPO on Geometry3K.

citing papers explorer

Showing 7 of 7 citing papers.

Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models cs.AI · 2026-04-07 · unverdicted · none · ref 2
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning cs.CV · 2025-09-30 · unverdicted · none · ref 2
AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent cs.IR · 2025-08-07 · unverdicted · none · ref 4
WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025-04-14 · conditional · none · ref 16
InternVL3-78B sets a new open-source SOTA of 72.2 on MMMU via native joint multimodal pre-training, V2PE, MPO, and test-time scaling while remaining competitive with proprietary models.
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization cs.CL · 2024-11-15 · conditional · none · ref 16
Mixed Preference Optimization with the MMPR dataset boosts multimodal CoT reasoning, lifting InternVL2-8B to 67.0 accuracy on MathVista (+8.7 points) and matching the 76B model.
Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning cs.LG · 2026-04-17 · unverdicted · none · ref 40
CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-critical settings.
Targeted Exploration via Unified Entropy Control for Reinforcement Learning cs.AI · 2026-04-16 · unverdicted · none · ref 1
UEC-RL improves RL reasoning performance in LLMs and VLMs by activating exploration on hard prompts and stabilizing entropy, delivering a 37.9% relative gain over GRPO on Geometry3K.

M3 cot: A novel benchmark for multi- domain multi-step multi-modal chain-of-thought

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer