MAPO improves multimodal chain-of-thought reasoning by requiring explicit textual descriptions of visual tool results and using a novel advantage estimator that combines semantic alignment with task rewards.
Deep but reliable: Advancing multi-turn reasoning for thinking with images
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
Mags-RL uses agentic RL and a super-resolution agent for two-round reasoning in MLLMs, claiming gains on VSR, TallyQA, and GQA with a curriculum needing only 40 samples.
citing papers explorer
-
Walk the Talk: Bridging the Reasoning-Action Gap for Thinking with Images via Multimodal Agentic Policy Optimization
MAPO improves multimodal chain-of-thought reasoning by requiring explicit textual descriptions of visual tool results and using a novel advantage estimator that combines semantic alignment with task rewards.
-
Mags-RL: Wearing Multimodal LLMs a Magnifying Glass via Agentic Reinforcement Learning For Complex Scene Reasoning
Mags-RL uses agentic RL and a super-resolution agent for two-round reasoning in MLLMs, claiming gains on VSR, TallyQA, and GQA with a curriculum needing only 40 samples.