Staged post-training that first solidifies visual perception before visual and textual reasoning improves VLM accuracy and shortens reasoning traces on visual math and perception benchmarks.
arXiv preprint arXiv:2506.07905 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
citing papers explorer
-
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
Staged post-training that first solidifies visual perception before visual and textual reasoning improves VLM accuracy and shortens reasoning traces on visual math and perception benchmarks.
-
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.