MM-Doc-R1 combines an agentic workflow with Similarity-based Policy Optimization (SPO) to achieve 10.4% higher performance than prior baselines on long-document visual question answering.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
MM-Doc-R1 combines an agentic workflow with Similarity-based Policy Optimization (SPO) to achieve 10.4% higher performance than prior baselines on long-document visual question answering.