VEPO improves RL for visual reasoning by multiplicatively coupling visual sensitivity with token entropy, outperforming entropy-only baselines by 2.28 points at 7B and 3.15 points at 3B scale.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection
VEPO improves RL for visual reasoning by multiplicatively coupling visual sensitivity with token entropy, outperforming entropy-only baselines by 2.28 points at 7B and 3.15 points at 3B scale.