← back to paper
arxiv: 2604.09349 · 2 revisions
Visually-Guided Policy Optimization for Multimodal Reasoning