pith. machine review for the scientific record. sign in

Vtool-r1: Vlms learn to think with images via reinforcement learning on multimodal tool use

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1 baseline 1 dataset 1

citation-polarity summary

fields

cs.CV 5 cs.AI 1

years

2026 5 2025 1

verdicts

UNVERDICTED 6

representative citing papers

Training Multi-Image Vision Agents via End2End Reinforcement Learning

cs.CV · 2025-12-05 · unverdicted · novelty 7.0

IMAgent trains a multi-image vision agent via pure end-to-end RL with visual reflection tools and a two-layer motion trajectory masking strategy, reaching SOTA on single- and multi-image benchmarks while revealing tool-use effects on attention.

citing papers explorer

Showing 6 of 6 citing papers.