pith. sign in

Simple o3: To- wards interleaved vision-language reasoning

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 4

years

2026 3 2025 1

verdicts

UNVERDICTED 4

roles

background 1

polarities

background 1

representative citing papers

Training Multi-Image Vision Agents via End2End Reinforcement Learning

cs.CV · 2025-12-05 · unverdicted · novelty 7.0

IMAgent trains a multi-image vision agent via pure end-to-end RL with visual reflection tools and a two-layer motion trajectory masking strategy, reaching SOTA on single- and multi-image benchmarks while revealing tool-use effects on attention.

MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MAG-3D is a training-free multi-agent framework that coordinates planning, grounding, and coding agents with off-the-shelf VLMs to achieve grounded 3D reasoning and state-of-the-art benchmark results.

citing papers explorer

Showing 4 of 4 citing papers.