pith. sign in

hub

arXiv preprint arXiv:2501.05452 (2025) 5, 6

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

citation-role summary

background 3 dataset 1

citation-polarity summary

years

2026 6 2025 7

representative citing papers

Training Multi-Image Vision Agents via End2End Reinforcement Learning

cs.CV · 2025-12-05 · unverdicted · novelty 7.0

IMAgent trains a multi-image vision agent via pure end-to-end RL with visual reflection tools and a two-layer motion trajectory masking strategy, reaching SOTA on single- and multi-image benchmarks while revealing tool-use effects on attention.

Latent Visual Reasoning

cs.CV · 2025-09-29 · unverdicted · novelty 7.0

Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.

Grounded Reinforcement Learning for Visual Reasoning

cs.CV · 2025-05-29 · unverdicted · novelty 6.0

ViGoRL introduces visually grounded RL that anchors reasoning steps to image coordinates and uses multi-turn zooming to outperform standard RL and supervised baselines on spatial and GUI reasoning benchmarks.

Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

DLR is a new reinforced latent reasoning method for VLMs that decomposes queries, uses continuous visual latents, and outperforms text-only and multimodal CoT baselines on vision-centric benchmarks with better interpretability.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

citing papers explorer

Showing 13 of 13 citing papers.