thinking with images

Shuoshuo Zhang, Yizhen Zhang, Jingjing Fu, Lei Song, Jiang Bian, Yujiu Yang, Rui Wang · 2025 · arXiv 2512.22120

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models

cs.RO · 2026-05-13 · conditional · novelty 6.0

PAIR-VLA adds invariance and sensitivity objectives over paired visual variants during PPO fine-tuning of VLA models, yielding 9-16% average gains on ManiSkill3 under distractors, textures, poses, viewpoints, and lighting shifts.

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

OpenVLThinkerV2 applies a new Gaussian GRPO training objective with response and entropy shaping to outperform prior open-source and proprietary models on 18 visual reasoning benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models cs.RO · 2026-05-13 · conditional · none · ref 40
PAIR-VLA adds invariance and sensitivity objectives over paired visual variants during PPO fine-tuning of VLA models, yielding 9-16% average gains on ManiSkill3 under distractors, textures, poses, viewpoints, and lighting shifts.
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks cs.CV · 2026-04-09 · unverdicted · none · ref 41
OpenVLThinkerV2 applies a new Gaussian GRPO training objective with response and entropy shaping to outperform prior open-source and proprietary models on 18 visual reasoning benchmarks.

thinking with images

fields

years

verdicts

representative citing papers

citing papers explorer