Enhancing spatial reasoning in vision-language models via chain-of-thought prompting and reinforcement learning,

· 2025 · arXiv 2507.13362

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SpatioRoute introduces dynamic prompt routing that improves zero-shot spatial VQA accuracy by up to 5% on the SQA3D benchmark across VLMs without 3D inputs or fine-tuning.

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

cs.AI · 2025-11-01 · unverdicted · novelty 6.0

RLVR on synthetic mazes enables VLMs to solve spatial reasoning tasks unreachable by the base model and generalizes to real-world navigation benchmarks.

Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 5.0 · 2 refs

EyeVLM benchmark finds that current VLMs underperform specialized visual models on gaze following and social gaze prediction, with fine-tuning narrowing but not closing the gap.

citing papers explorer

Showing 3 of 3 citing papers.

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning cs.CV · 2026-05-18 · unverdicted · none · ref 5
SpatioRoute introduces dynamic prompt routing that improves zero-shot spatial VQA accuracy by up to 5% on the SQA3D benchmark across VLMs without 3D inputs or fine-tuning.
Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models cs.AI · 2025-11-01 · unverdicted · none · ref 16
RLVR on synthetic mazes enables VLMs to solve spatial reasoning tasks unreachable by the base model and generalizes to real-world navigation benchmarks.
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models cs.CV · 2026-05-19 · unverdicted · none · ref 40 · 2 links
EyeVLM benchmark finds that current VLMs underperform specialized visual models on gaze following and social gaze prediction, with fine-tuning narrowing but not closing the gap.

Enhancing spatial reasoning in vision-language models via chain-of-thought prompting and reinforcement learning,

fields

years

verdicts

representative citing papers

citing papers explorer