Visual spatial reasoning

Fangyu Liu, Guy Emerson, Nigel Collier · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

cs.CV · 2024-06-13 · conditional · novelty 7.0

MuirBench is a new benchmark showing that top multimodal LLMs struggle with robust multi-image understanding, with GPT-4o at 68% and open-source models below 33% accuracy.

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

cs.CV · 2025-06-11 · unverdicted · novelty 6.0

VILASR integrates visual drawing operations with reasoning in LVLMs via cold-start synthetic training, reflective rejection sampling, and reinforcement learning, yielding an 18.4% average gain on spatial reasoning benchmarks.

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

cs.CV · 2023-10-14 · unverdicted · novelty 5.0

MiniGPT-v2 adds unique task identifiers to a large language model so one system can perform image description, visual question answering, and visual grounding after three-stage training.

citing papers explorer

Showing 3 of 3 citing papers.

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding cs.CV · 2024-06-13 · conditional · none · ref 33
MuirBench is a new benchmark showing that top multimodal LLMs struggle with robust multi-image understanding, with GPT-4o at 68% and open-source models below 33% accuracy.
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing cs.CV · 2025-06-11 · unverdicted · none · ref 38
VILASR integrates visual drawing operations with reasoning in LVLMs via cold-start synthetic training, reflective rejection sampling, and reinforcement learning, yielding an 18.4% average gain on spatial reasoning benchmarks.
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning cs.CV · 2023-10-14 · unverdicted · none · ref 25
MiniGPT-v2 adds unique task identifiers to a large language model so one system can perform image description, visual question answering, and visual grounding after three-stage training.

Visual spatial reasoning

fields

years

verdicts

representative citing papers

citing papers explorer