Title resolution pending

Estimate the center location of each instance within the provided categories, assuming the entire scene is represented by a 10x10 grid

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning

cs.CV · 2026-03-24 · unverdicted · novelty 6.0

TRACE prompting induces MLLMs to produce textual allocentric 3D representations from video, yielding consistent gains on spatial QA benchmarks across multiple model backbones.

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

cs.CV · 2024-12-18 · unverdicted · novelty 6.0

MLLMs achieve competitive but subhuman performance on the new VSI-Bench for visual-spatial intelligence from videos, with spatial reasoning as the main bottleneck and explicit cognitive map generation improving distance estimation.

citing papers explorer

Showing 2 of 2 citing papers.

Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning cs.CV · 2026-03-24 · unverdicted · none · ref 7
TRACE prompting induces MLLMs to produce textual allocentric 3D representations from video, yielding consistent gains on spatial QA benchmarks across multiple model backbones.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces cs.CV · 2024-12-18 · unverdicted · none · ref 110
MLLMs achieve competitive but subhuman performance on the new VSI-Bench for visual-spatial intelligence from videos, with spatial reasoning as the main bottleneck and explicit cognitive map generation improving distance estimation.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer