Pith Number

pith:B4C6YM3X

pith:2025:B4C6YM3XSS3A3KANRUD7FB4JRK

not attested not anchored not stored refs resolved

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

Fandong Meng, Haoning Wu, Hao Zhou, Jie Zhou, Kun Ouyang, Xu Sun, Yi Liu, Yuanxin Liu

SpaceR uses RL with a map imagination step to lift open MLLMs above GPT-4o on video spatial reasoning.

arxiv:2504.01805 v2 · 2025-04-02 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{B4C6YM3XSS3A3KANRUD7FB4JRK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SpaceR surpasses the advanced GPT-4o by 11.6% accuracy on VSI-Bench and is on par with the leading proprietary model Gemini-2.0-Flash.

C2weakest assumption

That the map imagination mechanism inside SG-RLVR genuinely improves spatial reasoning rather than merely increasing the chance of producing benchmark-correct answers during RL training.

C3one line summary

SpaceR uses a new verifiable dataset and map-imagination-augmented RLVR to reach SOTA spatial reasoning accuracy in MLLMs, exceeding GPT-4o on VSI-Bench.

References

45 extracted · 45 resolved · 19 Pith anchors

[1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international confe 2015

[2] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923

[3] Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 2024 · arXiv:2412.05271

[4] Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. 2024. Internvl: Scaling up vision foundation models and aligning for generic 2024

[5] Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Formal links

2 machine-checked theorem links

Cited by

34 papers in Pith

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

VISD: Enhancing Video Reasoning via Structured Self-Distillation

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Cambrian-P: Pose-Grounded Video Understanding

Receipt and verification

First computed	2026-05-17T23:38:51.111830Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

0f05ec337794b60da80d8d07f287898abc34417ece15ac30e8d6a3525a778f74

Aliases

arxiv: 2504.01805 · arxiv_version: 2504.01805v2 · doi: 10.48550/arxiv.2504.01805 · pith_short_12: B4C6YM3XSS3A · pith_short_16: B4C6YM3XSS3A3KAN · pith_short_8: B4C6YM3X

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/B4C6YM3XSS3A3KANRUD7FB4JRK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0f05ec337794b60da80d8d07f287898abc34417ece15ac30e8d6a3525a778f74

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "b1128aaafe23aa5d3764f0fa8ed20ff71f4e2a50c317e25e7b99a4babf951644",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-04-02T15:12:17Z",
    "title_canon_sha256": "eb73df9b782bd59f39d2a63a195316f9c257dd8db174b480a6c146662349dca0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.01805",
    "kind": "arxiv",
    "version": 2
  }
}