pith. sign in
Pith Number

pith:B4C6YM3X

pith:2025:B4C6YM3XSS3A3KANRUD7FB4JRK
not attested not anchored not stored refs resolved

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

Fandong Meng, Haoning Wu, Hao Zhou, Jie Zhou, Kun Ouyang, Xu Sun, Yi Liu, Yuanxin Liu

SpaceR uses RL with a map imagination step to lift open MLLMs above GPT-4o on video spatial reasoning.

arxiv:2504.01805 v2 · 2025-04-02 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B4C6YM3XSS3A3KANRUD7FB4JRK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SpaceR surpasses the advanced GPT-4o by 11.6% accuracy on VSI-Bench and is on par with the leading proprietary model Gemini-2.0-Flash.

C2weakest assumption

That the map imagination mechanism inside SG-RLVR genuinely improves spatial reasoning rather than merely increasing the chance of producing benchmark-correct answers during RL training.

C3one line summary

SpaceR uses a new verifiable dataset and map-imagination-augmented RLVR to reach SOTA spatial reasoning accuracy in MLLMs, exceeding GPT-4o on VSI-Bench.

References

45 extracted · 45 resolved · 19 Pith anchors

[1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international confe 2015
[2] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923
[3] Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 2024 · arXiv:2412.05271
[4] Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. 2024. Internvl: Scaling up vision foundation models and aligning for generic 2024
[5] Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Formal links

2 machine-checked theorem links

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:51.111830Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0f05ec337794b60da80d8d07f287898abc34417ece15ac30e8d6a3525a778f74

Aliases

arxiv: 2504.01805 · arxiv_version: 2504.01805v2 · doi: 10.48550/arxiv.2504.01805 · pith_short_12: B4C6YM3XSS3A · pith_short_16: B4C6YM3XSS3A3KAN · pith_short_8: B4C6YM3X
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B4C6YM3XSS3A3KANRUD7FB4JRK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0f05ec337794b60da80d8d07f287898abc34417ece15ac30e8d6a3525a778f74
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b1128aaafe23aa5d3764f0fa8ed20ff71f4e2a50c317e25e7b99a4babf951644",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-04-02T15:12:17Z",
    "title_canon_sha256": "eb73df9b782bd59f39d2a63a195316f9c257dd8db174b480a6c146662349dca0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.01805",
    "kind": "arxiv",
    "version": 2
  }
}