pith:3FBW3A4A
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
4DThinker lets vision-language models simulate evolving scenes inside their latent space for dynamic spatial reasoning from monocular video.
arxiv:2605.05997 v2 · 2026-05-07 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3FBW3A4AFRFRBSMCK3MCSDEUVD}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
4DThinker is the first framework that enables VLMs to 'think with 4D' through dynamic latent mental imagery, and extensive experiments demonstrate that it consistently outperforms strong baselines on dynamic spatial reasoning benchmarks.
That the annotation-free 4D data synthesis pipeline produces sufficiently rich and accurate supervision signals, and that jointly training textual tokens with 4D latents via DIFT plus restricting 4DRL policy gradients to text tokens will yield stable and superior intrinsic dynamic reasoning without external geometric modules.
4DThinker enables VLMs to perform dynamic spatial reasoning by internally simulating 4D imagery in latent space, outperforming prior text-based and modular approaches.
Receipt and verification
| First computed | 2026-05-25T02:01:22.041210Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d9436d83802c4b10c98256d8290c94a8f835f836115044eb71be6654a214e369
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3FBW3A4AFRFRBSMCK3MCSDEUVD \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d9436d83802c4b10c98256d8290c94a8f835f836115044eb71be6654a214e369
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f4b835433cc3360f396de4aeb7b3cf8f5c7766855e33bf2c12c6d76b1405e159",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-07T10:48:46Z",
"title_canon_sha256": "c76117be01773671c93740f35074247895a1389591a521ba58600e1e6ddd0340"
},
"schema_version": "1.0",
"source": {
"id": "2605.05997",
"kind": "arxiv",
"version": 2
}
}