pith. sign in
Pith Number

pith:HDPXO5DK

pith:2026:HDPXO5DKHMNGVLETG4KPSF3HKT
not attested not anchored not stored refs resolved

Quantitative Video World Model Evaluation for Geometric-Consistency

Jiaxin Wu, Xueyan Zou, Yihao Pi, Yinling Zhang, Yuheng Li

PDI-Bench quantifies geometric coherence in generated videos by measuring projective residuals from 3D lifts of tracked points.

arxiv:2605.15185 v1 · 2026-05-14 · cs.CV · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HDPXO5DKHMNGVLETG4KPSF3HKT}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across state-of-the-art video generators, PDI reveals consistent geometry-specific failure modes that are not captured by common perceptual metrics, and provides a diagnostic signal for progress toward physically grounded video generation and physical world model.

C2weakest assumption

That monocular 3D reconstruction from the generated video (via tools such as MegaSaM) produces sufficiently accurate world-space coordinates to diagnose the generator's own geometric errors rather than injecting reconstruction artifacts.

C3one line summary

PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.

References

64 extracted · 64 resolved · 10 Pith anchors

[1] K. Allen, C. Doersch, G. Zhou, M. Suhail, D. Driess, I. Rocco, Y. Rubanova, T. Kipf, M. S. M. Sajjadi, K. Murphy, J. Carreira, and S. van Steenkiste. Direct motion models for assessing generated video
[2] URLhttps://arxiv.org/abs/2505.00209
[3] M. Asim, C. Wewer, T. Wimmer, B. Schiele, and J. E. Lenssen. Met3r: Measuring multi-view consistency in generated images, 2026. URLhttps://arxiv.org/abs/2501.06336 2026
[4] Videophy: Evaluating physical commonsense for video generation 2024
[5] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets 2023 · arXiv:2311.15127

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T21:40:25.107372Z
Last reissued 2026-05-17T21:57:18.491550Z
Builder pith-number-builder-2026-05-17-v1
Signature unsigned_v0
Schema pith-number/v1.0

Canonical hash

38df77746a3b1a6aac933714f9176754d8866f31af51a8593c473b3d23a607e7

Aliases

arxiv: 2605.15185 · arxiv_version: 2605.15185v1 · pith_short_12: HDPXO5DKHMNG · pith_short_16: HDPXO5DKHMNGVLET · pith_short_8: HDPXO5DK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HDPXO5DKHMNGVLETG4KPSF3HKT \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 38df77746a3b1a6aac933714f9176754d8866f31af51a8593c473b3d23a607e7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2ad5c571057bcc27dfc05075d4d48a519b2cb4bb92f5b35c0bc663ec6d91284e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T17:59:04Z",
    "title_canon_sha256": "54a69f0e7bf5f6bfbca1d7fd0698c3fb77fc02d41bcdf1af94ce3fa33b5852f9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15185",
    "kind": "arxiv",
    "version": 1
  }
}