pith. sign in
Pith Number

pith:WSWSFIL3

pith:2026:WSWSFIL3PWRKV7LT7CFFZNFAYW
not attested not anchored not stored refs resolved

Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction

Santosh Kumar Paidi

Motion-centric masking in self-supervised video models enables better learning of causal physical dynamics by focusing on interactions rather than static patches.

arxiv:2605.15466 v1 · 2026-05-14 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WSWSFIL3PWRKV7LT7CFFZNFAYW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

IA-JEPA achieves 14.26% accuracy on causal reasoning tasks on CLEVRER, a significant lead over the 3.22% achieved by standard patch-masked baselines, while inducing a higher-entropy latent space that linearizes physical energy (R²=0.43).

C2weakest assumption

The assumption that self-supervised motion-centric masking targeting collisions and momentum transfers will force reconstruction of latent trajectories rather than static background features, as stated in the hypothesis section of the abstract.

C3one line summary

IA-JEPA applies interaction-aware masking to JEPA, raising causal reasoning accuracy on CLEVRER from 3.22% to 14.26% while producing a higher-entropy latent space that better aligns with physical energy.

References

35 extracted · 35 resolved · 3 Pith anchors

[1] Video generation models as world simulators.OpenAI Blog, 2024 2024
[2] V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning 2025 · arXiv:2506.09985
[3] CLEVRER: CoLlision Events for Video REpresentation and Reasoning 1910 · arXiv:1910.01442
[4] something something 2017
[5] Phyre: A new benchmark for physical reasoning 2019

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:01:00.062944Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b4ad22a17b7da2aafd73f88a5cb4a0c59698f0873f327fa9796da66835ee9852

Aliases

arxiv: 2605.15466 · arxiv_version: 2605.15466v1 · doi: 10.48550/arxiv.2605.15466 · pith_short_12: WSWSFIL3PWRK · pith_short_16: WSWSFIL3PWRKV7LT · pith_short_8: WSWSFIL3
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WSWSFIL3PWRKV7LT7CFFZNFAYW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b4ad22a17b7da2aafd73f88a5cb4a0c59698f0873f327fa9796da66835ee9852
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a5905b24c4e2218cf1e1ed35f581c36e863d935b7e1b197cab33bdc896d1b150",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T23:10:04Z",
    "title_canon_sha256": "65f8a46fe3a9c8de7d95549e2bf65b5fc800aea31088f93438fcb4bdf24ebb1d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15466",
    "kind": "arxiv",
    "version": 1
  }
}