pith:GHNBNG4U
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
SANA-WM generates minute-scale 720p videos with camera control at 36 times higher throughput than prior open-source models.
arxiv:2605.15178 v1 · 2026-05-14 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GHNBNG4UEWLWPVU2NRB2RO2EY3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
SANA-WM achieves visual quality comparable to large-scale industrial baselines such as LingBot-World and HY-WorldPlay, while significantly improving efficiency... On our one-minute world-model benchmark, SANA-WM demonstrates stronger action-following accuracy than prior open-source baselines and achieves comparable visual quality at 36× higher throughput for scalable world modeling.
The robust annotation pipeline extracts accurate metric-scale 6-DoF camera poses from public videos to yield high-quality, spatiotemporally consistent action labels that enable effective training of the world model.
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
References
Receipt and verification
| First computed | 2026-05-17T21:40:25.203865Z |
|---|---|
| Last reissued | 2026-05-17T21:57:18.558439Z |
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | unsigned_v0 |
| Schema | pith-number/v1.0 |
Canonical hash
31da169b94259767d69a6c43a8bb44c6d0d47bdb22323b83a8eea4144ede5b37
Aliases
· · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GHNBNG4UEWLWPVU2NRB2RO2EY3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 31da169b94259767d69a6c43a8bb44c6d0d47bdb22323b83a8eea4144ede5b37
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "eef6e0529d268c02237d963884733f3ac70a5b621928b74e9f4c12eaa73e4cd1",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-14T17:58:03Z",
"title_canon_sha256": "ac7f02e2b426f77376b0a94085db9f39f21a7eac26efeb50d349b13165daa19a"
},
"schema_version": "1.0",
"source": {
"id": "2605.15178",
"kind": "arxiv",
"version": 1
}
}