pith. sign in
Pith Number

pith:WUMYKSR7

pith:2025:WUMYKSR74ZAYOLM2FTE775KAD2
not attested not anchored not stored refs resolved

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

Bing Zhan, Chufeng Tang, Haochen Wang, Lue Fan, Lu Hou, Shuyao Shang, Weisong Liu, Xiaoman Wang, Yasong An, Yingyan Li, Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang

Adding world modeling to predict future images lets vision-language-action models use large driving datasets more effectively and accelerate performance gains as data scales.

arxiv:2510.12796 v2 · 2025-10-14 · cs.CV · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WUMYKSR74ZAYOLM2FTE775KAD2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we propose DriveVLA-W0, a training paradigm that employs world modeling to predict future images. ... Crucially, it amplifies the data scaling law, showing that performance gains accelerate as the training dataset size increases.

C2weakest assumption

That the added world modeling task of predicting future images supplies a dense, unbiased self-supervised signal that meaningfully utilizes unused model capacity without requiring extra labels or introducing new failure modes in driving dynamics.

C3one line summary

DriveVLA-W0 adds world modeling to predict future images in VLA models, overcoming sparse action supervision and amplifying data scaling laws on NAVSIM benchmarks and a large in-house dataset.

References

39 extracted · 39 resolved · 15 Pith anchors

[1] Covla: Comprehensive vision-language-action dataset for autonomous driving 2025
[2] Qwen2.5-VL Technical Report · arXiv:2502.13923
[3] Scaling Laws of Mo- tion Forecasting and Planning – Technical Report
[4] Vavim and vavam: Autonomous driving through video generative modeling
[6] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control · arXiv:2410.24164

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.789505Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b519854a3fe641872d9a2cc9fff5401e9cd4a492cba5a529daca1605f6176e25

Aliases

arxiv: 2510.12796 · arxiv_version: 2510.12796v2 · doi: 10.48550/arxiv.2510.12796 · pith_short_12: WUMYKSR74ZAY · pith_short_16: WUMYKSR74ZAYOLM2 · pith_short_8: WUMYKSR7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WUMYKSR74ZAYOLM2FTE775KAD2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b519854a3fe641872d9a2cc9fff5401e9cd4a492cba5a529daca1605f6176e25
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b0c36fc3151591d8a7b9a6ce74f682f50eb4bcf76c0dc2eaedbfb740b51fb4c2",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-14T17:59:47Z",
    "title_canon_sha256": "0d6fd76fc2dd6307b1b71f8456c10fcefcf50453f3ad221fbbd9a0ed2deee3a4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.12796",
    "kind": "arxiv",
    "version": 2
  }
}