pith. sign in
Pith Number

pith:C32FX3ZN

pith:2025:C32FX3ZNYSA4UYVQ3E2E3C6UAA
not attested not anchored not stored refs resolved

GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving

Anthony Hu, Elahe Arani, George Fedoseev, Gianluca Corrado, Jamie Shotton, Lloyd Russell, Lorenzo Bertoni

GAIA-2 generates high-resolution multi-camera driving videos from structured inputs like vehicle dynamics, agent positions, and road semantics.

arxiv:2503.20523 v1 · 2025-03-26 · cs.CV · cs.AI · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C32FX3ZNYSA4UYVQ3E2E3C6UAA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GAIA-2 supports controllable video generation conditioned on a rich set of structured inputs: ego-vehicle dynamics, agent configurations, environmental factors, and road semantics. It generates high-resolution, spatiotemporally consistent multi-camera videos across geographically diverse driving environments.

C2weakest assumption

That the generated videos are sufficiently realistic, consistent, and free of artifacts to serve as effective training data for autonomous driving systems without introducing biases or failures when transferred to real vehicles.

C3one line summary

GAIA-2 is a controllable latent diffusion world model that produces spatiotemporally consistent multi-view videos for autonomous driving simulation across diverse geographies.

References

52 extracted · 52 resolved · 2 Pith anchors

[1] D. P. Kingma and M. Welling. Auto-encoding variational bayes.Proceedings of the International Conference on Learning Representations (ICLR) , 2014 2014
[2] Cosmos World Foundation Model Platform for Physical AI 2025 · arXiv:2501.03575
[3] van den Oord, O 2017
[4] P. Esser, R. Rombach, and B. Ommer. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021 2021
[5] GAIA-1: A Generative World Model for Autonomous Driving 2023 · arXiv:2309.17080

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.407613Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

16f45bef2dc481ca62b0d9344d8bd40035eaae2c01e698f7c62fd2e37dd2db93

Aliases

arxiv: 2503.20523 · arxiv_version: 2503.20523v1 · doi: 10.48550/arxiv.2503.20523 · pith_short_12: C32FX3ZNYSA4 · pith_short_16: C32FX3ZNYSA4UYVQ · pith_short_8: C32FX3ZN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C32FX3ZNYSA4UYVQ3E2E3C6UAA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 16f45bef2dc481ca62b0d9344d8bd40035eaae2c01e698f7c62fd2e37dd2db93
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "fb599b61127c4e4ed010c5302ce50dd0aec525513754619b7e36da42bc535029",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.RO"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-03-26T13:11:35Z",
    "title_canon_sha256": "da0c2c4ad3f32678ee5c887c2e1236f46fc1af634d54bd819e57d92fa8043270"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.20523",
    "kind": "arxiv",
    "version": 1
  }
}