pith. sign in

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

World models, generative AI systems that simulate how environments evolve, are transforming autonomous driving, yet all existing approaches adopt an ego-vehicle perspective, leaving the infrastructure viewpoint unexplored. We argue that infrastructure-centric world models offer a fundamentally complementary capability: the bird's-eye, multi-sensor, persistent viewpoint that roadside systems uniquely possess. Central to our thesis is a spatio-temporal complementarity: fixed roadside sensors excel at temporal depth, accumulating long-term behavioral distributions including rare safety-critical events, while vehicle-borne sensors excel at spatial breadth, sampling diverse scenes across large road networks. This paper presents a vision for Infrastructure-centric World Models (I-WM) in three phases: (I) generative scene understanding with quality-aware uncertainty propagation, (II) physics-informed predictive dynamics with multi-agent counterfactual reasoning, and (III) collaborative world models for V2X communication via latent space alignment. We propose a dual-layer architecture, annotation-free perception as a multi-modal data engine feeding end-to-end generative world models, with a phased sensor strategy from LiDAR through 4D radar and signal phase data to event cameras. We establish a taxonomy of driving world model paradigms, position I-WM relative to LeCun's JEPA, Li Fei-Fei's spatial intelligence, and VLA architectures, and introduce Infrastructure VLA (I-VLA) as a novel unification of roadside perception, language commands, and traffic control actions. Our vision builds upon existing multi-LiDAR pipelines and identifies open-source foundations for each phase, providing a path toward infrastructure that understands and anticipates traffic.

fields

cs.RO 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Bridging Local Observation and Global Simulation in Closed-Loop Traffic Modeling

cs.RO · 2026-06-30 · unverdicted · novelty 5.0

CRAFT reduces collisions by 31.2% and traffic violations by 33.2% in closed-loop traffic simulation by discovering context-induced failures in what-if rollouts and using a contextual preference evaluator to reweight autoregressive decoding toward globally coherent behaviors.

citing papers explorer

Showing 1 of 1 citing paper.

  • Bridging Local Observation and Global Simulation in Closed-Loop Traffic Modeling cs.RO · 2026-06-30 · unverdicted · none · ref 18 · internal anchor

    CRAFT reduces collisions by 31.2% and traffic violations by 33.2% in closed-loop traffic simulation by discovering context-induced failures in what-if rollouts and using a contextual preference evaluator to reweight autoregressive decoding toward globally coherent behaviors.