OmniNWM: Omniscient Driving Navigation World Models

Baorui Peng; Bohan Li; Chao Ma; Dalong Du; Hao Zhao; Wenjun Zeng; Xianda Guo; Xin Jin; Yueming Jin; Zheng Zhu

arxiv: 2510.18313 · v6 · pith:MZTYFOJUnew · submitted 2025-10-21 · 💻 cs.CV

OmniNWM: Omniscient Driving Navigation World Models

Bohan Li , Zhuang Ma , Dalong Du , Baorui Peng , Zhujin Liang , Zhenqiang Liu , Xianda Guo , Zheng Zhu

show 5 more authors

Chao Ma Yueming Jin Xin Jin Hao Zhao Wenjun Zeng

This is my paper

classification 💻 cs.CV

keywords omninwmactionpanoramicacrosscontrolworldcameradatasets

0 comments

read the original abstract

Autonomous driving world models are expected to work effectively across three core dimensions: state, action, and reward. However, existing methods are typically restricted to fragmented modality modeling, short-horizon drift, and imprecise action control, while lacking intrinsic mechanisms for policy evaluation. In this paper, we introduce OmniNWM, an Omniscient panoramic Navigation World Model that addresses all three dimensions within a consistent probabilistic framework. For State, OmniNWM generates panoramic videos of RGB, semantics, metric depth, and 3D occupancy, ensuring pixel-level alignment across modalities with joint distribution modeling. To mitigate autoregressive exposure bias, we propose a structured panoramic forcing strategy to stabilize long-horizon generation via stochastic manifold thickening. For Action, we introduce canonical geometric action encoding with normalized panoramic Pl\"ucker ray-maps. This representation decouples motion dynamics from sensor intrinsics, enabling precise, zero-shot trajectory control across heterogeneous datasets and camera configurations. For Reward, we derive intrinsic occupancy-grounded dense rewards directly from generated 3D volumes, establishing a reliable closed-loop simulation cycle for evaluating diverse planning agents. Extensive experiments demonstrate that OmniNWM achieves SOTA performance in generation fidelity and control precision, with remarkable zero-shot robustness to novel scenes on NuPlan and in-house datasets with distinct camera rigs. Project page is available at https://arlo0o.github.io/OmniNWM/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Vision-Language-Action World Models for Autonomous Driving
cs.CV 2026-04 unverdicted novelty 7.0

VLA-World improves autonomous driving by using action-guided future image generation followed by reflective reasoning over the imagined scene to refine trajectories.
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
cs.CV 2026-05 unverdicted novelty 6.0

PanoWorld adds depth consistency and trajectory consistency losses plus spherical adaptations to a pre-trained video model, plus a new PanoGeo dataset, to produce geometry-consistent 360 video.
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
cs.CV 2026-04 unverdicted novelty 6.0

SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and t...
ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving
cs.CV 2026-04 unverdicted novelty 6.0

ExploreVLA augments VLA driving models with future RGB and depth prediction for dense supervision and uses prediction uncertainty as a safety-gated intrinsic reward for RL-based exploration, reaching SOTA PDMS 93.7 on NAVSIM.
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World
cs.CV 2025-12 unverdicted novelty 6.0

DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
ReWorld: Learning Better Representations for World Action Models
cs.CV 2026-06 unverdicted novelty 5.0

ReWorld applies future-predictive, cross-modal, and hard-negative supervision directly to intermediate representations in Video and Action DiTs for WAMs, reporting 23.9% FVD improvement and PDMS rise from 89.1 to 90.4...
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
cs.CV 2026-04 unverdicted novelty 5.0

RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
cs.CV 2026-04 unverdicted novelty 4.0

OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.
Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends
cs.CV 2026-05 unverdicted novelty 2.0

This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.