pith. sign in

Mixed citations

ReSim: Reliable World Simulation for Autonomous Driving

Mixed citation behavior. Most common role is background (67%).

9 Pith papers citing it
Background 67% of classified citations
abstract

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates a reward from ReSim's simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.

citation-role summary

background 4 baseline 2

citation-polarity summary

fields

cs.CV 7 cs.RO 2

years

2026 8 2025 1

representative citing papers

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

ReWorld: Learning Better Representations for World Action Models

cs.CV · 2026-06-25 · unverdicted · novelty 5.0

ReWorld applies future-predictive, cross-modal, and hard-negative supervision directly to intermediate representations in Video and Action DiTs for WAMs, reporting 23.9% FVD improvement and PDMS rise from 89.1 to 90.4 on nuScenes and NAVSIM.

citing papers explorer

Showing 9 of 9 citing papers.