DriveGAN: Towards a Controllable High-Quality Neural Simulation

Antonio Torralba; Jonah Philion; Sanja Fidler; Seung Wook Kim

arxiv: 2104.15060 · v1 · pith:7IPDXT2Vnew · submitted 2021-04-30 · 💻 cs.CV · cs.RO

DriveGAN: Towards a Controllable High-Quality Neural Simulation

Seung Wook Kim , Jonah Philion , Antonio Torralba , Sanja Fidler This is my paper

classification 💻 cs.CV cs.RO

keywords drivegansimulatorsactionallowscontrolsdatadifferentdirectly

0 comments

read the original abstract

Realistic simulators are critical for training and verifying robotics systems. While most of the contemporary simulators are hand-crafted, a scaleable way to build simulators is to use machine learning to learn how the environment behaves in response to an action, directly from data. In this work, we aim to learn to simulate a dynamic environment directly in pixel-space, by watching unannotated sequences of frames and their associated action pairs. We introduce a novel high-quality neural simulator referred to as DriveGAN that achieves controllability by disentangling different components without supervision. In addition to steering controls, it also includes controls for sampling features of a scene, such as the weather as well as the location of non-player objects. Since DriveGAN is a fully differentiable simulator, it further allows for re-simulation of a given video sequence, offering an agent to drive through a recorded scene again, possibly taking different actions. We train DriveGAN on multiple datasets, including 160 hours of real-world driving data. We showcase that our approach greatly surpasses the performance of previous data-driven simulators, and allows for new features not explored before.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

InfiniVerse: Occupancy Guided Unbounded Scene Generation for Autonomous Driving
cs.CV 2026-06 unverdicted novelty 5.0

InfiniVerse reconstructs 3D occupancy from one frame, extends scenes autoregressively, converts to video via diffusion, and uses re-projection feedback to achieve SOTA FID 6.4 and FVD 67.97 on Waymo and nuScenes.