DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

Chaojun Ni; Donny Y. Chen; Duochao Shi; Guan Huang; Guosheng Zhao; Haoxiao Wang; Haoyun Li; Jiagang Zhu; Jiwen Lu; Weijie Wang

arxiv: 2510.15264 · v3 · pith:LBXX6LDBnew · submitted 2025-10-17 · 💻 cs.CV

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

Weijie Wang , Jiagang Zhu , Zeyu Zhang , Xiaofeng Wang , Zheng Zhu , Guosheng Zhao , Chaojun Ni , Haoxiao Wang

show 9 more authors

Guan Huang Xinze Chen Yukun Zhou Wenkang Qin Duochao Shi Haoyun Li Yicheng Xiao Donny Y. Chen Jiwen Lu

This is my paper

classification 💻 cs.CV

keywords videodrivegen3ddrivinggenerationscenesynthesisdiffusiondynamic

0 comments

read the original abstract

We present DriveGen3D, a novel framework for generating high-quality and highly controllable dynamic 3D driving scenes that addresses critical limitations in existing methodologies. Current approaches to driving scene synthesis either suffer from prohibitive computational demands for extended temporal generation, focus exclusively on prolonged video synthesis without 3D representation, or restrict themselves to static single-scene reconstruction. Our work bridges this methodological gap by integrating accelerated long-term video generation with large-scale dynamic scene reconstruction through multimodal conditional control. DriveGen3D introduces a unified pipeline consisting of two specialized components: FastDrive-DiT, an efficient video diffusion transformer for high-resolution, temporally coherent video synthesis under text and Bird's-Eye-View (BEV) layout guidance; and FastRecon3D, a feed-forward module that rapidly builds 3D Gaussian representations across time, ensuring spatial-temporal consistency. DriveGen3D enable the generation of long driving videos (up to $800\times424$ at $12$ FPS) and corresponding 3D scenes, achieving state-of-the-art results while maintaining efficiency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis
cs.RO 2026-04 unverdicted novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.
UniMesh: Unifying 3D Mesh Understanding and Generation
cs.CV 2026-04 unverdicted novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.