HERMES++ unifies 3D scene understanding and future geometry prediction in driving scenes via BEV representations, LLM-enhanced queries, a temporal link, and joint geometric optimization.
Extending large vision-language model for diverse interactive tasks in autonomous driving
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
GaussianDWM uses 3D Gaussians with embedded linguistic features, language-guided sampling, and dual-condition generation for unified scene understanding and multi-modal output in driving world models.
citing papers explorer
-
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
HERMES++ unifies 3D scene understanding and future geometry prediction in driving scenes via BEV representations, LLM-enhanced queries, a temporal link, and joint geometric optimization.
-
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation
GaussianDWM uses 3D Gaussians with embedded linguistic features, language-guided sampling, and dual-condition generation for unified scene understanding and multi-modal output in driving world models.