BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
Bevworld: A multimodal world model for au- tonomous driving via unified bev latent space
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
HERMES++ unifies 3D scene understanding and future geometry prediction in driving scenes via BEV representations, LLM-enhanced queries, a temporal link, and joint geometric optimization.
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
ReSim is a controllable video world model trained on heterogeneous real and simulated driving data that achieves higher fidelity and controllability for both expert and non-expert actions, plus a Video2Reward module for estimating action quality from simulated futures.
PLAN-S decodes a style-conditioned four-channel semantic cost map from latent representations to bridge world models and planners in autonomous driving, reporting 0.55 m average L2 and 42% collision reduction on nuScenes plus PDMS gains on NAVSIM.
DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.