Embodied understanding of driving scenarios

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li · 2024 · arXiv 2403.04593

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

cs.CV · 2025-03-10 · unverdicted · novelty 7.0

AlphaDrive uses GRPO-based RL rewards and two-stage SFT+RL training on VLMs to improve autonomous driving planning performance and efficiency while producing emergent multimodal capabilities.

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

cs.CV · 2024-10-29 · conditional · novelty 6.0

Senna decouples language-based high-level planning from an LVLM with low-level trajectory prediction from an E2E model, reporting 27% lower planning error and 33% lower collisions after pre-training on DriveX and fine-tuning on nuScenes.

citing papers explorer

Showing 3 of 3 citing papers.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? cs.CV · 2024-08-23 · conditional · none · ref 74
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning cs.CV · 2025-03-10 · unverdicted · none · ref 49
AlphaDrive uses GRPO-based RL rewards and two-stage SFT+RL training on VLMs to improve autonomous driving planning performance and efficiency while producing emergent multimodal capabilities.
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving cs.CV · 2024-10-29 · conditional · none · ref 29
Senna decouples language-based high-level planning from an LVLM with low-level trajectory prediction from an E2E model, reporting 27% lower planning error and 33% lower collisions after pre-training on DriveX and fine-tuning on nuScenes.

Embodied understanding of driving scenarios

fields

years

verdicts

representative citing papers

citing papers explorer