EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.
NAACL , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
citing papers explorer
-
EMMA: End-to-End Multimodal Model for Autonomous Driving
EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.