MambaBEV: An EV-based 3D detection model with Mamba2

Hao Wang; Jinxiang Wang; Ni Wang; Qichao Zhao; Zihan You

read the original abstract

Accurate 3D object detection in autonomous driving relies on Bird's Eye View (BEV) perception and effective temporal fusion. However, existing fusion strategies based on convolutional layers or deformable self-attention struggle to model global context in BEV space, leading to reduced accuracy for large objects.To address this limitation, we propose MambaBEV, a novel BEV-based 3D object detection model that leverages Mamba2, an advanced state-space model (SSM) optimized for long-sequence processing. Our key contribution is TemporalMamba, a temporal fusion module that enhances global context modeling through a BEV feature discrete rearrangement mechanism tailored for sequential processing. In addition, we introduce a Mamba-based DETR head to improve multi-object representation. Evaluations on the nuScenes dataset demonstrate that MambaBEV-base achieves 51.7% NDS and an 42.7% mAP. Furthermore, evaluation within an end-to-end autonomous driving paradigm validates its effectiveness in motion forecasting and planning.These results highlight the potential of state-space models for improving global context understanding and large-object detection in autonomous driving perception systems.

MambaBEV: An EV-based 3D detection model with Mamba2

discussion (0)