MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unpro- jecting to 3d,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving
MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.