ConFusion reaches 59.1 mAP and 65.6 NDS on nuScenes validation by combining heterogeneous queries with QMix cross-attention and QSwap feature exchange.
arXiv preprint arXiv:2311.11722 (2023)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
DualViewMapDet fuses prior-traversal point cloud maps into camera features via dual perspective-view and bird's-eye-view encoding to improve 3D detection and tracking without LiDAR.
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
FocalAD adds an ego-local graph interactor and focal loss to prioritize decision-critical neighbors, yielding lower collision rates than prior methods on nuScenes, Bench2Drive, and especially the Adv-nuScenes robustness set.
citing papers explorer
-
Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion
ConFusion reaches 59.1 mAP and 65.6 NDS on nuScenes validation by combining heterogeneous queries with QMix cross-attention and QSwap feature exchange.
-
SimPB++: Simultaneously Detecting 2D and 3D Objects from Multiple Cameras
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
-
Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking
DualViewMapDet fuses prior-traversal point cloud maps into camera features via dual perspective-view and bird's-eye-view encoding to improve 3D detection and tracking without LiDAR.
-
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
-
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
-
FocalAD: Local Motion Planning for End-to-End Autonomous Driving
FocalAD adds an ego-local graph interactor and focal loss to prioritize decision-critical neighbors, yielding lower collision rates than prior methods on nuScenes, Bench2Drive, and especially the Adv-nuScenes robustness set.