Cross-View Supervision transfers geometric and topological priors from ego-aligned overhead perspectives into camera-based BEV encoders via feature-space alignment, yielding up to 44% relative mAP gains at long range on nuScenes.
InarXiv preprint arXiv:2203.17054
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 10representative citing papers
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
RQR3D reparametrizes oriented bounding box regression in BEV 3D detection as regressing a horizontal box plus corner offsets and achieves SOTA camera-radar performance on nuScenes with 67.5 NDS and 59.7 mAP.
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.
MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.
BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.
Fast-BEV++ achieves at least 3x speedup over Fast-BEV, a new SOTA of 0.488 NDS on nuScenes 3D detection, and over 134 FPS inference by redesigning the core transformation pipeline and adding a learnable depth module.
citing papers explorer
-
Learning Ego-Centric BEV Representations from a Perspective-Privileged View: Cross-View Supervision for Online HD Map Construction
Cross-View Supervision transfers geometric and topological priors from ego-aligned overhead perspectives into camera-based BEV encoders via feature-space alignment, yielding up to 44% relative mAP gains at long range on nuScenes.
-
SimPB++: Simultaneously Detecting 2D and 3D Objects from Multiple Cameras
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
-
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
-
RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection
RQR3D reparametrizes oriented bounding box regression in BEV 3D detection as regressing a horizontal box plus corner offsets and achieves SOTA camera-radar performance on nuScenes with 67.5 NDS and 59.7 mAP.
-
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
-
Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors
SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.
-
Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.
-
Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving
MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.
-
BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving
BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.
-
Fast-BEV++: Fast by Algorithm, Deployable by Design
Fast-BEV++ achieves at least 3x speedup over Fast-BEV, a new SOTA of 0.488 NDS on nuScenes 3D detection, and over 134 FPS inference by redesigning the core transformation pipeline and adding a learnable depth module.