Recognition: 2 theorem links
· Lean TheoremBEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
Pith reviewed 2026-05-15 14:30 UTC · model grok-4.3
The pith
BEVDet detects 3D objects in bird-eye-view by reusing standard modules plus custom data augmentation and upgraded NMS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BEVDet performs 3D object detection in bird-eye-view by reusing existing modules, with performance substantially improved by an exclusive data augmentation strategy and an upgraded non-maximum suppression strategy. On the nuScenes validation set, BEVDet-Base reaches 39.3 percent mAP and 47.2 percent NDS, exceeding all prior published results, while BEVDet-Tiny matches FCOS3D accuracy at 11 percent of the compute and 9.2 times the speed.
What carries the argument
The bird-eye-view (BEV) detection framework, which transforms multi-camera images into a unified top-down representation and applies standard detection heads enhanced by custom augmentation and NMS upgrades.
If this is right
- BEVDet-Tiny achieves 31.2 percent mAP and 39.2 percent NDS at 15.6 FPS using only 215.3 GFLOPs.
- BEVDet-Base surpasses FCOS3D by 9.8 percent mAP and 10.0 percent NDS at similar inference speed.
- Detection in BEV space simplifies integration with downstream route planning because coordinates match vehicle motion directly.
- Performance lifts derive mainly from the augmentation pipeline and NMS changes rather than architectural novelty.
Where Pith is reading between the lines
- The same augmentation and NMS adjustments could be transferred to other view-transformation methods to improve accuracy without new network designs.
- If the gains hold across varied sensor rigs, the paradigm may reduce reliance on bespoke 3D architectures in favor of careful data and post-processing choices.
- Validation on additional benchmarks with different lighting or traffic patterns would test whether the reported margins persist outside nuScenes.
Load-bearing premise
The custom data augmentation strategy and upgraded non-maximum suppression produce reliable gains on new datasets and environments without introducing biases or requiring extensive per-dataset retuning.
What would settle it
A drop in BEVDet mAP below the level of FCOS3D when evaluated on a different multi-camera dataset with altered camera configurations or weather conditions would show the gains do not generalize.
read the original abstract
Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the BEVDet paradigm for multi-camera 3D object detection performed directly in Bird-Eye-View (BEV). It reuses standard modules for the core pipeline but introduces an exclusive data augmentation strategy and an upgraded Non-Maximum Suppression procedure. On the nuScenes validation set, BEVDet-Tiny reports 31.2% mAP / 39.2% NDS at 15.6 FPS while BEVDet-Base reaches 39.3% mAP / 47.2% NDS, exceeding FCOS3D by +9.8% mAP and +10.0% NDS at comparable speed.
Significance. If the performance margins are reproducible and attributable to the proposed components, the work shows that large gains in multi-view 3D detection are obtainable without architectural novelty, simply by refining training and post-processing. This would be valuable for practical autonomous-driving stacks that already rely on BEV representations and need strong accuracy–latency trade-offs.
major comments (2)
- [Section 3] Section 3: The central performance claims rest on an 'exclusive data augmentation strategy' and 'upgraded NMS'. No ablation tables isolate the contribution of either component (e.g., standard augmentations + vanilla NMS vs. the proposed versions) while holding the rest of the pipeline fixed. Without these controlled experiments the reported +9.8% mAP margin cannot be confidently attributed to the new elements rather than hyper-parameter tuning or other unreported factors.
- [Section 4] Section 4 (Experiments): All quantitative results are reported solely on the nuScenes validation split. No cross-dataset evaluation (e.g., on KITTI or Waymo) or additional nuScenes splits is provided to test whether the custom augmentation and NMS strategies generalize or introduce dataset-specific biases.
minor comments (2)
- [Abstract and Section 4] The abstract and Section 4 state that source code is released, yet the manuscript does not specify which exact hyper-parameters, augmentation schedules, or NMS thresholds are used in the released implementation; this should be clarified for reproducibility.
- [Figure 2] Figure 2 and the accompanying text would benefit from an explicit diagram or table contrasting the proposed augmentation pipeline against the standard one used by prior BEV methods.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps strengthen the attribution of our results and the evaluation of generalization. We address each major comment below and will revise the manuscript to incorporate the suggested improvements where feasible.
read point-by-point responses
-
Referee: [Section 3] Section 3: The central performance claims rest on an 'exclusive data augmentation strategy' and 'upgraded NMS'. No ablation tables isolate the contribution of either component (e.g., standard augmentations + vanilla NMS vs. the proposed versions) while holding the rest of the pipeline fixed. Without these controlled experiments the reported +9.8% mAP margin cannot be confidently attributed to the new elements rather than hyper-parameter tuning or other unreported factors.
Authors: We agree that explicit, controlled ablations are required to isolate the contributions of the exclusive data augmentation strategy and upgraded NMS. In the revised manuscript we will add dedicated ablation tables that compare (i) standard augmentations versus our proposed strategy and (ii) vanilla NMS versus the upgraded procedure, while keeping the remainder of the pipeline (backbone, BEV encoder, detection head, training schedule) fixed. These experiments will be run on the same nuScenes validation split to directly quantify the gains attributable to each component. revision: yes
-
Referee: [Section 4] Section 4 (Experiments): All quantitative results are reported solely on the nuScenes validation split. No cross-dataset evaluation (e.g., on KITTI or Waymo) or additional nuScenes splits is provided to test whether the custom augmentation and NMS strategies generalize or introduce dataset-specific biases.
Authors: We acknowledge that reporting only on the nuScenes validation split limits the assessment of generalization. In the revision we will add results on the official nuScenes test set (via the evaluation server) to provide an additional held-out evaluation. We will also include a short discussion of potential dataset-specific effects of the augmentation and NMS choices. Full cross-dataset experiments on KITTI or Waymo would require substantial additional engineering and compute; if they cannot be completed within the revision window we will explicitly note this as a limitation and list it as future work. revision: partial
Circularity Check
No circularity: BEVDet is an empirical reuse of modules plus custom augmentation/NMS, evaluated directly on nuScenes.
full rationale
The paper presents BEVDet as a paradigm that reuses existing modules for BEV-based 3D detection and improves results via an exclusive data augmentation strategy and upgraded NMS. All reported metrics (mAP, NDS on nuScenes val) are direct empirical measurements on a standard split using conventional evaluation protocols. No equations, derivations, or predictions are defined that reduce by construction to fitted parameters, self-citations, or renamed inputs. The central claims rest on implementation details and benchmark scores rather than any self-referential mathematical chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Camera images can be reliably transformed into a consistent bird's-eye-view feature map using existing geometric projection techniques.
Lean theorems connected to this paper
-
Foundation.DAlembert.Inevitabilitybilinear_family_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy.
-
Foundation.DimensionForcingdimension_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving
Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.
-
EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras
EgoEV-HandPose uses stereo event cameras and a bird's-eye-view fusion module to achieve 30.54 mm MPJPE and 86.87% gesture accuracy on a new large-scale egocentric dataset, outperforming prior RGB and event methods esp...
-
PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations
PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.
-
SoK: The Next Frontier in AV Security: Systematizing Perception Attacks and the Emerging Threat of Multi-Sensor Fusion
The paper organizes perception attacks on AVs into a new taxonomy, identifies gaps in fusion-aware defenses, and validates one cross-sensor vulnerability with a proof-of-concept simulation.
-
Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning
Dynamic token selection and training only 1.6 million parameters instead of over 300 million reduces computation by 48-55% and improves accuracy over prior state-of-the-art on the NuScenes dataset.
-
DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather
DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset ...
-
TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding
TopoMaskV3 adds dense offset and height heads to produce standalone 3D road centerlines from masks and reports 28.5 OLS on a new geographically disjoint long-range benchmark.
-
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy
HiPR improves 3D occupancy prediction by adaptively reparameterizing projection sampling ranges using LiDAR height priors instead of fixed uniform pillars.
-
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy
HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.
-
SimPB++: Simultaneously Detecting 2D and 3D Objects from Multiple Cameras
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
-
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
-
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
-
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
-
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to pla...
-
R4Det: 4D Radar-Camera Fusion for High-Performance 3D Object Detection
R4Det fuses 4D radar and camera inputs via panoramic depth fusion, deformable gated temporal fusion without ego pose, and instance-guided refinement to reach state-of-the-art 3D detection on TJ4DRadSet and VoD.
-
InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making
Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.
-
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
-
Radar-Camera BEV Multi-Task Learning with Cross-Task Attention Bridge for Joint 3D Detection and Segmentation
CTAB exchanges features between detection and segmentation via multi-scale deformable attention in BEV space, yielding segmentation gains on 7 nuScenes classes at neutral detection cost.
-
Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end...
-
Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving
MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.
-
BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving
BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuScenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11621–11631 (2020)
work page 2020
-
[2]
IEEE Transactions on Pattern Analysis and Machine In- telligence (2019)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine In- telligence (2019)
work page 2019
-
[3]
In: Proceedings of the European Conference on Computer Vision
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End Object Detection with Transformers. In: Proceedings of the European Conference on Computer Vision. pp. 213–229. Springer (2020)
work page 2020
-
[4]
In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid Task Cascade for Instance Segmentation. In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4974–4983 (2019)
work page 2019
-
[5]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Chen, Y., Liu, S., Shen, X., Jia, J.: DSGN: Deep Stereo Geometry Network for 3D Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 12536–12545 (2020)
work page 2020
-
[6]
https://github.com/open-mmlab/mmdetection3d (2020)
Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d (2020)
work page 2020
-
[7]
In: Proceedings of the International Conference on Learning Representa- tions (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE. In: Proceedings of the International Conference on Learning Representa- tions (2020)
work page 2020
-
[8]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: A New Multi-scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
work page 2019
-
[9]
In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (2012)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (2012)
work page 2012
-
[10]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ghiasi, G., Lin, T.Y., Le, Q.V.: NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7036–7045 (2019)
work page 2019
-
[11]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac- curate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 580–587 (2014)
work page 2014
-
[12]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3D Packing for Self- Supervised Monocular Depth Estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2485–2494 (2020) 16 J. Huang et al
work page 2020
-
[13]
He, K., Gkioxari, G., Doll´ ar, P., Girshick, R.: Mask R-CNN
-
[14]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
work page 2016
-
[15]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An- dreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely Connected Convolutional Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708 (2017)
work page 2017
-
[17]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Huang, J., Zhu, Z., Guo, F., Huang, G.: The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5700–5709 (2020)
work page 2020
-
[18]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Kirillov, A., Wu, Y., He, K., Girshick, R.: PointRend: Image Segmentation as Ren- dering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9799–9808 (2020)
work page 2020
-
[19]
Advances in Neural Information Processing Sys- tems 25, 1097–1105 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Sys- tems 25, 1097–1105 (2012)
work page 2012
-
[20]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Kumar, A., Brazil, G., Liu, X.: GrooMeD-NMS: Grouped Mathematically Differ- entiable NMS for Monocular 3D Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8973–8983 (2021)
work page 2021
-
[21]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: Fast Encoders for Object Detection from Point Clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 12697–12705 (2019)
work page 2019
-
[22]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Lin, T.Y., Doll´ ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2117–2125 (2017)
work page 2017
-
[23]
In: Proceedings of the International Conference on Computer Vision
Lin, T.Y., Goyal, P., Girshick, R., He, K., Doll´ ar, P.: Focal Loss for Dense Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 2980–2988 (2017)
work page 2017
-
[24]
In: Proceedings of the European Conference on Computer Vision
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Proceedings of the European Conference on Computer Vision. pp. 740–755. Springer (2014)
work page 2014
-
[25]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path Aggregation Network for Instance Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8759–8768 (2018)
work page 2018
-
[26]
In: Proceedings of the International Conference on Computer Vision
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Trans- former: Hierarchical Vision Transformer using Shifted Windows. In: Proceedings of the International Conference on Computer Vision. pp. 10012–10022 (2021)
work page 2021
-
[27]
In: Proceedings of the International Conference on Computer Vision
Liu, Z., Zhou, D., Lu, F., Fang, J., Zhang, L.: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 15641–15650 (2021)
work page 2021
-
[28]
In: Proceedings of the International Conference on Learning Representa- tions (2019)
Loshchilov, I., Hutter, F.: DECOUPLED WEIGHT DECAY REGULARIZA- TION. In: Proceedings of the International Conference on Learning Representa- tions (2019)
work page 2019
-
[29]
BEVDet 17 In: Proceedings of the International Conference on Computer Vision
Lu, Y., Ma, X., Yang, L., Zhang, T., Liu, Y., Chu, Q., Yan, J., Ouyang, W.: Geometry Uncertainty Projection Network for Monocular 3D Object Detection. BEVDet 17 In: Proceedings of the International Conference on Computer Vision. pp. 3111– 3121 (2021)
work page 2021
-
[30]
In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision
Nabati, R., Qi, H.: CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection. In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision. pp. 1527–1536 (2021)
work page 2021
-
[31]
IEEE Robotics and Automation Letters 5(3), 4867–4873 (2020)
Pan, B., Sun, J., Leung, H.Y.T., Andonian, A., Zhou, B.: Cross-View Semantic Segmentation for Sensing Surroundings. IEEE Robotics and Automation Letters 5(3), 4867–4873 (2020)
work page 2020
-
[32]
Park, D., Ambrus, R., Guizilini, V., Li, J., Gaidon, A.: Is Pseudo-Lidar needed for Monocular 3D Object detection? In: Proceedings of the International Conference on Computer Vision. pp. 3142–3152 (2021)
work page 2021
-
[33]
In: Proceedings of the European Conference on Computer Vision
Philion, J., Fidler, S.: Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. In: Proceedings of the European Conference on Computer Vision. pp. 194–210. Springer (2020)
work page 2020
-
[34]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 5105–5114 (2017)
work page 2017
-
[35]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Doll´ ar, P.: Designing Network Design Spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 10428–10436 (2020)
work page 2020
-
[36]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical Depth Distribu- tion Network for Monocular 3D Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8555–8564 (2021)
work page 2021
-
[37]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 779–788 (2016)
work page 2016
-
[38]
Advances in Neural Information Processing Systems 28, 91–99 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Ob- ject Detection with Region Proposal Networks. Advances in Neural Information Processing Systems 28, 91–99 (2015)
work page 2015
-
[39]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Roddick, T., Cipolla, R.: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11138–11147 (2020)
work page 2020
-
[40]
IEEE Transactions on computers 100(5), 562–569 (1971)
Rosenfeld, A., Thurston, M.: Edge and Curve Detection for Visual Scene Analysis. IEEE Transactions on computers 100(5), 562–569 (1971)
work page 1971
-
[41]
In: Proceedings of the International Conference on Computer Vision
Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X., Li, J., Sun, J.: Objects365: A Large-scale, High-quality Dataset for Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 8430–8439 (2019)
work page 2019
-
[42]
In: Proceedings of the International Conference on Computer Vision
Simonelli, A., Bulo, S.R., Porzi, L., L´ opez-Antequera, M., Kontschieder, P.: Dis- entangling Monocular 3D Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 1991–1999 (2019)
work page 1991
-
[43]
In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep High-Resolution Representation Learn- ing for Human Pose Estimation. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition. pp. 5693–5703 (2019)
work page 2019
-
[44]
In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition
Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition. pp. 2446–2454 (2020)
work page 2020
-
[45]
In: Proceedings of the International Conference on Machine Learning
Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Proceedings of the International Conference on Machine Learning. pp. 6105–6114. PMLR (2019) 18 J. Huang et al
work page 2019
-
[46]
In: Proceedings of the International Conference on Computer Vision
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: Fully Convolutional One-Stage Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 9627–9636 (2019)
work page 2019
-
[47]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Wang, L., Du, L., Ye, X., Fu, Y., Guo, G., Xue, X., Feng, J., Zhang, L.: Depth- conditioned Dynamic Message Propagation for Monocular 3D Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 454–463 (2021)
work page 2021
-
[48]
In: Advances in Neural Information Processing Systems (2021)
Wang, L., Zhang, L., Zhu, Y., Zhang, Z., He, T., Li, M., Xue, X.: Progressive Coordinate Transforms for Monocular 3D Object Detection. In: Advances in Neural Information Processing Systems (2021)
work page 2021
-
[49]
arXiv preprint arXiv:2104.10956 (2021)
Wang, T., Zhu, X., Pang, J., Lin, D.: FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection. arXiv preprint arXiv:2104.10956 (2021)
-
[50]
arXiv preprint arXiv:2107.14160 (2021)
Wang, T., Zhu, X., Pang, J., Lin, D.: Probabilistic and Geometric Depth: Detecting Objects in Perspective. arXiv preprint arXiv:2107.14160 (2021)
-
[51]
arXiv preprint arXiv:2110.06922 (2021)
Wang, Y., Guizilini, V., Zhang, T., Wang, Y., Zhao, H., Solomon, J.: DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries. arXiv preprint arXiv:2110.06922 (2021)
-
[52]
In: Proceedings of the European Conference on Computer Vision
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified Perceptual Parsing for Scene Understanding. In: Proceedings of the European Conference on Computer Vision. pp. 418–434 (2018)
work page 2018
-
[53]
Yan, Y., Mao, Y., Li, B.: SECOND: Sparsely Embedded Convolutional Detection. Sensors 18(10), 3337 (2018)
work page 2018
-
[54]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yang, W., Li, Q., Liu, W., Yu, Y., Ma, Y., He, S., Pan, J.: Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transfor- mation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 15536–15545 (2021)
work page 2021
-
[55]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: Point-based 3D Single Stage Object Detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11040–11048 (2020)
work page 2020
-
[56]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D Object Detection and Track- ing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11784–11793 (2021)
work page 2021
-
[57]
In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the Gap Between Anchor- based and Anchor-free Detection via Adaptive Training Sample Selection. In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9759–9768 (2020)
work page 2020
-
[58]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, Y., Lu, J., Zhou, J.: Objects are Different: Flexible Monocular 3D Ob- ject Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3289–3298 (2021)
work page 2021
-
[59]
Zhou, X., Wang, D., Kr¨ ahenb¨ uhl, P.: Objects as Points. arXiv preprint arXiv:1904.07850 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[60]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhou, Y., Tuzel, O.: VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4490–4499 (2018)
work page 2018
-
[61]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhou, Y., He, Y., Zhu, H., Wang, C., Li, H., Jiang, Q.: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7556–7566 (2021)
work page 2021
-
[62]
arXiv preprint arXiv:1908.09492 (2019) BEVDet 19
Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv preprint arXiv:1908.09492 (2019) BEVDet 19
-
[63]
In: European Conference on Computer Vision
Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., Lin, D.: SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds. In: European Conference on Computer Vision. pp. 581–597. Springer (2020)
work page 2020
-
[64]
In: Proceedings of the International Conference on Computer Vision
Zou, Z., Ye, X., Du, L., Cheng, X., Tan, X., Zhang, L., Feng, J., Xue, X., Ding, E.: The Devil Is in the Task: Exploiting Reciprocal Appearance-Localization Fea- tures for Monocular 3D Object Detection. In: Proceedings of the International Conference on Computer Vision. pp. 2713–2722 (2021)
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.