RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection

Cem Tarhan; Ozsel Kilinc

arxiv: 2505.17732 · v2 · submitted 2025-05-23 · 💻 cs.CV · cs.AI· cs.LG

RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection

Ozsel Kilinc , Cem Tarhan This is my paper

Pith reviewed 2026-05-19 13:35 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords 3D object detectionBEV perceptionoriented bounding boxradar camera fusionnuScenes datasetkeypoint regressionautonomous driving

0 comments

The pith

RQR3D reparametrizes BEV 3D regression targets using restricted quadrilateral offsets to avoid discontinuities in angle-based losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing BEV 3D object detectors estimate oriented bounding boxes using angles, which leads to discontinuous loss functions during training. The paper proposes to instead regress the axis-aligned bounding box that encloses the oriented one along with the offsets from its corners to the oriented box corners. This reparametrization converts the task into keypoint regression, which the authors integrate into an anchor-free detector. Combined with a simple 2D convolution-based radar fusion, it delivers state-of-the-art results on nuScenes for camera-radar 3D detection. Readers interested in autonomous driving would care because lower errors in object position and orientation support more reliable planning and safety.

Core claim

The authors establish that the restricted quadrilateral representation defines 3D regression targets by the smallest horizontal bounding box and the offsets between the corners of this box and the oriented box. This approach transforms the oriented object detection problem into a keypoint regression task, enabling an anchor-free single-stage detector to achieve 67.5 NDS and 59.7 mAP on nuScenes while reducing translation and orientation errors.

What carries the argument

Restricted quadrilateral representation (RQR), which regresses an enclosing horizontal box and four corner offsets to represent rotated 3D objects without explicit angle prediction.

If this is right

The proposed method achieves state-of-the-art camera-radar 3D object detection on nuScenes.
Translation and orientation errors are reduced compared to prior approaches.
The representation is compatible with different object detection frameworks.
The simplified radar fusion backbone uses standard 2D convolutions for efficiency without voxel or sparse operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This corner-offset approach may extend naturally to 2D aerial oriented detection where similar angle issues arise.
The lightweight fusion design could enable faster inference in resource-constrained autonomous systems.
Testing the method on additional datasets would help confirm its robustness across different sensor configurations.

Load-bearing premise

The claim rests on the premise that angle discontinuities are the main limiter for BEV 3D detection performance and that the quadrilateral offset encoding resolves them cleanly.

What would settle it

Training both the angle-based and offset-based models on the same data and plotting the loss values specifically for objects oriented near 45 degrees or other discontinuity points would reveal whether the new representation produces smoother optimization.

Figures

Figures reproduced from arXiv: 2505.17732 by Cem Tarhan, Ozsel Kilinc.

**Figure 1.** Figure 1: a) First, we identify the smallest axis-aligned 2D bounding box (yellow) that encapsulates the oriented box (orange). We [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: a) The overall model architecture, comprising an image backbone, radar backbone, 2D to BEV projection module, temporal [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Orientation stability of RQR3D head (left) compared to [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

read the original abstract

Accurate, fast, and reliable 3D perception is essential for autonomous driving. Recently, bird's-eye view (BEV)-based perception approaches have emerged as superior alternatives to perspective-based solutions, offering enhanced spatial understanding and more natural outputs for planning. Existing BEV-based 3D object detection methods, typically using an angle-based representation, directly estimate the size and orientation of rotated bounding boxes. We observe that BEV-based 3D object detection is analogous to aerial oriented object detection, where angle-based methods are known to suffer from discontinuities in their loss functions. Drawing inspiration from this domain, we propose \textbf{R}estricted \textbf{Q}uadrilateral \textbf{R}epresentation to define \textbf{3D} regression targets. RQR3D regresses the smallest horizontal bounding box encapsulating the oriented box, along with the offsets between the corners of these two boxes, thereby transforming the oriented object detection problem into a keypoint regression task. We employ RQR3D within an anchor-free single-stage object detection method achieving state-of-the-art performance. We show that the proposed architecture is compatible with different object detection approaches. Furthermore, we introduce a simplified radar fusion backbone that applies standard 2D convolutions to radar features. This backbone leverages the inherent 2D structure of the data for efficient and geometrically consistent processing without over-parameterization, thereby eliminating the need for voxel grouping and sparse convolutions. Extensive evaluations on the nuScenes dataset show that RQR3D achieves SotA camera-radar 3D object detection performance despite its lightweight design, reaching 67.5 NDS and 59.7 mAP with reduced translation and orientation errors, which are crucial for safe autonomous driving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RQR3D reparametrizes BEV 3D boxes as an enclosing axis-aligned box plus corner offsets to sidestep angle loss discontinuities, with a simple 2D-conv radar fusion claiming 67.5 NDS on nuScenes.

read the letter

The one or two things to know: This paper proposes RQR3D, which reparametrizes the regression targets for BEV 3D object detection by using a restricted quadrilateral representation. Instead of directly predicting angles for oriented boxes, it regresses the smallest horizontal bounding box and the offsets to the corners of the oriented box, turning it into a keypoint regression problem. They also introduce a simplified radar fusion backbone using standard 2D convolutions on radar features instead of voxel grouping or sparse ops.

Referee Report

3 major / 2 minor

Summary. The paper proposes RQR3D, a Restricted Quadrilateral Representation for reparametrizing the regression targets in BEV-based 3D object detection. Drawing an analogy to aerial oriented object detection, it replaces direct angle regression with regression of the smallest enclosing axis-aligned box plus four corner offsets, converting oriented box detection into a keypoint regression task. The method is embedded in an anchor-free single-stage detector augmented by a simplified radar fusion backbone that uses standard 2D convolutions, and it reports state-of-the-art camera-radar 3D detection results on nuScenes (67.5 NDS, 59.7 mAP) with reduced translation and orientation errors.

Significance. If the performance claims are substantiated by detailed ablations and error analysis, the reparametrization could provide a practical way to mitigate periodicity and discontinuity issues in angle-based losses for BEV detectors, which are important for downstream planning in autonomous driving. The lightweight radar backbone is a secondary contribution that avoids voxel grouping and sparse convolutions while preserving geometric consistency.

major comments (3)

[§3.2] §3.2 (RQR3D formulation): the claim that regressing the smallest enclosing axis-aligned box plus corner offsets eliminates discontinuities is not accompanied by a proof or explicit verification that the composite loss remains continuous and that the mapping from the four offsets to yaw is bijective for all valid oriented boxes when height and z-center are regressed independently.
[§5] §5 (Experiments): the reported reductions in translation and orientation errors on nuScenes are not supported by an ablation that isolates the effect of the RQR3D representation from the new 2D-convolution radar backbone; without this separation it is unclear whether the gains are attributable to the reparametrization or to the backbone change.
[Table 1] Table 1 / main results: the SOTA comparison does not indicate whether competing methods were re-trained with the same simplified radar backbone or whether the RQR3D head was substituted into existing detectors while keeping all other components fixed, weakening the claim of compatibility and superiority.

minor comments (2)

[Figure 2] Figure 2: the diagram illustrating the quadrilateral offsets would benefit from explicit labels for the four corner vectors and the resulting oriented box to make the geometric consistency constraints visually clear.
[§4.3] §4.3 (loss function): the weighting between the keypoint offset loss and the separate height/z regression terms is not stated; a brief sensitivity analysis would help confirm robustness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (RQR3D formulation): the claim that regressing the smallest enclosing axis-aligned box plus corner offsets eliminates discontinuities is not accompanied by a proof or explicit verification that the composite loss remains continuous and that the mapping from the four offsets to yaw is bijective for all valid oriented boxes when height and z-center are regressed independently.

Authors: We appreciate the referee's observation. The RQR3D formulation is motivated by converting oriented box regression into a keypoint regression task to sidestep periodicity and discontinuity issues inherent in direct angle regression. We agree that an explicit verification would improve rigor. In the revised manuscript, we will expand §3.2 with a formal analysis proving continuity of the composite loss and bijectivity of the offset-to-yaw mapping for all valid oriented boxes, accounting for independent regression of height and z-center. revision: yes
Referee: [§5] §5 (Experiments): the reported reductions in translation and orientation errors on nuScenes are not supported by an ablation that isolates the effect of the RQR3D representation from the new 2D-convolution radar backbone; without this separation it is unclear whether the gains are attributable to the reparametrization or to the backbone change.

Authors: We acknowledge the value of isolating contributions for clear attribution. The current experiments report overall performance of the integrated system. To address this, we will add an ablation in the revised §5 that applies the RQR3D representation to a detector using the prior radar backbone, allowing direct comparison against the full proposed method to separate the effects of the reparametrization from the backbone changes. revision: yes
Referee: [Table 1] Table 1 / main results: the SOTA comparison does not indicate whether competing methods were re-trained with the same simplified radar backbone or whether the RQR3D head was substituted into existing detectors while keeping all other components fixed, weakening the claim of compatibility and superiority.

Authors: We thank the referee for this clarification request. Table 1 primarily reports published results of competing methods. To better support the compatibility claim, we will include additional experiments in the revision where the RQR3D head is substituted into existing detectors with other components held fixed, reporting the resulting performance to isolate the contribution of the representation. revision: yes

Circularity Check

0 steps flagged

Independent geometric reparametrization with no load-bearing circularity

full rationale

The paper's derivation introduces a Restricted Quadrilateral Representation by re-expressing oriented BEV boxes as the smallest enclosing axis-aligned box plus four corner offsets, converting angle regression into keypoint offsets. This mapping is defined directly from geometry and evaluated on the external nuScenes benchmark; no equation reduces a claimed prediction to a fitted input by construction, and no uniqueness theorem or ansatz is imported via self-citation to force the result. The central performance claims rest on empirical results rather than self-referential definitions, yielding only a minor (non-load-bearing) circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that angle discontinuities are the dominant issue and on the empirical superiority of the new representation; the RQR3D construct itself is the main added element.

axioms (1)

domain assumption Angle-based representations for oriented bounding boxes suffer from discontinuities in their loss functions
Explicitly stated in the abstract as the motivation drawn from aerial oriented object detection.

invented entities (1)

Restricted Quadrilateral Representation (RQR3D) no independent evidence
purpose: To define 3D regression targets by regressing the smallest horizontal bounding box and corner offsets
New representation introduced by the paper to transform oriented detection into keypoint regression.

pith-pipeline@v0.9.0 · 5858 in / 1532 out tokens · 66146 ms · 2026-05-19T13:35:21.725448+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe that BEV-based 3D object detection is analogous to aerial oriented object detection, where angle-based methods are known to suffer from discontinuities in their loss functions. ... RQR3D regresses the smallest horizontal bounding box ... along with the offsets between the corners
IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RQR3D uses (xmin, ymin, xmax, ymax) as the bounding box regression target and (u, v, arg minu, arg minv, dx, dy, zctr, h) as the keypoint targets.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion
cs.CV 2026-04 unverdicted novelty 7.0

ConFusion reaches 59.1 mAP and 65.6 NDS on nuScenes validation by combining heterogeneous queries with QMix cross-attention and QSwap feature exchange.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Simple-bev: What really matters for multi-sensor bev perception?,

A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-bev: What really matters for multi-sensor bev perception?,” in2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pp. 2759–2765, IEEE, 2023. 1

work page 2023
[2]

M 2bev: Multi- camera joint 3d detection and segmentation with uni- fied birds-eye view representation,

E. Xie, Z. Yu, D. Zhou, J. Philion, A. Anandkumar, S. Fidler, P. Luo, and J. M. Alvarez, “M 2bev: Multi- camera joint 3d detection and segmentation with uni- fied birds-eye view representation,” 2022. 1

work page 2022
[3]

Translating images into maps,

A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in2022 Interna- tional conference on robotics and automation (ICRA), pp. 9200–9206, IEEE, 2022. 1, 6

work page 2022
[4]

Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unpro- jecting to 3d,

J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unpro- jecting to 3d,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pp. 194–210, Springer,

work page 2020
[5]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection,

Y . Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y . Shi, J. Sun, and Z. Li, “Bevdepth: Acquisition of reliable depth for multi-view 3d object detection,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 1477–1485, 2023. 1, 2, 6, 7

work page 2023
[6]

Bevformer: learning bird’s-eye-view rep- resentation from lidar-camera via spatiotemporal trans- formers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Q. Yu, and J. Dai, “Bevformer: learning bird’s-eye-view rep- resentation from lidar-camera via spatiotemporal trans- formers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 7

work page 2024
[7]

Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,

Y . Wang, V . C. Guizilini, T. Zhang, Y . Wang, H. Zhao, and J. Solomon, “Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,” inConference on Robot Learning, pp. 180–191, PMLR, 2022. 1, 2, 7

work page 2022
[8]

Crn: Camera radar net for accurate, ro- bust, efficient 3d perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “Crn: Camera radar net for accurate, ro- bust, efficient 3d perception,” inProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pp. 17615–17626, 2023. 2, 3, 7

work page 2023
[9]

Crt-fusion: Camera, radar, temporal fusion using motion information for 3d object detection,

J. Kim, M. Seong, and J. W. Choi, “Crt-fusion: Camera, radar, temporal fusion using motion information for 3d object detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 108625–108648, 2024. 3, 7

work page 2024
[10]

Unleashing hydra: Hy- brid fusion, depth consistency and radar for unified 3d perception,

P. Wolters, J. Gilg, T. Teepe, F. Herzog, A. Laouichi, M. Hofmann, and G. Rigoll, “Unleashing hydra: Hy- brid fusion, depth consistency and radar for unified 3d perception,” 2025. 3, 7

work page 2025
[11]

Rcbevdet: radar-camera fusion in bird’s eye view for 3d object detection,

Z. Lin, Z. Liu, Z. Xia, X. Wang, Y . Wang, S. Qi, Y . Dong, N. Dong, L. Zhang, and C. Zhu, “Rcbevdet: radar-camera fusion in bird’s eye view for 3d object detection,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 14928–14937, 2024. 2, 3, 7

work page 2024
[12]

Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision,

C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu,et al., “Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17830–17839, 2023. 2

work page 2023
[13]

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

J. Huang, G. Huang, Z. Zhu, Y . Ye, and D. Du, “Bevdet: High-performance multi-camera 3d object detection in bird-eye-view,”arXiv preprint arXiv:2112.11790,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

InarXiv preprint arXiv:2203.17054

J. Huang and G. Huang, “Bevdet4d: Exploit tempo- ral cues in multi-camera 3d object detection,”arXiv preprint arXiv:2203.17054, 2022. 6, 2

work page arXiv 2022
[15]

Petr: Position embedding transformation for multi-view 3d object detection,

Y . Liu, T. Wang, X. Zhang, and J. Sun, “Petr: Position embedding transformation for multi-view 3d object detection,” inEuropean conference on computer vision, pp. 531–548, Springer, 2022

work page 2022
[16]

Petrv2: A unified framework for 3d per- ception from multi-camera images,

Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, and X. Zhang, “Petrv2: A unified framework for 3d per- ception from multi-camera images,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3262–3272, 2023. 7

work page 2023
[17]

Temporal enhanced training of multi-view 3d object detector via historical object prediction,

Z. Zong, D. Jiang, G. Song, Z. Xue, J. Su, H. Li, and Y . Liu, “Temporal enhanced training of multi-view 3d object detector via historical object prediction,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3781–3790, 2023. 3

work page 2023
[18]

Ray denoising: Depth-aware hard negative sampling for multi-view 3d object detection,

F. Liu, T. Huang, Q. Zhang, H. Yao, C. Zhang, F. Wan, Q. Ye, and Y . Zhou, “Ray denoising: Depth-aware hard negative sampling for multi-view 3d object detection,” inEuropean Conference on Computer Vision, pp. 200– 217, Springer, 2024. 3, 7

work page 2024
[19]

Sparsebev: High-performance sparse 3d object de- tection from multi-camera videos,

H. Liu, Y . Teng, T. Lu, H. Wang, and L. Wang, “Sparsebev: High-performance sparse 3d object de- tection from multi-camera videos,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18580–18590, 2023. 2, 3, 7

work page 2023
[20]

Center-based 3d object detection and tracking,

T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11784–11793, 2021. 2, 4, 8

work page 2021
[21]

Arbitrary-oriented object detec- tion with circular smooth label,

X. Yang and J. Yan, “Arbitrary-oriented object detec- tion with circular smooth label,” inComputer Vision– ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 677–694, Springer, 2020. 2

work page 2020
[22]

Dynamic refinement network for oriented and densely packed object detection,

X. Pan, Y . Ren, K. Sheng, W. Dong, H. Yuan, X. Guo, C. Ma, and C. Xu, “Dynamic refinement network for oriented and densely packed object detection,” inPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pp. 11207–11216, 2020. 2

work page 2020
[23]

Ori- ented r-cnn for object detection,

X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Ori- ented r-cnn for object detection,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 3520–3529, 2021

work page 2021
[24]

Redet: A rotation-equivariant detector for aerial object detection,

J. Han, J. Ding, N. Xue, and G.-S. Xia, “Redet: A rotation-equivariant detector for aerial object detection,” inProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pp. 2786–2795,

work page
[25]

Projecting points to axes: Oriented object detection via point-axis representation,

Z. Zhao, Q. Xue, Y . He, Y . Bai, X. Wei, and Y . Gong, “Projecting points to axes: Oriented object detection via point-axis representation,” inEuropean Conference on Computer Vision, pp. 161–179, Springer, 2024. 2

work page 2024
[26]

Oriented rep- points for aerial object detection,

W. Li, Y . Chen, K. Hu, and J. Zhu, “Oriented rep- points for aerial object detection,” inProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pp. 1829–1838, 2022. 2

work page 2022
[27]

Second: Sparsely embed- ded convolutional detection,

Y . Yan, Y . Mao, and B. Li, “Second: Sparsely embed- ded convolutional detection,”Sensors, vol. 18, no. 10, p. 3337, 2018. 3

work page 2018
[28]

Pointpillars: Fast encoders for object detection from point clouds,

A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12697–12705, 2019. 3

work page 2019
[29]

Pillarnet: Real-time and high-performance pillar-based 3d object detection,

G. Shi, R. Li, and C. Ma, “Pillarnet: Real-time and high-performance pillar-based 3d object detection,” in European Conference on Computer Vision, pp. 35–52, Springer, 2022. 3

work page 2022
[30]

Time will tell: New out- looks and a baseline for temporal multi-view 3d object detection,

J. Park, C. Xu, S. Yang, K. Keutzer, K. Kitani, M. Tomizuka, and W. Zhan, “Time will tell: New out- looks and a baseline for temporal multi-view 3d object detection,”arXiv preprint arXiv:2210.02443, 2022. 3, 7

work page arXiv 2022
[31]

Ex- ploring object-centric temporal modeling for efficient multi-view 3d object detection,

S. Wang, Y . Liu, T. Wang, Y . Li, and X. Zhang, “Ex- ploring object-centric temporal modeling for efficient multi-view 3d object detection,” inProceedings of the IEEE/CVF international conference on computer vi- sion, pp. 3621–3631, 2023. 3, 7

work page 2023
[32]

Riccardo: Radar hit prediction and convolution for camera-radar 3d object detection,

Y . Long, A. Kumar, X. Liu, and D. Morris, “Riccardo: Radar hit prediction and convolution for camera-radar 3d object detection,” inProceedings of the Computer Vision and Pattern Recognition Conference, pp. 22276– 22285, 2025. 3, 7

work page 2025
[33]

Racformer: Towards high-quality 3d object detection via query-based radar-camera fusion,

X. Chu, J. Deng, G. You, Y . Duan, H. Li, and Y . Zhang, “Racformer: Towards high-quality 3d object detection via query-based radar-camera fusion,” inProceedings of the Computer Vision and Pattern Recognition Con- ference, pp. 17081–17091, 2025. 3, 7

work page 2025
[34]

Fcos: Fully con- volutional one-stage object detection,

Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully con- volutional one-stage object detection,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019. 4, 6

work page 2019
[35]

Focal loss for dense object detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vi- sion, pp. 2980–2988, 2017. 4

work page 2017
[36]

Class- balanced grouping and sampling for point cloud 3d object detection,

B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu, “Class- balanced grouping and sampling for point cloud 3d object detection,”arXiv preprint arXiv:1908.09492,

work page arXiv 1908
[37]

Radcloud: Real-time high-resolution point cloud generation using low-cost radars for aerial and ground vehicles,

D. Hunt, S. Luo, A. Khazraei, X. Zhang, S. Hally- burton, T. Chen, and M. Pajic, “Radcloud: Real-time high-resolution point cloud generation using low-cost radars for aerial and ground vehicles,” in2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 12269–12275, IEEE, 2024. 5

work page 2024
[38]

Radarscenes: A real-world radar point cloud data set for automotive applications,

O. Schumann, M. Hahn, N. Scheiner, F. Weishaupt, J. F. Tilly, J. Dickmann, and C. Wöhler, “Radarscenes: A real-world radar point cloud data set for automotive applications,” in2021 IEEE 24th International Confer- ence on Information Fusion (FUSION), pp. 1–8, IEEE,

work page
[39]

Object detection for automotive radar point clouds–a comparison,

N. Scheiner, F. Kraus, N. Appenrodt, J. Dickmann, and B. Sick, “Object detection for automotive radar point clouds–a comparison,”AI Perspectives, vol. 3, no. 1, p. 6, 2021. 5

work page 2021
[40]

Deep resid- ual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep resid- ual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. 6, 7

work page 2016
[41]

Designing network design spaces,

I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár, “Designing network design spaces,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10428–10436, 2020. 6

work page 2020
[42]

Efficientdet: Scalable and efficient object detection,

M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790, 2020. 6

work page 2020
[43]

Internim- age: Exploring large-scale vision foundation models with deformable convolutions,

W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li,et al., “Internim- age: Exploring large-scale vision foundation models with deformable convolutions,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14408–14419, 2023. 6

work page 2023
[44]

nuscenes: A multimodal dataset for autonomous driv- ing,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driv- ing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11621– 11631, 2020. 6, 1

work page 2020
[45]

Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras,

A. Hu, Z. Murez, N. Mohan, S. Dudas, J. Hawke, V . Badrinarayanan, R. Cipolla, and A. Kendall, “Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras,” inProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pp. 15273–15282, 2021. 6

work page 2021
[46]

Futr3d: A unified sensor fusion framework for 3d detection,

X. Chen, T. Zhang, Y . Wang, Y . Wang, and H. Zhao, “Futr3d: A unified sensor fusion framework for 3d detection,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 172– 181, 2023. 6, 7 RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection Supplementary Material

work page 2023
[47]

In this sec- tion, we provide the details about how we obtain this repre- sentation using RQR3D outputs, (xmin, ymin, xmax, ymax) and (u, v,arg min u,arg min v dx, dy, zctr, h)

Obtaining the 3D bounding box parameters Evaluation tool from nuScenes [44] requires 3D bounding boxes to be defined as (xctr, yctr, zctr, w, l, h, θ). In this sec- tion, we provide the details about how we obtain this repre- sentation using RQR3D outputs, (xmin, ymin, xmax, ymax) and (u, v,arg min u,arg min v dx, dy, zctr, h). Recalling that u represents...

work page
[48]

Training is conducted over 20 epochs with a multi-step learning rate schedule: the learning rate is re- duced by a factor of 10 at epochs 15 and 18

Implementation Details All models are trained using a batch size of 8 and an ini- tial learning rate of 7.5×10 −5, optimized using the Adam optimizer. Training is conducted over 20 epochs with a multi-step learning rate schedule: the learning rate is re- duced by a factor of 10 at epochs 15 and 18. The bird’s eye view (BEV) representation covers a spatial...

work page
[49]

Projection Methods In Table 8, we evaluate the contribution of various projection methods to the overall performance

Additional Experiments 8.1. Projection Methods In Table 8, we evaluate the contribution of various projection methods to the overall performance. Our baseline model employs Lift-Splat projection with BEVDepth’s depth dis- tribution module, denoted as DN. We compare this base- line with three different versions: i) Lift-Splat projection with a simpler dept...

work page

[1] [1]

Simple-bev: What really matters for multi-sensor bev perception?,

A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-bev: What really matters for multi-sensor bev perception?,” in2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pp. 2759–2765, IEEE, 2023. 1

work page 2023

[2] [2]

M 2bev: Multi- camera joint 3d detection and segmentation with uni- fied birds-eye view representation,

E. Xie, Z. Yu, D. Zhou, J. Philion, A. Anandkumar, S. Fidler, P. Luo, and J. M. Alvarez, “M 2bev: Multi- camera joint 3d detection and segmentation with uni- fied birds-eye view representation,” 2022. 1

work page 2022

[3] [3]

Translating images into maps,

A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in2022 Interna- tional conference on robotics and automation (ICRA), pp. 9200–9206, IEEE, 2022. 1, 6

work page 2022

[4] [4]

Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unpro- jecting to 3d,

J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unpro- jecting to 3d,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pp. 194–210, Springer,

work page 2020

[5] [5]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection,

Y . Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y . Shi, J. Sun, and Z. Li, “Bevdepth: Acquisition of reliable depth for multi-view 3d object detection,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 1477–1485, 2023. 1, 2, 6, 7

work page 2023

[6] [6]

Bevformer: learning bird’s-eye-view rep- resentation from lidar-camera via spatiotemporal trans- formers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Q. Yu, and J. Dai, “Bevformer: learning bird’s-eye-view rep- resentation from lidar-camera via spatiotemporal trans- formers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 7

work page 2024

[7] [7]

Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,

Y . Wang, V . C. Guizilini, T. Zhang, Y . Wang, H. Zhao, and J. Solomon, “Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,” inConference on Robot Learning, pp. 180–191, PMLR, 2022. 1, 2, 7

work page 2022

[8] [8]

Crn: Camera radar net for accurate, ro- bust, efficient 3d perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “Crn: Camera radar net for accurate, ro- bust, efficient 3d perception,” inProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pp. 17615–17626, 2023. 2, 3, 7

work page 2023

[9] [9]

Crt-fusion: Camera, radar, temporal fusion using motion information for 3d object detection,

J. Kim, M. Seong, and J. W. Choi, “Crt-fusion: Camera, radar, temporal fusion using motion information for 3d object detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 108625–108648, 2024. 3, 7

work page 2024

[10] [10]

Unleashing hydra: Hy- brid fusion, depth consistency and radar for unified 3d perception,

P. Wolters, J. Gilg, T. Teepe, F. Herzog, A. Laouichi, M. Hofmann, and G. Rigoll, “Unleashing hydra: Hy- brid fusion, depth consistency and radar for unified 3d perception,” 2025. 3, 7

work page 2025

[11] [11]

Rcbevdet: radar-camera fusion in bird’s eye view for 3d object detection,

Z. Lin, Z. Liu, Z. Xia, X. Wang, Y . Wang, S. Qi, Y . Dong, N. Dong, L. Zhang, and C. Zhu, “Rcbevdet: radar-camera fusion in bird’s eye view for 3d object detection,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 14928–14937, 2024. 2, 3, 7

work page 2024

[12] [12]

Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision,

C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu,et al., “Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17830–17839, 2023. 2

work page 2023

[13] [13]

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

J. Huang, G. Huang, Z. Zhu, Y . Ye, and D. Du, “Bevdet: High-performance multi-camera 3d object detection in bird-eye-view,”arXiv preprint arXiv:2112.11790,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

InarXiv preprint arXiv:2203.17054

J. Huang and G. Huang, “Bevdet4d: Exploit tempo- ral cues in multi-camera 3d object detection,”arXiv preprint arXiv:2203.17054, 2022. 6, 2

work page arXiv 2022

[15] [15]

Petr: Position embedding transformation for multi-view 3d object detection,

Y . Liu, T. Wang, X. Zhang, and J. Sun, “Petr: Position embedding transformation for multi-view 3d object detection,” inEuropean conference on computer vision, pp. 531–548, Springer, 2022

work page 2022

[16] [16]

Petrv2: A unified framework for 3d per- ception from multi-camera images,

Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, and X. Zhang, “Petrv2: A unified framework for 3d per- ception from multi-camera images,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3262–3272, 2023. 7

work page 2023

[17] [17]

Temporal enhanced training of multi-view 3d object detector via historical object prediction,

Z. Zong, D. Jiang, G. Song, Z. Xue, J. Su, H. Li, and Y . Liu, “Temporal enhanced training of multi-view 3d object detector via historical object prediction,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3781–3790, 2023. 3

work page 2023

[18] [18]

Ray denoising: Depth-aware hard negative sampling for multi-view 3d object detection,

F. Liu, T. Huang, Q. Zhang, H. Yao, C. Zhang, F. Wan, Q. Ye, and Y . Zhou, “Ray denoising: Depth-aware hard negative sampling for multi-view 3d object detection,” inEuropean Conference on Computer Vision, pp. 200– 217, Springer, 2024. 3, 7

work page 2024

[19] [19]

Sparsebev: High-performance sparse 3d object de- tection from multi-camera videos,

H. Liu, Y . Teng, T. Lu, H. Wang, and L. Wang, “Sparsebev: High-performance sparse 3d object de- tection from multi-camera videos,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18580–18590, 2023. 2, 3, 7

work page 2023

[20] [20]

Center-based 3d object detection and tracking,

T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11784–11793, 2021. 2, 4, 8

work page 2021

[21] [21]

Arbitrary-oriented object detec- tion with circular smooth label,

X. Yang and J. Yan, “Arbitrary-oriented object detec- tion with circular smooth label,” inComputer Vision– ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 677–694, Springer, 2020. 2

work page 2020

[22] [22]

Dynamic refinement network for oriented and densely packed object detection,

X. Pan, Y . Ren, K. Sheng, W. Dong, H. Yuan, X. Guo, C. Ma, and C. Xu, “Dynamic refinement network for oriented and densely packed object detection,” inPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pp. 11207–11216, 2020. 2

work page 2020

[23] [23]

Ori- ented r-cnn for object detection,

X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Ori- ented r-cnn for object detection,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 3520–3529, 2021

work page 2021

[24] [24]

Redet: A rotation-equivariant detector for aerial object detection,

J. Han, J. Ding, N. Xue, and G.-S. Xia, “Redet: A rotation-equivariant detector for aerial object detection,” inProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pp. 2786–2795,

work page

[25] [25]

Projecting points to axes: Oriented object detection via point-axis representation,

Z. Zhao, Q. Xue, Y . He, Y . Bai, X. Wei, and Y . Gong, “Projecting points to axes: Oriented object detection via point-axis representation,” inEuropean Conference on Computer Vision, pp. 161–179, Springer, 2024. 2

work page 2024

[26] [26]

Oriented rep- points for aerial object detection,

W. Li, Y . Chen, K. Hu, and J. Zhu, “Oriented rep- points for aerial object detection,” inProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pp. 1829–1838, 2022. 2

work page 2022

[27] [27]

Second: Sparsely embed- ded convolutional detection,

Y . Yan, Y . Mao, and B. Li, “Second: Sparsely embed- ded convolutional detection,”Sensors, vol. 18, no. 10, p. 3337, 2018. 3

work page 2018

[28] [28]

Pointpillars: Fast encoders for object detection from point clouds,

A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12697–12705, 2019. 3

work page 2019

[29] [29]

Pillarnet: Real-time and high-performance pillar-based 3d object detection,

G. Shi, R. Li, and C. Ma, “Pillarnet: Real-time and high-performance pillar-based 3d object detection,” in European Conference on Computer Vision, pp. 35–52, Springer, 2022. 3

work page 2022

[30] [30]

Time will tell: New out- looks and a baseline for temporal multi-view 3d object detection,

J. Park, C. Xu, S. Yang, K. Keutzer, K. Kitani, M. Tomizuka, and W. Zhan, “Time will tell: New out- looks and a baseline for temporal multi-view 3d object detection,”arXiv preprint arXiv:2210.02443, 2022. 3, 7

work page arXiv 2022

[31] [31]

Ex- ploring object-centric temporal modeling for efficient multi-view 3d object detection,

S. Wang, Y . Liu, T. Wang, Y . Li, and X. Zhang, “Ex- ploring object-centric temporal modeling for efficient multi-view 3d object detection,” inProceedings of the IEEE/CVF international conference on computer vi- sion, pp. 3621–3631, 2023. 3, 7

work page 2023

[32] [32]

Riccardo: Radar hit prediction and convolution for camera-radar 3d object detection,

Y . Long, A. Kumar, X. Liu, and D. Morris, “Riccardo: Radar hit prediction and convolution for camera-radar 3d object detection,” inProceedings of the Computer Vision and Pattern Recognition Conference, pp. 22276– 22285, 2025. 3, 7

work page 2025

[33] [33]

Racformer: Towards high-quality 3d object detection via query-based radar-camera fusion,

X. Chu, J. Deng, G. You, Y . Duan, H. Li, and Y . Zhang, “Racformer: Towards high-quality 3d object detection via query-based radar-camera fusion,” inProceedings of the Computer Vision and Pattern Recognition Con- ference, pp. 17081–17091, 2025. 3, 7

work page 2025

[34] [34]

Fcos: Fully con- volutional one-stage object detection,

Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully con- volutional one-stage object detection,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019. 4, 6

work page 2019

[35] [35]

Focal loss for dense object detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vi- sion, pp. 2980–2988, 2017. 4

work page 2017

[36] [36]

Class- balanced grouping and sampling for point cloud 3d object detection,

B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu, “Class- balanced grouping and sampling for point cloud 3d object detection,”arXiv preprint arXiv:1908.09492,

work page arXiv 1908

[37] [37]

Radcloud: Real-time high-resolution point cloud generation using low-cost radars for aerial and ground vehicles,

D. Hunt, S. Luo, A. Khazraei, X. Zhang, S. Hally- burton, T. Chen, and M. Pajic, “Radcloud: Real-time high-resolution point cloud generation using low-cost radars for aerial and ground vehicles,” in2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 12269–12275, IEEE, 2024. 5

work page 2024

[38] [38]

Radarscenes: A real-world radar point cloud data set for automotive applications,

O. Schumann, M. Hahn, N. Scheiner, F. Weishaupt, J. F. Tilly, J. Dickmann, and C. Wöhler, “Radarscenes: A real-world radar point cloud data set for automotive applications,” in2021 IEEE 24th International Confer- ence on Information Fusion (FUSION), pp. 1–8, IEEE,

work page

[39] [39]

Object detection for automotive radar point clouds–a comparison,

N. Scheiner, F. Kraus, N. Appenrodt, J. Dickmann, and B. Sick, “Object detection for automotive radar point clouds–a comparison,”AI Perspectives, vol. 3, no. 1, p. 6, 2021. 5

work page 2021

[40] [40]

Deep resid- ual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep resid- ual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. 6, 7

work page 2016

[41] [41]

Designing network design spaces,

I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár, “Designing network design spaces,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10428–10436, 2020. 6

work page 2020

[42] [42]

Efficientdet: Scalable and efficient object detection,

M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790, 2020. 6

work page 2020

[43] [43]

Internim- age: Exploring large-scale vision foundation models with deformable convolutions,

W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li,et al., “Internim- age: Exploring large-scale vision foundation models with deformable convolutions,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14408–14419, 2023. 6

work page 2023

[44] [44]

nuscenes: A multimodal dataset for autonomous driv- ing,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driv- ing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11621– 11631, 2020. 6, 1

work page 2020

[45] [45]

Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras,

A. Hu, Z. Murez, N. Mohan, S. Dudas, J. Hawke, V . Badrinarayanan, R. Cipolla, and A. Kendall, “Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras,” inProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pp. 15273–15282, 2021. 6

work page 2021

[46] [46]

Futr3d: A unified sensor fusion framework for 3d detection,

X. Chen, T. Zhang, Y . Wang, Y . Wang, and H. Zhao, “Futr3d: A unified sensor fusion framework for 3d detection,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 172– 181, 2023. 6, 7 RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection Supplementary Material

work page 2023

[47] [47]

In this sec- tion, we provide the details about how we obtain this repre- sentation using RQR3D outputs, (xmin, ymin, xmax, ymax) and (u, v,arg min u,arg min v dx, dy, zctr, h)

Obtaining the 3D bounding box parameters Evaluation tool from nuScenes [44] requires 3D bounding boxes to be defined as (xctr, yctr, zctr, w, l, h, θ). In this sec- tion, we provide the details about how we obtain this repre- sentation using RQR3D outputs, (xmin, ymin, xmax, ymax) and (u, v,arg min u,arg min v dx, dy, zctr, h). Recalling that u represents...

work page

[48] [48]

Training is conducted over 20 epochs with a multi-step learning rate schedule: the learning rate is re- duced by a factor of 10 at epochs 15 and 18

Implementation Details All models are trained using a batch size of 8 and an ini- tial learning rate of 7.5×10 −5, optimized using the Adam optimizer. Training is conducted over 20 epochs with a multi-step learning rate schedule: the learning rate is re- duced by a factor of 10 at epochs 15 and 18. The bird’s eye view (BEV) representation covers a spatial...

work page

[49] [49]

Projection Methods In Table 8, we evaluate the contribution of various projection methods to the overall performance

Additional Experiments 8.1. Projection Methods In Table 8, we evaluate the contribution of various projection methods to the overall performance. Our baseline model employs Lift-Splat projection with BEVDepth’s depth dis- tribution module, denoted as DN. We compare this base- line with three different versions: i) Lift-Splat projection with a simpler dept...

work page