A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data

Gijs Dubbelman; Kavin Chandrasekaran; Pavol Jancura; Sorin Grigorescu

arxiv: 2411.13311 · v1 · submitted 2024-11-20 · 💻 cs.CV · cs.AI

A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data

Kavin Chandrasekaran , Sorin Grigorescu , Gijs Dubbelman , Pavol Jancura This is my paper

Pith reviewed 2026-05-23 08:11 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords object detectionsensor fusionbird's-eye viewcameraradarrange-Doppler spectrumRADIal datasetautonomous driving

0 comments

The pith

Fusing camera bird's-eye-view features with range-azimuth features recovered from raw radar range-Doppler spectrum achieves competitive object detection accuracy at lower computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes processing camera images through an encoder-decoder to extract features in the bird's-eye-view polar domain while feeding the raw radar range-Doppler spectrum into a separate decoder that recovers range-azimuth features. These two feature maps are fused to drive object detection without performing conventional radar signal processing steps such as point cloud generation. The method is evaluated on the RADIal dataset against prior fusion approaches both for detection accuracy and for metrics of computational complexity. A sympathetic reader would care because cameras supply semantic detail while radar operates in poor weather, yet most existing fusions incur heavy radar preprocessing costs that this direct-spectrum route aims to sidestep.

Core claim

The central claim is that object detection in bird's-eye view can be performed by transforming camera images into the BEV polar domain and extracting features with a dedicated encoder-decoder architecture, recovering range-azimuth features from the raw range-Doppler radar spectrum via a radar decoder, and fusing the two resulting maps to reach detection performance competitive with existing methods while lowering computational complexity on the RADIal dataset.

What carries the argument

The camera BEV-polar encoder-decoder paired with the radar decoder that reconstructs range-azimuth features directly from the raw range-Doppler input; their outputs are fused for detection.

If this is right

Object detection proceeds without conventional radar point-cloud extraction or signal processing.
Detection accuracy remains competitive with prior camera-radar fusion methods on the RADIal dataset.
Overall computational complexity is reduced relative to methods that ingest processed radar data.
The raw-spectrum route supplies sufficient information for the fusion step to succeed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The efficiency gain could support higher frame-rate operation on embedded vehicle hardware.
The same raw-spectrum decoder might be tested on other radar datasets to check whether the accuracy-complexity trade-off generalizes.
Because radar remains functional when cameras are degraded by weather, the fusion could be examined for robustness in rain or fog even though the paper reports only nominal conditions.

Load-bearing premise

The raw range-Doppler spectrum contains enough semantic and structural information that a dedicated decoder can recover usable range-azimuth features for fusion with camera features.

What would settle it

Running the proposed network on the RADIal dataset and measuring detection accuracy below existing fusion baselines or computational metrics above those baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2411.13311 by Gijs Dubbelman, Kavin Chandrasekaran, Pavol Jancura, Sorin Grigorescu.

**Figure 1.** Figure 1: Architecture Overview: The image processing pipeline first transforms the camera image into Bird’s-Eye View (BEV). Subsequently, the resultant BEV undergoes conversion into polar representation, directly mapping to the Range-Azimuth (RA) image. Object detection is performed on RA image features fused with radar features from the radar decoder. The predictions obtained in the RA view are shown in the camera… view at source ↗

**Figure 2.** Figure 2: Image Processing Pipeline: The objects in the frame (four cars) marked in different colors are reflected in the BEV Cartesian and Polar pixel images. The origin is at the bottom center. The azimuth (θ), range (r) ground truth polar coordinates are marked for reference. r denotes the distance from the objects to the ego vehicle (in meters); θ represents the angle at which the objects are located in degrees.… view at source ↗

**Figure 3.** Figure 3: The camera only and radar only encoder contains four ResNet-50-like blocks with a pre-encoder block. The features from each of those blocks are named x0, x1, x2, x3, and x4. The thick blue curved arrow takes the encoder’s output to the decoder’s input in order to expand the input feature maps to higher resolutions. The dotted lines represent the skip connections used to preserve spatial information. The fe… view at source ↗

**Figure 4.** Figure 4: Qualitative detection results from the proposed fusion model. The predictions obtained in the RA view (represented [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The prediction in blue and the ground truth in green are shown in (a) front-view camera and (b) BEV Polar image. Zoom in to better visualize. VIII. CONCLUSION AND FUTURE WORK In this work, upon proposing a fusion strategy in BEV space, we analysed how the performance affects the computational metrics in various aspects. Our approach demonstrates proficient performance while upholding a comparatively low… view at source ↗

read the original abstract

Cameras can be used to perceive the environment around the vehicle, while affordable radar sensors are popular in autonomous driving systems as they can withstand adverse weather conditions unlike cameras. However, radar point clouds are sparser with low azimuth and elevation resolution that lack semantic and structural information of the scenes, resulting in generally lower radar detection performance. In this work, we directly use the raw range-Doppler (RD) spectrum of radar data, thus avoiding radar signal processing. We independently process camera images within the proposed comprehensive image processing pipeline. Specifically, first, we transform the camera images to Bird's-Eye View (BEV) Polar domain and extract the corresponding features with our camera encoder-decoder architecture. The resultant feature maps are fused with Range-Azimuth (RA) features, recovered from the RD spectrum input from the radar decoder to perform object detection. We evaluate our fusion strategy with other existing methods not only in terms of accuracy but also on computational complexity metrics on RADIal dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Abstract-only view shows an incremental camera-radar BEV fusion approach using raw RD spectra, but the lack of any results leaves the accuracy and complexity claims unverified.

read the letter

The paper outlines a fusion network that takes raw radar range-Doppler spectra, runs them through a decoder to recover range-azimuth features, transforms camera images into BEV polar space with its own encoder-decoder, fuses the two feature sets, and performs object detection. It evaluates the approach on the RADIal dataset and claims competitive accuracy at lower computational cost than existing methods while avoiding conventional radar point-cloud processing. The direct use of the raw RD spectrum is the clearest departure from common practice. That choice could reduce pipeline steps and retain more signal detail, which aligns with the goal of keeping radar useful in adverse weather. The camera-to-BEV-polar path is a concrete implementation detail that might help alignment. Both accuracy and complexity are mentioned as evaluation axes, which is a practical framing for autonomous-driving work. The main issue is that none of the claims can be checked. The abstract supplies no numbers, no ablation tables, no baseline comparisons, and no error breakdowns, so it is impossible to tell whether the fusion actually improves anything or whether the raw-spectrum assumption holds. The approach also sounds close to prior camera-radar BEV fusion work the abstract itself references, which makes the novelty modest at best. This is the kind of paper that might interest engineers who build multi-sensor stacks and want a concrete recipe rather than a theoretical advance. A reader already working on radar-camera fusion could pull useful architecture choices from the full version if the experiments turn out to be careful. I would send it to peer review once the full paper is available, provided the experiments include proper ablations and fair complexity measurements; without those it would be hard to judge the contribution.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a fusion network for object detection in bird's-eye view that processes camera images through a BEV-polar transformation and encoder-decoder pipeline, recovers range-azimuth features from the raw radar range-Doppler spectrum via a dedicated decoder, and fuses the resulting feature maps to perform detection. It claims this strategy achieves competitive accuracy while reducing computational complexity relative to existing methods, with evaluation on the RADIal dataset.

Significance. If the empirical claims hold, the work could contribute to resource-efficient multi-modal perception for autonomous driving by avoiding conventional radar signal processing and directly ingesting raw spectra. The dual focus on accuracy and complexity metrics addresses a relevant practical constraint. However, the provided manuscript contains only an abstract with no quantitative results, architecture details, ablations, or comparisons, so no assessment of actual significance is possible.

major comments (1)

[Abstract] Abstract: The central claim that the proposed camera-radar fusion 'performs object detection with competitive accuracy and reduced computational complexity' cannot be evaluated because the manuscript supplies no accuracy metrics, complexity numbers (e.g., FLOPs, latency), baseline comparisons, ablation studies, or error analysis. This absence directly prevents verification of the result and of the assumption that raw RD-spectrum-derived RA features supply sufficient semantic information when fused with BEV-polar camera features.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the major comment below and note that the current submission consists solely of the abstract, as indicated in the provided materials.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the proposed camera-radar fusion 'performs object detection with competitive accuracy and reduced computational complexity' cannot be evaluated because the manuscript supplies no accuracy metrics, complexity numbers (e.g., FLOPs, latency), baseline comparisons, ablation studies, or error analysis. This absence directly prevents verification of the result and of the assumption that raw RD-spectrum-derived RA features supply sufficient semantic information when fused with BEV-polar camera features.

Authors: We agree that the abstract as provided does not contain the quantitative results, metrics, or analyses needed to substantiate the claims. The full manuscript will be expanded to include accuracy metrics (e.g., mAP, precision/recall), computational complexity measures (FLOPs, parameters, inference latency), direct comparisons against existing camera-radar fusion baselines, ablation studies on the fusion components, and error analysis, all evaluated on the RADIal dataset. These additions will enable verification of the performance claims and the sufficiency of the raw RD-derived RA features. revision: yes

Circularity Check

0 steps flagged

No circularity in abstract; derivation chain absent

full rationale

Only the abstract is available and it contains no equations, fitted parameters, predictions, or self-citations. The text describes an architecture (raw RD spectrum to RA features via decoder, camera to BEV-polar features, fusion for detection) without any claimed derivation that reduces outputs to inputs by construction. This is the most common honest finding when no load-bearing mathematical steps are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about feature sufficiency and dataset representativeness that cannot be enumerated from the given text.

pith-pipeline@v0.9.0 · 5692 in / 1054 out tokens · 30978 ms · 2026-05-23T08:11:55.605890+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REFNet++: Multi-Task Efficient Fusion of Camera and Radar Sensor Data in Bird's-Eye Polar View
cs.CV 2026-05 unverdicted novelty 4.0

REFNet++ aligns raw camera images and radar range-Doppler data into a shared bird's-eye polar view using variational encoders for multi-task vehicle detection and free space segmentation on the RADIal dataset.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,

D. J. Yeong, G. Velasco-Hernandez, J. Barry, and J. Walsh, “Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,” Sensors, vol. 21, p. 2140, Mar. 2021

work page 2021
[2]

Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,

S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh, “Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,” in Computer Vision – ECCV 2022 (S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, eds.), vol. 13699, pp. 160– 178, Cham: Springer Nature Switzerland, 2022. Series Title: Lecture Notes in Computer Science

work page 2022
[3]

Richards, Principles of modern radar

M. Richards, Principles of modern radar . SciTech Pub., 2010

work page 2010
[4]

Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,

T.-Y . Lim and A. Ansari, “Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,” in NeurIPS Ma- chine Learning for Autonomous Driving Workshop , 2019

work page 2019
[5]

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,

J. Philion and S. Fidler, “Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,” inComputer Vision – ECCV 2020 (A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, eds.), (Cham), pp. 194–210, Springer International Publishing, 2020

work page 2020
[6]

Orthographic Feature Transform for Monocular 3D Object Detection

T. Roddick, A. Kendall, and R. Cipolla, “Orthographic Fea- ture Transform for Monocular 3D Object Detection,” Nov. 2018. arXiv:1811.08188 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Cross-view Transformers for real- time Map-view Semantic Segmentation,

B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-view Transformers for real- time Map-view Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 13750–13759, CVPR, June 2022

work page 2022
[8]

PETR: Position Embedding Transformation for Multi-view 3D Object Detection,

Y . Liu, T. Wang, X. Zhang, and J. Sun, “PETR: Position Embedding Transformation for Multi-view 3D Object Detection,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII , (Berlin, Heidelberg), pp. 531–548, Springer-Verlag, Oct. 2022

work page 2022
[9]

Petrv2: A uniﬁed framework for 3d perception from multi-camera images,

Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, X. Zhang, and J. Sun, “PETRv2: A Unified Framework for 3D Perception from Multi- Camera Images,” Nov. 2022. arXiv:2206.01256 [cs]

work page arXiv 2022
[10]

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) , pp. 2774–2781, May 2023

work page 2023
[11]

BEVFormer: Learning Bird’s-Eye-View Representation from Multi- camera Images via Spatiotemporal Transformers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “BEVFormer: Learning Bird’s-Eye-View Representation from Multi- camera Images via Spatiotemporal Transformers,” in Computer Vision – ECCV 2022 , (Cham), pp. 1–18, Springer Nature Switzerland, 2022

work page 2022
[12]

BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,

C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu, J. Zhou, and J. Dai, “BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 17830–17839, June 2023. ISSN: 2575-7075

work page 2023
[13]

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo,

Y . Li, H. Bao, Z. Ge, J. Yang, J. Sun, and Z. Li, “BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo,” Sept. 2022. arXiv:2209.10248 [cs]

work page arXiv 2022
[14]

STS: Surround-view Temporal Stereo for Multi-view 3D Detection,

Z. Wang, C. Min, Z. Ge, Y . Li, Z. Li, H. Yang, and D. Huang, “STS: Surround-view Temporal Stereo for Multi-view 3D Detection,” Aug

work page
[15]

arXiv:2208.10145 [cs]

work page arXiv
[16]

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,

H. Wang, H. Tang, S. Shi, A. Li, Z. Li, B. Schiele, and L. Wang, “UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,” Aug. 2023. arXiv:2308.07732 [cs]

work page arXiv 2023
[17]

Raw High-Definition Radar for Multi-Task Learning,

J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw High-Definition Radar for Multi-Task Learning,” Apr. 2022. arXiv:2112.10646 [cs, eess]

work page arXiv 2022
[18]

CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,

A. Ouaknine, A. Newson, J. Rebut, F. Tupin, and P. P ´erez, “CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,” May 2021. arXiv:2005.01456 [cs]

work page arXiv 2021
[19]

RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,

A. Zhang, F. E. Nowruzi, and R. Laganiere, “RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,” in 2021 18th Conference on Robots and Vision (CRV), pp. 95– 102, May 2021

work page 2021
[20]

RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,

M. Sheeny, E. De Pellegrin, S. Mukherjee, A. Ahrabian, S. Wang, and A. Wallace, “RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,” Apr. 2021. arXiv:2010.09076 [cs]

work page arXiv 2021
[21]

High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,

M. Mostajabi, C. M. Wang, D. Ranjan, and G. Hsyu, “High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , pp. 450–457, June 2020. ISSN: 2160-7516

work page 2020
[22]

RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,

T.-Y . Lim, S. A. Markowitz, and M. N. Do, “RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,” IEEE Journal of Selected Topics in Signal Processing , vol. 15, pp. 941–953, June 2021. Conference Name: IEEE Journal of Selected Topics in Signal Processing

work page 2021
[23]

K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,

D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,” Nov. 2023. arXiv:2206.08171 [cs]

work page arXiv 2023
[24]

Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,

B. Major, D. Fontijne, A. Ansari, R. T. Sukhavasi, R. Gowaikar, M. Hamilton, S. Lee, S. Grzechnik, and S. Subramanian, “Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,” in 2019 IEEE/CVF International Confer- ence on Computer Vision Workshop (ICCVW), pp. 924–932, Oct. 2019. ISSN: 2473-9944

work page 2019
[25]

CNN based Road User Detection using the 3D Radar Cube,

A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “CNN based Road User Detection using the 3D Radar Cube,” IEEE Robotics and Auto- mation Letters, vol. 5, pp. 1263–1270, Apr. 2020. arXiv:2004.12165 [cs]

work page arXiv 2020
[26]

Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,

G. Zhang, H. Li, and F. Wenger, “Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4487–4491, May 2020. ISSN: 2379-190X

work page 2020
[27]

RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,

Y . Wang, Z. Jiang, Y . Li, J.-N. Hwang, G. Xing, and H. Liu, “RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,” IEEE Journal of Se- lected Topics in Signal Processing , vol. 15, pp. 954–967, June 2021. arXiv:2102.05150 [cs, eess]

work page arXiv 2021
[28]

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,

J. Giroux, M. Bouchard, and R. Laganiere, “T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,” Mar. 2023. arXiv:2303.16940 [cs]

work page arXiv 2023
[29]

ADCNet: Learning from Raw Radar Data via Distillation,

B. Yang, I. Khatri, M. Happold, and C. Chen, “ADCNet: Learning from Raw Radar Data via Distillation,” Dec. 2023. arXiv:2303.11420 [cs, eess]

work page arXiv 2023
[30]

Distant Vehicle Detection Using Radar and Vision

S. Chadwick, W. Maddern, and P. Newman, “Distant Vehicle Detection Using Radar and Vision,” May 2019. arXiv:1901.10951 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[31]

A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,

F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,” May 2020. arXiv:2005.07431 [cs]

work page arXiv 2020
[32]

Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles,

R. Nabati and H. Qi, “Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles,” Sept

work page
[33]

arXiv:2009.08428 [cs]

work page arXiv 2009
[34]

CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,

R. Nabati and H. Qi, “CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 1526–1535, Jan. 2021. arXiv:2011.04841 [cs]

work page arXiv 2021
[35]

GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image,

Y . Kim, J. W. Choi, and D. Kum, “GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10857– 10864, Oct. 2020. ISSN: 2153-0866

work page 2020
[36]

CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,” Dec. 2023. arXiv:2304.00670 [cs]

work page arXiv 2023
[37]

RVDet: Feature-level Fusion of Radar and Camera for Object Detection,

J. Zhang, M. Zhang, Z. Fang, Y . Wang, X. Zhao, and S. Pu, “RVDet: Feature-level Fusion of Radar and Camera for Object Detection,” in 2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), pp. 2822–2828, Sept. 2021

work page 2021
[38]

MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,

Z. Wu, G. Chen, Y . Gan, L. Wang, and J. Pu, “MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,” Feb. 2023. arXiv:2302.10511 [cs]

work page arXiv 2023
[39]

Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Mon- ocular Image,

J. Kim, Y . Kim, and D. Kum, “Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Mon- ocular Image,” in Proceedings of the Asian Conference on Computer Vision (ACCV), Proceedings of the Asian Conference on Computer Vision (ACCV), 2020

work page 2020
[40]

CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,

Y . Kim, S. Kim, J. W. Choi, and D. Kum, “CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,” Nov

work page
[41]

arXiv:2209.06535 [cs]

work page arXiv
[42]

Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,

Y . Jin, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,” IEEE Transactions on Intelligent Vehicles , vol. 8, pp. 3012–3025, Apr. 2023. Conference Name: IEEE Transactions on Intelligent Vehicles

work page 2023
[43]

ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,

L. Liu, S. Zhi, Z. Du, L. Liu, X. Zhang, K. Huo, and W. Jiang, “ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,” July 2023. arXiv:2307.08233 [cs]

work page arXiv 2023
[44]

Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,

Y . Liu, F. Wang, N. Wang, and Z.-X. Zhang, “Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,” Advances in Neural Information Processing Systems , vol. 36, pp. 53964–53982, Dec. 2023

work page 2023
[45]

Vision-Centric BEV Perception: A Survey,

Y . Ma, T. Wang, X. Bai, H. Yang, Y . Hou, Y . Wang, Y . Qiao, R. Yang, D. Manocha, and X. Zhu, “Vision-Centric BEV Perception: A Survey,” June 2023. arXiv:2208.02797 [cs]

work page arXiv 2023
[46]

PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,

Y . Jiang, L. Zhang, Z. Miao, X. Zhu, J. Gao, W. Hu, and Y .-G. Jiang, “PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,” Jan. 2023. arXiv:2206.15398 [cs]

work page arXiv 2023
[47]

Transform image to bird’s-eye view - MATLAB transformImage

“Transform image to bird’s-eye view - MATLAB transformImage.”

work page
[48]

scipy.ndimage.map coordinates — SciPy v1.12.0 Manual

“scipy.ndimage.map coordinates — SciPy v1.12.0 Manual.”

work page
[49]

MIMO Radar, Techniques and Opportunities,

B. J. Donnet and I. D. Longstaff, “MIMO Radar, Techniques and Opportunities,” in 2006 European Radar Conference , pp. 112–115, Sept. 2006

work page 2006
[50]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June 2016. ISSN: 1063-6919

work page 2016
[51]

Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,

S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu, and Y . Yue, “Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,”IEEE Transactions on Intelligent Vehicles, pp. 1–40, 2023. arXiv:2304.10410 [cs]

work page arXiv 2023
[52]

A survey on multi-sensor fusion based obstacle detection for intelligent ground vehicles in off-road environments,

J.-w. Hu, B.-y. Zheng, C. Wang, C.-h. Zhao, X.-l. Hou, Q. Pan, and Z. Xu, “A survey on multi-sensor fusion based obstacle detection for intelligent ground vehicles in off-road environments,” Frontiers of Information Technology & Electronic Engineering , vol. 21, pp. 675– 692, May 2020

work page 2020
[53]

Multi-Sensor Fusion in Automated Driving: A Survey,

Z. Wang, Y . Wu, and Q. Niu, “Multi-Sensor Fusion in Automated Driving: A Survey,” IEEE Access, vol. 8, pp. 2847–2868, 2020

work page 2020
[54]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” Jan. 2017. arXiv:1412.6980 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[55]

Focal Loss for Dense Object Detection

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” Feb. 2018. arXiv:1708.02002 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[56]

PIXOR: Real-time 3D Object Detection from Point Clouds

B. Yang, W. Luo, and R. Urtasun, “PIXOR: Real-time 3D Object Detection from Point Clouds,” Mar. 2019. arXiv:1902.06326 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[57]

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

M. Tan and Q. V . Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Sept. 2020. arXiv:1905.11946 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[58]

UNetFormer: A UNet-like transformer for efficient se- mantic segmentation of remote sensing urban scene imagery,

L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, and P. M. Atkinson, “UNetFormer: A UNet-like transformer for efficient se- mantic segmentation of remote sensing urban scene imagery,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 190, pp. 196– 214, Aug. 2022

work page 2022

[1] [1]

Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,

D. J. Yeong, G. Velasco-Hernandez, J. Barry, and J. Walsh, “Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,” Sensors, vol. 21, p. 2140, Mar. 2021

work page 2021

[2] [2]

Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,

S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh, “Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,” in Computer Vision – ECCV 2022 (S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, eds.), vol. 13699, pp. 160– 178, Cham: Springer Nature Switzerland, 2022. Series Title: Lecture Notes in Computer Science

work page 2022

[3] [3]

Richards, Principles of modern radar

M. Richards, Principles of modern radar . SciTech Pub., 2010

work page 2010

[4] [4]

Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,

T.-Y . Lim and A. Ansari, “Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,” in NeurIPS Ma- chine Learning for Autonomous Driving Workshop , 2019

work page 2019

[5] [5]

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,

J. Philion and S. Fidler, “Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,” inComputer Vision – ECCV 2020 (A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, eds.), (Cham), pp. 194–210, Springer International Publishing, 2020

work page 2020

[6] [6]

Orthographic Feature Transform for Monocular 3D Object Detection

T. Roddick, A. Kendall, and R. Cipolla, “Orthographic Fea- ture Transform for Monocular 3D Object Detection,” Nov. 2018. arXiv:1811.08188 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Cross-view Transformers for real- time Map-view Semantic Segmentation,

B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-view Transformers for real- time Map-view Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 13750–13759, CVPR, June 2022

work page 2022

[8] [8]

PETR: Position Embedding Transformation for Multi-view 3D Object Detection,

Y . Liu, T. Wang, X. Zhang, and J. Sun, “PETR: Position Embedding Transformation for Multi-view 3D Object Detection,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII , (Berlin, Heidelberg), pp. 531–548, Springer-Verlag, Oct. 2022

work page 2022

[9] [9]

Petrv2: A uniﬁed framework for 3d perception from multi-camera images,

Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, X. Zhang, and J. Sun, “PETRv2: A Unified Framework for 3D Perception from Multi- Camera Images,” Nov. 2022. arXiv:2206.01256 [cs]

work page arXiv 2022

[10] [10]

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) , pp. 2774–2781, May 2023

work page 2023

[11] [11]

BEVFormer: Learning Bird’s-Eye-View Representation from Multi- camera Images via Spatiotemporal Transformers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “BEVFormer: Learning Bird’s-Eye-View Representation from Multi- camera Images via Spatiotemporal Transformers,” in Computer Vision – ECCV 2022 , (Cham), pp. 1–18, Springer Nature Switzerland, 2022

work page 2022

[12] [12]

BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,

C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu, J. Zhou, and J. Dai, “BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 17830–17839, June 2023. ISSN: 2575-7075

work page 2023

[13] [13]

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo,

Y . Li, H. Bao, Z. Ge, J. Yang, J. Sun, and Z. Li, “BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo,” Sept. 2022. arXiv:2209.10248 [cs]

work page arXiv 2022

[14] [14]

STS: Surround-view Temporal Stereo for Multi-view 3D Detection,

Z. Wang, C. Min, Z. Ge, Y . Li, Z. Li, H. Yang, and D. Huang, “STS: Surround-view Temporal Stereo for Multi-view 3D Detection,” Aug

work page

[15] [15]

arXiv:2208.10145 [cs]

work page arXiv

[16] [16]

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,

H. Wang, H. Tang, S. Shi, A. Li, Z. Li, B. Schiele, and L. Wang, “UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,” Aug. 2023. arXiv:2308.07732 [cs]

work page arXiv 2023

[17] [17]

Raw High-Definition Radar for Multi-Task Learning,

J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw High-Definition Radar for Multi-Task Learning,” Apr. 2022. arXiv:2112.10646 [cs, eess]

work page arXiv 2022

[18] [18]

CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,

A. Ouaknine, A. Newson, J. Rebut, F. Tupin, and P. P ´erez, “CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,” May 2021. arXiv:2005.01456 [cs]

work page arXiv 2021

[19] [19]

RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,

A. Zhang, F. E. Nowruzi, and R. Laganiere, “RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,” in 2021 18th Conference on Robots and Vision (CRV), pp. 95– 102, May 2021

work page 2021

[20] [20]

RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,

M. Sheeny, E. De Pellegrin, S. Mukherjee, A. Ahrabian, S. Wang, and A. Wallace, “RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,” Apr. 2021. arXiv:2010.09076 [cs]

work page arXiv 2021

[21] [21]

High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,

M. Mostajabi, C. M. Wang, D. Ranjan, and G. Hsyu, “High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , pp. 450–457, June 2020. ISSN: 2160-7516

work page 2020

[22] [22]

RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,

T.-Y . Lim, S. A. Markowitz, and M. N. Do, “RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,” IEEE Journal of Selected Topics in Signal Processing , vol. 15, pp. 941–953, June 2021. Conference Name: IEEE Journal of Selected Topics in Signal Processing

work page 2021

[23] [23]

K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,

D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,” Nov. 2023. arXiv:2206.08171 [cs]

work page arXiv 2023

[24] [24]

Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,

B. Major, D. Fontijne, A. Ansari, R. T. Sukhavasi, R. Gowaikar, M. Hamilton, S. Lee, S. Grzechnik, and S. Subramanian, “Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,” in 2019 IEEE/CVF International Confer- ence on Computer Vision Workshop (ICCVW), pp. 924–932, Oct. 2019. ISSN: 2473-9944

work page 2019

[25] [25]

CNN based Road User Detection using the 3D Radar Cube,

A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “CNN based Road User Detection using the 3D Radar Cube,” IEEE Robotics and Auto- mation Letters, vol. 5, pp. 1263–1270, Apr. 2020. arXiv:2004.12165 [cs]

work page arXiv 2020

[26] [26]

Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,

G. Zhang, H. Li, and F. Wenger, “Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4487–4491, May 2020. ISSN: 2379-190X

work page 2020

[27] [27]

RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,

Y . Wang, Z. Jiang, Y . Li, J.-N. Hwang, G. Xing, and H. Liu, “RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,” IEEE Journal of Se- lected Topics in Signal Processing , vol. 15, pp. 954–967, June 2021. arXiv:2102.05150 [cs, eess]

work page arXiv 2021

[28] [28]

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,

J. Giroux, M. Bouchard, and R. Laganiere, “T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,” Mar. 2023. arXiv:2303.16940 [cs]

work page arXiv 2023

[29] [29]

ADCNet: Learning from Raw Radar Data via Distillation,

B. Yang, I. Khatri, M. Happold, and C. Chen, “ADCNet: Learning from Raw Radar Data via Distillation,” Dec. 2023. arXiv:2303.11420 [cs, eess]

work page arXiv 2023

[30] [30]

Distant Vehicle Detection Using Radar and Vision

S. Chadwick, W. Maddern, and P. Newman, “Distant Vehicle Detection Using Radar and Vision,” May 2019. arXiv:1901.10951 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[31] [31]

A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,

F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,” May 2020. arXiv:2005.07431 [cs]

work page arXiv 2020

[32] [32]

Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles,

R. Nabati and H. Qi, “Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles,” Sept

work page

[33] [33]

arXiv:2009.08428 [cs]

work page arXiv 2009

[34] [34]

CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,

R. Nabati and H. Qi, “CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 1526–1535, Jan. 2021. arXiv:2011.04841 [cs]

work page arXiv 2021

[35] [35]

GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image,

Y . Kim, J. W. Choi, and D. Kum, “GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10857– 10864, Oct. 2020. ISSN: 2153-0866

work page 2020

[36] [36]

CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,

Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,” Dec. 2023. arXiv:2304.00670 [cs]

work page arXiv 2023

[37] [37]

RVDet: Feature-level Fusion of Radar and Camera for Object Detection,

J. Zhang, M. Zhang, Z. Fang, Y . Wang, X. Zhao, and S. Pu, “RVDet: Feature-level Fusion of Radar and Camera for Object Detection,” in 2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), pp. 2822–2828, Sept. 2021

work page 2021

[38] [38]

MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,

Z. Wu, G. Chen, Y . Gan, L. Wang, and J. Pu, “MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,” Feb. 2023. arXiv:2302.10511 [cs]

work page arXiv 2023

[39] [39]

Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Mon- ocular Image,

J. Kim, Y . Kim, and D. Kum, “Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Mon- ocular Image,” in Proceedings of the Asian Conference on Computer Vision (ACCV), Proceedings of the Asian Conference on Computer Vision (ACCV), 2020

work page 2020

[40] [40]

CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,

Y . Kim, S. Kim, J. W. Choi, and D. Kum, “CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,” Nov

work page

[41] [41]

arXiv:2209.06535 [cs]

work page arXiv

[42] [42]

Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,

Y . Jin, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,” IEEE Transactions on Intelligent Vehicles , vol. 8, pp. 3012–3025, Apr. 2023. Conference Name: IEEE Transactions on Intelligent Vehicles

work page 2023

[43] [43]

ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,

L. Liu, S. Zhi, Z. Du, L. Liu, X. Zhang, K. Huo, and W. Jiang, “ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,” July 2023. arXiv:2307.08233 [cs]

work page arXiv 2023

[44] [44]

Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,

Y . Liu, F. Wang, N. Wang, and Z.-X. Zhang, “Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,” Advances in Neural Information Processing Systems , vol. 36, pp. 53964–53982, Dec. 2023

work page 2023

[45] [45]

Vision-Centric BEV Perception: A Survey,

Y . Ma, T. Wang, X. Bai, H. Yang, Y . Hou, Y . Wang, Y . Qiao, R. Yang, D. Manocha, and X. Zhu, “Vision-Centric BEV Perception: A Survey,” June 2023. arXiv:2208.02797 [cs]

work page arXiv 2023

[46] [46]

PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,

Y . Jiang, L. Zhang, Z. Miao, X. Zhu, J. Gao, W. Hu, and Y .-G. Jiang, “PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,” Jan. 2023. arXiv:2206.15398 [cs]

work page arXiv 2023

[47] [47]

Transform image to bird’s-eye view - MATLAB transformImage

“Transform image to bird’s-eye view - MATLAB transformImage.”

work page

[48] [48]

scipy.ndimage.map coordinates — SciPy v1.12.0 Manual

“scipy.ndimage.map coordinates — SciPy v1.12.0 Manual.”

work page

[49] [49]

MIMO Radar, Techniques and Opportunities,

B. J. Donnet and I. D. Longstaff, “MIMO Radar, Techniques and Opportunities,” in 2006 European Radar Conference , pp. 112–115, Sept. 2006

work page 2006

[50] [50]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June 2016. ISSN: 1063-6919

work page 2016

[51] [51]

Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,

S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu, and Y . Yue, “Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,”IEEE Transactions on Intelligent Vehicles, pp. 1–40, 2023. arXiv:2304.10410 [cs]

work page arXiv 2023

[52] [52]

A survey on multi-sensor fusion based obstacle detection for intelligent ground vehicles in off-road environments,

J.-w. Hu, B.-y. Zheng, C. Wang, C.-h. Zhao, X.-l. Hou, Q. Pan, and Z. Xu, “A survey on multi-sensor fusion based obstacle detection for intelligent ground vehicles in off-road environments,” Frontiers of Information Technology & Electronic Engineering , vol. 21, pp. 675– 692, May 2020

work page 2020

[53] [53]

Multi-Sensor Fusion in Automated Driving: A Survey,

Z. Wang, Y . Wu, and Q. Niu, “Multi-Sensor Fusion in Automated Driving: A Survey,” IEEE Access, vol. 8, pp. 2847–2868, 2020

work page 2020

[54] [54]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” Jan. 2017. arXiv:1412.6980 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[55] [55]

Focal Loss for Dense Object Detection

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” Feb. 2018. arXiv:1708.02002 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[56] [56]

PIXOR: Real-time 3D Object Detection from Point Clouds

B. Yang, W. Luo, and R. Urtasun, “PIXOR: Real-time 3D Object Detection from Point Clouds,” Mar. 2019. arXiv:1902.06326 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[57] [57]

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

M. Tan and Q. V . Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Sept. 2020. arXiv:1905.11946 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[58] [58]

UNetFormer: A UNet-like transformer for efficient se- mantic segmentation of remote sensing urban scene imagery,

L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, and P. M. Atkinson, “UNetFormer: A UNet-like transformer for efficient se- mantic segmentation of remote sensing urban scene imagery,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 190, pp. 196– 214, Aug. 2022

work page 2022