pith. machine review for the scientific record. sign in

arxiv: 2605.11824 · v1 · submitted 2026-05-12 · 💻 cs.CV · cs.AI

Recognition: no theorem link

REFNet++: Multi-Task Efficient Fusion of Camera and Radar Sensor Data in Bird's-Eye Polar View

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:38 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords multimodal sensor fusioncamera-radar fusionbird's-eye viewpolar domainvehicle detectionfree space segmentationRADIal datasetencoder-decoder
0
0 comments X

The pith

Aligning camera front-view images and radar range-Doppler spectra in a shared bird's-eye polar domain enables efficient and accurate sensor fusion for detection and segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces REFNet++, which fuses camera and radar data by mapping both into a common bird's-eye view polar representation. A variational encoder-decoder transforms front-view camera images to this polar domain, while a radar encoder-decoder extracts angle information from the raw range-Doppler spectrum to create range-azimuth features. The aligned modalities support multi-task learning for vehicle detection and free space segmentation. Evaluation on the RADIal dataset shows gains over existing methods in both accuracy and efficiency. Readers care because this approach addresses the limitations of radar's noise and camera's weather sensitivity without heavy computation.

Core claim

The paper claims that by using a variational encoder-decoder to project camera data from front view to bird's-eye polar view and a radar encoder-decoder to recover azimuth from range-Doppler data, both sensors can be represented in a compatible domain. This facilitates robust multimodal fusion that improves vehicle detection and free space segmentation performance compared to state-of-the-art approaches on the RADIal dataset.

What carries the argument

The variational encoder-decoder for camera-to-BEV-polar transformation and the radar encoder-decoder for RD-to-RA feature recovery.

If this is right

  • Vehicle detection benefits from radar's robustness combined with camera's detail in the aligned domain.
  • Free space segmentation achieves higher precision due to the unified representation.
  • The fusion strategy reduces computational cost while maintaining or improving accuracy over prior methods.
  • Multi-task training on detection and segmentation shares the aligned features efficiently.
  • Performance is validated on real-world radar-camera data from the RADIal dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending this alignment to include additional sensors like LiDAR could further enhance perception robustness.
  • The polar domain choice may particularly aid in handling varying object distances and angles in driving scenes.
  • If the transformation generalizes well, it could be applied to other autonomous driving datasets beyond RADIal.

Load-bearing premise

The variational encoder-decoder accurately learns the transformation of front-view camera data into the bird's-eye view polar domain without significant information loss or errors.

What would settle it

A direct comparison of the transformed camera BEV polar features against actual radar range-azimuth data showing consistent misalignment in object locations or shapes would falsify the claim of effective alignment.

Figures

Figures reproduced from arXiv: 2605.11824 by Gijs Dubbelman, Kavin Chandrasekaran, Pavol Jancura, Sorin Grigorescu.

Figure 1
Figure 1. Figure 1: Architectural Overview: The variational encoder￾decoder architecture learns the transformation from the front￾view camera image to BEV, which corresponds to the RA domain. On the other hand, the radar encoder-decoder ar￾chitecture learns to recover the angle information from the complex range-Doppler (RD) input, producing RA features. Free space segmentation and vehicle detection are performed on the resul… view at source ↗
Figure 2
Figure 2. Figure 2: Multitask resource-efficient fusion architecture: The input to the radar only network is the range-Doppler (RD) data, while the camera only network intakes front-view camera images. x0 to x4 are the feature maps from the respective encoder blocks. The encoder is connected to the decoder by a thick blue arrow, by which the encoded features are upscaled to higher resolutions. The skip connections are shown a… view at source ↗
Figure 2
Figure 2. Figure 2: It can be noticed that there are four convolutional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results on samples from the test set. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

A realistic view of the vehicle's surroundings is generally offered by camera sensors, which is crucial for environmental perception. Affordable radar sensors, on the other hand, are becoming invaluable due to their robustness in variable weather conditions. However, because of their noisy output and reduced classification capability, they work best when combined with other sensor data. Specifically, we address the challenge of multimodal sensor fusion by aligning radar and camera data in a unified domain, prioritizing not only accuracy, but also computational efficiency. Our work leverages the raw range-Doppler (RD) spectrum from radar and front-view camera images as inputs. To enable effective fusion, we employ a variational encoder-decoder architecture that learns the transformation of front-view camera data into the Bird's-Eye View (BEV) polar domain. Concurrently, a radar encoder-decoder learns to recover the angle information from the RD data that produce Range-Azimuth (RA) features. This alignment ensures that both modalities are represented in a compatible domain, facilitating robust and efficient sensor fusion. We evaluated our fusion strategy for vehicle detection and free space segmentation against state-of-the-art methods using the RADIal dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces REFNet++, a multi-task architecture for fusing raw range-Doppler radar spectra and front-view camera images in a shared Bird's-Eye View (BEV) polar coordinate frame. A variational encoder-decoder learns the camera-to-BEV-polar transformation, while a separate encoder-decoder recovers azimuth information from radar RD data to produce RA features. These aligned features are fused for simultaneous vehicle detection and free-space segmentation tasks, with performance evaluated on the RADIal dataset against existing state-of-the-art fusion methods.

Significance. If the quantitative results, training details, loss formulations, and ablation tables in the full manuscript hold, this work offers a meaningful contribution to efficient multi-modal sensor fusion for autonomous vehicles. Aligning modalities in the polar BEV domain addresses a practical compatibility issue between camera perspective and radar polar representations, and the variational approach for unsupervised view transformation is a standard and well-motivated choice when direct BEV supervision is unavailable. The multi-task design and emphasis on computational efficiency could support real-time perception pipelines, with the RADIal evaluation providing a relevant public benchmark.

major comments (2)
  1. [§4.2] §4.2, fusion module description: the claim that the polar BEV alignment enables 'robust' fusion in variable weather rests on the assumption that the variational camera-to-BEV mapping preserves sufficient spatial accuracy; however, no quantitative evaluation of transformation error (e.g., reprojection metrics on held-out calibration data) is provided, which is load-bearing for the robustness assertion.
  2. [Table 3] Table 3, detection results row: the reported AP improvement over the strongest baseline is given without standard deviation across multiple training runs or statistical significance testing; this weakens the cross-method comparison when claiming superiority.
minor comments (3)
  1. [Abstract] Abstract: the efficiency claim is stated qualitatively; adding one sentence with key metrics (e.g., FPS or parameter count relative to baselines) would strengthen the summary.
  2. [§3.1] §3.1: the notation for the variational posterior q(z|x) and prior p(z) should be introduced with a brief reminder of the ELBO objective to aid readers unfamiliar with the exact formulation used.
  3. [Figure 2] Figure 2: the radar RD-to-RA decoder branch diagram would benefit from explicit arrows indicating the angle recovery step and the final polar coordinate grid size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. We address each major comment below.

read point-by-point responses
  1. Referee: [§4.2] §4.2, fusion module description: the claim that the polar BEV alignment enables 'robust' fusion in variable weather rests on the assumption that the variational camera-to-BEV mapping preserves sufficient spatial accuracy; however, no quantitative evaluation of transformation error (e.g., reprojection metrics on held-out calibration data) is provided, which is load-bearing for the robustness assertion.

    Authors: We thank the referee for this observation. The robustness claim for fusion in variable weather is grounded in the radar modality's inherent resilience to adverse conditions (fog, rain, etc.), with the camera providing complementary high-resolution cues primarily under clear weather; the shared polar BEV domain simply enables their compatible fusion. The variational encoder-decoder is trained end-to-end using the multi-task detection and segmentation objectives on RADIal, so alignment quality is validated indirectly via task performance rather than explicit reprojection error. The RADIal dataset lacks held-out calibration data or direct BEV ground truth that would support quantitative transformation metrics. We will therefore add a dedicated limitations paragraph and qualitative feature-alignment visualizations in the revised §4.2. revision: partial

  2. Referee: [Table 3] Table 3, detection results row: the reported AP improvement over the strongest baseline is given without standard deviation across multiple training runs or statistical significance testing; this weakens the cross-method comparison when claiming superiority.

    Authors: We agree that reporting variability and significance testing would strengthen the claims. Our original experiments used a fixed random seed for reproducibility across all methods. We will rerun the detection experiments for REFNet++ and the strongest baselines with three independent seeds, report mean AP ± standard deviation in the revised Table 3, and add a brief note on statistical significance in the caption. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper describes a variational encoder-decoder pipeline that learns camera front-view to BEV-polar transformation and radar RD-to-RA recovery, then fuses the resulting features for downstream tasks. Training occurs on the external RADIal dataset with reported quantitative metrics for vehicle detection and free-space segmentation. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the architecture outputs are produced by standard learned mappings rather than being presupposed in the inputs. The central claim therefore rests on empirical performance rather than definitional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields incomplete ledger. The central claim rests on the unproven ability of variational networks to perform accurate domain transformations; typical neural-network training involves many fitted parameters whose impact cannot be assessed here.

free parameters (1)
  • variational network weights and hyperparameters
    Learned during training on RADIal data; exact values and selection process not specified in abstract.
axioms (1)
  • domain assumption Raw range-Doppler spectrum contains recoverable angle information via learned decoder
    Invoked by the radar encoder-decoder design in the abstract.

pith-pipeline@v0.9.0 · 5522 in / 1248 out tokens · 72815 ms · 2026-05-13T07:38:33.250128+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    A Survey of Deep Learning Applications to Autonomous Vehicle Control,

    S. Kuutti, R. Bowden, Y . Jin, P. Barber, and S. Fallah, “A Survey of Deep Learning Applications to Autonomous Vehicle Control,” Dec

  2. [2]

    arXiv:1912.10773 [cs, eess, stat]

  3. [3]

    A comprehensive review of embedded systems in autonomous vehicles: Trends, challenges, and future directions,

    S. Sonko, E. A. Etukudoh, K. I. Ibekwe, V . I. Ilojianya, and C. D. Daudu, “A comprehensive review of embedded systems in autonomous vehicles: Trends, challenges, and future directions,”World Journal of Advanced Research and Reviews, vol. 21, no. 1, pp. 2009–2020, 2024. Last Modified: 2024-01-25T08:14+05:30 Publisher: World Journal of Advanced Research an...

  4. [4]

    Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,

    S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh, “Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,” inComputer Vision – ECCV 2022(S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, eds.), vol. 13699, pp. 160– 178, Cham: Springer Nature Switzerland, 2022. Series Title: Lecture Notes in Computer Science

  5. [5]

    C4RFNet: Camera and 4D-Radar Fusion Network for Point Cloud Enhancement,

    W. Wang, W. Wang, X. Yu, and W. Zhang, “C4RFNet: Camera and 4D-Radar Fusion Network for Point Cloud Enhancement,”IEEE Sensors Journal, vol. 25, pp. 7596–7610, Feb. 2025

  6. [6]

    BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,

    C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu, J. Zhou, and J. Dai, “BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,” in2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17830–17839, June 2023. ISSN: 2575-7075

  7. [7]

    Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,

    T.-Y . Lim and A. Ansari, “Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,” inNeurIPS Ma- chine Learning for Autonomous Driving Workshop, 2019

  8. [8]

    Bridging the View Disparity Between Radar and Camera Features for Multi- Modal Fusion 3D Object Detection,

    T. Zhou, J. Chen, Y . Shi, K. Jiang, M. Yang, and D. Yang, “Bridging the View Disparity Between Radar and Camera Features for Multi- Modal Fusion 3D Object Detection,”IEEE Transactions on Intelligent Vehicles, vol. 8, pp. 1523–1535, Feb. 2023

  9. [9]

    EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection,

    H. Hu, F. Wang, J. Su, Y . Wang, L. Hu, W. Fang, J. Xu, and Z. Zhang, “EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection,” Aug. 2023. arXiv:2303.17895 [cs]

  10. [10]

    Cross-view Transformers for real- time Map-view Semantic Segmentation,

    B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-view Transformers for real- time Map-view Semantic Segmentation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13750–13759, CVPR, June 2022

  11. [11]

    BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,

    Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2774–2781, May 2023

  12. [12]

    Raw High-Definition Radar for Multi-Task Learning,

    J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw High-Definition Radar for Multi-Task Learning,” inIEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 17021–17030, IEEE, 2022

  13. [13]

    A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data

    K. Chandrasekaran, S. Grigorescu, G. Dubbelman, and P. Jancura, “A Resource Efficient Fusion Network for Object Detection in Bird’s- Eye View using Camera and Raw Radar Data,” inIEEE Intelligent Transportation Systems Conference (ITSC) 2024, IEEE, Nov. 2024. arXiv:2411.13311 [cs]

  14. [14]

    T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,

    J. Giroux, M. Bouchard, and R. Laganiere, “T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,” Mar. 2023. arXiv:2303.16940 [cs]

  15. [15]

    ADCNet: Learning from Raw Radar Data via Distillation,

    B. Yang, I. Khatri, M. Happold, and C. Chen, “ADCNet: Learning from Raw Radar Data via Distillation,” Dec. 2023. arXiv:2303.11420 [cs, eess]

  16. [16]

    Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,

    Y . Jin, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,”IEEE Transactions on Intelligent Vehicles, vol. 8, pp. 3012–3025, Apr. 2023. Conference Name: IEEE Transactions on Intelligent Vehicles

  17. [17]

    SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data,

    J. Wu, M. Meuter, M. Schoeler, and M. Rottmann, “SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data,” July

  18. [18]

    arXiv:2406.10600 [cs]

  19. [19]

    ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,

    L. Liu, S. Zhi, Z. Du, L. Liu, X. Zhang, K. Huo, and W. Jiang, “ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,” July 2023. arXiv:2307.08233 [cs]

  20. [20]

    Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,

    Y . Liu, F. Wang, N. Wang, and Z.-X. Zhang, “Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,”Advances in Neural Information Processing Systems, vol. 36, pp. 53964–53982, Dec. 2023

  21. [21]

    A Deep Automotive Radar Detector Using the RaDelft Dataset,

    I. Roldan, A. Palffy, J. F. P. Kooij, D. M. Gavrila, F. Fioranelli, and A. Yarovoy, “A Deep Automotive Radar Detector Using the RaDelft Dataset,”IEEE Transactions on Radar Systems, vol. 2, pp. 1062–1075, 2024

  22. [22]

    RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,

    T.-Y . Lim, S. A. Markowitz, and M. N. Do, “RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,”IEEE Journal of Selected Topics in Signal Processing, vol. 15, pp. 941–953, June 2021. Conference Name: IEEE Journal of Selected Topics in Signal Processing

  23. [23]

    CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,

    A. Ouaknine, A. Newson, J. Rebut, F. Tupin, and P. P ´erez, “CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,” May 2021. arXiv:2005.01456 [cs]

  24. [24]

    RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,

    A. Zhang, F. E. Nowruzi, and R. Laganiere, “RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,” in2021 18th Conference on Robots and Vision (CRV), pp. 95– 102, May 2021

  25. [25]

    RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,

    M. Sheeny, E. De Pellegrin, S. Mukherjee, A. Ahrabian, S. Wang, and A. Wallace, “RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,” Apr. 2021. arXiv:2010.09076 [cs]

  26. [26]

    High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,

    M. Mostajabi, C. M. Wang, D. Ranjan, and G. Hsyu, “High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 450–457, June 2020. ISSN: 2160-7516

  27. [27]

    Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,

    S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu, and Y . Yue, “Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,”IEEE Transactions on Intelligent Vehicles, pp. 1–40, 2023. arXiv:2304.10410 [cs]

  28. [28]

    Multi-Sensor Fusion in Automated Driving: A Survey,

    Z. Wang, Y . Wu, and Q. Niu, “Multi-Sensor Fusion in Automated Driving: A Survey,”IEEE Access, vol. 8, pp. 2847–2868, 2020

  29. [29]

    K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,

    D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,” Nov. 2023. arXiv:2206.08171 [cs]

  30. [30]

    CNN based Road User Detection using the 3D Radar Cube,

    A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “CNN based Road User Detection using the 3D Radar Cube,”IEEE Robotics and Auto- mation Letters, vol. 5, pp. 1263–1270, Apr. 2020. arXiv:2004.12165 [cs]

  31. [31]

    Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,

    G. Zhang, H. Li, and F. Wenger, “Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4487–4491, May 2020. ISSN: 2379-190X

  32. [32]

    RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,

    Y . Wang, Z. Jiang, Y . Li, J.-N. Hwang, G. Xing, and H. Liu, “RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,”IEEE Journal of Se- lected Topics in Signal Processing, vol. 15, pp. 954–967, June 2021. arXiv:2102.05150 [cs, eess]

  33. [33]

    Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey,

    K. Shi, S. He, Z. Shi, A. Chen, Z. Xiong, J. Chen, and J. Luo, “Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey,” Oct. 2024. arXiv:2410.19872 [cs]

  34. [34]

    Hvdetfusion: A simple and robust camera-radar fusion framework.arXiv preprint arXiv:2307.11323, 2023

    K. Lei, Z. Chen, S. Jia, and X. Zhang, “HVDetFusion: A Simple and Robust Camera-Radar Fusion Framework,” July 2023. arXiv:2307.11323 [cs]

  35. [35]

    RADIANT: Radar-Image Association Network for 3D Object Detec- tion,

    Y . Long, A. Kumar, D. Morris, X. Liu, M. Castro, and P. Chakravarty, “RADIANT: Radar-Image Association Network for 3D Object Detec- tion,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1808–1816, June 2023. Number: 2

  36. [36]

    RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection,

    J. Kim, M. Seong, G. Bang, D. Kum, and J. W. Choi, “RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection,” May

  37. [37]

    arXiv:2307.10249 [cs]

  38. [38]

    Semantic Segmentation-Based Occupancy Grid Map Learning With Automotive Radar Raw Data,

    Y . Jin, M. Hoffmann, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Semantic Segmentation-Based Occupancy Grid Map Learning With Automotive Radar Raw Data,”IEEE Transactions on Intelligent Vehicles, vol. 9, pp. 216–230, Jan. 2024. Conference Name: IEEE Transactions on Intelligent Vehicles

  39. [39]

    TransRadar: Adaptive- Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation,

    Y . Dalbah, J. Lahoud, and H. Cholakkal, “TransRadar: Adaptive- Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation,” in2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (Waikoloa, HI, USA), pp. 352–361, IEEE, Jan. 2024

  40. [40]

    Exploring Radar Data Representations in Autonomous Driving: A Comprehensive Review,

    S. Yao, R. Guan, Z. Peng, C. Xu, Y . Shi, W. Ding, E. G. Lim, Y . Yue, H. Seo, K. L. Man, J. Ma, X. Zhu, and Y . Yue, “Exploring Radar Data Representations in Autonomous Driving: A Comprehensive Review,” Apr. 2024. arXiv:2312.04861 [cs]

  41. [41]

    MIMO Radar, Techniques and Opportunities,

    B. J. Donnet and I. D. Longstaff, “MIMO Radar, Techniques and Opportunities,” in2006 European Radar Conference, pp. 112–115, Sept. 2006

  42. [42]

    Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks,

    C. Lu, M. J. G. v. d. Molengraft, and G. Dubbelman, “Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks,”IEEE Robotics and Automation Letters, vol. 4, pp. 445–452, Apr. 2019. arXiv:1804.02176 [cs]

  43. [43]

    Epanechnikov Variational Autoencoder,

    T. Qin and W.-M. Huang, “Epanechnikov Variational Autoencoder,” May 2024. arXiv:2405.12783 [stat]

  44. [44]

    torch.nn.functional.interpolate — PyTorch 2.5 documentation

    “torch.nn.functional.interpolate — PyTorch 2.5 documentation.”

  45. [45]

    Focal Loss for Dense Object Detection

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” Feb. 2018. arXiv:1708.02002 [cs]

  46. [46]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” Jan. 2017. arXiv:1412.6980 [cs]