A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data
Pith reviewed 2026-05-23 08:11 UTC · model grok-4.3
The pith
Fusing camera bird's-eye-view features with range-azimuth features recovered from raw radar range-Doppler spectrum achieves competitive object detection accuracy at lower computational cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that object detection in bird's-eye view can be performed by transforming camera images into the BEV polar domain and extracting features with a dedicated encoder-decoder architecture, recovering range-azimuth features from the raw range-Doppler radar spectrum via a radar decoder, and fusing the two resulting maps to reach detection performance competitive with existing methods while lowering computational complexity on the RADIal dataset.
What carries the argument
The camera BEV-polar encoder-decoder paired with the radar decoder that reconstructs range-azimuth features directly from the raw range-Doppler input; their outputs are fused for detection.
If this is right
- Object detection proceeds without conventional radar point-cloud extraction or signal processing.
- Detection accuracy remains competitive with prior camera-radar fusion methods on the RADIal dataset.
- Overall computational complexity is reduced relative to methods that ingest processed radar data.
- The raw-spectrum route supplies sufficient information for the fusion step to succeed.
Where Pith is reading between the lines
- The efficiency gain could support higher frame-rate operation on embedded vehicle hardware.
- The same raw-spectrum decoder might be tested on other radar datasets to check whether the accuracy-complexity trade-off generalizes.
- Because radar remains functional when cameras are degraded by weather, the fusion could be examined for robustness in rain or fog even though the paper reports only nominal conditions.
Load-bearing premise
The raw range-Doppler spectrum contains enough semantic and structural information that a dedicated decoder can recover usable range-azimuth features for fusion with camera features.
What would settle it
Running the proposed network on the RADIal dataset and measuring detection accuracy below existing fusion baselines or computational metrics above those baselines would falsify the central claim.
Figures
read the original abstract
Cameras can be used to perceive the environment around the vehicle, while affordable radar sensors are popular in autonomous driving systems as they can withstand adverse weather conditions unlike cameras. However, radar point clouds are sparser with low azimuth and elevation resolution that lack semantic and structural information of the scenes, resulting in generally lower radar detection performance. In this work, we directly use the raw range-Doppler (RD) spectrum of radar data, thus avoiding radar signal processing. We independently process camera images within the proposed comprehensive image processing pipeline. Specifically, first, we transform the camera images to Bird's-Eye View (BEV) Polar domain and extract the corresponding features with our camera encoder-decoder architecture. The resultant feature maps are fused with Range-Azimuth (RA) features, recovered from the RD spectrum input from the radar decoder to perform object detection. We evaluate our fusion strategy with other existing methods not only in terms of accuracy but also on computational complexity metrics on RADIal dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a fusion network for object detection in bird's-eye view that processes camera images through a BEV-polar transformation and encoder-decoder pipeline, recovers range-azimuth features from the raw radar range-Doppler spectrum via a dedicated decoder, and fuses the resulting feature maps to perform detection. It claims this strategy achieves competitive accuracy while reducing computational complexity relative to existing methods, with evaluation on the RADIal dataset.
Significance. If the empirical claims hold, the work could contribute to resource-efficient multi-modal perception for autonomous driving by avoiding conventional radar signal processing and directly ingesting raw spectra. The dual focus on accuracy and complexity metrics addresses a relevant practical constraint. However, the provided manuscript contains only an abstract with no quantitative results, architecture details, ablations, or comparisons, so no assessment of actual significance is possible.
major comments (1)
- [Abstract] Abstract: The central claim that the proposed camera-radar fusion 'performs object detection with competitive accuracy and reduced computational complexity' cannot be evaluated because the manuscript supplies no accuracy metrics, complexity numbers (e.g., FLOPs, latency), baseline comparisons, ablation studies, or error analysis. This absence directly prevents verification of the result and of the assumption that raw RD-spectrum-derived RA features supply sufficient semantic information when fused with BEV-polar camera features.
Simulated Author's Rebuttal
We thank the referee for their review. We address the major comment below and note that the current submission consists solely of the abstract, as indicated in the provided materials.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the proposed camera-radar fusion 'performs object detection with competitive accuracy and reduced computational complexity' cannot be evaluated because the manuscript supplies no accuracy metrics, complexity numbers (e.g., FLOPs, latency), baseline comparisons, ablation studies, or error analysis. This absence directly prevents verification of the result and of the assumption that raw RD-spectrum-derived RA features supply sufficient semantic information when fused with BEV-polar camera features.
Authors: We agree that the abstract as provided does not contain the quantitative results, metrics, or analyses needed to substantiate the claims. The full manuscript will be expanded to include accuracy metrics (e.g., mAP, precision/recall), computational complexity measures (FLOPs, parameters, inference latency), direct comparisons against existing camera-radar fusion baselines, ablation studies on the fusion components, and error analysis, all evaluated on the RADIal dataset. These additions will enable verification of the performance claims and the sufficiency of the raw RD-derived RA features. revision: yes
Circularity Check
No circularity in abstract; derivation chain absent
full rationale
Only the abstract is available and it contains no equations, fitted parameters, predictions, or self-citations. The text describes an architecture (raw RD spectrum to RA features via decoder, camera to BEV-polar features, fusion for detection) without any claimed derivation that reduces outputs to inputs by construction. This is the most common honest finding when no load-bearing mathematical steps are present.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
REFNet++: Multi-Task Efficient Fusion of Camera and Radar Sensor Data in Bird's-Eye Polar View
REFNet++ aligns raw camera images and radar range-Doppler data into a shared bird's-eye polar view using variational encoders for multi-task vehicle detection and free space segmentation on the RADIal dataset.
Reference graph
Works this paper leans on
-
[1]
Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,
D. J. Yeong, G. Velasco-Hernandez, J. Barry, and J. Walsh, “Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review,” Sensors, vol. 21, p. 2140, Mar. 2021
work page 2021
-
[2]
Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,
S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh, “Radat- ron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,” in Computer Vision – ECCV 2022 (S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, eds.), vol. 13699, pp. 160– 178, Cham: Springer Nature Switzerland, 2022. Series Title: Lecture Notes in Computer Science
work page 2022
-
[3]
Richards, Principles of modern radar
M. Richards, Principles of modern radar . SciTech Pub., 2010
work page 2010
-
[4]
Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,
T.-Y . Lim and A. Ansari, “Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems,” in NeurIPS Ma- chine Learning for Autonomous Driving Workshop , 2019
work page 2019
-
[5]
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,
J. Philion and S. Fidler, “Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D,” inComputer Vision – ECCV 2020 (A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, eds.), (Cham), pp. 194–210, Springer International Publishing, 2020
work page 2020
-
[6]
Orthographic Feature Transform for Monocular 3D Object Detection
T. Roddick, A. Kendall, and R. Cipolla, “Orthographic Fea- ture Transform for Monocular 3D Object Detection,” Nov. 2018. arXiv:1811.08188 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
Cross-view Transformers for real- time Map-view Semantic Segmentation,
B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-view Transformers for real- time Map-view Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 13750–13759, CVPR, June 2022
work page 2022
-
[8]
PETR: Position Embedding Transformation for Multi-view 3D Object Detection,
Y . Liu, T. Wang, X. Zhang, and J. Sun, “PETR: Position Embedding Transformation for Multi-view 3D Object Detection,” in Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII , (Berlin, Heidelberg), pp. 531–548, Springer-Verlag, Oct. 2022
work page 2022
-
[9]
Petrv2: A unified framework for 3d perception from multi-camera images,
Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, X. Zhang, and J. Sun, “PETRv2: A Unified Framework for 3D Perception from Multi- Camera Images,” Nov. 2022. arXiv:2206.01256 [cs]
-
[10]
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,
Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s- Eye View Representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) , pp. 2774–2781, May 2023
work page 2023
-
[11]
Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “BEVFormer: Learning Bird’s-Eye-View Representation from Multi- camera Images via Spatiotemporal Transformers,” in Computer Vision – ECCV 2022 , (Cham), pp. 1–18, Springer Nature Switzerland, 2022
work page 2022
-
[12]
C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu, J. Zhou, and J. Dai, “BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 17830–17839, June 2023. ISSN: 2575-7075
work page 2023
-
[13]
Y . Li, H. Bao, Z. Ge, J. Yang, J. Sun, and Z. Li, “BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo,” Sept. 2022. arXiv:2209.10248 [cs]
-
[14]
STS: Surround-view Temporal Stereo for Multi-view 3D Detection,
Z. Wang, C. Min, Z. Ge, Y . Li, Z. Li, H. Yang, and D. Huang, “STS: Surround-view Temporal Stereo for Multi-view 3D Detection,” Aug
- [15]
-
[16]
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,
H. Wang, H. Tang, S. Shi, A. Li, Z. Li, B. Schiele, and L. Wang, “UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s- Eye-View Representation,” Aug. 2023. arXiv:2308.07732 [cs]
-
[17]
Raw High-Definition Radar for Multi-Task Learning,
J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw High-Definition Radar for Multi-Task Learning,” Apr. 2022. arXiv:2112.10646 [cs, eess]
-
[18]
CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,
A. Ouaknine, A. Newson, J. Rebut, F. Tupin, and P. P ´erez, “CAR- RADA Dataset: Camera and Automotive Radar with Range-Angle- Doppler Annotations,” May 2021. arXiv:2005.01456 [cs]
-
[19]
RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,
A. Zhang, F. E. Nowruzi, and R. Laganiere, “RADDet: Range- Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,” in 2021 18th Conference on Robots and Vision (CRV), pp. 95– 102, May 2021
work page 2021
-
[20]
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,
M. Sheeny, E. De Pellegrin, S. Mukherjee, A. Ahrabian, S. Wang, and A. Wallace, “RADIATE: A Radar Dataset for Automotive Perception in Bad Weather,” Apr. 2021. arXiv:2010.09076 [cs]
-
[21]
High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,
M. Mostajabi, C. M. Wang, D. Ranjan, and G. Hsyu, “High Resolution Radar Dataset for Semi-Supervised Learning of Dynamic Objects,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , pp. 450–457, June 2020. ISSN: 2160-7516
work page 2020
-
[22]
T.-Y . Lim, S. A. Markowitz, and M. N. Do, “RaDICaL: A Synchron- ized FMCW Radar, Depth, IMU and RGB Camera Data Dataset With Low-Level FMCW Radar Signals,” IEEE Journal of Selected Topics in Signal Processing , vol. 15, pp. 941–953, June 2021. Conference Name: IEEE Journal of Selected Topics in Signal Processing
work page 2021
-
[23]
K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,
D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,” Nov. 2023. arXiv:2206.08171 [cs]
-
[24]
Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,
B. Major, D. Fontijne, A. Ansari, R. T. Sukhavasi, R. Gowaikar, M. Hamilton, S. Lee, S. Grzechnik, and S. Subramanian, “Vehicle Detection With Automotive Radar Using Deep Learning on Range- Azimuth-Doppler Tensors,” in 2019 IEEE/CVF International Confer- ence on Computer Vision Workshop (ICCVW), pp. 924–932, Oct. 2019. ISSN: 2473-9944
work page 2019
-
[25]
CNN based Road User Detection using the 3D Radar Cube,
A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “CNN based Road User Detection using the 3D Radar Cube,” IEEE Robotics and Auto- mation Letters, vol. 5, pp. 1263–1270, Apr. 2020. arXiv:2004.12165 [cs]
-
[26]
Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,
G. Zhang, H. Li, and F. Wenger, “Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4487–4491, May 2020. ISSN: 2379-190X
work page 2020
-
[27]
Y . Wang, Z. Jiang, Y . Li, J.-N. Hwang, G. Xing, and H. Liu, “RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,” IEEE Journal of Se- lected Topics in Signal Processing , vol. 15, pp. 954–967, June 2021. arXiv:2102.05150 [cs, eess]
-
[28]
T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,
J. Giroux, M. Bouchard, and R. Laganiere, “T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,” Mar. 2023. arXiv:2303.16940 [cs]
-
[29]
ADCNet: Learning from Raw Radar Data via Distillation,
B. Yang, I. Khatri, M. Happold, and C. Chen, “ADCNet: Learning from Raw Radar Data via Distillation,” Dec. 2023. arXiv:2303.11420 [cs, eess]
-
[30]
Distant Vehicle Detection Using Radar and Vision
S. Chadwick, W. Maddern, and P. Newman, “Distant Vehicle Detection Using Radar and Vision,” May 2019. arXiv:1901.10951 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[31]
A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,
F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection,” May 2020. arXiv:2005.07431 [cs]
-
[32]
R. Nabati and H. Qi, “Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles,” Sept
- [33]
-
[34]
CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,
R. Nabati and H. Qi, “CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 1526–1535, Jan. 2021. arXiv:2011.04841 [cs]
-
[35]
Y . Kim, J. W. Choi, and D. Kum, “GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10857– 10864, Oct. 2020. ISSN: 2153-0866
work page 2020
-
[36]
CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,
Y . Kim, J. Shin, S. Kim, I.-J. Lee, J. W. Choi, and D. Kum, “CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception,” Dec. 2023. arXiv:2304.00670 [cs]
-
[37]
RVDet: Feature-level Fusion of Radar and Camera for Object Detection,
J. Zhang, M. Zhang, Z. Fang, Y . Wang, X. Zhao, and S. Pu, “RVDet: Feature-level Fusion of Radar and Camera for Object Detection,” in 2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), pp. 2822–2828, Sept. 2021
work page 2021
-
[38]
MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,
Z. Wu, G. Chen, Y . Gan, L. Wang, and J. Pu, “MVFusion: Multi- View 3D Object Detection with Semantic-aligned Radar and Camera Fusion,” Feb. 2023. arXiv:2302.10511 [cs]
-
[39]
J. Kim, Y . Kim, and D. Kum, “Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Mon- ocular Image,” in Proceedings of the Asian Conference on Computer Vision (ACCV), Proceedings of the Asian Conference on Computer Vision (ACCV), 2020
work page 2020
-
[40]
CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,
Y . Kim, S. Kim, J. W. Choi, and D. Kum, “CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer,” Nov
- [41]
-
[42]
Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,
Y . Jin, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Cross- Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,” IEEE Transactions on Intelligent Vehicles , vol. 8, pp. 3012–3025, Apr. 2023. Conference Name: IEEE Transactions on Intelligent Vehicles
work page 2023
-
[43]
ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,
L. Liu, S. Zhi, Z. Du, L. Liu, X. Zhang, K. Huo, and W. Jiang, “ROFusion: Efficient Object Detection using Hybrid Point-wise Radar- Optical Fusion,” July 2023. arXiv:2307.08233 [cs]
-
[44]
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,
Y . Liu, F. Wang, N. Wang, and Z.-X. Zhang, “Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,” Advances in Neural Information Processing Systems , vol. 36, pp. 53964–53982, Dec. 2023
work page 2023
-
[45]
Vision-Centric BEV Perception: A Survey,
Y . Ma, T. Wang, X. Bai, H. Yang, Y . Hou, Y . Wang, Y . Qiao, R. Yang, D. Manocha, and X. Zhu, “Vision-Centric BEV Perception: A Survey,” June 2023. arXiv:2208.02797 [cs]
-
[46]
PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,
Y . Jiang, L. Zhang, Z. Miao, X. Zhu, J. Gao, W. Hu, and Y .-G. Jiang, “PolarFormer: Multi-camera 3D Object Detection with Polar Transformer,” Jan. 2023. arXiv:2206.15398 [cs]
-
[47]
Transform image to bird’s-eye view - MATLAB transformImage
“Transform image to bird’s-eye view - MATLAB transformImage.”
-
[48]
scipy.ndimage.map coordinates — SciPy v1.12.0 Manual
“scipy.ndimage.map coordinates — SciPy v1.12.0 Manual.”
-
[49]
MIMO Radar, Techniques and Opportunities,
B. J. Donnet and I. D. Longstaff, “MIMO Radar, Techniques and Opportunities,” in 2006 European Radar Conference , pp. 112–115, Sept. 2006
work page 2006
-
[50]
Deep Residual Learning for Image Recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June 2016. ISSN: 1063-6919
work page 2016
-
[51]
S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y . Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu, and Y . Yue, “Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review,”IEEE Transactions on Intelligent Vehicles, pp. 1–40, 2023. arXiv:2304.10410 [cs]
-
[52]
J.-w. Hu, B.-y. Zheng, C. Wang, C.-h. Zhao, X.-l. Hou, Q. Pan, and Z. Xu, “A survey on multi-sensor fusion based obstacle detection for intelligent ground vehicles in off-road environments,” Frontiers of Information Technology & Electronic Engineering , vol. 21, pp. 675– 692, May 2020
work page 2020
-
[53]
Multi-Sensor Fusion in Automated Driving: A Survey,
Z. Wang, Y . Wu, and Q. Niu, “Multi-Sensor Fusion in Automated Driving: A Survey,” IEEE Access, vol. 8, pp. 2847–2868, 2020
work page 2020
-
[54]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” Jan. 2017. arXiv:1412.6980 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[55]
Focal Loss for Dense Object Detection
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” Feb. 2018. arXiv:1708.02002 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[56]
PIXOR: Real-time 3D Object Detection from Point Clouds
B. Yang, W. Luo, and R. Urtasun, “PIXOR: Real-time 3D Object Detection from Point Clouds,” Mar. 2019. arXiv:1902.06326 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[57]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
M. Tan and Q. V . Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Sept. 2020. arXiv:1905.11946 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[58]
L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, and P. M. Atkinson, “UNetFormer: A UNet-like transformer for efficient se- mantic segmentation of remote sensing urban scene imagery,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 190, pp. 196– 214, Aug. 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.