mRadNet: A Compact Radar Object Detector with MetaFormer
Pith reviewed 2026-05-21 22:12 UTC · model grok-4.3
The pith
mRadNet achieves state-of-the-art radar object detection on the CRUW dataset with the fewest parameters and lowest FLOPs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors develop mRadNet as a U-Net style network that replaces standard blocks with MetaFormer units employing separable convolutions and attention token mixers to extract local and global features, paired with streamlined token embedding and merging steps, and demonstrate that this yields higher object detection performance on the CRUW dataset than existing approaches while requiring the smallest parameter count and fewest FLOPs.
What carries the argument
MetaFormer blocks that combine separable convolution for local feature capture with attention for global token mixing inside a U-Net backbone, supported by efficient token embedding and merging strategies.
If this is right
- Real-time object detection can run on embedded hardware with tight limits on memory and processing speed.
- Automotive radar systems gain a practical option that maintains robustness against rain and fog.
- Lower computational cost opens deployment to a wider range of advanced driver assistance platforms.
- Detection performance improves without increasing the hardware demands typical of earlier radar models.
Where Pith is reading between the lines
- The same block and token design choices may transfer to object detection tasks using other radar frequencies or sensor combinations.
- Further validation across varied driving environments would clarify how well the efficiency holds outside the CRUW collection.
- Pairing mRadNet with camera or lidar inputs could produce fused systems that retain the reported compactness.
Load-bearing premise
That the accuracy gains observed on the CRUW dataset arise primarily from the MetaFormer blocks and token strategies rather than from dataset-specific tuning or training details.
What would settle it
Running mRadNet on a separate automotive radar dataset collected under different conditions and measuring whether detection accuracy remains higher than prior models while parameter count and FLOPs stay lower.
read the original abstract
Frequency-modulated continuous wave radars have gained increasing popularity in the automotive industry. Their robustness against adverse weather conditions makes it a suitable choice for radar object detection in advanced driver assistance systems. These real-time embedded systems have requirements for the compactness and efficiency of the model, which have been largely overlooked in previous work. In this work, we propose mRadNet, a novel radar object detection model with compactness in mind. mRadNet employs a U-net style architecture with MetaFormer blocks, in which separable convolution and attention token mixers are used to capture both local and global features effectively. More efficient token embedding and merging strategies are introduced to further facilitate the lightweight design. The performance of mRadNet is validated on the CRUW dataset, improving state-of-the-art performance with the fewest parameters and the lowest FLOPs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes mRadNet, a compact U-Net-style radar object detector that integrates MetaFormer blocks using separable-convolution and attention token mixers together with custom token-embedding and token-merging strategies. It reports that this architecture achieves state-of-the-art detection performance on the CRUW dataset while using the fewest parameters and the lowest FLOPs of any compared method.
Significance. If the efficiency-accuracy trade-off is rigorously demonstrated, the work would be relevant for real-time automotive radar perception on embedded hardware, where model size and compute directly affect deployment feasibility under adverse weather. The combination of MetaFormer mixers with radar-specific token handling is a plausible direction, but its impact cannot yet be evaluated without the missing quantitative evidence.
major comments (3)
- [Experimental Results / §4] The abstract and experimental section assert SOTA improvement together with lowest parameter count and FLOPs on CRUW, yet supply no numerical metrics (mAP, precision-recall, etc.), no baseline comparison table, and no ablation isolating the separable-convolution versus attention mixers or the new token strategies. Without these data the central performance claim cannot be verified.
- [Experiments and Discussion] The manuscript evaluates only on the single CRUW dataset. To attribute gains to the MetaFormer architecture rather than dataset-specific tuning or undisclosed preprocessing, identical training protocols, data-augmentation details, and results on at least one additional radar dataset or cross-validation split are required.
- [§3.2–3.3] The description of the token-embedding and token-merging modules is qualitative; explicit parameter-count derivations or FLOPs equations comparing the proposed blocks to standard MetaFormer or U-Net counterparts would be needed to substantiate the “fewest parameters / lowest FLOPs” claim.
minor comments (2)
- [Abstract] The abstract states “improving state-of-the-art performance” without naming the exact metrics or the magnitude of improvement.
- [Figures] Figure captions and axis labels in any architecture or result plots should explicitly state the dataset, metric, and compared methods.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions made.
read point-by-point responses
-
Referee: [Experimental Results / §4] The abstract and experimental section assert SOTA improvement together with lowest parameter count and FLOPs on CRUW, yet supply no numerical metrics (mAP, precision-recall, etc.), no baseline comparison table, and no ablation isolating the separable-convolution versus attention mixers or the new token strategies. Without these data the central performance claim cannot be verified.
Authors: We acknowledge that the initial submission presented the performance claims without sufficient numerical detail or supporting tables. In the revised manuscript we have added a dedicated results table reporting mAP, precision, recall, and F1 scores for mRadNet against all compared baselines on CRUW, together with explicit parameter counts and FLOPs. We have also inserted an ablation study that isolates the separable-convolution mixer, the attention token mixer, and the proposed token-embedding/merging modules, with quantitative metrics showing their individual contributions to accuracy and efficiency. revision: yes
-
Referee: [Experiments and Discussion] The manuscript evaluates only on the single CRUW dataset. To attribute gains to the MetaFormer architecture rather than dataset-specific tuning or undisclosed preprocessing, identical training protocols, data-augmentation details, and results on at least one additional radar dataset or cross-validation split are required.
Authors: We agree that broader evaluation strengthens attribution of gains to the architecture. The revised manuscript now includes a detailed description of the training protocol and data-augmentation pipeline. We have added k-fold cross-validation results on CRUW to demonstrate robustness to data partitioning. While CRUW remains the primary public benchmark for automotive radar object detection and few directly comparable annotated datasets exist, we have expanded the discussion to address potential dataset-specific effects and preprocessing choices. revision: partial
-
Referee: [§3.2–3.3] The description of the token-embedding and token-merging modules is qualitative; explicit parameter-count derivations or FLOPs equations comparing the proposed blocks to standard MetaFormer or U-Net counterparts would be needed to substantiate the “fewest parameters / lowest FLOPs” claim.
Authors: We thank the referee for this suggestion. Sections 3.2 and 3.3 have been revised to include explicit parameter-count formulas and FLOPs derivations for the token-embedding and token-merging modules. These derivations are directly compared with the corresponding operations in a standard MetaFormer block and a conventional U-Net encoder, confirming the efficiency advantages of the proposed lightweight strategies. revision: yes
Circularity Check
No circularity: empirical validation on external CRUW dataset
full rationale
The paper introduces mRadNet as a U-Net-style architecture incorporating MetaFormer blocks with separable-convolution and attention mixers plus custom token embedding/merging. Performance claims rest on direct comparison against baselines on the publicly available CRUW dataset, reporting parameter count, FLOPs, and detection metrics. No equations, fitted parameters, or self-citations are presented as load-bearing derivations that reduce to the inputs by construction. The architecture choices are motivated by efficiency considerations rather than any self-referential theorem or renaming of prior results. This is a standard empirical ML proposal whose central claims are falsifiable against the external benchmark.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We validate mRadNet on the CRUW dataset, where it improves state-of-the-art performance with lower parameter count and FLOPs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
mRadNet: A Compact Radar Object Detector with MetaFormer
INTRODUCTION With recent advances in advanced driver assistance systems (ADAS), Frequency Modulated Continuous Wave (FMCW) radar has gained increasing popularity in the automotive in- dustry [1]. Its ability to provide accurate distance and veloc- ity measurements makes it a suitable choice for radar object detection (ROD) tasks. Since it operates in the ...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a)
METHODOLOGY 2.1. Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a). mRadNet adopts a U-net style architecture, which consists of an encoder and a decoder with skip connections. The network takes a batch of 5D tensors as input, consisting of multiple frames of complex RF images, each with several chirps, the real and imagina...
-
[3]
EXPERIMENTS 3.1. Dataset Our model is trained and evaluated on the CRUW dataset [3], a large-scale automotive radar object detection dataset con- taining 400K frames of camera-radar data sampled at 30Hz, of which a subset of 41K annotated frames (40 sequences) and 11K unannotated frames (10 sequences) are made pub- lic through the ROD2021 challenge. We di...
-
[4]
CONCLUSION In this paper, we propose mRadNet, a compact MetaFormer- based architecture for radar object detection tasks. By inte- grating convolution and attention token mixers, mRadNet ef- fectively captures both local and global features. Its U-Net- style design produces hierarchical representations that pre- serve fine-grained details while encoding hi...
-
[5]
4d mmwave radar for autonomous driving perception: A comprehensive survey,
L. Fan, J. Wang, Y . Chang, Y . Li, Y . Wang, and D. Cao, “4d mmwave radar for autonomous driving perception: A comprehensive survey,”IEEE Transactions on Intel- ligent Vehicles, vol. 9, no. 4, pp. 4606–4620, 2024
work page 2024
-
[6]
A. Venon, Y . Dupuis, P. Vasseur, and P. Merriaux, “Mil- limeter wave fmcw radars for perception, recognition and localization in automotive applications: A survey,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 533–555, 2022
work page 2022
-
[7]
Y . Wang, Z. Jiang, Y . Li, J.N. Hwang, G. Xing, and H. Liu, “Rodnet: A real-time radar object detection net- work cross-supervised by camera-radar fused object 3d localization,”IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 4, pp. 954–967, 2021
work page 2021
-
[8]
T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,
J. Giroux, M. Bouchard, and R. Laganiere, “T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 4030–4039
work page 2023
-
[9]
Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,
G. Zhang, H. Li, and F. Wenger, “Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,” inIEEE International Confer- ence on Acoustics, Speech and Signal Processing 2020 (ICASSP). IEEE, 2020, pp. 4487–4491
work page 2020
-
[10]
Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,
A. Zhang, F.E. Nowruzi, and R. Laganiere, “Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,” in2021 18th Conference on Robots and Vision (CRV). IEEE, 2021, pp. 95–102
work page 2021
-
[11]
Raw high-definition radar for multi-task learning,
J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw high-definition radar for multi-task learning,” inPro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2022, pp. 17021–17030
work page 2022
-
[12]
Transrad: Retentive vision transformer for enhanced radar object detection,
L. Cheng and S. Cao, “Transrad: Retentive vision transformer for enhanced radar object detection,”IEEE Transactions on Radar Systems, 2025 (Early Access)
work page 2025
-
[13]
T-rodnet: Transformer for vehicular millimeter-wave radar object detection,
T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,”IEEE Trans- actions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022
work page 2022
-
[14]
Y . Wu, J. Liu, G. Jiang, W. Liu, and D. Orlando, “Mask-radarnet: Enhancing transformer with spatial- temporal semantic context for radar object detection in autonomous driving,”arXiv preprint arXiv:2412.15595, 2024
-
[15]
Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,
L. Zhuang, Y . Yao, and N. Li, “Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,”IEEE Transactions on Cir- cuits and Systems for Video Technology, 2025 (Early Access)
work page 2025
-
[16]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022
work page 2021
-
[17]
Metaformer baselines for vision,
W. Yu, C. Si, P. Zhou, M. Luo, Y . Zhou, J. Feng, S. Yan, and X. Wang, “Metaformer baselines for vision,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 2, pp. 896–912, 2023
work page 2023
-
[18]
Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,
B. Kang, S. Moon, Y . Cho, H. Yu, and S.J. Kang, “Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,” inPro- ceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, 2024, pp. 434–443
work page 2024
-
[19]
Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,
X. Chen, F. Ma, Y . Wu, B. Han, L. Luo, and S.A. Bian- cardo, “Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,” Computers & Industrial Engineering, vol. 207, pp. 1– 13, 2025
work page 2025
-
[20]
Efficient classification of photovoltaic module defects in infrared images,
J.H. Kim and G.R. Kwon, “Efficient classification of photovoltaic module defects in infrared images,”IEEE Signal Processing Letters, vol. 99, pp. 1–5, 2025
work page 2025
-
[21]
E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,
W. Xu, P. Lu, and Y . Zhao, “E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,”IEEE Sensors Journal, vol. 24, pp. 33091– 33100, 2024
work page 2024
-
[22]
U-net: Convo- lutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” inInternational Conference on Medical Image Comput- ing and Computer-assisted Intervention. Springer, 2015, pp. 234–241
work page 2015
-
[23]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[24]
Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,
B. Heo, S. Chun, S.J. Oh, D. Han, S. Yun, G. Kim, Y . Uh, and J.W. Ha, “Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,”arXiv preprint arXiv:2006.08217, 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.