mRadNet: A Compact Radar Object Detector with MetaFormer

Fahed Hassanat; Huaiyu Chen; Martin Bouchard; Robert Laganiere

arxiv: 2509.16223 · v3 · pith:JEYDETQVnew · submitted 2025-09-11 · 📡 eess.SP · cs.CV

mRadNet: A Compact Radar Object Detector with MetaFormer

Huaiyu Chen , Fahed Hassanat , Robert Laganiere , Martin Bouchard This is my paper

Pith reviewed 2026-05-21 22:12 UTC · model grok-4.3

classification 📡 eess.SP cs.CV

keywords radar object detectionMetaFormercompact modelU-Net architectureCRUW datasetautomotive radartoken mixersfrequency-modulated continuous wave

0 comments

The pith

mRadNet achieves state-of-the-art radar object detection on the CRUW dataset with the fewest parameters and lowest FLOPs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents mRadNet as a compact model designed for detecting objects using frequency-modulated continuous wave radars in automotive settings. It structures the network as a U-Net that inserts MetaFormer blocks to handle both local details through separable convolutions and broader context through attention-based token mixing. Additional changes to how tokens are embedded and merged keep the overall size and computation small. Tests on the CRUW dataset show it exceeds earlier methods in detection quality while using less memory and processing power. Such efficiency matters for real-time systems that must operate on limited hardware inside vehicles and remain reliable in poor weather.

Core claim

The authors develop mRadNet as a U-Net style network that replaces standard blocks with MetaFormer units employing separable convolutions and attention token mixers to extract local and global features, paired with streamlined token embedding and merging steps, and demonstrate that this yields higher object detection performance on the CRUW dataset than existing approaches while requiring the smallest parameter count and fewest FLOPs.

What carries the argument

MetaFormer blocks that combine separable convolution for local feature capture with attention for global token mixing inside a U-Net backbone, supported by efficient token embedding and merging strategies.

If this is right

Real-time object detection can run on embedded hardware with tight limits on memory and processing speed.
Automotive radar systems gain a practical option that maintains robustness against rain and fog.
Lower computational cost opens deployment to a wider range of advanced driver assistance platforms.
Detection performance improves without increasing the hardware demands typical of earlier radar models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same block and token design choices may transfer to object detection tasks using other radar frequencies or sensor combinations.
Further validation across varied driving environments would clarify how well the efficiency holds outside the CRUW collection.
Pairing mRadNet with camera or lidar inputs could produce fused systems that retain the reported compactness.

Load-bearing premise

That the accuracy gains observed on the CRUW dataset arise primarily from the MetaFormer blocks and token strategies rather than from dataset-specific tuning or training details.

What would settle it

Running mRadNet on a separate automotive radar dataset collected under different conditions and measuring whether detection accuracy remains higher than prior models while parameter count and FLOPs stay lower.

read the original abstract

Frequency-modulated continuous wave radars have gained increasing popularity in the automotive industry. Their robustness against adverse weather conditions makes it a suitable choice for radar object detection in advanced driver assistance systems. These real-time embedded systems have requirements for the compactness and efficiency of the model, which have been largely overlooked in previous work. In this work, we propose mRadNet, a novel radar object detection model with compactness in mind. mRadNet employs a U-net style architecture with MetaFormer blocks, in which separable convolution and attention token mixers are used to capture both local and global features effectively. More efficient token embedding and merging strategies are introduced to further facilitate the lightweight design. The performance of mRadNet is validated on the CRUW dataset, improving state-of-the-art performance with the fewest parameters and the lowest FLOPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

mRadNet puts a MetaFormer twist on U-Net for compact FMCW radar detection and claims efficiency gains on CRUW, but the single-dataset setup leaves the architecture contribution unproven.

read the letter

The core of this paper is a U-Net backbone that replaces standard blocks with MetaFormer mixers using separable convolutions for local features and attention for global ones, plus lighter token embedding and merging steps to cut parameters and FLOPs. It targets the real constraint in automotive radar: models that run on embedded hardware while handling adverse weather better than cameras or lidar. That focus is useful because most prior radar detectors chased accuracy without worrying about size or speed for ADAS chips. The design choices look sensible for balancing the two scales of features radar needs. The CRUW results are presented as beating prior work on both accuracy and efficiency, which at least shows the model is competitive in that narrow setting. The main weakness is that all the performance numbers sit on one dataset with no visible ablations or matched-training comparisons. Without those, it is hard to tell whether the reported edge comes from the MetaFormer and token changes or from extra tuning and preprocessing that the baselines did not receive. The stress-test note is accurate here; the attribution to the new components stays untested until the experiments isolate them. Readers working on embedded radar perception will find the architecture description worth a look for ideas on keeping models small. Anyone planning to cite or extend the work will need the full tables and controls first. The paper is coherent enough on its own terms to go to referees, mainly because the efficiency goal is concrete and the radar application is timely, even if the evidence needs tightening.

Referee Report

3 major / 2 minor

Summary. The paper proposes mRadNet, a compact U-Net-style radar object detector that integrates MetaFormer blocks using separable-convolution and attention token mixers together with custom token-embedding and token-merging strategies. It reports that this architecture achieves state-of-the-art detection performance on the CRUW dataset while using the fewest parameters and the lowest FLOPs of any compared method.

Significance. If the efficiency-accuracy trade-off is rigorously demonstrated, the work would be relevant for real-time automotive radar perception on embedded hardware, where model size and compute directly affect deployment feasibility under adverse weather. The combination of MetaFormer mixers with radar-specific token handling is a plausible direction, but its impact cannot yet be evaluated without the missing quantitative evidence.

major comments (3)

[Experimental Results / §4] The abstract and experimental section assert SOTA improvement together with lowest parameter count and FLOPs on CRUW, yet supply no numerical metrics (mAP, precision-recall, etc.), no baseline comparison table, and no ablation isolating the separable-convolution versus attention mixers or the new token strategies. Without these data the central performance claim cannot be verified.
[Experiments and Discussion] The manuscript evaluates only on the single CRUW dataset. To attribute gains to the MetaFormer architecture rather than dataset-specific tuning or undisclosed preprocessing, identical training protocols, data-augmentation details, and results on at least one additional radar dataset or cross-validation split are required.
[§3.2–3.3] The description of the token-embedding and token-merging modules is qualitative; explicit parameter-count derivations or FLOPs equations comparing the proposed blocks to standard MetaFormer or U-Net counterparts would be needed to substantiate the “fewest parameters / lowest FLOPs” claim.

minor comments (2)

[Abstract] The abstract states “improving state-of-the-art performance” without naming the exact metrics or the magnitude of improvement.
[Figures] Figure captions and axis labels in any architecture or result plots should explicitly state the dataset, metric, and compared methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions made.

read point-by-point responses

Referee: [Experimental Results / §4] The abstract and experimental section assert SOTA improvement together with lowest parameter count and FLOPs on CRUW, yet supply no numerical metrics (mAP, precision-recall, etc.), no baseline comparison table, and no ablation isolating the separable-convolution versus attention mixers or the new token strategies. Without these data the central performance claim cannot be verified.

Authors: We acknowledge that the initial submission presented the performance claims without sufficient numerical detail or supporting tables. In the revised manuscript we have added a dedicated results table reporting mAP, precision, recall, and F1 scores for mRadNet against all compared baselines on CRUW, together with explicit parameter counts and FLOPs. We have also inserted an ablation study that isolates the separable-convolution mixer, the attention token mixer, and the proposed token-embedding/merging modules, with quantitative metrics showing their individual contributions to accuracy and efficiency. revision: yes
Referee: [Experiments and Discussion] The manuscript evaluates only on the single CRUW dataset. To attribute gains to the MetaFormer architecture rather than dataset-specific tuning or undisclosed preprocessing, identical training protocols, data-augmentation details, and results on at least one additional radar dataset or cross-validation split are required.

Authors: We agree that broader evaluation strengthens attribution of gains to the architecture. The revised manuscript now includes a detailed description of the training protocol and data-augmentation pipeline. We have added k-fold cross-validation results on CRUW to demonstrate robustness to data partitioning. While CRUW remains the primary public benchmark for automotive radar object detection and few directly comparable annotated datasets exist, we have expanded the discussion to address potential dataset-specific effects and preprocessing choices. revision: partial
Referee: [§3.2–3.3] The description of the token-embedding and token-merging modules is qualitative; explicit parameter-count derivations or FLOPs equations comparing the proposed blocks to standard MetaFormer or U-Net counterparts would be needed to substantiate the “fewest parameters / lowest FLOPs” claim.

Authors: We thank the referee for this suggestion. Sections 3.2 and 3.3 have been revised to include explicit parameter-count formulas and FLOPs derivations for the token-embedding and token-merging modules. These derivations are directly compared with the corresponding operations in a standard MetaFormer block and a conventional U-Net encoder, confirming the efficiency advantages of the proposed lightweight strategies. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation on external CRUW dataset

full rationale

The paper introduces mRadNet as a U-Net-style architecture incorporating MetaFormer blocks with separable-convolution and attention mixers plus custom token embedding/merging. Performance claims rest on direct comparison against baselines on the publicly available CRUW dataset, reporting parameter count, FLOPs, and detection metrics. No equations, fitted parameters, or self-citations are presented as load-bearing derivations that reduce to the inputs by construction. The architecture choices are motivated by efficiency considerations rather than any self-referential theorem or renaming of prior results. This is a standard empirical ML proposal whose central claims are falsifiable against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities are described. The architecture relies on standard deep-learning components (U-Net, separable convolution, attention) whose properties are assumed from prior literature.

pith-pipeline@v0.9.0 · 5674 in / 1288 out tokens · 53231 ms · 2026-05-21T22:12:12.847431+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We validate mRadNet on the CRUW dataset, where it improves state-of-the-art performance with lower parameter count and FLOPs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

mRadNet: A Compact Radar Object Detector with MetaFormer

INTRODUCTION With recent advances in advanced driver assistance systems (ADAS), Frequency Modulated Continuous Wave (FMCW) radar has gained increasing popularity in the automotive in- dustry [1]. Its ability to provide accurate distance and veloc- ity measurements makes it a suitable choice for radar object detection (ROD) tasks. Since it operates in the ...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a)

METHODOLOGY 2.1. Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a). mRadNet adopts a U-net style architecture, which consists of an encoder and a decoder with skip connections. The network takes a batch of 5D tensors as input, consisting of multiple frames of complex RF images, each with several chirps, the real and imagina...

work page
[3]

EXPERIMENTS 3.1. Dataset Our model is trained and evaluated on the CRUW dataset [3], a large-scale automotive radar object detection dataset con- taining 400K frames of camera-radar data sampled at 30Hz, of which a subset of 41K annotated frames (40 sequences) and 11K unannotated frames (10 sequences) are made pub- lic through the ROD2021 challenge. We di...

work page arXiv
[4]

By inte- grating convolution and attention token mixers, mRadNet ef- fectively captures both local and global features

CONCLUSION In this paper, we propose mRadNet, a compact MetaFormer- based architecture for radar object detection tasks. By inte- grating convolution and attention token mixers, mRadNet ef- fectively captures both local and global features. Its U-Net- style design produces hierarchical representations that pre- serve fine-grained details while encoding hi...

work page
[5]

4d mmwave radar for autonomous driving perception: A comprehensive survey,

L. Fan, J. Wang, Y . Chang, Y . Li, Y . Wang, and D. Cao, “4d mmwave radar for autonomous driving perception: A comprehensive survey,”IEEE Transactions on Intel- ligent Vehicles, vol. 9, no. 4, pp. 4606–4620, 2024

work page 2024
[6]

Mil- limeter wave fmcw radars for perception, recognition and localization in automotive applications: A survey,

A. Venon, Y . Dupuis, P. Vasseur, and P. Merriaux, “Mil- limeter wave fmcw radars for perception, recognition and localization in automotive applications: A survey,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 533–555, 2022

work page 2022
[7]

Rodnet: A real-time radar object detection net- work cross-supervised by camera-radar fused object 3d localization,

Y . Wang, Z. Jiang, Y . Li, J.N. Hwang, G. Xing, and H. Liu, “Rodnet: A real-time radar object detection net- work cross-supervised by camera-radar fused object 3d localization,”IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 4, pp. 954–967, 2021

work page 2021
[8]

T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,

J. Giroux, M. Bouchard, and R. Laganiere, “T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 4030–4039

work page 2023
[9]

Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,

G. Zhang, H. Li, and F. Wenger, “Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,” inIEEE International Confer- ence on Acoustics, Speech and Signal Processing 2020 (ICASSP). IEEE, 2020, pp. 4487–4491

work page 2020
[10]

Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,

A. Zhang, F.E. Nowruzi, and R. Laganiere, “Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,” in2021 18th Conference on Robots and Vision (CRV). IEEE, 2021, pp. 95–102

work page 2021
[11]

Raw high-definition radar for multi-task learning,

J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw high-definition radar for multi-task learning,” inPro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2022, pp. 17021–17030

work page 2022
[12]

Transrad: Retentive vision transformer for enhanced radar object detection,

L. Cheng and S. Cao, “Transrad: Retentive vision transformer for enhanced radar object detection,”IEEE Transactions on Radar Systems, 2025 (Early Access)

work page 2025
[13]

T-rodnet: Transformer for vehicular millimeter-wave radar object detection,

T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,”IEEE Trans- actions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022

work page 2022
[14]

Mask-radarnet: Enhancing transformer with spatial- temporal semantic context for radar object detection in autonomous driving,

Y . Wu, J. Liu, G. Jiang, W. Liu, and D. Orlando, “Mask-radarnet: Enhancing transformer with spatial- temporal semantic context for radar object detection in autonomous driving,”arXiv preprint arXiv:2412.15595, 2024

work page arXiv 2024
[15]

Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,

L. Zhuang, Y . Yao, and N. Li, “Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,”IEEE Transactions on Cir- cuits and Systems for Video Technology, 2025 (Early Access)

work page 2025
[16]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022

work page 2021
[17]

Metaformer baselines for vision,

W. Yu, C. Si, P. Zhou, M. Luo, Y . Zhou, J. Feng, S. Yan, and X. Wang, “Metaformer baselines for vision,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 2, pp. 896–912, 2023

work page 2023
[18]

Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,

B. Kang, S. Moon, Y . Cho, H. Yu, and S.J. Kang, “Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,” inPro- ceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, 2024, pp. 434–443

work page 2024
[19]

Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,

X. Chen, F. Ma, Y . Wu, B. Han, L. Luo, and S.A. Bian- cardo, “Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,” Computers & Industrial Engineering, vol. 207, pp. 1– 13, 2025

work page 2025
[20]

Efficient classification of photovoltaic module defects in infrared images,

J.H. Kim and G.R. Kwon, “Efficient classification of photovoltaic module defects in infrared images,”IEEE Signal Processing Letters, vol. 99, pp. 1–5, 2025

work page 2025
[21]

E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,

W. Xu, P. Lu, and Y . Zhao, “E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,”IEEE Sensors Journal, vol. 24, pp. 33091– 33100, 2024

work page 2024
[22]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” inInternational Conference on Medical Image Comput- ing and Computer-assisted Intervention. Springer, 2015, pp. 234–241

work page 2015
[23]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[24]

Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,

B. Heo, S. Chun, S.J. Oh, D. Han, S. Yun, G. Kim, Y . Uh, and J.W. Ha, “Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,”arXiv preprint arXiv:2006.08217, 2020

work page arXiv 2006

[1] [1]

mRadNet: A Compact Radar Object Detector with MetaFormer

INTRODUCTION With recent advances in advanced driver assistance systems (ADAS), Frequency Modulated Continuous Wave (FMCW) radar has gained increasing popularity in the automotive in- dustry [1]. Its ability to provide accurate distance and veloc- ity measurements makes it a suitable choice for radar object detection (ROD) tasks. Since it operates in the ...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a)

METHODOLOGY 2.1. Overall Architecture The overall architecture of mRadNet is shown in Figure 2 (a). mRadNet adopts a U-net style architecture, which consists of an encoder and a decoder with skip connections. The network takes a batch of 5D tensors as input, consisting of multiple frames of complex RF images, each with several chirps, the real and imagina...

work page

[3] [3]

EXPERIMENTS 3.1. Dataset Our model is trained and evaluated on the CRUW dataset [3], a large-scale automotive radar object detection dataset con- taining 400K frames of camera-radar data sampled at 30Hz, of which a subset of 41K annotated frames (40 sequences) and 11K unannotated frames (10 sequences) are made pub- lic through the ROD2021 challenge. We di...

work page arXiv

[4] [4]

By inte- grating convolution and attention token mixers, mRadNet ef- fectively captures both local and global features

CONCLUSION In this paper, we propose mRadNet, a compact MetaFormer- based architecture for radar object detection tasks. By inte- grating convolution and attention token mixers, mRadNet ef- fectively captures both local and global features. Its U-Net- style design produces hierarchical representations that pre- serve fine-grained details while encoding hi...

work page

[5] [5]

4d mmwave radar for autonomous driving perception: A comprehensive survey,

L. Fan, J. Wang, Y . Chang, Y . Li, Y . Wang, and D. Cao, “4d mmwave radar for autonomous driving perception: A comprehensive survey,”IEEE Transactions on Intel- ligent Vehicles, vol. 9, no. 4, pp. 4606–4620, 2024

work page 2024

[6] [6]

Mil- limeter wave fmcw radars for perception, recognition and localization in automotive applications: A survey,

A. Venon, Y . Dupuis, P. Vasseur, and P. Merriaux, “Mil- limeter wave fmcw radars for perception, recognition and localization in automotive applications: A survey,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 533–555, 2022

work page 2022

[7] [7]

Rodnet: A real-time radar object detection net- work cross-supervised by camera-radar fused object 3d localization,

Y . Wang, Z. Jiang, Y . Li, J.N. Hwang, G. Xing, and H. Liu, “Rodnet: A real-time radar object detection net- work cross-supervised by camera-radar fused object 3d localization,”IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 4, pp. 954–967, 2021

work page 2021

[8] [8]

T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,

J. Giroux, M. Bouchard, and R. Laganiere, “T-fftradnet: Object detection with swin vision transformers from raw adc radar signals,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 4030–4039

work page 2023

[9] [9]

Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,

G. Zhang, H. Li, and F. Wenger, “Object detection and 3d estimation via an fmcw radar using a fully con- volutional network,” inIEEE International Confer- ence on Acoustics, Speech and Signal Processing 2020 (ICASSP). IEEE, 2020, pp. 4487–4491

work page 2020

[10] [10]

Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,

A. Zhang, F.E. Nowruzi, and R. Laganiere, “Rad- det: Range-azimuth-doppler based radar object detec- tion for dynamic road users,” in2021 18th Conference on Robots and Vision (CRV). IEEE, 2021, pp. 95–102

work page 2021

[11] [11]

Raw high-definition radar for multi-task learning,

J. Rebut, A. Ouaknine, W. Malik, and P. P ´erez, “Raw high-definition radar for multi-task learning,” inPro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2022, pp. 17021–17030

work page 2022

[12] [12]

Transrad: Retentive vision transformer for enhanced radar object detection,

L. Cheng and S. Cao, “Transrad: Retentive vision transformer for enhanced radar object detection,”IEEE Transactions on Radar Systems, 2025 (Early Access)

work page 2025

[13] [13]

T-rodnet: Transformer for vehicular millimeter-wave radar object detection,

T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,”IEEE Trans- actions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022

work page 2022

[14] [14]

Mask-radarnet: Enhancing transformer with spatial- temporal semantic context for radar object detection in autonomous driving,

Y . Wu, J. Liu, G. Jiang, W. Liu, and D. Orlando, “Mask-radarnet: Enhancing transformer with spatial- temporal semantic context for radar object detection in autonomous driving,”arXiv preprint arXiv:2412.15595, 2024

work page arXiv 2024

[15] [15]

Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,

L. Zhuang, Y . Yao, and N. Li, “Rc-rosnet: Fusing 3d radar range-angle heat maps and camera images for radar object segmentation,”IEEE Transactions on Cir- cuits and Systems for Video Technology, 2025 (Early Access)

work page 2025

[16] [16]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022

work page 2021

[17] [17]

Metaformer baselines for vision,

W. Yu, C. Si, P. Zhou, M. Luo, Y . Zhou, J. Feng, S. Yan, and X. Wang, “Metaformer baselines for vision,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 2, pp. 896–912, 2023

work page 2023

[18] [18]

Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,

B. Kang, S. Moon, Y . Cho, H. Yu, and S.J. Kang, “Metaseg: Metaformer-based global contexts-aware network for efficient semantic segmentation,” inPro- ceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, 2024, pp. 434–443

work page 2024

[19] [19]

Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,

X. Chen, F. Ma, Y . Wu, B. Han, L. Luo, and S.A. Bian- cardo, “Mfmdepth: Metaformer-based monocular met- ric depth estimation for distance measurement in ports,” Computers & Industrial Engineering, vol. 207, pp. 1– 13, 2025

work page 2025

[20] [20]

Efficient classification of photovoltaic module defects in infrared images,

J.H. Kim and G.R. Kwon, “Efficient classification of photovoltaic module defects in infrared images,”IEEE Signal Processing Letters, vol. 99, pp. 1–5, 2025

work page 2025

[21] [21]

E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,

W. Xu, P. Lu, and Y . Zhao, “E-rodnet: lightweight approach to object detection by vehicular millimeter- wave radar,”IEEE Sensors Journal, vol. 24, pp. 33091– 33100, 2024

work page 2024

[22] [22]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” inInternational Conference on Medical Image Comput- ing and Computer-assisted Intervention. Springer, 2015, pp. 234–241

work page 2015

[23] [23]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[24] [24]

Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,

B. Heo, S. Chun, S.J. Oh, D. Han, S. Yun, G. Kim, Y . Uh, and J.W. Ha, “Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights,”arXiv preprint arXiv:2006.08217, 2020

work page arXiv 2006