DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection
Pith reviewed 2026-05-25 07:40 UTC · model grok-4.3
The pith
DFIR-DETR fixes uniform attention, norm drift, and high-frequency loss in RT-DETR to raise small-object detection accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By tracing each proposed module back to a specific, measurable deficiency in the RT-DETR baseline—uniform attention that ignores spatial complexity, norm drift that destabilises upsampled features, and spatial convolutions that progressively suppress the high-frequency components small objects depend on—DFIR-DETR achieves 92.9 percent and 51.6 percent mAP50 on NEU-DET and VisDrone with only 11.7M parameters and 47.2 GFLOPs, demonstrating consistent gains across two qualitatively different detection domains.
What carries the argument
Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation modules, each explicitly linked to one of the three listed deficiencies in RT-DETR.
If this is right
- Small objects in cluttered or low-resolution scenes become reliably detectable without increasing model capacity.
- The same module-to-deficiency tracing method can be applied to other transformer-based detectors that share the RT-DETR backbone and neck structure.
- Detection pipelines for industrial inspection and drone imagery can adopt the architecture while staying within tight compute limits.
- High-frequency preservation techniques may reduce the need for deeper backbones when the task depends on edge detail.
Where Pith is reading between the lines
- If the frequency-domain module proves portable, it could be inserted into other vision transformers to protect fine detail without custom redesign.
- Dynamic feature aggregation might offer a general alternative to fixed attention patterns in tasks where object scale varies sharply within a single image.
- The reported efficiency numbers suggest the approach could support real-time small-object detection on edge hardware once integrated with existing deployment frameworks.
Load-bearing premise
The three listed deficiencies in RT-DETR are the main causes of weak small-object performance and are directly mitigated by the proposed modules.
What would settle it
An ablation study in which removing the frequency-domain refinement or dynamic aggregation module produces no drop in small-object mAP on NEU-DET or VisDrone while the full model still meets the reported parameter and FLOP budget.
Figures
read the original abstract
Small object detection in complex scenes exposes a fundamental tension in neural network design: backbone attention distributes computation uniformly regardless of content, pyramid necks inflate activation magnitudes during upsampling without norm compensation, and bottleneck convolutions progressively smooth high-frequency edge components through accumulated spatial filtering. In response, we develop DFIR-DETR by tracing each proposed module back to a specific, measurable deficiency in the RT-DETR baseline: uniform attention that ignores spatial complexity, norm drift that destabilises upsampled features, and spatial convolutions that progressively suppress the high-frequency components small objects depend on. On NEU-DET and VisDrone, DFIR-DETR achieves 92.9% and 51.6% mAP50 with only 11.7M parameters and 47.2 GFLOPs, demonstrating consistent gains across two qualitatively different detection domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DFIR-DETR, an extension of RT-DETR for small-object detection. It identifies three specific deficiencies in the RT-DETR baseline (uniform attention that ignores spatial complexity, norm drift during upsampling, and progressive high-frequency suppression by spatial convolutions) and introduces frequency-domain iterative refinement plus dynamic feature aggregation modules to address them. On NEU-DET and VisDrone the model is reported to reach 92.9 % and 51.6 % mAP50 respectively while using 11.7 M parameters and 47.2 GFLOPs.
Significance. If the performance gains can be shown to arise from the hypothesized module-level corrections rather than uncontrolled capacity or training differences, the work would supply a concrete, efficiency-aware route to improving high-frequency detail preservation in DETR-style detectors for industrial and aerial imagery.
major comments (2)
- Abstract: the claim that uniform attention, norm drift, and high-frequency suppression are the dominant causes of weak small-object performance is asserted without any quantitative diagnostics (attention entropy, per-stage feature-norm statistics, or Fourier spectra of feature maps) that would confirm the deficiencies exist at the claimed severity in the RT-DETR baseline.
- Abstract: the reported mAP50 figures are presented without ablation tables, controlled baseline comparisons, or statistical tests that isolate the contribution of each proposed module while holding parameter count and other architectural choices fixed; consequently the causal link between the modules and the observed gains cannot be evaluated.
minor comments (1)
- Abstract: baseline RT-DETR mAP50 numbers on the same two datasets are not supplied, preventing immediate assessment of the magnitude of improvement.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the motivation and empirical validation of our proposed modules. We address each major comment below and will revise the manuscript to strengthen the supporting evidence.
read point-by-point responses
-
Referee: Abstract: the claim that uniform attention, norm drift, and high-frequency suppression are the dominant causes of weak small-object performance is asserted without any quantitative diagnostics (attention entropy, per-stage feature-norm statistics, or Fourier spectra of feature maps) that would confirm the deficiencies exist at the claimed severity in the RT-DETR baseline.
Authors: We acknowledge that the abstract states these deficiencies without accompanying quantitative diagnostics. The design of each module was motivated by observed behaviors during development of the RT-DETR baseline, but explicit metrics such as attention entropy, feature-norm statistics, or Fourier spectra were not reported. In the revised manuscript we will add these diagnostic analyses on the baseline to substantiate the claimed severity of each issue. revision: yes
-
Referee: Abstract: the reported mAP50 figures are presented without ablation tables, controlled baseline comparisons, or statistical tests that isolate the contribution of each proposed module while holding parameter count and other architectural choices fixed; consequently the causal link between the modules and the observed gains cannot be evaluated.
Authors: The current manuscript presents end-to-end results on NEU-DET and VisDrone but does not include module-level ablations with parameter-controlled baselines or statistical significance tests. We agree that such experiments are necessary to establish the contribution of each component. The revised version will incorporate detailed ablation tables that isolate the frequency-domain iterative refinement and dynamic feature aggregation modules while keeping parameter count and training settings fixed. revision: yes
Circularity Check
No circularity; purely empirical architecture claims
full rationale
The paper introduces DFIR-DETR as an empirical modification of RT-DETR, motivated by three listed deficiencies and validated solely by reported mAP50 numbers on NEU-DET and VisDrone. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided text. The performance figures are measurements, not outputs that reduce to the inputs by construction, so the derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A survey of object detection for uavs based on deep learning,
J. Yin, F. Wu, Y . Qiu, C. Liu, B. Guo, and C. Zhu, “A survey of object detection for uavs based on deep learning,”Remote Sensing, vol. 16, no. 1, p. 149, 2024
work page 2024
-
[2]
K. Liu and J. Zheng, “Uav trajectory optimization for time-constrained data collection in uav-enabled environmental monitoring systems,”IEEE Internet of Things Journal, vol. 9, no. 24, pp. 24 300–24 314, 2022
work page 2022
-
[3]
Z. Feng, D. Wu, M. Huanget al., “Graph attention-based reinforcement learning for trajectory design and resource assignment in multi-uav assisted communication,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 21 847–21 862, 2024
work page 2024
-
[4]
T. Lei, R. Wang, Y . Zhang, Y . Wan, C. Liu, and A. K. Nandi, “Cat- ednet: Cross-attention transformer-based encoder-decoder network for salient defect detection of strip steel surface,”IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–10, 2022
work page 2022
-
[5]
W. Zhouet al., “Mjpnet-s*: Multistyle joint-perception network with knowledge distillation for drone rgb-thermal crowd density estimation in smart cities,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 20 327–20 339, 2024
work page 2024
-
[6]
A new subspace clustering strategy for ai-based data analysis in iot system,
Z. Cui, X. Jing, P. Zhao, W. Zhang, and J. Chen, “A new subspace clustering strategy for ai-based data analysis in iot system,”IEEE Internet of Things Journal, vol. 9, no. 1, pp. 97–112, 2022. 15
work page 2022
-
[7]
Object detection with deep learning: A review,
Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,”IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019
work page 2019
-
[10]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision. Springer, 2020, pp. 213– 229
work page 2020
-
[11]
Detrs beat yolos on real-time object detection,
Y . Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y . Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024, pp. 16 965–16 974
work page 2024
-
[12]
A survey of small object detection based on deep learning in aerial images,
J. Liu, L. Wang, and M. Zhang, “A survey of small object detection based on deep learning in aerial images,”Artificial Intelligence Review, vol. 58, pp. 1–45, 2025
work page 2025
-
[13]
Z. Yuan, J. Gong, B. Guo, C. Wang, N. Liao, J. Song, and Q. Wu, “Small object detection in uav remote sensing images based on intra- group multi-scale fusion attention and adaptive weighted feature fusion mechanism,”Remote Sensing, vol. 16, no. 22, p. 4265, 2024
work page 2024
-
[14]
Attention mechanisms in computer vision: A survey,
M. Wang and W. Deng, “Attention mechanisms in computer vision: A survey,”Computational Visual Media, vol. 10, no. 1, pp. 3–25, 2024
work page 2024
-
[15]
L. Chi, B. Jiang, and Y . Mu, “Fast fourier convolution,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 4479–4488
work page 2020
-
[16]
Rich feature hierarchies for accurate object detection and semantic segmentation,
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587
work page 2014
-
[17]
Faster r-cnn: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137– 1149, 2017
work page 2017
-
[18]
Ssd: Single shot multibox detector,
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” inEuropean Conference on Computer Vision. Springer, 2016, pp. 21–37
work page 2016
-
[19]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779– 788
work page 2016
-
[20]
YOLOv3: An Incremental Improvement
J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778
work page 2016
-
[23]
Feature pyramid networks for object detection,
T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 936–944
work page 2017
-
[24]
Path aggregation network for instance segmentation,
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768
work page 2018
-
[25]
Efficientdet: Scalable and efficient object detection,
M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 781–10 790
work page 2020
-
[26]
Nas-fpn: Learning scalable feature pyramid architecture for object detection,
G. Ghiasi, T.-Y . Lin, R. Pang, and Q. V . Le, “Nas-fpn: Learning scalable feature pyramid architecture for object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7036–7045
work page 2019
-
[27]
Detection and tracking meet drones challenge,
P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and tracking meet drones challenge,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2021
work page 2021
-
[28]
K. Song and Y . Yan, “A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects,”Applied Surface Science, vol. 285, pp. 858–864, 2013
work page 2013
-
[29]
Strip: Spatial transformer for efficient image processing,
Z. Guo, L. Leng, Y . Wu, C. Li, Y . Wang, and Q. Zhang, “Strip: Spatial transformer for efficient image processing,”Pattern Recognition, vol. 135, p. 109139, 2023
work page 2023
-
[30]
Mambaout: Do we really need mamba for vision?
W. Yu and X. Wang, “Mambaout: Do we really need mamba for vision?” arXiv preprint arXiv:2405.07992, 2024
-
[31]
Global filter networks for image classification,
Y . Rao, W. Zhao, Y . Tang, J. Zhou, S.-N. Lim, and J. Lu, “Global filter networks for image classification,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 980–993
work page 2022
-
[32]
Deformable convolutional networks,
J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei, “Deformable convolutional networks,” inProceedings of the IEEE International Conference on Computer Vision, 2017, pp. 764–773
work page 2017
-
[33]
Fdt: Fast and effective dynamic token for vision transformer,
Y . Mao, H. Zhou, J. Xia, and K. Zhang, “Fdt: Fast and effective dynamic token for vision transformer,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7598–7607
work page 2023
-
[34]
Dtab: Dual-token attention block for efficient vision transformers,
Z. Liu, Y . Han, Q. Zhang, and K. Li, “Dtab: Dual-token attention block for efficient vision transformers,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4163–4177, 2023
work page 2023
-
[35]
Camixer: Convolution and attention mixing for efficient image processing,
Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y . Li, “Camixer: Convolution and attention mixing for efficient image processing,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2589–2599
work page 2023
-
[36]
J. Zhu, J. Li, J. Chen, and Q. Chen, “Efficientvim: Efficient vision mamba with bidirectional state space models for semantic segmenta- tion,”arXiv preprint arXiv:2402.02509, 2024
-
[37]
Elgca: Efficient local-global context aggregation for remote sensing change detection,
L. Song, M. Xia, L. Weng, H. Lin, M. Qian, and B. Chen, “Elgca: Efficient local-global context aggregation for remote sensing change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024
work page 2024
-
[38]
Hdrab: High-dynamic range attention block for efficient image super-resolution,
X. Wang, D. Liu, Y . Song, and D. Liang, “Hdrab: High-dynamic range attention block for efficient image super-resolution,”Pattern Recogni- tion, vol. 139, p. 109451, 2023
work page 2023
-
[39]
Msn: Multi- scale network for object detection,
Z. Huang, J. Wang, X. Fu, T. Yu, Y . Guo, and R. Wang, “Msn: Multi- scale network for object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3368–3378
work page 2023
-
[40]
Fcanet: Frequency channel attention networks,
Z. Qin, P. Zhang, F. Wu, and X. Li, “Fcanet: Frequency channel attention networks,”arXiv preprint arXiv:2012.11879, 2020
-
[41]
Rab: Residual attention block for efficient image super- resolution,
W. Yang, Y . Yuan, W. Guo, W. Ren, J. Zhang, X. He, S. Kwong, and S. Wang, “Rab: Residual attention block for efficient image super- resolution,” inProceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021, pp. 1477–1486
work page 2021
-
[42]
Yolov6 v3.0: A full-scale reloading,
C. Li, L. Li, H. Jiang, K. Weng, Y . Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nieet al., “Yolov6 v3.0: A full-scale reloading,”arXiv preprint arXiv:2301.05586, 2023
-
[43]
Yolov11: An improved real-time object detection model,
Ultralytics, “Yolov11: An improved real-time object detection model,” https://docs.ultralytics.com, 2024
work page 2024
-
[44]
You only look one-level feature,
Q. Chen, Y . Wang, T. Yang, X. Zhang, J. Cheng, and J. Sun, “You only look one-level feature,” pp. 13 039–13 048, 2021
work page 2021
-
[45]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988
work page 2017
-
[46]
X. Lu, B. Li, Y . Yue, Q. Li, and J. Yan, “Grid r-cnn,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7363–7372
work page 2019
-
[47]
X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 21 002–21 012
work page 2020
- [48]
-
[49]
Asf: Adaptive spatial fusion for efficient multi-scale feature learning,
C. Yang, Z. Huang, and N. Wang, “Asf: Adaptive spatial fusion for efficient multi-scale feature learning,”arXiv preprint arXiv:2202.03149, 2022
-
[50]
Sdi: Spatial detail injection network for multi-scale semantic segmentation,
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Sdi: Spatial detail injection network for multi-scale semantic segmentation,”Pattern Recognition, vol. 138, p. 109367, 2023
work page 2023
-
[51]
Gold- yolo: Efficient object detector via gather-and-distribute mechanism,
C. Wang, W. He, Y . Nie, J. Guo, C. Liu, K. Han, and Y . Wang, “Gold- yolo: Efficient object detector via gather-and-distribute mechanism,” arXiv preprint arXiv:2309.11331, 2023
-
[52]
Hsfpn: Hierarchical semantic fusion pyramid network for multi-scale object detection,
Y . Li, Q. Hou, Z. Zheng, M.-M. Cheng, J. Yang, and X. Li, “Hsfpn: Hierarchical semantic fusion pyramid network for multi-scale object detection,”IEEE Transactions on Image Processing, vol. 32, pp. 2918– 2931, 2023
work page 2023
-
[53]
Cgafusion: Context-guided adap- tive fusion network for rgb-t semantic segmentation,
H. Guo, J. Yang, B. Yang, and G. Xu, “Cgafusion: Context-guided adap- tive fusion network for rgb-t semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023, pp. 4156–4165
work page 2023
-
[54]
Psfm: Progressive semantic feature module for object detection,
P. Sun, R. Zhang, Y . Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Yuan, P. Wang, and P. Luo, “Psfm: Progressive semantic feature module for object detection,”arXiv preprint arXiv:2302.02923, 2023
-
[55]
Glsa: Global- local self-attention for multi-scale feature learning,
Y . Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y . Fu, “Glsa: Global- local self-attention for multi-scale feature learning,”IEEE Transactions 16 on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8784– 8800, 2023
work page 2023
-
[56]
Ctrans: Cross- transformer network for multi-scale feature fusion,
X. Yan, H. Tang, S. Sun, H. Ma, D. Kong, and X. Xie, “Ctrans: Cross- transformer network for multi-scale feature fusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3868–3877
work page 2023
-
[57]
Maffn: Multi-scale attention feature fusion network for semantic segmentation,
W. Liu, Z. Wang, X. Liu, N. Zeng, Y . Liu, and F. E. Alsaadi, “Maffn: Multi-scale attention feature fusion network for semantic segmentation,” Neurocomputing, vol. 520, pp. 29–40, 2023
work page 2023
-
[58]
Msga: Multi-scale grouped attention mechanism for object detection,
J. Wang, K. Chen, J. Yang, C. C. Loy, and D. Lin, “Msga: Multi-scale grouped attention mechanism for object detection,”Pattern Recognition, vol. 140, p. 109545, 2023
work page 2023
-
[59]
Fsa: Feature separation and aggregation network for semantic segmentation,
X. Li, A. You, Z. Zhu, H. Zhao, M. Yang, K. Yang, and Y . Tong, “Fsa: Feature separation and aggregation network for semantic segmentation,” Neurocomputing, vol. 523, pp. 103–114, 2023
work page 2023
-
[60]
Mfm: Multi-frequency multiscale feature fusion for object detection,
J. Hu, L. Shen, and G. Sun, “Mfm: Multi-frequency multiscale feature fusion for object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 860–868
work page 2023
-
[61]
Diverse branch block: Building a convolution as an inception-like unit,
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Diverse branch block: Building a convolution as an inception-like unit,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 886–10 895
work page 2021
-
[62]
Dbbc3: Dynamic branching bottleneck for efficient neural networks,
K. Han, Y . Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Dbbc3: Dynamic branching bottleneck for efficient neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 8, pp. 4456– 4468, 2023
work page 2023
-
[63]
Dgcst: Dynamic group convolution shuffle transformer for efficient vision backbone,
X. Chen, H. Wang, Y . Hong, J. Guo, X. Wang, and Q. Zhang, “Dgcst: Dynamic group convolution shuffle transformer for efficient vision backbone,”Pattern Recognition Letters, vol. 168, pp. 36–43, 2023
work page 2023
-
[64]
Litv2: Efficient self-attention for vision transformers with learnable interaction tokens,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Litv2: Efficient self-attention for vision transformers with learnable interaction tokens,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 043–11 053
work page 2023
-
[65]
Fcos: Fully convolutional one- stage object detection,
Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one- stage object detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636
work page 2019
-
[66]
Cascade r-cnn: Delving into high quality object detection,
Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6154–6162
work page 2018
-
[67]
Yolov5: A state-of-the-art real-time object detection sys- tem,
Ultralytics, “Yolov5: A state-of-the-art real-time object detection sys- tem,” https://github.com/ultralytics/yolov5, 2021
work page 2021
-
[68]
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
work page 2023
-
[69]
Yolov9: Learning what you want to learn using programmable gradient information,
C. Y . Wang and H. Y . Liao, “Yolov9: Learning what you want to learn using programmable gradient information,”arXiv preprint arXiv:2402.13616, 2024
-
[70]
Yolov10: Real-time end-to-end object detection,
A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “Yolov10: Real-time end-to-end object detection,”arXiv preprint arXiv:2405.14458, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.