Recognition: no theorem link
SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection
Pith reviewed 2026-05-13 20:46 UTC · model grok-4.3
The pith
SFFNet improves UAV object detection by fusing frequency and spatial domain edges to separate small targets from background noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that dual-domain edge extraction in the MDDC module, combined with the SFPN's adaptive geometric and semantic fusion through deformable convolutions and long-range perception, reliably isolates multi-scale objects from UAV clutter and yields higher detection accuracy than prior single-domain or standard pyramid approaches.
What carries the argument
The multi-scale dynamic dual-domain coupling (MDDC) module that performs dual-driven edge extraction in frequency and spatial domains to separate objects from noise, and the synergistic feature pyramid network (SFPN) that employs linear deformable convolutions plus a wide-area perception module (WPM) to handle irregular shapes and context.
If this is right
- The design enables detectors that maintain accuracy across widely different object sizes typical in aerial views.
- Resource-constrained applications can use the smaller N or S variants without complete loss of the dual-domain benefit.
- Long-range contextual associations reduce errors on objects partially obscured by clutter or at unusual angles.
- The overall architecture supports deployment in varied UAV missions by offering a family of models rather than a single fixed network.
Where Pith is reading between the lines
- The dual-domain separation technique could be tested on satellite or ground-based surveillance imagery where background noise similarly overwhelms small targets.
- Ablation studies isolating frequency versus spatial contributions would clarify which domain drives most of the reported gain.
- If the wide-area perception module generalizes, it might be combined with other pyramid networks to improve detection in non-aerial cluttered scenes.
Load-bearing premise
That the dual-domain edge extraction and deformable fusion steps will continue to distinguish object boundaries from background noise even in new UAV scenes not seen during training.
What would settle it
An experiment that disables the MDDC module or the SFPN and measures whether average precision on VisDrone falls below the performance of existing methods without these components.
Figures
read the original abstract
Object detection in unmanned aerial vehicle (UAV) images remains a highly challenging task, primarily caused by the complexity of background noise and the imbalance of target scales. Traditional methods easily struggle to effectively separate objects from intricate backgrounds and fail to fully leverage the rich multi-scale information contained within images. To address these issues, we have developed a synergistic feature fusion network (SFFNet) with dual-domain edge enhancement specifically tailored for object detection in UAV images. Firstly, the multi-scale dynamic dual-domain coupling (MDDC) module is designed. This component introduces a dual-driven edge extraction architecture that operates in both the frequency and spatial domains, enabling effective decoupling of multi-scale object edges from background noise. Secondly, to further enhance the representation capability of the model's neck in terms of both geometric and semantic information, a synergistic feature pyramid network (SFPN) is proposed. SFPN leverages linear deformable convolutions to adaptively capture irregular object shapes and establishes long-range contextual associations around targets through the designed wide-area perception module (WPM). Moreover, to adapt to the various applications or resource-constrained scenarios, six detectors of different scales (N/S/M/B/L/X) are designed. Experiments on two challenging aerial datasets (VisDrone and UAVDT) demonstrate the outstanding performance of SFFNet-X, achieving 36.8 AP and 20.6 AP, respectively. The lightweight models (N/S) also maintain a balance between detection accuracy and parameter efficiency. The code will be available at https://github.com/CQNU-ZhangLab/SFFNet.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SFFNet, a network for object detection in UAV imagery that addresses scale imbalance and background clutter via two main components: the Multi-scale Dynamic Dual-domain Coupling (MDDC) module, which performs edge decoupling in both frequency and spatial domains, and the Synergistic Feature Pyramid Network (SFPN), which uses linear deformable convolutions plus a Wide-area Perception Module (WPM) to capture irregular shapes and long-range context. Six scaled variants (N/S/M/B/L/X) are introduced; the largest (SFFNet-X) is reported to reach 36.8 AP on VisDrone and 20.6 AP on UAVDT, with lighter variants balancing accuracy and efficiency. Code release is promised.
Significance. If the reported gains are reproducible, the dual-domain edge enhancement and adaptive pyramid design could meaningfully advance detection under the specific constraints of UAV imagery (small objects, heavy clutter, extreme scale variation). The availability of multiple model scales and the commitment to release code are practical strengths that would aid adoption and further research.
major comments (3)
- [Experiments] Experiments section: the central performance claims (36.8 AP on VisDrone, 20.6 AP on UAVDT) are given as single-point estimates without error bars, standard deviations across random seeds, or a complete training protocol (optimizer schedule, data-augmentation details, input resolution, etc.). This prevents verification of whether the improvements are statistically reliable or sensitive to implementation choices.
- [Ablation studies] Ablation studies: no quantitative breakdown is provided that isolates the contribution of the frequency-domain branch versus the spatial-domain branch inside MDDC, or of the WPM versus the deformable-convolution path inside SFPN. Without these controlled ablations, it is impossible to confirm that the dual-domain coupling and synergistic fusion are the load-bearing reasons for the reported AP gains rather than other factors (backbone choice, training recipe).
- [Comparison tables] Comparison tables: the baseline detectors against which SFFNet-X is evaluated are not described with identical training settings or hyper-parameters, making it unclear whether the 36.8 / 20.6 AP numbers reflect architectural superiority or differences in optimization.
minor comments (2)
- [Figures] Figure captions and axis labels in the architecture diagrams use inconsistent font sizes and occasionally omit units or module dimensions, reducing readability.
- [Method] The notation for the wide-area perception module (WPM) is introduced without an explicit equation or pseudocode, forcing the reader to infer its exact operation from the textual description alone.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of experimental rigor. We will revise the manuscript to provide fuller details on training protocols, expanded ablations, and clarified comparisons while preserving the core contributions of the MDDC and SFPN modules.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the central performance claims (36.8 AP on VisDrone, 20.6 AP on UAVDT) are given as single-point estimates without error bars, standard deviations across random seeds, or a complete training protocol (optimizer schedule, data-augmentation details, input resolution, etc.). This prevents verification of whether the improvements are statistically reliable or sensitive to implementation choices.
Authors: We agree that additional experimental details are necessary for reproducibility. In the revised manuscript we will add a dedicated subsection describing the full training protocol, including the optimizer and schedule, data-augmentation pipeline, and input resolutions used for all reported results. We will also perform the main experiments across multiple random seeds and report mean AP values together with standard deviations on both VisDrone and UAVDT to quantify statistical reliability. revision: yes
-
Referee: [Ablation studies] Ablation studies: no quantitative breakdown is provided that isolates the contribution of the frequency-domain branch versus the spatial-domain branch inside MDDC, or of the WPM versus the deformable-convolution path inside SFPN. Without these controlled ablations, it is impossible to confirm that the dual-domain coupling and synergistic fusion are the load-bearing reasons for the reported AP gains rather than other factors (backbone choice, training recipe).
Authors: We accept that more granular ablations are required to isolate component contributions. The revised paper will include new controlled ablation tables that separately measure the performance impact of the frequency-domain branch versus the spatial-domain branch within MDDC, and of the Wide-area Perception Module versus the linear deformable convolution path within SFPN, all under otherwise identical settings. revision: yes
-
Referee: [Comparison tables] Comparison tables: the baseline detectors against which SFFNet-X is evaluated are not described with identical training settings or hyper-parameters, making it unclear whether the 36.8 / 20.6 AP numbers reflect architectural superiority or differences in optimization.
Authors: We will revise the comparison section to state explicitly that every baseline detector was re-trained from scratch using the identical training recipe, hyper-parameters, data splits, and augmentation strategy employed for SFFNet. Any unavoidable differences arising from original public implementations will be noted. revision: yes
Circularity Check
No significant circularity in empirical architecture proposal
full rationale
The paper is an empirical architecture design for UAV object detection. It introduces MDDC and SFPN modules motivated by challenges of scale imbalance and background clutter, then reports measured AP scores on VisDrone and UAVDT. No equations, derivations, or predictions reduce the reported performance to fitted parameters, self-definitions, or self-citation chains by construction. The central claims rest on standard benchmark experiments rather than any internal reduction of outputs to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Image-only real-time incremental UA V image mosaic for multi-strip flight,
F. Zhang, T. Yang, L. Liu, B. Liang, Y . Bai, and J. Li, “Image-only real-time incremental UA V image mosaic for multi-strip flight,”IEEE Trans. Multimedia, vol. 23, pp. 1410–1425, 2021
work page 2021
-
[2]
Robust multi-drone multi-target tracking to resolve target occlusion: A benchmark,
Z. Liuet al., “Robust multi-drone multi-target tracking to resolve target occlusion: A benchmark,”IEEE Trans. Multimedia, vol. 25, pp. 1462– 1476, 2023
work page 2023
-
[3]
Gait recognition with drones: A benchmark,
A. Li, S. Hou, Q. Cai, Y . Fu, and Y . Huang, “Gait recognition with drones: A benchmark,”IEEE Trans. Multimedia, vol. 26, pp. 3530– 3540, 2024
work page 2024
-
[4]
Spatial reliability enhanced correlation filter: An efficient approach for real-time UA V tracking,
C. Fu, J. Jin, F. Ding, Y . Li, and G. Lu, “Spatial reliability enhanced correlation filter: An efficient approach for real-time UA V tracking,” IEEE Trans. Multimedia, vol. 26, pp. 4123–4137, 2024
work page 2024
-
[5]
Y . Chen, J. Wang, Q. Zhou, and H. Hu, “ArbiTrack: A novel multi-object tracking framework for a moving UA V to detect and track arbitrarily oriented targets,”IEEE Trans. Multimedia, pp. 1–11, 2025
work page 2025
-
[6]
Automatic detection of civilian and military personnel in reconnaissance missions using a UA V,
N. P. Santos, V . B. Rodrigues, A. B. Pinto, and B. Damas, “Automatic detection of civilian and military personnel in reconnaissance missions using a UA V,” inProc. IEEE Int. Conf. Auto. Robot. Syst. Competitions (ICARSC), Apr. 2023, pp. 157–162
work page 2023
-
[7]
Automated detection of wildlife using drones: Synthesis, opportunities and con- straints,
E. Corcoran, M. Winsen, A. Sudholz, and G. Hamilton, “Automated detection of wildlife using drones: Synthesis, opportunities and con- straints,”Methods Ecol. Evol., vol. 12, no. 6, pp. 1103–1114, Jun. 2021
work page 2021
-
[8]
X.-W. Tang, X.-L. Huang, and F. Hu, “QoE-driven UA V-enabled pseudo- analog wireless video broadcast: A joint optimization of power and trajectory,”IEEE Trans. Multimedia, vol. 23, pp. 2398–2412, 2021
work page 2021
-
[9]
Feature pyramid networks for object detection,
T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125
work page 2017
-
[10]
EfficientDet: Scalable and efficient ob- ject detection,
M. Tan, R. Pang, and Q. V . Le, “EfficientDet: Scalable and efficient ob- ject detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10 781–10 790
work page 2020
-
[11]
Extended feature pyramid network for small object detection,
C. Deng, M. Wang, L. Liu, Y . Liu, and Y . Jiang, “Extended feature pyramid network for small object detection,”IEEE Trans. Multimedia, vol. 24, pp. 1968–1979, 2022
work page 1968
-
[12]
Hyperspectral image instance segmentation using spectral–spatial feature pyramid network,
L. Fang, Y . Jiang, Y . Yan, J. Yue, and Y . Deng, “Hyperspectral image instance segmentation using spectral–spatial feature pyramid network,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–13, 2023
work page 2023
-
[13]
S. Zhang and J. Ma, “CascadeDumpNet: Enhancing open dumpsite detection through deep learning and AutoML integrated dual-stage ap- proach using high-resolution satellite imagery,”Remote Sens. Environ., vol. 313, p. 114349, 2024
work page 2024
-
[14]
Clustered object detection in aerial images,
F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered object detection in aerial images,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 8311–8320
work page 2019
-
[15]
Density map guided object detection in aerial images,
C. Li, T. Yang, S. Zhu, C. Chen, and S. Guan, “Density map guided object detection in aerial images,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshop (CVPRW), Jun. 2020, pp. 190–191
work page 2020
-
[16]
UFPMP-Det: Toward accurate and efficient object detection on drone imagery,
Y . Huang, J. Chen, and D. Huang, “UFPMP-Det: Toward accurate and efficient object detection on drone imagery,” inProc. AAAI Conf. Artif. Intell., vol. 36, no. 1, Jun. 2022, pp. 1026–1033
work page 2022
-
[17]
Dense tiny object detection: A scene context guided approach and a unified benchmark,
Z. Zhao, J. Du, C. Li, X. Fang, Y . Xiao, and J. Tang, “Dense tiny object detection: A scene context guided approach and a unified benchmark,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–13, 2024
work page 2024
-
[18]
High-resolution feature generator for small ship detection in optical remote sensing images,
H. Zhang, S. Wen, Z. Wei, and Z. Chen, “High-resolution feature generator for small ship detection in optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–11, 2024
work page 2024
-
[19]
Enhancing aerial object detection with selective frequency interaction network,
W. Weng, M. Wei, J. Ren, and F. Shen, “Enhancing aerial object detection with selective frequency interaction network,”IEEE Trans. Artif. Intell., vol. 5, no. 12, pp. 6109–6120, 2024
work page 2024
-
[20]
S. Zheng, Z. Wu, Y . Xu, and Z. Wei, “Instance-aware spatial-frequency feature fusion detector for oriented object detection in remote-sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–13, 2023
work page 2023
-
[21]
S. Zheng, Z. Wu, Y . Xu, Z. Wei, and A. Plaza, “Learning orientation information from frequency-domain for oriented object detection in remote sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, 2022
work page 2022
-
[22]
K. Wang, X. Fu, C. Ge, C. Cao, and Z.-J. Zha, “Towards generalized UA V object detection: A novel perspective from frequency domain disentanglement,”Int. J. Comput. Vision, vol. 132, no. 11, pp. 5410– 5438, 2024
work page 2024
-
[23]
Path aggregation network for instance segmentation,
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 8759–8768
work page 2018
-
[24]
AugFPN: Improving multi-scale feature learning for object detection,
C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “AugFPN: Improving multi-scale feature learning for object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 12 595– 12 604
work page 2020
-
[25]
Asymptotic feature pyramid network for labeling pixels and regions,
G. Yang, J. Lei, H. Tian, Z. Feng, and R. Liang, “Asymptotic feature pyramid network for labeling pixels and regions,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 9, pp. 7820–7829, 2024
work page 2024
-
[26]
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection,
H.-I. Liu, Y .-W. Tseng, K.-C. Chang, P.-J. Wang, H.-H. Shuai, and W.-H. Cheng, “A DeNoising FPN With Transformer R-CNN for Tiny Object Detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024
work page 2024
-
[27]
Y . Li, L. Chen, and W. Li, “Fine-Grained Ship Recognition with Spatial-Aligned Feature Pyramid Network and Adaptive Prototypical Contrastive Learning,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–13, 2025
work page 2025
-
[28]
LDConv: Linear deformable convolution for improving convolutional neural networks,
X. Zhanget al., “LDConv: Linear deformable convolution for improving convolutional neural networks,”Image Vision Comput., vol. 149, p. 105190, 2024
work page 2024
-
[29]
Large selective kernel network for remote sensing object detection,
Y . Li, Q. Hou, Z. Zheng, M.-M. Cheng, J. Yang, and X. Li, “Large selective kernel network for remote sensing object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 16 794– 16 805
work page 2023
-
[30]
Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection,
X. Yuanet al., “Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection,” 2025. [Online]. Available: https: //arxiv.org/abs/2501.03775
-
[31]
Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,
X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 11 963–11 975
work page 2022
-
[32]
PeLK: Parameter-efficient large kernel convnets with peripheral convolution,
H. Chenet al., “PeLK: Parameter-efficient large kernel convnets with peripheral convolution,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 5557–5567
work page 2024
- [33]
-
[34]
Yolov10: Real-time end-to-end object detection,
A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Hanet al., “Yolov10: Real-time end-to-end object detection,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, pp. 107 984–108 011, Dec. 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14
work page 2024
- [35]
-
[36]
VisDrone-DET2019: The vision meets drone object detection in image challenge results,
D. Duet al., “VisDrone-DET2019: The vision meets drone object detection in image challenge results,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 213–226
work page 2019
-
[37]
The unmanned aerial vehicle benchmark: Object detection and tracking,
D. Du and et al., “The unmanned aerial vehicle benchmark: Object detection and tracking,” inProc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 370–386
work page 2018
-
[38]
Yolov9: Learning what you want to learn using programmable gradient information,
C.-Y . Wang, I.-H. Yeh, and H.-Y . Mark Liao, “Yolov9: Learning what you want to learn using programmable gradient information,” inProc. Eur. Conf. Comput. Vis. (ECCV). Springer, 2024, pp. 1–21
work page 2024
-
[39]
YOLOv4: Optimal Speed and Accuracy of Object Detection
A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Optimal Speed and Accuracy of Object Detection,” 2020. [Online]. Available: https://arxiv.org/abs/2004.10934
work page internal anchor Pith review Pith/arXiv arXiv 2020
- [40]
-
[41]
Faster R-CNN: Towards real- time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- time object detection with region proposal networks,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 28, 2015, pp. 1–9
work page 2015
-
[42]
X. Liet al., “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 21 002–21 012, 2020
work page 2020
-
[43]
AMRNet: Chips Augmentation in Aerial Images Object Detection,
Z. Wei, C. Duan, X. Song, Y . Tian, and H. Wang, “AMRNet: Chips Augmentation in Aerial Images Object Detection,” 2020. [Online]. Available: https://arxiv.org/abs/2009.07168
-
[44]
TOOD: Task- aligned One-stage Object Detection ,
C. Feng, Y . Zhong, Y . Gao, M. R. Scott, and W. Huang, “ TOOD: Task- aligned One-stage Object Detection ,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 3490–3499
work page 2021
-
[45]
Coarse-Grained Density Map Guided Object Detection in Aerial Images,
C. Duan, Z. Wei, C. Zhang, S. Qu, and H. Wang, “Coarse-Grained Density Map Guided Object Detection in Aerial Images,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 2789–2798
work page 2021
-
[46]
A Global-Local Self-Adaptive Network for Drone-View Object Detection,
S. Denget al., “A Global-Local Self-Adaptive Network for Drone-View Object Detection,”IEEE Trans. Image Process., vol. 30, pp. 1556–1569, 2021
work page 2021
-
[47]
YOLOX: Exceeding YOLO Series in 2021,
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” 2021. [Online]. Available: https://arxiv.org/abs/2107. 08430
work page 2021
-
[48]
QueryDet: Cascaded sparse query for accelerating high-resolution small object detection,
C. Yang, Z. Huang, and N. Wang, “QueryDet: Cascaded sparse query for accelerating high-resolution small object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 13 668– 13 677
work page 2022
-
[49]
Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark,
C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark,”ISPRS J. Photogramm. Remote Sens., vol. 190, pp. 79–93, Aug. 2022
work page 2022
-
[50]
Scale Decoupled Pyramid for Object Detection in Aerial Images,
Y . Ma, L. Chai, and L. Jin, “Scale Decoupled Pyramid for Object Detection in Aerial Images,”IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023
work page 2023
-
[51]
Dense distinct query for end-to-end object detection,
S. Zhang and et al., “Dense distinct query for end-to-end object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 7329–7338
work page 2023
-
[52]
Cascaded Zoom-In Detector for High Resolution Aerial Images,
A. Meethal, E. Granger, and M. Pedersoli, “Cascaded Zoom-In Detector for High Resolution Aerial Images,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 2046–2055
work page 2023
-
[53]
Efficient multi-scale attention module with cross- spatial learning,
D. Ouyanget al., “Efficient multi-scale attention module with cross- spatial learning,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Jun. 2023, pp. 1–5
work page 2023
-
[54]
Pareto refocusing for drone-view object detection,
J. Leng, M. Mo, Y . Zhou, C. Gao, W. Li, and X. Gao, “Pareto refocusing for drone-view object detection,”IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 3, pp. 1320–1334, 2023
work page 2023
-
[55]
DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection,
H. Zhang and et al., “DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection,” inProc. Int. Conf. Learn. Represent. (ICLR), May. 2023
work page 2023
-
[56]
EF-DETR: A lightweight transformer-based object detector with an encoder-free neck,
S. Chenget al., “EF-DETR: A lightweight transformer-based object detector with an encoder-free neck,”IEEE Trans. Ind. Informat., vol. 20, no. 11, pp. 12 994–13 002, 2024
work page 2024
-
[57]
BRSTD: Bio-inspired remote sensing tiny object detection,
S. Huang, C. Lin, X. Jiang, and Z. Qu, “BRSTD: Bio-inspired remote sensing tiny object detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024
work page 2024
-
[58]
No-extra components density map cropping guided object detection in aerial images,
Z. Guo, G. Bi, H. Lv, Y . Feng, Y . Zhang, and M. Sun, “No-extra components density map cropping guided object detection in aerial images,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–13, 2024
work page 2024
-
[59]
DETRs Beat YOLOs on Real-time Object Detection,
Y . Zhaoet al., “DETRs Beat YOLOs on Real-time Object Detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 16 965–16 974
work page 2024
-
[60]
DDOD: Dive deeper into the disentanglement of object detector,
Z. Chen and et al., “DDOD: Dive deeper into the disentanglement of object detector,”IEEE Trans. Multimedia, vol. 26, pp. 284–298, Jan. 2024
work page 2024
-
[61]
SDPDet: Learning scale-separated dynamic proposals for end-to-end drone-view detection,
N. Yin, C. Liu, R. Tian, and X. Qian, “SDPDet: Learning scale-separated dynamic proposals for end-to-end drone-view detection,”IEEE Trans. Multimedia, vol. 26, pp. 7812–7822, 2024
work page 2024
-
[62]
Boundary-aware feature fusion with dual-stream attention for remote sensing small object detection,
J. Songet al., “Boundary-aware feature fusion with dual-stream attention for remote sensing small object detection,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–13, 2025
work page 2025
-
[63]
Microsoft COCO: Common Objects in Context,
T.-Y . Linet al., “Microsoft COCO: Common Objects in Context,” in Proc. Eur. Conf. Comput. Vis. (ECCV). Springer, 2014, pp. 740–755
work page 2014
-
[64]
X. Zhou, D. Wang, and P. Kr ¨ahenb¨uhl, “Objects as points,” 2019. [Online]. Available: https://arxiv.org/abs/1904.07850
work page Pith review arXiv 2019
-
[65]
B. Du, Y . Huang, J. Chen, and D. Huang, “Adaptive Sparse Convolu- tional Networks With Global Context Enhancement for Faster Object Detection on Drone Images,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 13 435–13 444
work page 2023
-
[66]
SCLNet: A scale-robust complementary learning network for object detection in UA V images,
X. Li, W. Diao, Y . Mao, X. Li, and X. Sun, “SCLNet: A scale-robust complementary learning network for object detection in UA V images,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–19, 2024
work page 2024
-
[67]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. Eur. Conf. Comput. Vis. (ECCV). Springer, 2020, pp. 213–229
work page 2020
-
[68]
DAMO-YOLO : A report on real-time object detection design,
X. Xu, Y . Jiang, W. Chen, Y . Huang, Y . Zhang, and X. Sun, “DAMO-YOLO : A report on real-time object detection design,” 2023. [Online]. Available: https://arxiv.org/abs/2211.15444
-
[69]
Z. Yanget al., “Multi-branch auxiliary fusion yolo with re- parameterization heterogeneous convolutional for accurate object de- tection,” inChin. Conf. Pattern Recognit. Comput. Vision (PRCV). Springer, 2025, pp. 492–505
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.