pith. sign in

arxiv: 2409.06490 · v7 · submitted 2024-09-09 · 💻 cs.CV · stat.AP

UAVDB: Point-Guided Masks for UAV Detection and Segmentation

Pith reviewed 2026-05-23 20:30 UTC · model grok-4.3

classification 💻 cs.CV stat.AP
keywords UAV detectionsegmentationweak supervisiondatasetpoint-guided annotationbounding boxesmulti-view video
0
0 comments X

The pith

A UAV dataset is constructed from video trajectory points via intensity convergence to produce boxes and masks without manual labeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UAVDB, a benchmark dataset for UAV detection and segmentation that captures objects across extreme scale variations from fixed-camera multi-view videos. It builds the dataset through a point-guided pipeline that applies Patch Intensity Convergence to convert trajectory points into bounding boxes and then uses SAM2 to add segmentation masks. The approach seeks to solve the problem of limited scale and labeling effort in existing UAV datasets. A sympathetic reader would care because accurate automated annotation could support larger training sets for surveillance and airspace monitoring tasks.

Core claim

The paper claims that Patch Intensity Convergence combined with SAM2 produces higher IoU than existing annotation techniques, enabling UAVDB with multi-scale UAV instances and YOLO detector baselines.

What carries the argument

Patch Intensity Convergence (PIC), a lightweight method that converts trajectory points into bounding boxes by analyzing patch intensities.

If this is right

  • Large UAV datasets can be built at lower labeling cost while retaining precise spatial localization.
  • Segmentation masks become available alongside detection labels for multi-task model training.
  • YOLO-based detectors gain concrete baselines on UAVs ranging from clear objects to single-pixel instances.
  • Future annotation pipelines can start from trajectory points rather than full manual box drawing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same point-to-box conversion could annotate other small moving targets in fixed-camera video if trajectory data is present.
  • Swapping SAM2 for alternative mask generators might improve results under specific lighting or motion conditions.
  • UAVDB instances near single-pixel size offer a direct test for how current detectors handle the smallest detectable objects.

Load-bearing premise

Trajectory points from the source multi-view video are accurate and dense enough to generate reliable bounding boxes without additional manual correction.

What would settle it

Manually creating ground-truth boxes on a held-out subset of frames and finding that PIC-generated boxes have substantially lower IoU than the reported values would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2409.06490 by Yu-Hsi Chen.

Figure 1
Figure 1. Figure 1: UAV trajectory captured by Camera 3 in Dataset 4 at [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top: Comparison of bounding box outputs from multiple methods, including fixed-size, image thresholding [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stepwise illustration of the PIC process across datasets [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Validation performance curves of YOLOv8 [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sequential tracking results predicted by YOLOv12n-seg [ [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Accurate detection of Unmanned Aerial Vehicles (UAVs) is critical for surveillance, security, and airspace monitoring. However, existing datasets remain limited in scale, resolution, and the ability to capture objects across extreme size variations. To address these challenges, we present UAVDB, a benchmark dataset for UAV detection and segmentation, constructed via a point-guided weak supervision pipeline. We introduce Patch Intensity Convergence (PIC), a lightweight annotation method that converts trajectory points into bounding boxes, eliminating the need for manual labeling while preserving precise spatial localization. Building upon these annotations, we further generate segmentation masks using SAM2, enriching the dataset with multi-task labels. UAVDB consists of RGB frames from a fixed-camera multi-view video dataset, capturing UAVs across scales ranging from clearly visible objects to near single-pixel instances under diverse conditions. Quantitative results show that PIC combined with SAM2 outperforms existing annotation techniques in terms of IoU. Furthermore, we benchmark YOLO-based detectors on UAVDB, establishing baselines for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents UAVDB, a benchmark dataset for UAV detection and segmentation constructed from fixed-camera multi-view video via a point-guided weak supervision pipeline. It introduces Patch Intensity Convergence (PIC) to convert trajectory points into bounding boxes without manual labeling, then applies SAM2 to generate segmentation masks. The central claims are that PIC+SAM2 outperforms prior annotation techniques on IoU and that YOLO-based detectors provide useful baselines on this dataset spanning extreme scale variations including near-single-pixel UAVs.

Significance. If the IoU gains are reproducible and the trajectory seeds prove reliable, the work supplies a practical, scalable annotation route for small-object UAV data that existing manual or fully supervised pipelines struggle to produce at volume. The resulting multi-task labels and YOLO baselines could serve as a reference point for future surveillance and airspace-monitoring research.

major comments (1)
  1. [§4] §4 (Experiments) and the quantitative IoU claim: the headline result that PIC+SAM2 outperforms existing annotation techniques rests on the unvalidated premise that the source trajectory points are sufficiently accurate and dense to serve as faithful seeds for PIC box generation. No manual ground-truth comparison, density statistics, or subset validation against independent location annotations is described; for near-single-pixel UAVs any systematic offset would render the reported IoU advantage non-diagnostic.
minor comments (2)
  1. [Abstract] The abstract states an IoU improvement but supplies no numerical values, error bars, or dataset-split details; these should be added for immediate readability.
  2. [Methods] Notation for PIC parameters and the exact conversion from trajectory points to boxes should be formalized with an equation or pseudocode in the methods section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern about validating the source trajectory points is well-taken, and we address it directly below with a commitment to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments) and the quantitative IoU claim: the headline result that PIC+SAM2 outperforms existing annotation techniques rests on the unvalidated premise that the source trajectory points are sufficiently accurate and dense to serve as faithful seeds for PIC box generation. No manual ground-truth comparison, density statistics, or subset validation against independent location annotations is described; for near-single-pixel UAVs any systematic offset would render the reported IoU advantage non-diagnostic.

    Authors: The trajectory points originate from the source fixed-camera multi-view video dataset, which supplies them as part of its original construction. We agree that the current manuscript does not include explicit validation (density statistics or manual ground-truth comparison), which is a limitation for claims involving near-single-pixel objects. In the revised version we will add to §4: (i) point-density statistics stratified by UAV scale, and (ii) a validation subset in which independent manual location annotations are compared against the source points and the resulting PIC boxes. This will directly test whether systematic offsets exist and whether the reported IoU advantage remains diagnostic. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction without derivation chain

full rationale

The paper introduces UAVDB via a point-guided pipeline (PIC for boxes from trajectories, SAM2 for masks) and reports empirical IoU benchmarks plus YOLO baselines. No equations, parameter fitting, predictions, or self-citation chains appear in the provided text. The central claims rest on the described annotation process and external comparisons rather than any reduction of outputs to inputs by construction. This matches the reader's 0.0 assessment; honest non-finding applies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset construction paper with no free parameters, mathematical axioms, or invented entities; relies on standard computer-vision tools (SAM2) and the existence of trajectory points in the source video.

pith-pipeline@v0.9.0 · 5696 in / 1125 out tokens · 21552 ms · 2026-05-23T20:30:34.351300+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 4 internal anchors

  1. [1]

    Bounding box priors for cell detection with point anno- tations

    Hari Om Aggrawal, Dipam Goswami, and Vinti Agarwal. Bounding box priors for cell detection with point anno- tations. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023. 3

  2. [2]

    Drone dataset: Amateur un- manned air vehicle detection

    Mehmet C ¸ agrı Aksoy, Alp Sezer Orak, Hasan Mertcan ¨Ozkan, and Bilgin Selimoglu. Drone dataset: Amateur un- manned air vehicle detection. Mendeley Data, 4:2019, 2019. 2

  3. [3]

    Image Segmentation by Using Threshold Techniques

    Salem Saleh Al-Amri, Namdeo V Kalyankar, et al. Image segmentation by using threshold techniques. arXiv preprint arXiv:1005.4020, 2010. 3, 4, 5

  4. [4]

    mta dataset

    aydin. mta dataset. https://universe.roboflow. com/aydin/mta-rwowu , 2024. visited on 2025-07-16. 2

  5. [5]

    Okutama-action: An aerial view video dataset for concurrent human action detection

    Mohammadamin Barekatain, Miquel Mart ´ı, Hsueh-Fu Shih, Samuel Murray, Kotaro Nakayama, Yutaka Matsuo, and Hel- mut Prendinger. Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceed- ings of the IEEE conference on computer vision and pattern recognition workshops, pages 28–35, 2017. 1

  6. [6]

    Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance

    Ilker Bozcan and Erdal Kayacan. Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 8504–8510. IEEE, 2020. 1

  7. [7]

    Leveraging point annotations in segmentation learning with boundary loss

    Eva Breznik, Hoel Kervadec, Filip Malmberg, Joel Kullberg, H˚akan Ahlstr ¨om, Marleen de Bruijne, and Robin Strand. Leveraging point annotations in segmentation learning with boundary loss. In International Conference on Pattern Recognition, pages 194–210. Springer, 2024. 3

  8. [8]

    End-to- end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. In European confer- ence on computer vision, pages 213–229. Springer, 2020. 1

  9. [9]

    Points as queries: Weakly semi-supervised object 7 Figure 5

    Liangyu Chen, Tong Yang, Xiangyu Zhang, Wei Zhang, and Jian Sun. Points as queries: Weakly semi-supervised object 7 Figure 5. Sequential tracking results predicted by YOLOv12n-seg [60] on the entirely unseen Dataset 5. Top: Camera 3. Bottom: Camera

  10. [10]

    detection by points

    Left to right shows consecutive video frames. detection by points. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 8823–8832, 2021. 3

  11. [11]

    P2object: Single point supervised object detection and in- stance segmentation

    Pengfei Chen, Xuehui Yu, Xumeng Han, Kuiran Wang, Guorong Li, Lingxi Xie, Zhenjun Han, and Jianbin Jiao. P2object: Single point supervised object detection and in- stance segmentation. International Journal of Computer Vi- sion, pages 1–25, 2025. 3

  12. [12]

    Drone dataset

    ConcordiaNA VLab. Drone dataset. https : / / universe . roboflow . com / concordianavlab / drone-9ab2n, 2023. visited on 2025-07-16. 2

  13. [13]

    Weakly semi-supervised infrared small target de- tection guided by point labels

    Xiaolong Cui, Xingxiu Li, Panlong Wu, Shan He, and Ruo- han Zhao. Weakly semi-supervised infrared small target de- tection guided by point labels. IEEE Transactions on Geo- science and Remote Sensing, 2025. 3

  14. [14]

    At- tentional Local Contrast Networks for Infrared Small Target Detection

    Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. At- tentional Local Contrast Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–12, 2021. 2

  15. [15]

    Asymmetric contextual modulation for infrared small target detection

    Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. Asymmetric contextual modulation for infrared small target detection. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 2021

  16. [16]

    One-Stage Cascade Refinement Networks for Infrared Small Target Detection

    Yimian Dai, Xiang Li, Fei Zhou, Yulei Qian, Yaohong Chen, and Jian Yang. One-Stage Cascade Refinement Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–17, 2023. 2

  17. [17]

    Object detection in aerial im- ages: A large-scale benchmark and challenges

    Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Michael Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, et al. Object detection in aerial im- ages: A large-scale benchmark and challenges. IEEE trans- actions on pattern analysis and machine intelligence, 44(11): 7778–7796, 2021. 1

  18. [18]

    Drone dataset

    Drone. Drone dataset. https : / / universe . roboflow . com / drone - blb9h / drone - evttd ,

  19. [20]

    The unmanned aerial vehicle benchmark: Object detection and tracking

    Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European con- ference on computer vision (ECCV) , pages 370–386, 2018. 1

  20. [21]

    Caniget- theuploadactuallyworking dataset

    flippinggreatwodgesofdroneimages1. Caniget- theuploadactuallyworking dataset. https : / / universe . roboflow . com / flippinggreatwodgesofdroneimages1 / canigettheuploadactuallyworking, 2022. visited on 2025-07-16. 2

  21. [22]

    Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection

    Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, and Feng Zhao. Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1797–1805, 2024. 3

  22. [23]

    Point-teaching: weakly semi- supervised object detection with point annotations

    Yongtao Ge, Qiang Zhou, Xinlong Wang, Chunhua Shen, Zhibin Wang, and Hao Li. Point-teaching: weakly semi- supervised object detection with point annotations. In Pro- ceedings of the AAAI Conference on Artificial Intelligence , pages 667–675, 2023. 3

  23. [24]

    mobile net dataset

    Ganta Gourish. mobile net dataset. https://universe. roboflow . com / ganta - gourish / mobile - net,

  24. [25]

    visited on 2025-07-16. 2

  25. [26]

    Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion

    Hanqing Guo, Xiuxiu Lin, and Shiyu Zhao. Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion. arXiv preprint arXiv:2503.07115, 2025. 2

  26. [27]

    Drone- based object counting by spatially regularized regional pro- posal network

    Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. Drone- based object counting by spatially regularized regional pro- posal network. In Proceedings of the IEEE international conference on computer vision, pages 4145–4153, 2017. 1

  27. [28]

    Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild

    Bo Huang, Jianan Li, Junjie Chen, Gang Wang, Jian Zhao, and Tingfa Xu. Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild. T- PAMI, 2023. 2

  28. [29]

    Anti-uav: a large-scale benchmark for vision-based uav tracking

    Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Qixiang Ye, Jianbin Jiao, Zhenjun Han, et al. Anti-uav: a large-scale benchmark for vision-based uav tracking. T-MM, 2021. 2

  29. [30]

    Ultralytics yolo11, 2024

    Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 1, 2, 5, 6, 7 8

  30. [31]

    YOLO by Ultralytics, 2023

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. YOLO by Ultralytics, 2023. 1, 2, 5, 6, 7

  31. [32]

    dron3 dataset

    Aniket Jog. dron3 dataset. https : / / universe . roboflow.com/aniket-jog-0whc0/dron3 , 2023. visited on 2025-07-16. 2

  32. [33]

    Dronesurf: Benchmark dataset for drone-based face recognition

    Isha Kalra, Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa, and PB Sujit. Dronesurf: Benchmark dataset for drone-based face recognition. In2019 14th IEEE Interna- tional Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–7. IEEE, 2019. 1

  33. [34]

    Sky monitoring system for flying object detection us- ing 4k resolution camera

    Takehiro Kashiyama, Hideaki Sobue, and Yoshihide Seki- moto. Sky monitoring system for flying object detection us- ing 4k resolution camera. Sensors, 20(24):7071, 2020. 2

  34. [35]

    The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation

    Beomyoung Kim, Joonhyun Jeong, Dongyoon Han, and Sung Ju Hwang. The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 11360– 11370, 2023. 3

  35. [36]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3, 4, 5

  36. [37]

    Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception

    Mengqi Lei, Siqi Li, Yihong Wu, Han Hu, You Zhou, Xinhu Zheng, Guiguang Ding, Shaoyi Du, Zongze Wu, and Yue Gao. Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733, 2025. 1, 2, 5, 6, 7

  37. [38]

    Monte carlo linear clustering with single-point supervision is enough for infrared small target detection

    Boyang Li, Yingqian Wang, Longguang Wang, Fei Zhang, Ting Liu, Zaiping Lin, Wei An, and Yulan Guo. Monte carlo linear clustering with single-point supervision is enough for infrared small target detection. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1009–1019, 2023. 3

  38. [39]

    A level set annotation framework with single-point supervision for infrared small target detection

    Haoqing Li, Jinfu Yang, Yifei Xu, and Runshi Wang. A level set annotation framework with single-point supervision for infrared small target detection. IEEE Signal Processing Let- ters, 31:451–455, 2024. 3

  39. [40]

    Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs)

    Jing Li, Dong Hye Ye, Timothy Chung, Mathias Kolsch, Juan Wachs, and Charles Bouman. Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs). In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4992–4997. IEEE, 2016. 2

  40. [41]

    Reconstruction of 3d flight trajectories from ad-hoc camera networks

    Jingtong Li, Jesse Murray, Dorina Ismaili, Konrad Schindler, and Cenek Albl. Reconstruction of 3d flight trajectories from ad-hoc camera networks. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1621–1628. IEEE, 2020. 1, 2, 3, 4, 7

  41. [42]

    Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els

    Siyi Li and Dit-Yan Yeung. Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els. In Proceedings of the AAAI conference on artificial in- telligence, 2017. 1

  42. [43]

    Weakly semi- supervised object detection with point annotations in reti- nal oct images

    Xiaoming Liu, Xin Zhu, and Jinshan Tang. Weakly semi- supervised object detection with point annotations in reti- nal oct images. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) , pages 3991–3995. IEEE, 2023. 3

  43. [44]

    Pointobb: Learning oriented object de- tection via single point supervision

    Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, and Yansheng Li. Pointobb: Learning oriented object de- tection via single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16730–16740, 2024. 3

  44. [45]

    Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos

    Murari Mandal, Lav Kush Kumar, and Santosh Kumar Vip- parthi. Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos. In Proceedings of the 28th ACM international conference on multimedia, pages 2626–2635, 2020. 1

  45. [46]

    Polo–point-based, multi-class animal detec- tion

    Giacomo May, Emanuele Dalsasso, Benjamin Kellenberger, and Devis Tuia. Polo–point-based, multi-class animal detec- tion. In European Conference on Computer Vision , pages 169–177. Springer, 2024. 3

  46. [47]

    Spartan hpc-cloud hybrid: delivering performance and flexibility

    Bernard Meade, Lev Lafayette, Greg Sauter, and Daniel Tosello. Spartan hpc-cloud hybrid: delivering performance and flexibility. University of Melbourne, 10:49, 2017. 5

  47. [48]

    The common objects underwater (cou) dataset for robust underwater object detection

    Rishi Mukherjee, Sakshi Singh, Jack McWilliams, and Ju- naed Sattar. The common objects underwater (cou) dataset for robust underwater object detection. arXiv preprint arXiv:2502.20651, 2025. 5

  48. [49]

    A large contextual dataset for classifica- tion, detection and counting of cars with deep learning

    T Nathan Mundhenk, Goran Konjevod, Wesam A Sakla, and Kofi Boakye. A large contextual dataset for classifica- tion, detection and counting of cars with deep learning. In European conference on computer vision , pages 785–800. Springer, 2016. 1

  49. [50]

    Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection

    Maciej Pawełczyk and Marek Wojtyra. Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection. IEEE Access, 8:174394–174409, 2020. 2

  50. [51]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024. 2, 3, 4, 5

  51. [52]

    Vehicle detec- tion in aerial imagery: A small target detection benchmark

    Sebastien Razakarivony and Frederic Jurie. Vehicle detec- tion in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation, 34:187–203, 2016. 1

  52. [53]

    Airborne object tracking dataset,

    AWS Open Data Registry. Airborne object tracking dataset,

  53. [54]

    19, 2025

    Accessed: Feb. 19, 2025. 2

  54. [55]

    Learning social etiquette: Human tra- jectory understanding in crowded scenes

    Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human tra- jectory understanding in crowded scenes. In European con- ference on computer vision, pages 549–565. Springer, 2016. 1

  55. [56]

    Rf- detr

    Isaac Robinson, Peter Robicheaux, and Matvei Popov. Rf- detr. https://github.com/roboflow/rf-detr ,

  56. [57]

    SOTA Real-Time Object Detection Model. 1

  57. [58]

    ” grabcut” interactive foreground extraction using iterated graph cuts

    Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ” grabcut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG) , 23(3): 309–314, 2004. 3, 4, 5

  58. [59]

    Flying objects detection from a single moving camera

    Artem Rozantsev, Vincent Lepetit, and Pascal Fua. Flying objects detection from a single moving camera. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 4128–4136, 2015. 2 9

  59. [60]

    Segmentationdrones dataset

    SegmentDrones. Segmentationdrones dataset. https: / / universe . roboflow . com / segmentdrones / segmentationdrones, 2023. visited on 2025-07-16. 2

  60. [61]

    The aircraft context dataset: Understanding and optimizing data variability in aerial domains

    Daniel Steininger, Verena Widhalm, Julia Simon, Andreas Kriegler, and Christoph Sulzbachner. The aircraft context dataset: Understanding and optimizing data variability in aerial domains. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision , pages 3823–3832,

  61. [62]

    Efficientdet: Scalable and efficient object detection

    Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020. 1

  62. [63]

    Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images

    Ziqian Tan and Chen Wu. Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024. 3

  63. [64]

    Weakly semi-supervised ori- ented with points for remote sensing vehicle detection

    Ziqian Tan and Chen Wu. Weakly semi-supervised ori- ented with points for remote sensing vehicle detection. In IGARSS 2024-2024 IEEE International Geoscience and Re- mote Sensing Symposium, pages 9294–9297. IEEE, 2024. 3

  64. [65]

    Yolov12: Attention-centric real-time object detectors, 2025

    Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors, 2025. 1, 2, 5, 6, 7, 8

  65. [66]

    Utilizing class-agnostic point-to-box regressors as object proposal generators

    Gulin Tufekci Dogan, Ramazan Gokberk Cinbis, and Ilkay Ulusoy. Utilizing class-agnostic point-to-box regressors as object proposal generators. In European Conference on Computer Vision, pages 253–269. Springer, 2024. 3

  66. [67]

    Yolov10: Real-time end-to-end object detection,

    Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end- to-end object detection. arXiv preprint arXiv:2405.14458 ,

  67. [68]

    Yolov9: Learning what you want to learn using programmable gradient information,

    Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn us- ing programmable gradient information. arXiv preprint arXiv:2402.13616, 2024. 1, 2, 5, 6, 7

  68. [69]

    Tiny object detection in aerial images

    Jinwang Wang, Wen Yang, Haowen Guo, Ruixiang Zhang, and Gui-Song Xia. Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR), pages 3791–3798. IEEE, 2021. 1

  69. [70]

    Point-to-rbox net- work for oriented object detection via single point supervi- sion

    Yucheng Wang, Chu He, and Xi Chen. Point-to-rbox net- work for oriented object detection via single point supervi- sion. In BMVC, pages 323–325, 2023. 3

  70. [71]

    Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points

    Sanjoeng Wong. Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points. arXiv preprint arXiv:2412.18918 ,

  71. [72]

    Air-detect dataset

    WorkspaceTest1. Air-detect dataset. https : //universe.roboflow.com/workspacetest1- t9dog/air-detect, 2025. visited on 2025-07-16. 2

  72. [73]

    Uavd4l: A large-scale dataset for uav 6-dof localization

    Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Mao- jun Zhang, and Shen Yan. Uavd4l: A large-scale dataset for uav 6-dof localization. arXiv preprint arXiv:2401.05971,

  73. [74]

    Dota: A large-scale dataset for object detection in aerial images

    Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018

  74. [75]

    Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark

    Chang Xu, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, and Gui-Song Xia. Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. IS- PRS Journal of Photogrammetry and Remote Sensing , 190: 79–93, 2022. 1

  75. [76]

    Deep GrabCut for Object Selection

    Ning Xu, Brian Price, Scott Cohen, Jimei Yang, and Thomas Huang. Deep grabcut for object selection. arXiv preprint arXiv:1707.00243, 2017. 3

  76. [77]

    Position-based anchor opti- mization for point supervised dense nuclei detection

    Jieru Yao, Longfei Han, Guangyu Guo, Zhaohui Zheng, Runmin Cong, Xiankai Huang, Jin Ding, Kaihui Yang, Ding- wen Zhang, and Junwei Han. Position-based anchor opti- mization for point supervised dense nuclei detection. Neural Networks, 171:159–170, 2024. 3

  77. [78]

    Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision

    Xinyi Ying, Li Liu, Yingqian Wang, Ruojing Li, Nuo Chen, Zaiping Lin, Weidong Sheng, and Shilin Zhou. Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15528–15538, 2023. 3

  78. [79]

    Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision

    Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, and Junchi Yan. Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16783–16793, 2024

  79. [80]

    Group r-cnn for weakly semi- supervised object detection with points

    Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Wang, Aojun Zhou, and Kai Chen. Group r-cnn for weakly semi- supervised object detection with points. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9417–9426, 2022

  80. [81]

    Weakly semi-supervised oriented object detection with points

    Ziming Zhang, Yucheng Wang, Chu He, Qingyi Zhang, and Xi Chen. Weakly semi-supervised oriented object detection with points. In 2023 IEEE International Conference on Im- age Processing (ICIP), pages 3080–3084. IEEE, 2023. 3

Showing first 80 references.