UAVDB: Point-Guided Masks for UAV Detection and Segmentation

Yu-Hsi Chen

arxiv: 2409.06490 · v7 · submitted 2024-09-09 · 💻 cs.CV · stat.AP

UAVDB: Point-Guided Masks for UAV Detection and Segmentation

Yu-Hsi Chen This is my paper

Pith reviewed 2026-05-23 20:30 UTC · model grok-4.3

classification 💻 cs.CV stat.AP

keywords UAV detectionsegmentationweak supervisiondatasetpoint-guided annotationbounding boxesmulti-view video

0 comments

The pith

A UAV dataset is constructed from video trajectory points via intensity convergence to produce boxes and masks without manual labeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UAVDB, a benchmark dataset for UAV detection and segmentation that captures objects across extreme scale variations from fixed-camera multi-view videos. It builds the dataset through a point-guided pipeline that applies Patch Intensity Convergence to convert trajectory points into bounding boxes and then uses SAM2 to add segmentation masks. The approach seeks to solve the problem of limited scale and labeling effort in existing UAV datasets. A sympathetic reader would care because accurate automated annotation could support larger training sets for surveillance and airspace monitoring tasks.

Core claim

The paper claims that Patch Intensity Convergence combined with SAM2 produces higher IoU than existing annotation techniques, enabling UAVDB with multi-scale UAV instances and YOLO detector baselines.

What carries the argument

Patch Intensity Convergence (PIC), a lightweight method that converts trajectory points into bounding boxes by analyzing patch intensities.

If this is right

Large UAV datasets can be built at lower labeling cost while retaining precise spatial localization.
Segmentation masks become available alongside detection labels for multi-task model training.
YOLO-based detectors gain concrete baselines on UAVs ranging from clear objects to single-pixel instances.
Future annotation pipelines can start from trajectory points rather than full manual box drawing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same point-to-box conversion could annotate other small moving targets in fixed-camera video if trajectory data is present.
Swapping SAM2 for alternative mask generators might improve results under specific lighting or motion conditions.
UAVDB instances near single-pixel size offer a direct test for how current detectors handle the smallest detectable objects.

Load-bearing premise

Trajectory points from the source multi-view video are accurate and dense enough to generate reliable bounding boxes without additional manual correction.

What would settle it

Manually creating ground-truth boxes on a held-out subset of frames and finding that PIC-generated boxes have substantially lower IoU than the reported values would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2409.06490 by Yu-Hsi Chen.

**Figure 2.** Figure 2: Top: Comparison of bounding box outputs from multiple methods, including fixed-size, image thresholding [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Stepwise illustration of the PIC process across datasets [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Validation performance curves of YOLOv8 [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Sequential tracking results predicted by YOLOv12n-seg [ [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Accurate detection of Unmanned Aerial Vehicles (UAVs) is critical for surveillance, security, and airspace monitoring. However, existing datasets remain limited in scale, resolution, and the ability to capture objects across extreme size variations. To address these challenges, we present UAVDB, a benchmark dataset for UAV detection and segmentation, constructed via a point-guided weak supervision pipeline. We introduce Patch Intensity Convergence (PIC), a lightweight annotation method that converts trajectory points into bounding boxes, eliminating the need for manual labeling while preserving precise spatial localization. Building upon these annotations, we further generate segmentation masks using SAM2, enriching the dataset with multi-task labels. UAVDB consists of RGB frames from a fixed-camera multi-view video dataset, capturing UAVs across scales ranging from clearly visible objects to near single-pixel instances under diverse conditions. Quantitative results show that PIC combined with SAM2 outperforms existing annotation techniques in terms of IoU. Furthermore, we benchmark YOLO-based detectors on UAVDB, establishing baselines for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UAVDB adds a new labeled set for tiny UAVs via point-to-box PIC and SAM2 masks, but the annotation quality rests on unvalidated trajectory points and the results section gives no numbers or protocol.

read the letter

The core thing here is a new dataset UAVDB drawn from fixed-camera multi-view video, with UAVs ranging down to near-single-pixel size. They convert trajectory points into boxes using their Patch Intensity Convergence method, then feed those to SAM2 for masks, and run YOLO baselines on the result. That pipeline and the resulting multi-task labels are the actual new pieces; prior UAV datasets are cited as smaller or less varied in scale.

Referee Report

1 major / 2 minor

Summary. The paper presents UAVDB, a benchmark dataset for UAV detection and segmentation constructed from fixed-camera multi-view video via a point-guided weak supervision pipeline. It introduces Patch Intensity Convergence (PIC) to convert trajectory points into bounding boxes without manual labeling, then applies SAM2 to generate segmentation masks. The central claims are that PIC+SAM2 outperforms prior annotation techniques on IoU and that YOLO-based detectors provide useful baselines on this dataset spanning extreme scale variations including near-single-pixel UAVs.

Significance. If the IoU gains are reproducible and the trajectory seeds prove reliable, the work supplies a practical, scalable annotation route for small-object UAV data that existing manual or fully supervised pipelines struggle to produce at volume. The resulting multi-task labels and YOLO baselines could serve as a reference point for future surveillance and airspace-monitoring research.

major comments (1)

[§4] §4 (Experiments) and the quantitative IoU claim: the headline result that PIC+SAM2 outperforms existing annotation techniques rests on the unvalidated premise that the source trajectory points are sufficiently accurate and dense to serve as faithful seeds for PIC box generation. No manual ground-truth comparison, density statistics, or subset validation against independent location annotations is described; for near-single-pixel UAVs any systematic offset would render the reported IoU advantage non-diagnostic.

minor comments (2)

[Abstract] The abstract states an IoU improvement but supplies no numerical values, error bars, or dataset-split details; these should be added for immediate readability.
[Methods] Notation for PIC parameters and the exact conversion from trajectory points to boxes should be formalized with an equation or pseudocode in the methods section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern about validating the source trajectory points is well-taken, and we address it directly below with a commitment to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments) and the quantitative IoU claim: the headline result that PIC+SAM2 outperforms existing annotation techniques rests on the unvalidated premise that the source trajectory points are sufficiently accurate and dense to serve as faithful seeds for PIC box generation. No manual ground-truth comparison, density statistics, or subset validation against independent location annotations is described; for near-single-pixel UAVs any systematic offset would render the reported IoU advantage non-diagnostic.

Authors: The trajectory points originate from the source fixed-camera multi-view video dataset, which supplies them as part of its original construction. We agree that the current manuscript does not include explicit validation (density statistics or manual ground-truth comparison), which is a limitation for claims involving near-single-pixel objects. In the revised version we will add to §4: (i) point-density statistics stratified by UAV scale, and (ii) a validation subset in which independent manual location annotations are compared against the source points and the resulting PIC boxes. This will directly test whether systematic offsets exist and whether the reported IoU advantage remains diagnostic. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction without derivation chain

full rationale

The paper introduces UAVDB via a point-guided pipeline (PIC for boxes from trajectories, SAM2 for masks) and reports empirical IoU benchmarks plus YOLO baselines. No equations, parameter fitting, predictions, or self-citation chains appear in the provided text. The central claims rest on the described annotation process and external comparisons rather than any reduction of outputs to inputs by construction. This matches the reader's 0.0 assessment; honest non-finding applies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset construction paper with no free parameters, mathematical axioms, or invented entities; relies on standard computer-vision tools (SAM2) and the existence of trajectory points in the source video.

pith-pipeline@v0.9.0 · 5696 in / 1125 out tokens · 21552 ms · 2026-05-23T20:30:34.351300+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 4 internal anchors

[1]

Bounding box priors for cell detection with point anno- tations

Hari Om Aggrawal, Dipam Goswami, and Vinti Agarwal. Bounding box priors for cell detection with point anno- tations. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023. 3

work page 2023
[2]

Drone dataset: Amateur un- manned air vehicle detection

Mehmet C ¸ agrı Aksoy, Alp Sezer Orak, Hasan Mertcan ¨Ozkan, and Bilgin Selimoglu. Drone dataset: Amateur un- manned air vehicle detection. Mendeley Data, 4:2019, 2019. 2

work page 2019
[3]

Image Segmentation by Using Threshold Techniques

Salem Saleh Al-Amri, Namdeo V Kalyankar, et al. Image segmentation by using threshold techniques. arXiv preprint arXiv:1005.4020, 2010. 3, 4, 5

work page internal anchor Pith review Pith/arXiv arXiv 2010
[4]

mta dataset

aydin. mta dataset. https://universe.roboflow. com/aydin/mta-rwowu , 2024. visited on 2025-07-16. 2

work page 2024
[5]

Okutama-action: An aerial view video dataset for concurrent human action detection

Mohammadamin Barekatain, Miquel Mart ´ı, Hsueh-Fu Shih, Samuel Murray, Kotaro Nakayama, Yutaka Matsuo, and Hel- mut Prendinger. Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceed- ings of the IEEE conference on computer vision and pattern recognition workshops, pages 28–35, 2017. 1

work page 2017
[6]

Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance

Ilker Bozcan and Erdal Kayacan. Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 8504–8510. IEEE, 2020. 1

work page 2020
[7]

Leveraging point annotations in segmentation learning with boundary loss

Eva Breznik, Hoel Kervadec, Filip Malmberg, Joel Kullberg, H˚akan Ahlstr ¨om, Marleen de Bruijne, and Robin Strand. Leveraging point annotations in segmentation learning with boundary loss. In International Conference on Pattern Recognition, pages 194–210. Springer, 2024. 3

work page 2024
[8]

End-to- end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. In European confer- ence on computer vision, pages 213–229. Springer, 2020. 1

work page 2020
[9]

Points as queries: Weakly semi-supervised object 7 Figure 5

Liangyu Chen, Tong Yang, Xiangyu Zhang, Wei Zhang, and Jian Sun. Points as queries: Weakly semi-supervised object 7 Figure 5. Sequential tracking results predicted by YOLOv12n-seg [60] on the entirely unseen Dataset 5. Top: Camera 3. Bottom: Camera

work page
[10]

detection by points

Left to right shows consecutive video frames. detection by points. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 8823–8832, 2021. 3

work page 2021
[11]

P2object: Single point supervised object detection and in- stance segmentation

Pengfei Chen, Xuehui Yu, Xumeng Han, Kuiran Wang, Guorong Li, Lingxi Xie, Zhenjun Han, and Jianbin Jiao. P2object: Single point supervised object detection and in- stance segmentation. International Journal of Computer Vi- sion, pages 1–25, 2025. 3

work page 2025
[12]

Drone dataset

ConcordiaNA VLab. Drone dataset. https : / / universe . roboflow . com / concordianavlab / drone-9ab2n, 2023. visited on 2025-07-16. 2

work page 2023
[13]

Weakly semi-supervised infrared small target de- tection guided by point labels

Xiaolong Cui, Xingxiu Li, Panlong Wu, Shan He, and Ruo- han Zhao. Weakly semi-supervised infrared small target de- tection guided by point labels. IEEE Transactions on Geo- science and Remote Sensing, 2025. 3

work page 2025
[14]

At- tentional Local Contrast Networks for Infrared Small Target Detection

Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. At- tentional Local Contrast Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–12, 2021. 2

work page 2021
[15]

Asymmetric contextual modulation for infrared small target detection

Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. Asymmetric contextual modulation for infrared small target detection. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 2021

work page 2021
[16]

One-Stage Cascade Refinement Networks for Infrared Small Target Detection

Yimian Dai, Xiang Li, Fei Zhou, Yulei Qian, Yaohong Chen, and Jian Yang. One-Stage Cascade Refinement Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–17, 2023. 2

work page 2023
[17]

Object detection in aerial im- ages: A large-scale benchmark and challenges

Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Michael Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, et al. Object detection in aerial im- ages: A large-scale benchmark and challenges. IEEE trans- actions on pattern analysis and machine intelligence, 44(11): 7778–7796, 2021. 1

work page 2021
[18]

Drone dataset

Drone. Drone dataset. https : / / universe . roboflow . com / drone - blb9h / drone - evttd ,

work page
[20]

The unmanned aerial vehicle benchmark: Object detection and tracking

Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European con- ference on computer vision (ECCV) , pages 370–386, 2018. 1

work page 2018
[21]

Caniget- theuploadactuallyworking dataset

flippinggreatwodgesofdroneimages1. Caniget- theuploadactuallyworking dataset. https : / / universe . roboflow . com / flippinggreatwodgesofdroneimages1 / canigettheuploadactuallyworking, 2022. visited on 2025-07-16. 2

work page 2022
[22]

Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection

Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, and Feng Zhao. Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1797–1805, 2024. 3

work page 2024
[23]

Point-teaching: weakly semi- supervised object detection with point annotations

Yongtao Ge, Qiang Zhou, Xinlong Wang, Chunhua Shen, Zhibin Wang, and Hao Li. Point-teaching: weakly semi- supervised object detection with point annotations. In Pro- ceedings of the AAAI Conference on Artificial Intelligence , pages 667–675, 2023. 3

work page 2023
[24]

mobile net dataset

Ganta Gourish. mobile net dataset. https://universe. roboflow . com / ganta - gourish / mobile - net,

work page
[25]

visited on 2025-07-16. 2

work page 2025
[26]

Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion

Hanqing Guo, Xiuxiu Lin, and Shiyu Zhao. Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion. arXiv preprint arXiv:2503.07115, 2025. 2

work page arXiv 2025
[27]

Drone- based object counting by spatially regularized regional pro- posal network

Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. Drone- based object counting by spatially regularized regional pro- posal network. In Proceedings of the IEEE international conference on computer vision, pages 4145–4153, 2017. 1

work page 2017
[28]

Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild

Bo Huang, Jianan Li, Junjie Chen, Gang Wang, Jian Zhao, and Tingfa Xu. Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild. T- PAMI, 2023. 2

work page 2023
[29]

Anti-uav: a large-scale benchmark for vision-based uav tracking

Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Qixiang Ye, Jianbin Jiao, Zhenjun Han, et al. Anti-uav: a large-scale benchmark for vision-based uav tracking. T-MM, 2021. 2

work page 2021
[30]

Ultralytics yolo11, 2024

Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 1, 2, 5, 6, 7 8

work page 2024
[31]

YOLO by Ultralytics, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. YOLO by Ultralytics, 2023. 1, 2, 5, 6, 7

work page 2023
[32]

dron3 dataset

Aniket Jog. dron3 dataset. https : / / universe . roboflow.com/aniket-jog-0whc0/dron3 , 2023. visited on 2025-07-16. 2

work page 2023
[33]

Dronesurf: Benchmark dataset for drone-based face recognition

Isha Kalra, Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa, and PB Sujit. Dronesurf: Benchmark dataset for drone-based face recognition. In2019 14th IEEE Interna- tional Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–7. IEEE, 2019. 1

work page 2019
[34]

Sky monitoring system for flying object detection us- ing 4k resolution camera

Takehiro Kashiyama, Hideaki Sobue, and Yoshihide Seki- moto. Sky monitoring system for flying object detection us- ing 4k resolution camera. Sensors, 20(24):7071, 2020. 2

work page 2020
[35]

The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation

Beomyoung Kim, Joonhyun Jeong, Dongyoon Han, and Sung Ju Hwang. The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 11360– 11370, 2023. 3

work page 2023
[36]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3, 4, 5

work page 2023
[37]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception

Mengqi Lei, Siqi Li, Yihong Wu, Han Hu, You Zhou, Xinhu Zheng, Guiguang Ding, Shaoyi Du, Zongze Wu, and Yue Gao. Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733, 2025. 1, 2, 5, 6, 7

work page arXiv 2025
[38]

Monte carlo linear clustering with single-point supervision is enough for infrared small target detection

Boyang Li, Yingqian Wang, Longguang Wang, Fei Zhang, Ting Liu, Zaiping Lin, Wei An, and Yulan Guo. Monte carlo linear clustering with single-point supervision is enough for infrared small target detection. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1009–1019, 2023. 3

work page 2023
[39]

A level set annotation framework with single-point supervision for infrared small target detection

Haoqing Li, Jinfu Yang, Yifei Xu, and Runshi Wang. A level set annotation framework with single-point supervision for infrared small target detection. IEEE Signal Processing Let- ters, 31:451–455, 2024. 3

work page 2024
[40]

Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs)

Jing Li, Dong Hye Ye, Timothy Chung, Mathias Kolsch, Juan Wachs, and Charles Bouman. Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs). In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4992–4997. IEEE, 2016. 2

work page 2016
[41]

Reconstruction of 3d flight trajectories from ad-hoc camera networks

Jingtong Li, Jesse Murray, Dorina Ismaili, Konrad Schindler, and Cenek Albl. Reconstruction of 3d flight trajectories from ad-hoc camera networks. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1621–1628. IEEE, 2020. 1, 2, 3, 4, 7

work page 2020
[42]

Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els

Siyi Li and Dit-Yan Yeung. Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els. In Proceedings of the AAAI conference on artificial in- telligence, 2017. 1

work page 2017
[43]

Weakly semi- supervised object detection with point annotations in reti- nal oct images

Xiaoming Liu, Xin Zhu, and Jinshan Tang. Weakly semi- supervised object detection with point annotations in reti- nal oct images. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) , pages 3991–3995. IEEE, 2023. 3

work page 2023
[44]

Pointobb: Learning oriented object de- tection via single point supervision

Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, and Yansheng Li. Pointobb: Learning oriented object de- tection via single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16730–16740, 2024. 3

work page 2024
[45]

Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos

Murari Mandal, Lav Kush Kumar, and Santosh Kumar Vip- parthi. Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos. In Proceedings of the 28th ACM international conference on multimedia, pages 2626–2635, 2020. 1

work page 2020
[46]

Polo–point-based, multi-class animal detec- tion

Giacomo May, Emanuele Dalsasso, Benjamin Kellenberger, and Devis Tuia. Polo–point-based, multi-class animal detec- tion. In European Conference on Computer Vision , pages 169–177. Springer, 2024. 3

work page 2024
[47]

Spartan hpc-cloud hybrid: delivering performance and flexibility

Bernard Meade, Lev Lafayette, Greg Sauter, and Daniel Tosello. Spartan hpc-cloud hybrid: delivering performance and flexibility. University of Melbourne, 10:49, 2017. 5

work page 2017
[48]

The common objects underwater (cou) dataset for robust underwater object detection

Rishi Mukherjee, Sakshi Singh, Jack McWilliams, and Ju- naed Sattar. The common objects underwater (cou) dataset for robust underwater object detection. arXiv preprint arXiv:2502.20651, 2025. 5

work page arXiv 2025
[49]

A large contextual dataset for classifica- tion, detection and counting of cars with deep learning

T Nathan Mundhenk, Goran Konjevod, Wesam A Sakla, and Kofi Boakye. A large contextual dataset for classifica- tion, detection and counting of cars with deep learning. In European conference on computer vision , pages 785–800. Springer, 2016. 1

work page 2016
[50]

Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection

Maciej Pawełczyk and Marek Wojtyra. Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection. IEEE Access, 8:174394–174409, 2020. 2

work page 2020
[51]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024. 2, 3, 4, 5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Vehicle detec- tion in aerial imagery: A small target detection benchmark

Sebastien Razakarivony and Frederic Jurie. Vehicle detec- tion in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation, 34:187–203, 2016. 1

work page 2016
[53]

Airborne object tracking dataset,

AWS Open Data Registry. Airborne object tracking dataset,

work page
[54]

19, 2025

Accessed: Feb. 19, 2025. 2

work page 2025
[55]

Learning social etiquette: Human tra- jectory understanding in crowded scenes

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human tra- jectory understanding in crowded scenes. In European con- ference on computer vision, pages 549–565. Springer, 2016. 1

work page 2016
[56]

Rf- detr

Isaac Robinson, Peter Robicheaux, and Matvei Popov. Rf- detr. https://github.com/roboflow/rf-detr ,

work page
[57]

SOTA Real-Time Object Detection Model. 1

work page
[58]

” grabcut” interactive foreground extraction using iterated graph cuts

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ” grabcut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG) , 23(3): 309–314, 2004. 3, 4, 5

work page 2004
[59]

Flying objects detection from a single moving camera

Artem Rozantsev, Vincent Lepetit, and Pascal Fua. Flying objects detection from a single moving camera. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 4128–4136, 2015. 2 9

work page 2015
[60]

Segmentationdrones dataset

SegmentDrones. Segmentationdrones dataset. https: / / universe . roboflow . com / segmentdrones / segmentationdrones, 2023. visited on 2025-07-16. 2

work page 2023
[61]

The aircraft context dataset: Understanding and optimizing data variability in aerial domains

Daniel Steininger, Verena Widhalm, Julia Simon, Andreas Kriegler, and Christoph Sulzbachner. The aircraft context dataset: Understanding and optimizing data variability in aerial domains. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision , pages 3823–3832,

work page
[62]

Efficientdet: Scalable and efficient object detection

Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020. 1

work page 2020
[63]

Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images

Ziqian Tan and Chen Wu. Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024. 3

work page 2024
[64]

Weakly semi-supervised ori- ented with points for remote sensing vehicle detection

Ziqian Tan and Chen Wu. Weakly semi-supervised ori- ented with points for remote sensing vehicle detection. In IGARSS 2024-2024 IEEE International Geoscience and Re- mote Sensing Symposium, pages 9294–9297. IEEE, 2024. 3

work page 2024
[65]

Yolov12: Attention-centric real-time object detectors, 2025

Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors, 2025. 1, 2, 5, 6, 7, 8

work page 2025
[66]

Utilizing class-agnostic point-to-box regressors as object proposal generators

Gulin Tufekci Dogan, Ramazan Gokberk Cinbis, and Ilkay Ulusoy. Utilizing class-agnostic point-to-box regressors as object proposal generators. In European Conference on Computer Vision, pages 253–269. Springer, 2024. 3

work page 2024
[67]

Yolov10: Real-time end-to-end object detection,

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end- to-end object detection. arXiv preprint arXiv:2405.14458 ,

work page arXiv
[68]

Yolov9: Learning what you want to learn using programmable gradient information,

Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn us- ing programmable gradient information. arXiv preprint arXiv:2402.13616, 2024. 1, 2, 5, 6, 7

work page arXiv 2024
[69]

Tiny object detection in aerial images

Jinwang Wang, Wen Yang, Haowen Guo, Ruixiang Zhang, and Gui-Song Xia. Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR), pages 3791–3798. IEEE, 2021. 1

work page 2020
[70]

Point-to-rbox net- work for oriented object detection via single point supervi- sion

Yucheng Wang, Chu He, and Xi Chen. Point-to-rbox net- work for oriented object detection via single point supervi- sion. In BMVC, pages 323–325, 2023. 3

work page 2023
[71]

Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points

Sanjoeng Wong. Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points. arXiv preprint arXiv:2412.18918 ,

work page arXiv
[72]

Air-detect dataset

WorkspaceTest1. Air-detect dataset. https : //universe.roboflow.com/workspacetest1- t9dog/air-detect, 2025. visited on 2025-07-16. 2

work page 2025
[73]

Uavd4l: A large-scale dataset for uav 6-dof localization

Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Mao- jun Zhang, and Shen Yan. Uavd4l: A large-scale dataset for uav 6-dof localization. arXiv preprint arXiv:2401.05971,

work page arXiv
[74]

Dota: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018

work page 2018
[75]

Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark

Chang Xu, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, and Gui-Song Xia. Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. IS- PRS Journal of Photogrammetry and Remote Sensing , 190: 79–93, 2022. 1

work page 2022
[76]

Deep GrabCut for Object Selection

Ning Xu, Brian Price, Scott Cohen, Jimei Yang, and Thomas Huang. Deep grabcut for object selection. arXiv preprint arXiv:1707.00243, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[77]

Position-based anchor opti- mization for point supervised dense nuclei detection

Jieru Yao, Longfei Han, Guangyu Guo, Zhaohui Zheng, Runmin Cong, Xiankai Huang, Jin Ding, Kaihui Yang, Ding- wen Zhang, and Junwei Han. Position-based anchor opti- mization for point supervised dense nuclei detection. Neural Networks, 171:159–170, 2024. 3

work page 2024
[78]

Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision

Xinyi Ying, Li Liu, Yingqian Wang, Ruojing Li, Nuo Chen, Zaiping Lin, Weidong Sheng, and Shilin Zhou. Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15528–15538, 2023. 3

work page 2023
[79]

Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision

Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, and Junchi Yan. Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16783–16793, 2024

work page 2024
[80]

Group r-cnn for weakly semi- supervised object detection with points

Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Wang, Aojun Zhou, and Kai Chen. Group r-cnn for weakly semi- supervised object detection with points. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9417–9426, 2022

work page 2022
[81]

Weakly semi-supervised oriented object detection with points

Ziming Zhang, Yucheng Wang, Chu He, Qingyi Zhang, and Xi Chen. Weakly semi-supervised oriented object detection with points. In 2023 IEEE International Conference on Im- age Processing (ICIP), pages 3080–3084. IEEE, 2023. 3

work page 2023

Showing first 80 references.

[1] [1]

Bounding box priors for cell detection with point anno- tations

Hari Om Aggrawal, Dipam Goswami, and Vinti Agarwal. Bounding box priors for cell detection with point anno- tations. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023. 3

work page 2023

[2] [2]

Drone dataset: Amateur un- manned air vehicle detection

Mehmet C ¸ agrı Aksoy, Alp Sezer Orak, Hasan Mertcan ¨Ozkan, and Bilgin Selimoglu. Drone dataset: Amateur un- manned air vehicle detection. Mendeley Data, 4:2019, 2019. 2

work page 2019

[3] [3]

Image Segmentation by Using Threshold Techniques

Salem Saleh Al-Amri, Namdeo V Kalyankar, et al. Image segmentation by using threshold techniques. arXiv preprint arXiv:1005.4020, 2010. 3, 4, 5

work page internal anchor Pith review Pith/arXiv arXiv 2010

[4] [4]

mta dataset

aydin. mta dataset. https://universe.roboflow. com/aydin/mta-rwowu , 2024. visited on 2025-07-16. 2

work page 2024

[5] [5]

Okutama-action: An aerial view video dataset for concurrent human action detection

Mohammadamin Barekatain, Miquel Mart ´ı, Hsueh-Fu Shih, Samuel Murray, Kotaro Nakayama, Yutaka Matsuo, and Hel- mut Prendinger. Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceed- ings of the IEEE conference on computer vision and pattern recognition workshops, pages 28–35, 2017. 1

work page 2017

[6] [6]

Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance

Ilker Bozcan and Erdal Kayacan. Au-air: A multi-modal un- manned aerial vehicle dataset for low altitude traffic surveil- lance. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 8504–8510. IEEE, 2020. 1

work page 2020

[7] [7]

Leveraging point annotations in segmentation learning with boundary loss

Eva Breznik, Hoel Kervadec, Filip Malmberg, Joel Kullberg, H˚akan Ahlstr ¨om, Marleen de Bruijne, and Robin Strand. Leveraging point annotations in segmentation learning with boundary loss. In International Conference on Pattern Recognition, pages 194–210. Springer, 2024. 3

work page 2024

[8] [8]

End-to- end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. In European confer- ence on computer vision, pages 213–229. Springer, 2020. 1

work page 2020

[9] [9]

Points as queries: Weakly semi-supervised object 7 Figure 5

Liangyu Chen, Tong Yang, Xiangyu Zhang, Wei Zhang, and Jian Sun. Points as queries: Weakly semi-supervised object 7 Figure 5. Sequential tracking results predicted by YOLOv12n-seg [60] on the entirely unseen Dataset 5. Top: Camera 3. Bottom: Camera

work page

[10] [10]

detection by points

Left to right shows consecutive video frames. detection by points. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 8823–8832, 2021. 3

work page 2021

[11] [11]

P2object: Single point supervised object detection and in- stance segmentation

Pengfei Chen, Xuehui Yu, Xumeng Han, Kuiran Wang, Guorong Li, Lingxi Xie, Zhenjun Han, and Jianbin Jiao. P2object: Single point supervised object detection and in- stance segmentation. International Journal of Computer Vi- sion, pages 1–25, 2025. 3

work page 2025

[12] [12]

Drone dataset

ConcordiaNA VLab. Drone dataset. https : / / universe . roboflow . com / concordianavlab / drone-9ab2n, 2023. visited on 2025-07-16. 2

work page 2023

[13] [13]

Weakly semi-supervised infrared small target de- tection guided by point labels

Xiaolong Cui, Xingxiu Li, Panlong Wu, Shan He, and Ruo- han Zhao. Weakly semi-supervised infrared small target de- tection guided by point labels. IEEE Transactions on Geo- science and Remote Sensing, 2025. 3

work page 2025

[14] [14]

At- tentional Local Contrast Networks for Infrared Small Target Detection

Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. At- tentional Local Contrast Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–12, 2021. 2

work page 2021

[15] [15]

Asymmetric contextual modulation for infrared small target detection

Yimian Dai, Yiquan Wu, Fei Zhou, and Kobus Barnard. Asymmetric contextual modulation for infrared small target detection. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 2021

work page 2021

[16] [16]

One-Stage Cascade Refinement Networks for Infrared Small Target Detection

Yimian Dai, Xiang Li, Fei Zhou, Yulei Qian, Yaohong Chen, and Jian Yang. One-Stage Cascade Refinement Networks for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, pages 1–17, 2023. 2

work page 2023

[17] [17]

Object detection in aerial im- ages: A large-scale benchmark and challenges

Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Michael Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, et al. Object detection in aerial im- ages: A large-scale benchmark and challenges. IEEE trans- actions on pattern analysis and machine intelligence, 44(11): 7778–7796, 2021. 1

work page 2021

[18] [18]

Drone dataset

Drone. Drone dataset. https : / / universe . roboflow . com / drone - blb9h / drone - evttd ,

work page

[19] [20]

The unmanned aerial vehicle benchmark: Object detection and tracking

Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European con- ference on computer vision (ECCV) , pages 370–386, 2018. 1

work page 2018

[20] [21]

Caniget- theuploadactuallyworking dataset

flippinggreatwodgesofdroneimages1. Caniget- theuploadactuallyworking dataset. https : / / universe . roboflow . com / flippinggreatwodgesofdroneimages1 / canigettheuploadactuallyworking, 2022. visited on 2025-07-16. 2

work page 2022

[21] [22]

Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection

Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, and Feng Zhao. Leveraging imagery data with spatial point prior for weakly semi-supervised 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1797–1805, 2024. 3

work page 2024

[22] [23]

Point-teaching: weakly semi- supervised object detection with point annotations

Yongtao Ge, Qiang Zhou, Xinlong Wang, Chunhua Shen, Zhibin Wang, and Hao Li. Point-teaching: weakly semi- supervised object detection with point annotations. In Pro- ceedings of the AAAI Conference on Artificial Intelligence , pages 667–675, 2023. 3

work page 2023

[23] [24]

mobile net dataset

Ganta Gourish. mobile net dataset. https://universe. roboflow . com / ganta - gourish / mobile - net,

work page

[24] [25]

visited on 2025-07-16. 2

work page 2025

[25] [26]

Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion

Hanqing Guo, Xiuxiu Lin, and Shiyu Zhao. Yolomg: Vision- based drone-to-drone detection with appearance and pixel- level motion fusion. arXiv preprint arXiv:2503.07115, 2025. 2

work page arXiv 2025

[26] [27]

Drone- based object counting by spatially regularized regional pro- posal network

Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. Drone- based object counting by spatially regularized regional pro- posal network. In Proceedings of the IEEE international conference on computer vision, pages 4145–4153, 2017. 1

work page 2017

[27] [28]

Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild

Bo Huang, Jianan Li, Junjie Chen, Gang Wang, Jian Zhao, and Tingfa Xu. Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild. T- PAMI, 2023. 2

work page 2023

[28] [29]

Anti-uav: a large-scale benchmark for vision-based uav tracking

Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Qixiang Ye, Jianbin Jiao, Zhenjun Han, et al. Anti-uav: a large-scale benchmark for vision-based uav tracking. T-MM, 2021. 2

work page 2021

[29] [30]

Ultralytics yolo11, 2024

Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 1, 2, 5, 6, 7 8

work page 2024

[30] [31]

YOLO by Ultralytics, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. YOLO by Ultralytics, 2023. 1, 2, 5, 6, 7

work page 2023

[31] [32]

dron3 dataset

Aniket Jog. dron3 dataset. https : / / universe . roboflow.com/aniket-jog-0whc0/dron3 , 2023. visited on 2025-07-16. 2

work page 2023

[32] [33]

Dronesurf: Benchmark dataset for drone-based face recognition

Isha Kalra, Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa, and PB Sujit. Dronesurf: Benchmark dataset for drone-based face recognition. In2019 14th IEEE Interna- tional Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–7. IEEE, 2019. 1

work page 2019

[33] [34]

Sky monitoring system for flying object detection us- ing 4k resolution camera

Takehiro Kashiyama, Hideaki Sobue, and Yoshihide Seki- moto. Sky monitoring system for flying object detection us- ing 4k resolution camera. Sensors, 20(24):7071, 2020. 2

work page 2020

[34] [35]

The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation

Beomyoung Kim, Joonhyun Jeong, Dongyoon Han, and Sung Ju Hwang. The devil is in the points: Weakly semi- supervised instance segmentation via point-guided mask rep- resentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 11360– 11370, 2023. 3

work page 2023

[35] [36]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3, 4, 5

work page 2023

[36] [37]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception

Mengqi Lei, Siqi Li, Yihong Wu, Han Hu, You Zhou, Xinhu Zheng, Guiguang Ding, Shaoyi Du, Zongze Wu, and Yue Gao. Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733, 2025. 1, 2, 5, 6, 7

work page arXiv 2025

[37] [38]

Monte carlo linear clustering with single-point supervision is enough for infrared small target detection

Boyang Li, Yingqian Wang, Longguang Wang, Fei Zhang, Ting Liu, Zaiping Lin, Wei An, and Yulan Guo. Monte carlo linear clustering with single-point supervision is enough for infrared small target detection. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1009–1019, 2023. 3

work page 2023

[38] [39]

A level set annotation framework with single-point supervision for infrared small target detection

Haoqing Li, Jinfu Yang, Yifei Xu, and Runshi Wang. A level set annotation framework with single-point supervision for infrared small target detection. IEEE Signal Processing Let- ters, 31:451–455, 2024. 3

work page 2024

[39] [40]

Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs)

Jing Li, Dong Hye Ye, Timothy Chung, Mathias Kolsch, Juan Wachs, and Charles Bouman. Multi-target detection and tracking from a single camera in unmanned aerial ve- hicles (uavs). In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4992–4997. IEEE, 2016. 2

work page 2016

[40] [41]

Reconstruction of 3d flight trajectories from ad-hoc camera networks

Jingtong Li, Jesse Murray, Dorina Ismaili, Konrad Schindler, and Cenek Albl. Reconstruction of 3d flight trajectories from ad-hoc camera networks. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1621–1628. IEEE, 2020. 1, 2, 3, 4, 7

work page 2020

[41] [42]

Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els

Siyi Li and Dit-Yan Yeung. Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els. In Proceedings of the AAAI conference on artificial in- telligence, 2017. 1

work page 2017

[42] [43]

Weakly semi- supervised object detection with point annotations in reti- nal oct images

Xiaoming Liu, Xin Zhu, and Jinshan Tang. Weakly semi- supervised object detection with point annotations in reti- nal oct images. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) , pages 3991–3995. IEEE, 2023. 3

work page 2023

[43] [44]

Pointobb: Learning oriented object de- tection via single point supervision

Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, and Yansheng Li. Pointobb: Learning oriented object de- tection via single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16730–16740, 2024. 3

work page 2024

[44] [45]

Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos

Murari Mandal, Lav Kush Kumar, and Santosh Kumar Vip- parthi. Mor-uav: A benchmark dataset and baselines for moving object recognition in uav videos. In Proceedings of the 28th ACM international conference on multimedia, pages 2626–2635, 2020. 1

work page 2020

[45] [46]

Polo–point-based, multi-class animal detec- tion

Giacomo May, Emanuele Dalsasso, Benjamin Kellenberger, and Devis Tuia. Polo–point-based, multi-class animal detec- tion. In European Conference on Computer Vision , pages 169–177. Springer, 2024. 3

work page 2024

[46] [47]

Spartan hpc-cloud hybrid: delivering performance and flexibility

Bernard Meade, Lev Lafayette, Greg Sauter, and Daniel Tosello. Spartan hpc-cloud hybrid: delivering performance and flexibility. University of Melbourne, 10:49, 2017. 5

work page 2017

[47] [48]

The common objects underwater (cou) dataset for robust underwater object detection

Rishi Mukherjee, Sakshi Singh, Jack McWilliams, and Ju- naed Sattar. The common objects underwater (cou) dataset for robust underwater object detection. arXiv preprint arXiv:2502.20651, 2025. 5

work page arXiv 2025

[48] [49]

A large contextual dataset for classifica- tion, detection and counting of cars with deep learning

T Nathan Mundhenk, Goran Konjevod, Wesam A Sakla, and Kofi Boakye. A large contextual dataset for classifica- tion, detection and counting of cars with deep learning. In European conference on computer vision , pages 785–800. Springer, 2016. 1

work page 2016

[49] [50]

Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection

Maciej Pawełczyk and Marek Wojtyra. Real world object detection dataset for quadcopter unmanned aerial vehicle de- tection. IEEE Access, 8:174394–174409, 2020. 2

work page 2020

[50] [51]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024. 2, 3, 4, 5

work page internal anchor Pith review Pith/arXiv arXiv 2024

[51] [52]

Vehicle detec- tion in aerial imagery: A small target detection benchmark

Sebastien Razakarivony and Frederic Jurie. Vehicle detec- tion in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation, 34:187–203, 2016. 1

work page 2016

[52] [53]

Airborne object tracking dataset,

AWS Open Data Registry. Airborne object tracking dataset,

work page

[53] [54]

19, 2025

Accessed: Feb. 19, 2025. 2

work page 2025

[54] [55]

Learning social etiquette: Human tra- jectory understanding in crowded scenes

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human tra- jectory understanding in crowded scenes. In European con- ference on computer vision, pages 549–565. Springer, 2016. 1

work page 2016

[55] [56]

Rf- detr

Isaac Robinson, Peter Robicheaux, and Matvei Popov. Rf- detr. https://github.com/roboflow/rf-detr ,

work page

[56] [57]

SOTA Real-Time Object Detection Model. 1

work page

[57] [58]

” grabcut” interactive foreground extraction using iterated graph cuts

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ” grabcut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG) , 23(3): 309–314, 2004. 3, 4, 5

work page 2004

[58] [59]

Flying objects detection from a single moving camera

Artem Rozantsev, Vincent Lepetit, and Pascal Fua. Flying objects detection from a single moving camera. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 4128–4136, 2015. 2 9

work page 2015

[59] [60]

Segmentationdrones dataset

SegmentDrones. Segmentationdrones dataset. https: / / universe . roboflow . com / segmentdrones / segmentationdrones, 2023. visited on 2025-07-16. 2

work page 2023

[60] [61]

The aircraft context dataset: Understanding and optimizing data variability in aerial domains

Daniel Steininger, Verena Widhalm, Julia Simon, Andreas Kriegler, and Christoph Sulzbachner. The aircraft context dataset: Understanding and optimizing data variability in aerial domains. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision , pages 3823–3832,

work page

[61] [62]

Efficientdet: Scalable and efficient object detection

Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020. 1

work page 2020

[62] [63]

Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images

Ziqian Tan and Chen Wu. Point-based weakly semi- supervised oriented vehicle detection in optical remote sens- ing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024. 3

work page 2024

[63] [64]

Weakly semi-supervised ori- ented with points for remote sensing vehicle detection

Ziqian Tan and Chen Wu. Weakly semi-supervised ori- ented with points for remote sensing vehicle detection. In IGARSS 2024-2024 IEEE International Geoscience and Re- mote Sensing Symposium, pages 9294–9297. IEEE, 2024. 3

work page 2024

[64] [65]

Yolov12: Attention-centric real-time object detectors, 2025

Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors, 2025. 1, 2, 5, 6, 7, 8

work page 2025

[65] [66]

Utilizing class-agnostic point-to-box regressors as object proposal generators

Gulin Tufekci Dogan, Ramazan Gokberk Cinbis, and Ilkay Ulusoy. Utilizing class-agnostic point-to-box regressors as object proposal generators. In European Conference on Computer Vision, pages 253–269. Springer, 2024. 3

work page 2024

[66] [67]

Yolov10: Real-time end-to-end object detection,

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end- to-end object detection. arXiv preprint arXiv:2405.14458 ,

work page arXiv

[67] [68]

Yolov9: Learning what you want to learn using programmable gradient information,

Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn us- ing programmable gradient information. arXiv preprint arXiv:2402.13616, 2024. 1, 2, 5, 6, 7

work page arXiv 2024

[68] [69]

Tiny object detection in aerial images

Jinwang Wang, Wen Yang, Haowen Guo, Ruixiang Zhang, and Gui-Song Xia. Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR), pages 3791–3798. IEEE, 2021. 1

work page 2020

[69] [70]

Point-to-rbox net- work for oriented object detection via single point supervi- sion

Yucheng Wang, Chu He, and Xi Chen. Point-to-rbox net- work for oriented object detection via single point supervi- sion. In BMVC, pages 323–325, 2023. 3

work page 2023

[70] [71]

Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points

Sanjoeng Wong. Bcr-net: Boundary-category refinement network for weakly semi-supervised x-ray prohibited item detection with points. arXiv preprint arXiv:2412.18918 ,

work page arXiv

[71] [72]

Air-detect dataset

WorkspaceTest1. Air-detect dataset. https : //universe.roboflow.com/workspacetest1- t9dog/air-detect, 2025. visited on 2025-07-16. 2

work page 2025

[72] [73]

Uavd4l: A large-scale dataset for uav 6-dof localization

Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Mao- jun Zhang, and Shen Yan. Uavd4l: A large-scale dataset for uav 6-dof localization. arXiv preprint arXiv:2401.05971,

work page arXiv

[73] [74]

Dota: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018

work page 2018

[74] [75]

Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark

Chang Xu, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, and Gui-Song Xia. Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. IS- PRS Journal of Photogrammetry and Remote Sensing , 190: 79–93, 2022. 1

work page 2022

[75] [76]

Deep GrabCut for Object Selection

Ning Xu, Brian Price, Scott Cohen, Jimei Yang, and Thomas Huang. Deep grabcut for object selection. arXiv preprint arXiv:1707.00243, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[76] [77]

Position-based anchor opti- mization for point supervised dense nuclei detection

Jieru Yao, Longfei Han, Guangyu Guo, Zhaohui Zheng, Runmin Cong, Xiankai Huang, Jin Ding, Kaihui Yang, Ding- wen Zhang, and Junwei Han. Position-based anchor opti- mization for point supervised dense nuclei detection. Neural Networks, 171:159–170, 2024. 3

work page 2024

[77] [78]

Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision

Xinyi Ying, Li Liu, Yingqian Wang, Ruojing Li, Nuo Chen, Zaiping Lin, Weidong Sheng, and Shilin Zhou. Mapping degeneration meets label evolution: Learning infrared small target detection with single point supervision. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15528–15538, 2023. 3

work page 2023

[78] [79]

Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision

Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, and Junchi Yan. Point2rbox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16783–16793, 2024

work page 2024

[79] [80]

Group r-cnn for weakly semi- supervised object detection with points

Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Wang, Aojun Zhou, and Kai Chen. Group r-cnn for weakly semi- supervised object detection with points. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9417–9426, 2022

work page 2022

[80] [81]

Weakly semi-supervised oriented object detection with points

Ziming Zhang, Yucheng Wang, Chu He, Qingyi Zhang, and Xi Chen. Weakly semi-supervised oriented object detection with points. In 2023 IEEE International Conference on Im- age Processing (ICIP), pages 3080–3084. IEEE, 2023. 3

work page 2023