Recognition: no theorem link
Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark
Pith reviewed 2026-05-13 20:07 UTC · model grok-4.3
The pith
A single point prompt at inference time lets a detector raise small-object localization accuracy by 31 percent over fully supervised baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sparse point prompts supplied at inference time act as an efficient information bridge that augments the weak semantic representations of small objects; the DEAL framework trained on TinySet-9M learns prompt-conditioned features that deliver a 31.4 percent relative improvement in AP75 over fully supervised baselines on that dataset and generalize to unseen categories and unseen datasets.
What carries the argument
The Point-Prompt Small Object Detection (P2SOD) paradigm, in which a single user-supplied point at inference time supplies category-level semantic information to condition the detector.
If this is right
- A single inference-time click produces a 31.4 percent relative AP75 gain over fully supervised training on TinySet-9M.
- The same model transfers to categories absent from its training set.
- The same model transfers to entirely new datasets without retraining.
- Prompt-conditioned representations learned from large-scale data overcome the performance drop that label-efficient methods suffer on small objects.
Where Pith is reading between the lines
- Interactive single-point guidance may reduce the annotation burden for large-scale small-object tasks in domains such as aerial imagery or medical scans.
- Existing label-efficient detectors may require explicit inference-time conditioning rather than training-time feature enhancement when objects occupy very few pixels.
- The benchmark on TinySet-9M provides a concrete test bed for measuring how much semantic information a point must carry to compensate for missing visual detail.
Load-bearing premise
A single point prompt placed by a user will reliably convey enough category identity to overcome the weak visual cues of small objects without introducing placement-dependent bias.
What would settle it
Run DEAL on TinySet-9M with the same point locations replaced by random coordinates inside each image; if the AP75 gain disappears or reverses, the claim that the prompt supplies useful semantic guidance is falsified.
Figures
read the original abstract
Small object detection (SOD) remains challenging due to extremely limited pixels and ambiguous object boundaries. These characteristics lead to challenging annotation, limited availability of large-scale high-quality datasets, and inherently weak semantic representations for small objects. In this work, we first address the data limitation by introducing TinySet-9M, the first large-scale, multi-domain dataset for small object detection. Beyond filling the gap in large-scale datasets, we establish a benchmark to evaluate the effectiveness of existing label-efficient detection methods for small objects. Our evaluation reveals that weak visual cues further exacerbate the performance degradation of label-efficient methods in small object detection, highlighting a critical challenge in label-efficient SOD. Secondly, to tackle the limitation of insufficient semantic representation, we move beyond training-time feature enhancement and propose a new paradigm termed Point-Prompt Small Object Detection (P2SOD). This paradigm introduces sparse point prompts at inference time as an efficient information bridge for category-level localization, enabling semantic augmentation. Building upon the P2SOD paradigm and the large-scale TinySet-9M dataset, we further develop DEAL (DEtect Any smalL object), a scalable and transferable point-prompted detection framework that learns robust, prompt-conditioned representations from large-scale data. With only a single click at inference time, DEAL achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) on TinySet-9M, while generalizing effectively to unseen categories and unseen datasets. Our project is available at https://zhuhaoraneis.github.io/TinySet-9M/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TinySet-9M, the first large-scale multi-domain dataset for small object detection, along with a benchmark evaluating label-efficient methods on small objects. It identifies weak visual cues as a core limitation and proposes the Point-Prompt Small Object Detection (P2SOD) paradigm, which supplies sparse point prompts at inference time to enable category-level semantic augmentation. Building on this, the authors present DEAL, a scalable point-prompted framework trained on the new dataset that achieves a 31.4% relative AP75 improvement over fully supervised baselines on TinySet-9M while generalizing to unseen categories and datasets.
Significance. If the quantitative claims hold under realistic conditions, the work is significant for computer vision research on small object detection. The 9M-scale dataset and benchmark directly address data scarcity and provide a standardized testbed for label-efficient approaches. The P2SOD paradigm offers a practical inference-time mechanism that could lower annotation costs in domains with tiny objects. The reported cross-dataset generalization and project release (code and data) are positive contributions that support reproducibility and further research.
major comments (2)
- [§4] §4 (Experiments on TinySet-9M) and abstract: The 31.4% relative AP75 gain is obtained with point prompts supplied at inference. No ablation or sensitivity analysis is reported for placement offsets of even 2-3 pixels, which for objects spanning only a few pixels would place the prompt outside the bounding box and break the claimed semantic bridge, reducing performance to the weak visual cues identified in the introduction as the central difficulty.
- [§3.2] §3.2 (P2SOD Paradigm): The construction treats the single point as reliably supplying category-level semantic information sufficient to overcome ambiguous boundaries. The manuscript provides no evidence or controlled test that the learned prompt-conditioned representations remain effective when the point misses the object, which is a load-bearing assumption for the single-click usability claim.
minor comments (2)
- [Abstract] Abstract: The claim of 'effective generalization to unseen categories and unseen datasets' is stated without naming the target datasets, metrics, or quantitative deltas; these details should be added for clarity.
- [§4] Figure captions and §4: Several result tables and plots lack error bars, standard deviations across runs, or statistical significance tests, making it difficult to assess the stability of the reported relative gains.
Simulated Author's Rebuttal
We appreciate the referee's insightful comments on our work introducing TinySet-9M and the DEAL framework for point-prompted small object detection. We provide point-by-point responses to the major comments below.
read point-by-point responses
-
Referee: [§4] §4 (Experiments on TinySet-9M) and abstract: The 31.4% relative AP75 gain is obtained with point prompts supplied at inference. No ablation or sensitivity analysis is reported for placement offsets of even 2-3 pixels, which for objects spanning only a few pixels would place the prompt outside the bounding box and break the claimed semantic bridge, reducing performance to the weak visual cues identified in the introduction as the central difficulty.
Authors: We agree that evaluating the sensitivity to point placement offsets is crucial for validating the practical applicability of the P2SOD paradigm, especially given the small size of the objects. The reported results assume points are placed accurately within the object boundaries, consistent with the one-click inference setting. To strengthen the manuscript, we will conduct and report an ablation study simulating random offsets of 1 to 5 pixels around the object center, measuring the impact on AP75. This analysis will be added to Section 4. revision: yes
-
Referee: [§3.2] §3.2 (P2SOD Paradigm): The construction treats the single point as reliably supplying category-level semantic information sufficient to overcome ambiguous boundaries. The manuscript provides no evidence or controlled test that the learned prompt-conditioned representations remain effective when the point misses the object, which is a load-bearing assumption for the single-click usability claim.
Authors: The P2SOD paradigm is designed with the assumption that the user-provided point lies on the target object to supply the necessary semantic cue. We acknowledge the lack of explicit testing for off-object points in the current version. In the revision, we will add experiments where points are deliberately placed outside the bounding boxes (e.g., in background regions) to assess the model's behavior and performance drop. This will help clarify the robustness and limitations of the single-click approach, and we will discuss mitigation strategies if needed. revision: yes
Circularity Check
No circularity: empirical claims rest on new benchmark and external generalization tests
full rationale
The paper introduces TinySet-9M and the P2SOD/DEAL framework as an empirical solution to small-object detection. Reported gains (e.g., 31.4% relative AP75 improvement) are measured on the newly collected dataset after training, with explicit additional evaluation on unseen categories and unseen external datasets. No equations, self-definitional constructions, fitted-input-as-prediction steps, or load-bearing self-citations appear in the provided text that would reduce the central performance claims to the inputs by construction. The derivation chain is therefore self-contained experimental validation rather than circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sparse point prompts at inference time provide reliable category-level localization cues for small objects
Forward citations
Cited by 1 Pith paper
-
UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery
UHR-DETR delivers 2.8% higher mAP and 10x faster inference than sliding-window baselines for small object detection in UHR remote sensing imagery on a single 24GB GPU.
Reference graph
Works this paper leans on
-
[1]
Oriented tiny object detection: A dataset, benchmark, and dynamic unbiased learning,
C. Xu, R. Zhang, W. Yang, H. Zhu, F. Xu, J. Ding, and G.-S. Xia, “Oriented tiny object detection: A dataset, benchmark, and dynamic unbiased learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2025
work page 2025
-
[2]
To- wards large-scale small object detection: Survey and benchmarks,
G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “To- wards large-scale small object detection: Survey and benchmarks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 13 467–13 488, 2023
work page 2023
-
[3]
Microsoft coco: Common objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean Conference on Computer Vision. Springer, 2014, pp. 740–755
work page 2014
-
[4]
Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark,
C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, pp. 79–93, 2022
work page 2022
-
[5]
Robust tiny object detection in aerial images amidst label noise,
H. Zhu, C. Xu, W. Yang, R. Zhang, Y . Zhang, and G.-S. Xia, “Robust tiny object detection in aerial images amidst label noise,”arXiv preprint arXiv:2401.08056, 2024
-
[6]
Scale match for tiny person detection,
X. Yu, Y . Gong, N. Jiang, Q. Ye, and Z. Han, “Scale match for tiny person detection,” inIEEE Workshops on Applications of Computer Vision, 2020, pp. 1257–1265
work page 2020
-
[7]
Visible-thermal tiny object detection: A benchmark dataset and baselines,
X. Ying, C. Xiao, W. An, R. Li, X. He, B. Li, X. Cao, Z. Li, Y . Wang, M. Huet al., “Visible-thermal tiny object detection: A benchmark dataset and baselines,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[8]
Drone- based rgbt tiny person detection,
Y . Zhang, C. Xu, W. Yang, G. He, H. Yu, L. Yu, and G.-S. Xia, “Drone- based rgbt tiny person detection,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 204, pp. 61–76, 2023
work page 2023
-
[9]
Detection and tracking meet drones challenge,
P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and tracking meet drones challenge,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2021
work page 2021
-
[10]
Robust object detection with inaccurate bounding boxes,
C. Liu, K. Wang, H. Lu, Z. Cao, and Z. Zhang, “Robust object detection with inaccurate bounding boxes,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 53–69
work page 2022
-
[11]
Pseco: Pseudo labeling and consistency training for semi-supervised object detection,
G. Li, X. Li, Y . Wang, Y . Wu, D. Liang, and S. Zhang, “Pseco: Pseudo labeling and consistency training for semi-supervised object detection,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 457– 472
work page 2022
-
[12]
Hs-fpn: High frequency and spatial perception fpn for tiny object detection,
Z. Shi, J. Hu, J. Ren, H. Ye, X. Yuan, Y . Ouyang, J. He, B. Ji, and J. Guo, “Hs-fpn: High frequency and spatial perception fpn for tiny object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 6896–6904
work page 2025
-
[13]
Set: Spectral enhancement for tiny object detection,
H. Sun, R. Wang, Y . Li, L. Yang, S. Lin, X. Cao, and B. Zhang, “Set: Spectral enhancement for tiny object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 4713–4723
work page 2025
-
[14]
Label assignment matters: A gaussian assignment strategy for tiny object detection,
F. Zhang, S. Zhou, Y . Wang, X. Wang, and Y . Hou, “Label assignment matters: A gaussian assignment strategy for tiny object detection,”IEEE Transactions on Geoscience and Remote Sensing, 2024
work page 2024
-
[15]
Similarity distance-based label assignment for tiny object detection,
S. Shi, Q. Fang, X. Xu, and T. Zhao, “Similarity distance-based label assignment for tiny object detection,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 13 711– 13 718
work page 2024
-
[16]
Interactive object detection for tiny objects in large remotely sensed images,
M. Burges, S. Zambanini, and R. Sablatnig, “Interactive object detection for tiny objects in large remotely sensed images,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 4704–4713
work page 2025
-
[17]
The pascal visual object classes challenge: A retrospective,
M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,”International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015
work page 2015
-
[18]
Livecell—a large-scale dataset for label-free live cell segmentation,
C. Edlund, T. R. Jackson, N. Khalid, N. Bevan, T. Dale, A. Dengel, S. Ahmed, J. Trygg, and R. Sj ¨ogren, “Livecell—a large-scale dataset for label-free live cell segmentation,”Nature methods, vol. 18, no. 9, pp. 1038–1045, 2021
work page 2021
-
[19]
Similarity distance-based label assignment for tiny object detection,
S. Shi, Q. Fang, X. Xu, and T. Zhao, “Similarity distance-based label assignment for tiny object detection,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 13 711–13 718
work page 2024
-
[20]
Rfla: Gaussian receptive field based label assignment for tiny object detection,
C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Rfla: Gaussian receptive field based label assignment for tiny object detection,” in European Conference on Computer Vision. Springer, 2022, pp. 526– 543
work page 2022
-
[21]
P. Chen, J. Wang, Z. Zhang, and C. He, “Frli-net: Feature reconstruction and learning interaction network for tiny object detection in remote sensing images,”IEEE Signal Processing Letters, vol. 32, pp. 2159– 2163, 2025
work page 2025
-
[22]
Feature information driven position gaussian distribution estimation for tiny object detection,
J. Bian, M. Feng, W. Dong, F. Wu, J. Luo, Y . Wang, and G. Shi, “Feature information driven position gaussian distribution estimation for tiny object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 30 376–30 386
work page 2025
-
[23]
Spatial self- distillation for object detection with inaccurate bounding boxes,
D. Wu, P. Chen, X. Yu, G. Li, Z. Han, and J. Jiao, “Spatial self- distillation for object detection with inaccurate bounding boxes,” in JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17 Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6855–6865
work page 2021
-
[24]
End-to-end semi-supervised object detection with soft teacher,
M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu, “End-to-end semi-supervised object detection with soft teacher,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3060–3069
work page 2021
-
[25]
X. Wang, X. Yang, S. Zhang, Y . Li, L. Feng, S. Fang, C. Lyu, K. Chen, and W. Zhang, “Consistent-teacher: Towards reducing inconsis- tent pseudo-targets in semi-supervised object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3240–3249
work page 2023
-
[26]
Point-to-box network for accurate object detection via single point supervision,
P. Chen, X. Yu, X. Han, N. Hassan, K. Wang, J. Li, J. Zhao, H. Shi, Z. Han, and Q. Ye, “Point-to-box network for accurate object detection via single point supervision,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 51–67
work page 2022
-
[27]
Tiny object detection with single point supervision,
H. Zhu, C. Xu, R. Zhang, F. Xu, W. Yang, H. Zhang, and G.-S. Xia, “Tiny object detection with single point supervision,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 227, pp. 219–233, 2025
work page 2025
-
[28]
T. Zhang, Z. Fan, M. Liu, X. Zhang, X. Lu, W. Li, Y . Zhou, Y . Yu, X. Li, J. Yanet al., “Point2rbox-v3: Self-bootstrapping from point annotations via integrated pseudo-label refinement and utilization,”arXiv preprint arXiv:2509.26281, 2025
-
[29]
Pointobb-v3: Expanding performance boundaries of single point-supervised oriented object detection,
P. Zhang, J. Luo, X. Yang, Y . Yu, Q. Li, Y . Zhou, X. Jia, X. Lu, J. Chen, X. Liet al., “Pointobb-v3: Expanding performance boundaries of single point-supervised oriented object detection,”International Journal of Computer Vision, pp. 1–21, 2025
work page 2025
-
[30]
Calibrated teacher for sparsely annotated object detection,
H. Wang, L. Liu, B. Zhang, J. Zhang, W. Zhang, Z. Gan, Y . Wang, C. Wang, and H. Wang, “Calibrated teacher for sparsely annotated object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2519–2527
work page 2023
-
[31]
Minimizing sample redundancy for label-efficient object detection in aerial images,
R. Zhang, C. Xu, H. Zhu, F. Xu, W. Yang, H. Zhang, and G.-S. Xia, “Minimizing sample redundancy for label-efficient object detection in aerial images,”IEEE Transactions on Geoscience and Remote Sensing, 2025
work page 2025
-
[32]
Co-mining: Self-supervised learning for sparsely annotated object detection,
T. Wang, T. Yang, J. Cao, and X. Zhang, “Co-mining: Self-supervised learning for sparsely annotated object detection,” inProceedings of the AAAI conference on artificial intelligence, vol. 35, no. 4, 2021, pp. 2800– 2808
work page 2021
-
[33]
Sparsedet: Improving sparsely annotated object detection with pseudo-positive mining,
S. Suri, S. Rambhatla, R. Chellappa, and A. Shrivastava, “Sparsedet: Improving sparsely annotated object detection with pseudo-positive mining,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6770–6781
work page 2023
-
[34]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection,
S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Suet al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” inEuropean conference on computer vision. Springer, 2024, pp. 38–55
work page 2024
-
[35]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y . Chen, F. Yanet al., “Grounded sam: Assembling open-world models for diverse visual tasks,”arXiv preprint arXiv:2401.14159, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
work page 2021
-
[37]
SAM 3: Segment Anything with Concepts
N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huanget al., “Sam 3: Segment anything with concepts,”arXiv preprint arXiv:2511.16719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Detect anything via next point prediction,
Q. Jiang, J. Huo, X. Chen, Y . Xiong, Z. Zeng, Y . Chen, T. Ren, J. Yu, and L. Zhang, “Detect anything via next point prediction,”arXiv preprint arXiv:2510.12798, 2025
-
[39]
Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation,
S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation,”Ieee Access, vol. 8, pp. 120 234–120 254, 2020
work page 2020
-
[40]
Isnet: Shape matters for infrared small target detection,
M. Zhang, R. Zhang, Y . Yang, H. Bai, J. Zhang, and J. Guo, “Isnet: Shape matters for infrared small target detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 877–886
work page 2022
-
[41]
T. Zhang, X. Zhang, X. Ke, X. Zhan, J. Shi, S. Wei, D. Pan, J. Li, H. Su, Y . Zhouet al., “Ls-ssdd-v1. 0: A deep learning dataset dedicated to small ship detection from large-scale sentinel-1 sar images,”Remote Sensing, vol. 12, no. 18, p. 2997, 2020
work page 2020
-
[42]
Sardet- 100k: Towards open-source benchmark and toolkit for large-scale sar object detection,
Y . Li, X. Li, W. Li, Q. Hou, L. Liu, M.-M. Cheng, and J. Yang, “Sardet- 100k: Towards open-source benchmark and toolkit for large-scale sar object detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 128 430–128 461, 2024
work page 2024
-
[43]
Y . Lu, Y . Lin, H. Wu, X. Xian, Y . Shi, and L. Lin, “Sirst-5k: Exploring massive negatives synthesis with self-supervised learning for robust infrared small target detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–11, 2024
work page 2024
-
[44]
Crowdhuman: A benchmark for detecting human in a crowd,
S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,”arXiv preprint arXiv:1805.00123, 2018
-
[45]
20k bounding boxes, dflbundesliga data shootout,
enddl22, “20k bounding boxes, dflbundesliga data shootout,” https://www.kaggle.com/datasets/enddl22/ bounding-boxes-dflbundesliga-data-shootout, 2021, kaggle
work page 2021
-
[46]
Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method,
V . A. Sindagi, R. Yasarla, and V . M. Patel, “Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 5, pp. 2594–2609, 2020
work page 2020
-
[47]
Detecting heads using feature refine net and cascaded multi-scale architecture,
D. Peng, Z. Sun, Z. Chen, Z. Cai, L. Xie, and L. Jin, “Detecting heads using feature refine net and cascaded multi-scale architecture,” in2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 2528–2533
work page 2018
-
[48]
H. M. Ahmad and A. Rahimi, “Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,” Journal of Safety Science and Resilience, vol. 6, no. 2, pp. 175– 185, 2025. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S266644962400077X
work page 2025
-
[49]
Precise detection in densely packed scenes,
E. Goldman, R. Herzig, A. Eisenschtat, J. Goldberger, and T. Hassner, “Precise detection in densely packed scenes,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5227–5236
work page 2019
-
[50]
License plate detection dataset,
F. Elmenshawii, “License plate detection dataset,” https://www.kaggle. com/datasets/fareselmenshawii/license-plate-dataset, 2024, kaggle
work page 2024
-
[51]
C. Reich, T. Prangemeier, and H. Koeppl, “The tyc dataset for under- standing instance-level semantics and motions of cells in microstruc- tures,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3940–3951
work page 2023
-
[52]
Visalgae 2023: A dataset and challenge for al- gae detection in microscopy images,
M. Sun, J. Jiang, Z. Yang, S. Kong, J. Qi, J. Shang, S. Luo, W. Sun, T. Wang, Y . Wanget al., “Visalgae 2023: A dataset and challenge for al- gae detection in microscopy images,”arXiv preprint arXiv:2505.20687, 2025
-
[53]
Infrared small object detection for wildlife con- servation,
wangsheng0352, “Infrared small object detection for wildlife con- servation,” https://www.kaggle.com/datasets/wangsheng0352/ir-wildlife, 2024, kaggle
work page 2024
-
[54]
Pest24: A large-scale very small object data set of agricultural pests for multi-target detection,
Q.-J. Wang, S.-Y . Zhang, S.-F. Dong, G.-C. Zhang, J. Yang, R. Li, and H.-Q. Wang, “Pest24: A large-scale very small object data set of agricultural pests for multi-target detection,”Computers and electronics in agriculture, vol. 175, p. 105585, 2020
work page 2020
-
[55]
The detection and counting of olive tree fruits using deep learning models in tacna, per ´u,
E. Osco-Mamani, O. Santana-Carbajal, I. Chaparro-Cruz, D. Ochoa- Donoso, and S. Alcazar-Alay, “The detection and counting of olive tree fruits using deep learning models in tacna, per ´u,”AI, vol. 6, no. 2, p. 25, 2025
work page 2025
-
[56]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” inIEEE International Conference on Computer Vision, 2017, pp. 2980–2988
work page 2017
-
[57]
Faster R-CNN: Towards real- time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- time object detection with region proposal networks,” inAdvances in Neural Information Processing Systems, 2015, pp. 91–99
work page 2015
-
[58]
Fcos: A simple and strong anchor-free object detector,
Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: A simple and strong anchor-free object detector,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1922–1933, 2022
work page 1922
-
[59]
Detrs beat yolos on real-time object detection,
Y . Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y . Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16 965–16 974
work page 2024
-
[60]
B. Ren, X. Yang, Y . Yu, J. Luo, and Z. Deng, “Pointobb-v2: Towards simpler, faster, and stronger single point supervised oriented object detection,”arXiv preprint arXiv:2410.08210, 2024
-
[61]
Object detection in optical remote sensing images: A survey and a new benchmark,
K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 296–307, 2020
work page 2020
-
[62]
Object detection in aerial images: A large-scale benchmark and challenges,
J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y . Yang, S. Belongie, J. Luo, M. Datcu, M. Pelilloet al., “Object detection in aerial images: A large-scale benchmark and challenges,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7778–7796, 2021
work page 2021
-
[63]
xview: Objects in context in overhead imagery,
D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y . Bulatov, and B. McCord, “xview: Objects in context in overhead imagery,”arXiv preprint arXiv:1802.07856, 2018
-
[64]
FCOS: Fully convolutional one- stage object detection,
Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one- stage object detection,” inIEEE International Conference on Computer Vision, 2019, pp. 9627–9636. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18
work page 2019
-
[65]
Geography-aware self-supervised learning,
K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon, “Geography-aware self-supervised learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 181–10 190
work page 2021
-
[66]
Change-aware sampling and con- trastive learning for satellite images,
U. Mall, B. Hariharan, and K. Bala, “Change-aware sampling and con- trastive learning for satellite images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5261–5270
work page 2023
-
[67]
C. Tao, J. Qi, G. Zhang, Q. Zhu, W. Lu, and H. Li, “Tov: The original vision model for optical remote sensing image understanding via self- supervised learning,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 4916–4930, 2023
work page 2023
-
[68]
Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning,
C. J. Reed, R. Gupta, S. Li, S. Brockman, C. Funk, B. Clipp, K. Keutzer, S. Candido, M. Uyttendaele, and T. Darrell, “Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4088–4099
work page 2023
-
[69]
Satlaspretrain: A large-scale dataset for remote sensing image under- standing,
F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “Satlaspretrain: A large-scale dataset for remote sensing image under- standing,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 772–16 782
work page 2023
-
[70]
Ringmo: A remote sensing foundation model with masked image modeling,
X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Changet al., “Ringmo: A remote sensing foundation model with masked image modeling,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–22, 2022
work page 2022
-
[71]
X. Guo, J. Lao, B. Dang, Y . Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Huet al., “Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 672–27 683
work page 2024
-
[72]
Mtp: Advancing remote sensing foundation model via multitask pretraining,
D. Wang, J. Zhang, M. Xu, L. Liu, D. Wang, E. Gao, C. Han, H. Guo, B. Du, D. Taoet al., “Mtp: Advancing remote sensing foundation model via multitask pretraining,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 11 632–11 654, 2024
work page 2024
-
[73]
S. Bai, K. Chen, X. Liu, J. Wang, W. Ge, S. Song, K. Dang, P. Wang, S. Wang, J. Tanget al., “Qwen2. 5-vl technical report,”arXiv preprint arXiv:2502.13923, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
work page 2016
-
[75]
Pvt v2: Improved baselines with pyramid vision transformer,
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Computational visual media, vol. 8, no. 3, pp. 415–424, 2022
work page 2022
-
[76]
O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[77]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.