Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning
Pith reviewed 2026-05-24 20:50 UTC · model grok-4.3
The pith
Less than half a percent of new labels recovers nearly 80 percent of animals in fresh UAV images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Transfer Sampling uses optimal transport to locate corresponding regions between source and target datasets inside the space of CNN activations; the source dataset's animal-likelihood scores then rank target samples so that an active learning loop can retrieve almost 80 percent of the animals after an oracle supplies labels for fewer than 0.5 percent of the candidates, outperforming conventional uncertainty-based selection criteria.
What carries the argument
Transfer Sampling (TS), which applies optimal transport to match CNN activation regions and transfers source confidence rankings to the target dataset.
If this is right
- A detector trained on one year's UAV survey can be reused on later surveys of the same area with only minimal new labeling.
- When positive examples are extremely rare, selecting the most confident rather than the most uncertain samples yields faster retrieval of true positives.
- The window cropping extension reduces the number of full images an oracle must inspect before most animals are found.
- The same pipeline beats uncertainty sampling and random selection on the reported UAV animal datasets.
Where Pith is reading between the lines
- Similar transfer-ranking ideas could reduce labeling needs in other remote-sensing tasks where objects of interest are sparse.
- If the optimal-transport alignment proves stable across more domain shifts, the method might extend to non-UAV imagery with comparable rarity of positives.
- Testing the approach on datasets with different sensor resolutions or animal species would reveal how far the ranking transfer generalizes without retraining.
Load-bearing premise
Optimal transport can reliably locate corresponding regions between source and target datasets in CNN activation space so that the source ranking transfers usefully.
What would settle it
Running the method on a new UAV collection where it retrieves substantially fewer than 80 percent of the animals after labeling 0.5 percent of the data while a standard uncertainty baseline performs better would falsify the central performance claim.
Figures
read the original abstract
We present an Active Learning (AL) strategy for re-using a deep Convolutional Neural Network (CNN)-based object detector on a new dataset. This is of particular interest for wildlife conservation: given a set of images acquired with an Unmanned Aerial Vehicle (UAV) and manually labeled gound truth, our goal is to train an animal detector that can be re-used for repeated acquisitions, e.g. in follow-up years. Domain shifts between datasets typically prevent such a direct model application. We thus propose to bridge this gap using AL and introduce a new criterion called Transfer Sampling (TS). TS uses Optimal Transport to find corresponding regions between the source and the target datasets in the space of CNN activations. The CNN scores in the source dataset are used to rank the samples according to their likelihood of being animals, and this ranking is transferred to the target dataset. Unlike conventional AL criteria that exploit model uncertainty, TS focuses on very confident samples, thus allowing a quick retrieval of true positives in the target dataset, where positives are typically extremely rare and difficult to find by visual inspection. We extend TS with a new window cropping strategy that further accelerates sample retrieval. Our experiments show that with both strategies combined, less than half a percent of oracle-provided labels are enough to find almost 80% of the animals in challenging sets of UAV images, beating all baselines by a margin.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Transfer Sampling (TS), an active learning criterion that applies Optimal Transport in CNN activation space to transfer animal-likelihood rankings from a labeled source dataset to an unlabeled target UAV dataset. Combined with a window-cropping strategy, the method claims that <0.5% oracle labels suffice to retrieve ~80% of animals on challenging target sets while outperforming uncertainty-based and other baselines.
Significance. If the reported performance holds under scrutiny, the result would be significant for wildlife-monitoring applications: it shows that ranking transfer via OT can locate rare positives far more efficiently than standard AL in extreme class-imbalance settings, potentially reducing annotation effort by two orders of magnitude for repeated UAV surveys.
major comments (2)
- [§3] §3 (Transfer Sampling): the central claim that OT reliably transfers source rankings rests on the untested assumption that CNN activations preserve semantic correspondence across domain shift; no ablation on layer choice, OT regularization, or failure cases is provided, making it impossible to assess when the transfer succeeds or breaks.
- [§4] §4 (Experiments): the headline result (80% recall at <0.5% labels) is presented without dataset statistics (number of images, positive rate, animal species), exact baseline re-implementations, or variance across runs, so the margin over baselines cannot be verified as robust.
minor comments (2)
- [Abstract] Abstract: 'gound truth' is a typo for 'ground truth'.
- [§3] Notation for the OT cost matrix and the transferred ranking function should be defined explicitly with an equation rather than described only in prose.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving the presentation and analysis of Transfer Sampling. We address each point below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [§3] §3 (Transfer Sampling): the central claim that OT reliably transfers source rankings rests on the untested assumption that CNN activations preserve semantic correspondence across domain shift; no ablation on layer choice, OT regularization, or failure cases is provided, making it impossible to assess when the transfer succeeds or breaks.
Authors: We agree that additional analysis would strengthen the claims. While the empirical success of TS across multiple UAV datasets with varying domain shifts provides supporting evidence for semantic correspondence in CNN activations, we will add ablations on layer selection (e.g., comparing conv4 vs. conv5 features), OT regularization parameters (entropy regularization strength), and a discussion of failure modes (e.g., when source-target domain shift exceeds a certain threshold in activation space). These will be included in a revised §3. revision: yes
-
Referee: [§4] §4 (Experiments): the headline result (80% recall at <0.5% labels) is presented without dataset statistics (number of images, positive rate, animal species), exact baseline re-implementations, or variance across runs, so the margin over baselines cannot be verified as robust.
Authors: We will revise §4 to include a dedicated table with full dataset statistics (image counts, positive rates per dataset, species composition), precise descriptions of baseline re-implementations (including hyperparameter choices and code references where possible), and error bars or standard deviations from multiple runs (e.g., 5 seeds) to demonstrate statistical robustness of the performance margins. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central contribution is an empirical Active Learning method (Transfer Sampling via Optimal Transport) whose performance claims are tied to experimental results on UAV datasets rather than any derivation that reduces to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The approach relies on external components (pre-trained CNNs, Optimal Transport) and reports measured retrieval rates, making the argument self-contained against external benchmarks with no quoted steps that collapse by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Optimal Transport can map regions between source and target CNN activations effectively for ranking transfer
Reference graph
Works this paper leans on
-
[1]
Unmanned aerial vehicles (UA Vs) for surveying Marine Fauna: A dugong case study,
A. Hodgson, N. Kelly, and D. Peel, “Unmanned aerial vehicles (UA Vs) for surveying Marine Fauna: A dugong case study,” PLoS One, vol. 8, no. 11, pp. 1–15, 2013
work page 2013
-
[2]
Spotting East African mammals in open savannah from space,
Z. Yang, T. Wang, A. K. Skidmore, J. De Leeuw, M. Y . Said, and J. Freer, “Spotting East African mammals in open savannah from space,” PLoS One, vol. 9, no. 12, pp. 1–16, 2014
work page 2014
-
[3]
P. Bayliss and K. Yeomans, “Distribution and abundance of feral livestock in the ’top end’ of the northern territory (1985-86), and their relation to population control.” Wildlife Research, vol. 16, no. 6, pp. 651–676, 1989
work page 1985
-
[4]
Norton-Griffiths, Counting animals
M. Norton-Griffiths, Counting animals. Serengeti Ecological Monitoring Programme, African Wildlife Leadership Foundation, 1978, no. 1
work page 1978
-
[5]
J. Linchant, J. Lisein, J. Semeki, P. Lejeune, and C. Vermeulen, “Are unmanned aircraft systems (UASs) the future of wildlife monitoring? A review of accomplishments and challenges,” Mammal Review, vol. 45, no. 4, pp. 239–252, 2015
work page 2015
-
[6]
Drones count wildlife more accurately and precisely than humans,
J. C. Hodgson, R. Mott, S. M. Baylis, T. T. Pham, S. Wotherspoon, A. D. Kilpatrick, R. Raja Segaran, I. Reid, A. Terauds, and L. P. Koh, “Drones count wildlife more accurately and precisely than humans,” Methods in Ecology and Evolution , vol. 2018, no. December 2017, pp. 1–8, 2018
work page 2018
-
[7]
Detecting animals in African Savanna with UA Vs and the crowds,
N. Rey, M. V olpi, S. Joost, and D. Tuia, “Detecting animals in African Savanna with UA Vs and the crowds,” Remote Sensing of Environment , vol. 200, pp. 341–351, 2017
work page 2017
-
[8]
B. Kellenberger, D. Marcos, and D. Tuia, “Detecting mammals in UA V images: Best practices to address a substantially imbalanced dataset with deep learning,” Remote Sensing of Environment , vol. 216, pp. 139–153, 2018
work page 2018
-
[9]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in Neural Information Processing Systems (NIPS) , vol. 28, pp. 1–10, 2015
work page 2015
-
[10]
You Only Look Once: Unified, Real-Time Object Detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016
work page 2016
-
[11]
Yolo9000: better, faster, stronger,
J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2017
work page 2017
-
[12]
ImageNet Classification with Deep Convolutional Neural Networks,
A. Krizhevsky, I. Sulskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS) , pp. 1–9, 2012
work page 2012
-
[13]
Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015
work page 2015
-
[14]
Domain adaptation for the classification of remote sensing data: An overview of recent advances,
D. Tuia, C. Persello, and L. Bruzzone, “Domain adaptation for the classification of remote sensing data: An overview of recent advances,” IEEE Geoscience and Remote Sensing Magazine , vol. 4, no. 2, pp. 41–57, 2016
work page 2016
-
[15]
A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification,
D. Tuia, M. V olpi, L. Copa, M. Kanevski, and J. Munoz-Mari, “A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification,” IEEE Journal of Selected Topics in Signal Processing , vol. 5, no. 3, pp. 606–617, 2011
work page 2011
-
[16]
B. Settles, “Active learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 6, no. 1, pp. 1–114, 2012
work page 2012
-
[17]
Active learning to recognize multiple types of plankton,
T. Luo, K. Kramer, D. B. Goldgof, L. O. Hall, S. Samson, A. Remsen, and T. Hopkins, “Active learning to recognize multiple types of plankton,” Journal of Machine Learning Research (JMLR) , vol. 6, pp. 589–613, 2005
work page 2005
-
[18]
Less is more: Active learning with support vector machines,
G. Schohn and D. Cohn, “Less is more: Active learning with support vector machines,” in International Conference on Machine Learning (ICML) , 2000, pp. 839–846
work page 2000
-
[19]
Maximizing expected model change for active learning in regression,
W. Cai, Y . Zhang, and J. Zhou, “Maximizing expected model change for active learning in regression,” in IEEE International Conference on Data Mining (ICDM), 2013, pp. 51–60
work page 2013
-
[20]
Deep bayesian active learning with image data,
Y . Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” Advances in Neural Information Processing Systems (NIPS) workshop, 2017
work page 2017
-
[21]
One-shot Learning with Memory-Augmented Neural Networks
A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “One-shot learning with memory-augmented neural networks,” arXiv preprint arXiv:1605.06065, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[22]
Sinkhorn distances: Lightspeed computation of optimal transport,
M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in Neural Information Processing Systems (NIPS) , 2013, pp. 2292–2300
work page 2013
-
[23]
Visualizing high-dimensional data using t-sne,
L. J. P. Van Der Maaten and G. E. Hinton, “Visualizing high-dimensional data using t-sne,” Journal of Machine Learning Research (JMLR) , vol. 9, pp. 2579–2605, 2008
work page 2008
-
[24]
C. Cortes and V . Vapnik, “Support vector machine,” Machine Learning (ML) , vol. 20, no. 3, pp. 273–297, 1995
work page 1995
-
[25]
Optimal transport for domain adaptation,
N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 39, no. 9, pp. 1853–1865, 2017
work page 2017
-
[26]
F. Ofli, P. Meier, M. Imran, C. Castillo, D. Tuia, N. Rey, J. Briant, P. Millet, F. Reinhard, M. Parkan et al., “Combining human computing and machine learning to make sense of big (aerial) data for disaster response,” Big Data, vol. 4, no. 1, pp. 47–59, 2016
work page 2016
-
[27]
Deep Residual Learning for Image Recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 7, no. 3, pp. 171–180, 2015
work page 2015
-
[28]
ImageNet Large Scale Visual Recognition Challenge,
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV) , vol. 115, no. 3, pp. 211–252, 2015
work page 2015
-
[29]
Instance Normalization: The Missing Ingredient for Fast Stylization
D. Ulyanov and A. Vedaldi, “Instance Normalization: The Missing Ingredient for Fast Stylization,” arXiv preprint arXiv:1607.08022v3 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[30]
Localization-Aware Active Learning for Object Detection,
C.-C. Kao, T.-Y . Lee, P. Sen, and M.-Y . Liu, “Localization-Aware Active Learning for Object Detection,” Asian Conference on Computer Vision (ACCV), 2018
work page 2018
-
[31]
Learning user’s confidence for active learning,
D. Tuia and J. Munoz-Mari, “Learning user’s confidence for active learning,” IEEE Transactions on Geoscience and Remote Sensing (TGRS) , vol. 51, no. 2, pp. 872–880, 2013
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.