pith. sign in

arxiv: 1907.07319 · v1 · pith:332YBGYXnew · submitted 2019-07-17 · 💻 cs.CV

Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning

Pith reviewed 2026-05-24 20:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords active learningobject detectionUAV imageryanimal detectionoptimal transporttransfer samplingwildlife monitoringdeep CNN
0
0 comments X

The pith

Less than half a percent of new labels recovers nearly 80 percent of animals in fresh UAV images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an active learning method to adapt a pretrained CNN animal detector to new UAV image sets that differ in appearance or location. It introduces Transfer Sampling, which aligns regions across source and target datasets via optimal transport on CNN features and carries over the source ranking of high-confidence animal detections to prioritize labeling in the target. A window-cropping step further speeds up the process by focusing on likely positive patches. This combination lets the system locate most animals while querying an oracle for labels on under 0.5 percent of the data, far fewer than uncertainty-driven baselines require. Readers would care because repeated UAV wildlife surveys become feasible without repeating the full labeling burden each season.

Core claim

Transfer Sampling uses optimal transport to locate corresponding regions between source and target datasets inside the space of CNN activations; the source dataset's animal-likelihood scores then rank target samples so that an active learning loop can retrieve almost 80 percent of the animals after an oracle supplies labels for fewer than 0.5 percent of the candidates, outperforming conventional uncertainty-based selection criteria.

What carries the argument

Transfer Sampling (TS), which applies optimal transport to match CNN activation regions and transfers source confidence rankings to the target dataset.

If this is right

  • A detector trained on one year's UAV survey can be reused on later surveys of the same area with only minimal new labeling.
  • When positive examples are extremely rare, selecting the most confident rather than the most uncertain samples yields faster retrieval of true positives.
  • The window cropping extension reduces the number of full images an oracle must inspect before most animals are found.
  • The same pipeline beats uncertainty sampling and random selection on the reported UAV animal datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar transfer-ranking ideas could reduce labeling needs in other remote-sensing tasks where objects of interest are sparse.
  • If the optimal-transport alignment proves stable across more domain shifts, the method might extend to non-UAV imagery with comparable rarity of positives.
  • Testing the approach on datasets with different sensor resolutions or animal species would reveal how far the ranking transfer generalizes without retraining.

Load-bearing premise

Optimal transport can reliably locate corresponding regions between source and target datasets in CNN activation space so that the source ranking transfers usefully.

What would settle it

Running the method on a new UAV collection where it retrieves substantially fewer than 80 percent of the animals after labeling 0.5 percent of the data while a standard uncertainty baseline performs better would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1907.07319 by Benjamin Kellenberger, Devis Tuia, Diego Marcos, Sylvain Lobry.

Figure 1
Figure 1. Figure 1: Examples from the Kuzikus dataset (see Section III-A) from 2014 (left) and 2015 (right). It is trivial for humans to [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed workflow. We first predict candidates in the source dataset using the original, unadapted CNN [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Source dataset candidates with animal confidence of 0.1 or more, projected using t-SNE [23]. Blue samples were [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A subset of predicted locations in the source (left) and target (right) domain training sets. Blue samples were predicted [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Patch of a target image (left) and all candidates predicted by the source CNN in it (right). By cropping a larger patch [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: For window cropping, we address different scenarios to maximize the query gain: in the first situation (left), we place [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Basic architecture for our animal detector, following [8]. We employ the main blocks of a ResNet-18 pretrained on [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Cumulative number of animals found over the AL iterations. Solid lines denote the criteria performances with model [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Prediction examples for the TS (top) and max confidence (bottom) strategies after five AL iterations (250 queries), [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example image from the target training set with annotations after five AL iterations, showing all selected patches [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

We present an Active Learning (AL) strategy for re-using a deep Convolutional Neural Network (CNN)-based object detector on a new dataset. This is of particular interest for wildlife conservation: given a set of images acquired with an Unmanned Aerial Vehicle (UAV) and manually labeled gound truth, our goal is to train an animal detector that can be re-used for repeated acquisitions, e.g. in follow-up years. Domain shifts between datasets typically prevent such a direct model application. We thus propose to bridge this gap using AL and introduce a new criterion called Transfer Sampling (TS). TS uses Optimal Transport to find corresponding regions between the source and the target datasets in the space of CNN activations. The CNN scores in the source dataset are used to rank the samples according to their likelihood of being animals, and this ranking is transferred to the target dataset. Unlike conventional AL criteria that exploit model uncertainty, TS focuses on very confident samples, thus allowing a quick retrieval of true positives in the target dataset, where positives are typically extremely rare and difficult to find by visual inspection. We extend TS with a new window cropping strategy that further accelerates sample retrieval. Our experiments show that with both strategies combined, less than half a percent of oracle-provided labels are enough to find almost 80% of the animals in challenging sets of UAV images, beating all baselines by a margin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Transfer Sampling (TS), an active learning criterion that applies Optimal Transport in CNN activation space to transfer animal-likelihood rankings from a labeled source dataset to an unlabeled target UAV dataset. Combined with a window-cropping strategy, the method claims that <0.5% oracle labels suffice to retrieve ~80% of animals on challenging target sets while outperforming uncertainty-based and other baselines.

Significance. If the reported performance holds under scrutiny, the result would be significant for wildlife-monitoring applications: it shows that ranking transfer via OT can locate rare positives far more efficiently than standard AL in extreme class-imbalance settings, potentially reducing annotation effort by two orders of magnitude for repeated UAV surveys.

major comments (2)
  1. [§3] §3 (Transfer Sampling): the central claim that OT reliably transfers source rankings rests on the untested assumption that CNN activations preserve semantic correspondence across domain shift; no ablation on layer choice, OT regularization, or failure cases is provided, making it impossible to assess when the transfer succeeds or breaks.
  2. [§4] §4 (Experiments): the headline result (80% recall at <0.5% labels) is presented without dataset statistics (number of images, positive rate, animal species), exact baseline re-implementations, or variance across runs, so the margin over baselines cannot be verified as robust.
minor comments (2)
  1. [Abstract] Abstract: 'gound truth' is a typo for 'ground truth'.
  2. [§3] Notation for the OT cost matrix and the transferred ranking function should be defined explicitly with an equation rather than described only in prose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving the presentation and analysis of Transfer Sampling. We address each point below and will incorporate revisions accordingly.

read point-by-point responses
  1. Referee: [§3] §3 (Transfer Sampling): the central claim that OT reliably transfers source rankings rests on the untested assumption that CNN activations preserve semantic correspondence across domain shift; no ablation on layer choice, OT regularization, or failure cases is provided, making it impossible to assess when the transfer succeeds or breaks.

    Authors: We agree that additional analysis would strengthen the claims. While the empirical success of TS across multiple UAV datasets with varying domain shifts provides supporting evidence for semantic correspondence in CNN activations, we will add ablations on layer selection (e.g., comparing conv4 vs. conv5 features), OT regularization parameters (entropy regularization strength), and a discussion of failure modes (e.g., when source-target domain shift exceeds a certain threshold in activation space). These will be included in a revised §3. revision: yes

  2. Referee: [§4] §4 (Experiments): the headline result (80% recall at <0.5% labels) is presented without dataset statistics (number of images, positive rate, animal species), exact baseline re-implementations, or variance across runs, so the margin over baselines cannot be verified as robust.

    Authors: We will revise §4 to include a dedicated table with full dataset statistics (image counts, positive rates per dataset, species composition), precise descriptions of baseline re-implementations (including hyperparameter choices and code references where possible), and error bars or standard deviations from multiple runs (e.g., 5 seeds) to demonstrate statistical robustness of the performance margins. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central contribution is an empirical Active Learning method (Transfer Sampling via Optimal Transport) whose performance claims are tied to experimental results on UAV datasets rather than any derivation that reduces to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The approach relies on external components (pre-trained CNNs, Optimal Transport) and reports measured retrieval rates, making the argument self-contained against external benchmarks with no quoted steps that collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on domain assumption that OT mappings in CNN activation space preserve useful ranking information from source to target, and that target positives are rare enough for confident-sample focus to work.

axioms (1)
  • domain assumption Optimal Transport can map regions between source and target CNN activations effectively for ranking transfer
    Invoked to bridge domain shift and transfer source scores to target samples.

pith-pipeline@v0.9.0 · 5789 in / 1122 out tokens · 21385 ms · 2026-05-24T20:50:53.387165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Unmanned aerial vehicles (UA Vs) for surveying Marine Fauna: A dugong case study,

    A. Hodgson, N. Kelly, and D. Peel, “Unmanned aerial vehicles (UA Vs) for surveying Marine Fauna: A dugong case study,” PLoS One, vol. 8, no. 11, pp. 1–15, 2013

  2. [2]

    Spotting East African mammals in open savannah from space,

    Z. Yang, T. Wang, A. K. Skidmore, J. De Leeuw, M. Y . Said, and J. Freer, “Spotting East African mammals in open savannah from space,” PLoS One, vol. 9, no. 12, pp. 1–16, 2014

  3. [3]

    Distribution and abundance of feral livestock in the ’top end’ of the northern territory (1985-86), and their relation to population control

    P. Bayliss and K. Yeomans, “Distribution and abundance of feral livestock in the ’top end’ of the northern territory (1985-86), and their relation to population control.” Wildlife Research, vol. 16, no. 6, pp. 651–676, 1989

  4. [4]

    Norton-Griffiths, Counting animals

    M. Norton-Griffiths, Counting animals. Serengeti Ecological Monitoring Programme, African Wildlife Leadership Foundation, 1978, no. 1

  5. [5]

    Are unmanned aircraft systems (UASs) the future of wildlife monitoring? A review of accomplishments and challenges,

    J. Linchant, J. Lisein, J. Semeki, P. Lejeune, and C. Vermeulen, “Are unmanned aircraft systems (UASs) the future of wildlife monitoring? A review of accomplishments and challenges,” Mammal Review, vol. 45, no. 4, pp. 239–252, 2015

  6. [6]

    Drones count wildlife more accurately and precisely than humans,

    J. C. Hodgson, R. Mott, S. M. Baylis, T. T. Pham, S. Wotherspoon, A. D. Kilpatrick, R. Raja Segaran, I. Reid, A. Terauds, and L. P. Koh, “Drones count wildlife more accurately and precisely than humans,” Methods in Ecology and Evolution , vol. 2018, no. December 2017, pp. 1–8, 2018

  7. [7]

    Detecting animals in African Savanna with UA Vs and the crowds,

    N. Rey, M. V olpi, S. Joost, and D. Tuia, “Detecting animals in African Savanna with UA Vs and the crowds,” Remote Sensing of Environment , vol. 200, pp. 341–351, 2017

  8. [8]

    Detecting mammals in UA V images: Best practices to address a substantially imbalanced dataset with deep learning,

    B. Kellenberger, D. Marcos, and D. Tuia, “Detecting mammals in UA V images: Best practices to address a substantially imbalanced dataset with deep learning,” Remote Sensing of Environment , vol. 216, pp. 139–153, 2018

  9. [9]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in Neural Information Processing Systems (NIPS) , vol. 28, pp. 1–10, 2015

  10. [10]

    You Only Look Once: Unified, Real-Time Object Detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016

  11. [11]

    Yolo9000: better, faster, stronger,

    J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2017

  12. [12]

    ImageNet Classification with Deep Convolutional Neural Networks,

    A. Krizhevsky, I. Sulskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS) , pp. 1–9, 2012

  13. [13]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015

  14. [14]

    Domain adaptation for the classification of remote sensing data: An overview of recent advances,

    D. Tuia, C. Persello, and L. Bruzzone, “Domain adaptation for the classification of remote sensing data: An overview of recent advances,” IEEE Geoscience and Remote Sensing Magazine , vol. 4, no. 2, pp. 41–57, 2016

  15. [15]

    A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification,

    D. Tuia, M. V olpi, L. Copa, M. Kanevski, and J. Munoz-Mari, “A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification,” IEEE Journal of Selected Topics in Signal Processing , vol. 5, no. 3, pp. 606–617, 2011

  16. [16]

    Active learning,

    B. Settles, “Active learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 6, no. 1, pp. 1–114, 2012

  17. [17]

    Active learning to recognize multiple types of plankton,

    T. Luo, K. Kramer, D. B. Goldgof, L. O. Hall, S. Samson, A. Remsen, and T. Hopkins, “Active learning to recognize multiple types of plankton,” Journal of Machine Learning Research (JMLR) , vol. 6, pp. 589–613, 2005

  18. [18]

    Less is more: Active learning with support vector machines,

    G. Schohn and D. Cohn, “Less is more: Active learning with support vector machines,” in International Conference on Machine Learning (ICML) , 2000, pp. 839–846

  19. [19]

    Maximizing expected model change for active learning in regression,

    W. Cai, Y . Zhang, and J. Zhou, “Maximizing expected model change for active learning in regression,” in IEEE International Conference on Data Mining (ICDM), 2013, pp. 51–60

  20. [20]

    Deep bayesian active learning with image data,

    Y . Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” Advances in Neural Information Processing Systems (NIPS) workshop, 2017

  21. [21]

    One-shot Learning with Memory-Augmented Neural Networks

    A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “One-shot learning with memory-augmented neural networks,” arXiv preprint arXiv:1605.06065, 2016

  22. [22]

    Sinkhorn distances: Lightspeed computation of optimal transport,

    M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Advances in Neural Information Processing Systems (NIPS) , 2013, pp. 2292–2300

  23. [23]

    Visualizing high-dimensional data using t-sne,

    L. J. P. Van Der Maaten and G. E. Hinton, “Visualizing high-dimensional data using t-sne,” Journal of Machine Learning Research (JMLR) , vol. 9, pp. 2579–2605, 2008

  24. [24]

    Support vector machine,

    C. Cortes and V . Vapnik, “Support vector machine,” Machine Learning (ML) , vol. 20, no. 3, pp. 273–297, 1995

  25. [25]

    Optimal transport for domain adaptation,

    N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 39, no. 9, pp. 1853–1865, 2017

  26. [26]

    Combining human computing and machine learning to make sense of big (aerial) data for disaster response,

    F. Ofli, P. Meier, M. Imran, C. Castillo, D. Tuia, N. Rey, J. Briant, P. Millet, F. Reinhard, M. Parkan et al., “Combining human computing and machine learning to make sense of big (aerial) data for disaster response,” Big Data, vol. 4, no. 1, pp. 47–59, 2016

  27. [27]

    Deep Residual Learning for Image Recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 7, no. 3, pp. 171–180, 2015

  28. [28]

    ImageNet Large Scale Visual Recognition Challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV) , vol. 115, no. 3, pp. 211–252, 2015

  29. [29]

    Instance Normalization: The Missing Ingredient for Fast Stylization

    D. Ulyanov and A. Vedaldi, “Instance Normalization: The Missing Ingredient for Fast Stylization,” arXiv preprint arXiv:1607.08022v3 , 2016

  30. [30]

    Localization-Aware Active Learning for Object Detection,

    C.-C. Kao, T.-Y . Lee, P. Sen, and M.-Y . Liu, “Localization-Aware Active Learning for Object Detection,” Asian Conference on Computer Vision (ACCV), 2018

  31. [31]

    Learning user’s confidence for active learning,

    D. Tuia and J. Munoz-Mari, “Learning user’s confidence for active learning,” IEEE Transactions on Geoscience and Remote Sensing (TGRS) , vol. 51, no. 2, pp. 872–880, 2013