pith. sign in

arxiv: 2607.00804 · v1 · pith:UVRPSH3Dnew · submitted 2026-07-01 · 💻 cs.CV

Spotted: Location-informed Reidentification of Hyenas and Leopards in Camera Trap Surveys

Pith reviewed 2026-07-02 14:01 UTC · model grok-4.3

classification 💻 cs.CV
keywords animal re-identificationcamera trapsspatio-temporal constraintshuman-in-the-loophyenaleopardcomputer vision
0
0 comments X

The pith

Integrating camera-trap locations with visual similarity improves re-identification of spotted hyenas and leopards while reducing expert comparisons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that re-identification of individual animals from camera-trap photos stays difficult because of poor image quality, varying light, and uneven numbers of photos per animal. Most prior work uses only the pictures themselves. This work adds the known positions and times of the cameras to create a feasibility score based on the slowest speed an animal would need to travel between two sites. That score supplies pseudo-labels to adapt a visual model, then combines with the visual scores for final matches. An active sampling step further limits how many pairs a human expert must inspect. On three new datasets the approach raises top-5 accuracy and cuts the review load by up to 69 percentage points.

Core claim

Spotted computes a feasibility score from the minimum travel speed required for two detections to belong to the same individual, uses these scores as pseudo-supervision to train a lightweight head on a frozen visual foundation model, fuses the adapted visual similarity with the feasibility score into a pairwise matching score, and applies an active pair sampling strategy that prioritizes uncertain predictions for human review.

What carries the argument

The minimum-travel-speed feasibility score derived from camera locations, used both as pseudo-supervision for model adaptation and as a direct term in the fused matching score.

Load-bearing premise

That the minimum travel speed between camera sites gives a reliable signal for whether two sightings can belong to one animal.

What would settle it

On a new camera-trap dataset, adding the feasibility score produces no gain in top-5 accuracy or the active sampler returns fewer positive matches than random sampling.

Figures

Figures reproduced from arXiv: 2607.00804 by Abhinav Valada, Andrea Sibanda, Andrew Loveridge, Bob Mandinyenya, Daniele De Martini, Halil Sina Kelebek, Julia Hindel, Justin Seymour-Smith, Kobus Hoffman, Kudakwashe Ncube, Lauren Hoffman, Matthew Wijers.

Figure 1
Figure 1. Figure 1: Overview of the proposed Spotted framework. Camera-trap images are encoded using a frozen ReID backbone and refined via a network head trained on location-derived pseudo-labels. The adapted visual similarities are fused with spatio-temporal feasibility scores and used within an active sampling loop for efficient human-in-the-loop labeling. same individual. As such, an excessively high minimum speed indicat… view at source ↗
Figure 2
Figure 2. Figure 2: Quantitative and qualitative overview of our three animal ReID datasets. (a–c) Sighting frequency distributions for LeopardID102, SpottedHyenaID109, and SpottedHyenaID415 datasets, highlighting the long-tailed imbalance in observations per individual. (d) Sample images illustrating our preprocessing pipeline for each dataset. of this backbone, we train a lightweight MLP fθ consisting of a down-projection t… view at source ↗
Figure 3
Figure 3. Figure 3: Spatio-temporal feasibility score as a function of normalised distance ∆x˜. shown for multiple values of ∆t˜. Larger ∆t˜ results in slower decay with distance, reflecting increased tolerance to spatial displacement over longer time intervals. feasibility pairs provide a stronger constraint than high feasibility pairs which do not guarantee identity correspondence. LWMSE = λp |P+| ∑ (i, j)∈P+ (Sim(i, j)−1) … view at source ↗
Figure 4
Figure 4. Figure 4: Top-k accuracy curves for our three datasets. The top-k accuracy is computed as the proportion of query images for which at-least one true match is retrieved within the top-k highest scoring image pairs. (a-c) MegaDescriptor is used as the frozen image backbone. (d-f) MiewID is used as the frozen image backbone. We evaluate our approach against three baselines, HotSpotter5 , MegaDescriptor11, and MiewID10.… view at source ↗
Figure 5
Figure 5. Figure 5: Similarity matrices between all pairs of images within the SpottedHyenaID109 dataset. The data has been ordered by animal id [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cumulative number of confirmed positive matches discovered as a function of annotation queries for our three datasets. We compare greedy sampling against our SPAL-inspired active sampling with and without the confidence sampling step. MegaDescriptor, MiewID and our Spotted similarities are compared. (a-c) MegaDescriptor (MD) is used as the frozen image backbone. (d-f) MiewID (MI) is used as the frozen imag… view at source ↗
Figure 7
Figure 7. Figure 7: Number of remaining unlabelled image pairs as a function of annotation queries for our three datasets. (a-c) MegaDescriptor (MD) is used as the frozen image backbone. (d-f) MiewID (MI) is used as the frozen image backbone. Faster reduction indicates improved annotation efficiency. highlighting that small improvements in ranking quality can lead to considerable gains in annotation efficiency. The impact of … view at source ↗
read the original abstract

Animal re-identification (ReID) in camera-trap surveys remains challenging due to low image quality, strong variation in illumination and viewpoint, and highly imbalanced numbers of observations per individual. As a result, current ReID performance is often insufficient for fully automated use, and practical workflows typically depend on expert review of algorithmically proposed candidate matches. Moreover, most existing approaches focus almost exclusively on visual cues and overlook auxiliary information routinely available in field studies, such as image timestamps and camera-trap locations. We introduce Spotted, a location-informed, human-in-the-loop animal ReID framework that integrates visual similarity with spatio-temporal feasibility priors derived from camera locations, thereby reducing the amount of required expert review. Our method (i) computes an image-model-agnostic feasibility score based on the minimum travel speed required for two detections to correspond to the same individual, (ii) uses these feasibility cues as pseudo-supervision to train a lightweight head on top of a frozen visual foundation model, and (iii) fuses adapted visual similarity with spatio-temporal feasibility to obtain a robust pairwise matching score. We additionally integrate an active pair sampling strategy to accelerate annotation by initially prioritizing uncertain predictions. We evaluate Spotted on three challenging camera-trap ReID datasets comprised of spotted hyenas and leopards, which we release as part of this work. Our model improves average top-5 identification accuracy by 9pp, 2pp and 9pp over the best baseline on our LeopardID102, SpottedHyenaID109 and SpottedHyenaID415 datasets, respectively. Further, we show that our human-in-the-loop strategy reduces the number of queried comparisons by up to 69pp while achieving equivalent positive matches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Spotted, a location-informed human-in-the-loop framework for animal re-identification (ReID) in camera-trap surveys. It computes a spatio-temporal feasibility score from camera locations and minimum travel speed, uses this as pseudo-supervision to train a lightweight adaptation head on a frozen visual foundation model, fuses the adapted visual similarity with the feasibility prior for pairwise matching, and applies active pair sampling to prioritize uncertain predictions for expert review. The method is evaluated on three new released datasets (LeopardID102, SpottedHyenaID109, SpottedHyenaID415), claiming average top-5 accuracy gains of 9pp, 2pp, and 9pp over the best baseline, plus up to 69pp reduction in queried comparisons while maintaining equivalent positive matches.

Significance. If the reported gains are robust, the work has practical significance for scaling camera-trap surveys by reducing expert annotation burden through integration of routinely available location data. Dataset release is a clear strength. The human-in-the-loop active sampling component is a positive contribution if it preserves recall. However, the central adaptation mechanism rests on an unverified assumption about pseudo-supervision quality, which limits the strength of the significance claim until addressed.

major comments (2)
  1. [§3.2] §3.2 (Pseudo-supervision and adaptation): The manuscript does not report the precision of the feasibility-based pseudo-positive pairs used to train the lightweight head. Because feasible pairs (based on minimum travel speed) are expected to be overwhelmingly negatives in multi-individual data, low precision would mean the adaptation step primarily injects noise; the reported 2–9 pp gains and the human-in-the-loop savings would then depend almost entirely on the subsequent fusion step rather than the claimed adaptation. An ablation or precision estimate on held-out labeled pairs is required to substantiate the load-bearing claim.
  2. [§4] §4 (Experiments): No details are provided on how the three datasets were split for training the adaptation head versus evaluation, nor on whether the feasibility pseudo-labels were generated only on training data or leaked into test pairs. This directly affects whether the top-5 improvements can be attributed to the method rather than data leakage or optimistic evaluation.
minor comments (2)
  1. [Abstract] Abstract and §1: The claimed improvements are stated without any mention of the number of individuals, images per individual, or baseline methods used; this makes the quantitative claims hard to contextualize without reading the full experimental section.
  2. [§3.3] §3.3 (Fusion): The exact form of the fused score (e.g., weighted sum, product, learned combination) is not specified with an equation; readers cannot reproduce the matching score without this detail.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for additional validation of the pseudo-supervision mechanism and clearer experimental protocols. We address each major comment below and will revise the manuscript to incorporate the requested analyses and details.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Pseudo-supervision and adaptation): The manuscript does not report the precision of the feasibility-based pseudo-positive pairs used to train the lightweight head. Because feasible pairs (based on minimum travel speed) are expected to be overwhelmingly negatives in multi-individual data, low precision would mean the adaptation step primarily injects noise; the reported 2–9 pp gains and the human-in-the-loop savings would then depend almost entirely on the subsequent fusion step rather than the claimed adaptation. An ablation or precision estimate on held-out labeled pairs is required to substantiate the load-bearing claim.

    Authors: We agree that quantifying the precision of feasibility-based pseudo-positive pairs is necessary to substantiate the adaptation step. The current manuscript does not report this metric or include the requested ablation. In the revision we will compute precision on held-out labeled pairs from each dataset and add an ablation comparing performance with and without the adaptation head (i.e., fusion alone). This will clarify the relative contributions of adaptation versus fusion to the observed gains. revision: yes

  2. Referee: [§4] §4 (Experiments): No details are provided on how the three datasets were split for training the adaptation head versus evaluation, nor on whether the feasibility pseudo-labels were generated only on training data or leaked into test pairs. This directly affects whether the top-5 improvements can be attributed to the method rather than data leakage or optimistic evaluation.

    Authors: We acknowledge the omission of split and leakage details. The manuscript currently provides no explicit description of how training versus evaluation partitions were formed or whether pseudo-label generation was restricted to training data. We will expand §4 to specify the splitting strategy (by individual identity with temporal separation where applicable) and explicitly state that feasibility scores used for pseudo-supervision were computed solely on the training partition, with no access to test pairs. This will confirm that the reported improvements are not due to leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its ReID improvements from an independent spatio-temporal feasibility score computed directly from camera locations and timestamps (minimum travel speed between sites), which serves as external pseudo-supervision for a lightweight adaptation head on a frozen visual model. This input is model-agnostic and not derived from visual features or prior model outputs. Reported gains are empirical comparisons on released datasets against baselines, with no self-citations, fitted parameters renamed as predictions, or uniqueness theorems invoked. The human-in-the-loop sampling and fusion steps operate on these external priors without reducing the central claims to self-definition or construction from the target outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; detailed parameters and assumptions not extractable. The feasibility score relies on a domain assumption about animal movement speeds.

free parameters (1)
  • minimum travel speed threshold
    Used to compute feasibility score from camera locations and timestamps; specific value or fitting process not specified in abstract.
axioms (1)
  • domain assumption The minimum travel speed between two camera locations can be used as a proxy for whether two detections are the same individual
    Central to the feasibility score computation mentioned in the abstract.

pith-pipeline@v0.9.1-grok · 5897 in / 1316 out tokens · 50265 ms · 2026-07-02T14:01:36.879979+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    J., Flaherty, E

    Delisle, Z. J., Flaherty, E. A., Nobbe, M. R., Wzientek, C. M. & Swihart, R. K. Next-generation camera trapping: Systematic review of historic trends suggests keys to expanded research applications in ecology and conservation.Front. Ecol. Evol.V olume 9 - 2021, DOI: 10.3389/fevo.2021.617996 (2021)

  2. [2]

    & Picek, L

    Adam, L., ˇCerm´ak, V ., Papafitsoros, K. & Picek, L. Seaturtleid2022: A long-span dataset for reliable sea turtle re- identification. InProceedings of the IEEE/CVF winter conference on applications of computer vision, 7146–7156 (2024)

  3. [3]

    & Picek, L

    Adam, L., ˇCerm´ak, V ., Papafitsoros, K. & Picek, L. Wildlifereid-10k: Wildlife re-identification dataset with 10k individual animals. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2090–2100 (IEEE, 2025)

  4. [4]

    Powell, R. A. & Mitchell, M. S. What is a home range?J. Mammal.93, 948–958, DOI: 10.1644/11-MAMM-S-177.1 (2012). https://academic.oup.com/jmammal/article-pdf/93/4/948/2899018/93-4-948.pdf

  5. [5]

    Distinctive image features from scale-invariant keypoints,

    Crall, J. P., Stewart, C. V ., Berger-Wolf, T. Y ., Rubenstein, D. I. & Sundaresan, S. R. Hotspotter—patterned species instance recognition. In2013 IEEE workshop on applications of computer vision (WACV), 230–237 (IEEE, 2013). 6.Lowe, D. G. Distinctive image features from scale-invariant keypoints.Int. J. Comput. Vis.60, 91–110, DOI: 10.1023/B: VISI.00000...

  6. [6]

    Guo, S.et al.Automatic identification of individual primates with deep learning techniques.iScience23, 101412, DOI: https://doi.org/10.1016/j.isci.2020.101412 (2020)

  7. [7]

    & Fua, P

    Yu, Y ., Vidit, V ., Davydov, A., Engilberge, M. & Fua, P. Addressing the elephant in the room: Robust animal re-identification with unsupervised part-based feature alignment.arXiv preprint arXiv:2405.13781(2024)

  8. [8]

    InProceedings of the IEEE/CVF International Conference on Computer Vision, 14369–14379 (2025)

    Hou, S.et al.Openanimals: Revisiting person re-identification for animals towards better generalization. InProceedings of the IEEE/CVF International Conference on Computer Vision, 14369–14379 (2025)

  9. [9]

    & Stewart, C

    Otarashvili, L., Subramanian, T., Holmberg, J., Levenson, J. & Stewart, C. V . Multispecies animal re-id using a large community-curated dataset.arXiv preprint arXiv:2412.05602(2024)

  10. [10]

    & Papafitsoros, K

    ˇCerm´ak, V ., Picek, L., Adam, L. & Papafitsoros, K. WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5953–5963 (2024)

  11. [11]

    InEuropean conference on computer vision, 198–214 (Springer, 2022)

    Zhu, K.et al.Pass: Part-aware self-supervised pre-training for person re-identification. InEuropean conference on computer vision, 198–214 (Springer, 2022)

  12. [12]

    & Baktashmotlagh, M

    Moskvyak, O., Maire, F., Dayoub, F. & Baktashmotlagh, M. Keypoint-aligned embeddings for image retrieval and re-identification. InProceedings of the IEEE/CVF winter conference on applications of computer vision, 676–685 (2021)

  13. [13]

    & Baktashmotlagh, M

    Moskvyak, O., Maire, F., Dayoub, F. & Baktashmotlagh, M. Learning landmark guided embeddings for animal re- identification. InProceedings of the IEEE/CVF winter conference on applications of computer vision workshops, 12–19 (2020)

  14. [14]

    InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops(2019)

    Shukla, A.et al.A hybrid approach to tiger re-identification. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops(2019)

  15. [15]

    & Weinberger, K

    Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (2017)

  16. [16]

    InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022 (2021)

    Liu, Z.et al.Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022 (2021). 11/13

  17. [17]

    & Zafeiriou, S

    Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019)

  18. [18]

    Tan, M. & Le, Q. Efficientnetv2: Smaller models and faster training. In Meila, M. & Zhang, T. (eds.)Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Research, 10096–10106 (PMLR, 2021)

  19. [19]

    Li, Y .et al.Metawild: A multimodal dataset for animal re-identification with environmental metadata. InProceedings of the 33rd ACM International Conference on Multimedia, MM ’25, 13009–13015, DOI: 10.1145/3746027.3758249 (Association for Computing Machinery, New York, NY , USA, 2025)

  20. [20]

    Kays, R., Hody, A., Jachowski, D. S. & Parsons, A. W. Empirical evaluation of the spatial scale and detection process of camera trap surveys.Mov. Ecol.9, 41, DOI: 10.1186/s40462-021-00277-3 (2021)

  21. [21]

    & Perona, P

    Mac Aodha, O., Cole, E. & Perona, P. Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/cvf international conference on computer vision, 9596–9606 (2019)

  22. [22]

    & Matas, J

    Picek, L., Neumann, L. & Matas, J. Animal identification with independent foreground and background modeling. In Cremers, D.et al.(eds.)Pattern Recognition, 241–257 (Springer Nature Switzerland, Cham, 2025)

  23. [23]

    J., Abu Baker, M., Tang, C

    Rosa, M. J., Abu Baker, M., Tang, C. M. & Littlemore, J. Gcn-id: A benchmark dataset for great crested newt re- identification using ai foundation models. InProceedings of the BMVC(2025)

  24. [24]

    & Lin, W

    Li, S., Li, J., Tang, H., Qian, R. & Lin, W. Atrw: A benchmark for amur tiger re-identification in the wild. InProceedings of the 28th ACM International Conference on Multimedia, MM ’20, 2590–2598, DOI: 10.1145/3394171.3413569 (Association for Computing Machinery, New York, NY , USA, 2020)

  25. [25]

    Panthera pardus csv custom export (2022)

    Botswana Predator Conservation Trust. Panthera pardus csv custom export (2022). Retrieved from African Carnivore Wildbook on 2022-04-28

  26. [26]

    W., Linquist, S

    Schneider, S., Taylor, G. W., Linquist, S. & Kremer, S. C. Past, present and future approaches using computer vision for animal re-identification from camera trap data.Methods Ecol. Evol.10, 461–470 (2019). 28.Osner, N. TrapTagger

  27. [27]

    & Anand, S

    Sani, D., Khurana, M. & Anand, S. Active learning for animal re-identification with ambiguity-aware sampling (2025). 2511.06658

  28. [28]

    S.et al.A deep active learning system for species identification and counting in camera trap images

    Norouzzadeh, M. S.et al.A deep active learning system for species identification and counting in camera trap images. Methods Ecol. Evol.12, 150–161, DOI: https://doi.org/10.1111/2041-210X.13504 (2021). https://besjournals.onlinelibrary. wiley.com/doi/pdf/10.1111/2041-210X.13504. 31.Jin, D. & Li, M. Towards fewer labels: Support pair active learning for pe...

  29. [29]

    & Tao, D

    Liu, Z., Wang, J., Gong, S., Lu, H. & Tao, D. Deep reinforcement active learning for human-in-the-loop person re-identification. InProceedings of the IEEE/CVF international conference on computer vision, 6122–6131 (2019)

  30. [30]

    K., Jin, R

    Mallapragada, P. K., Jin, R. & Jain, A. K. Active query selection for semi-supervised clustering. In2008 19th International Conference on Pattern Recognition, 1–4, DOI: 10.1109/ICPR.2008.4761792 (2008)

  31. [31]

    & Shi, Z

    Zhao, W., He, Q., Ma, H. & Shi, Z. Effective semi-supervised document clustering via active learning with instance-level constraints.Knowl. Inf. Syst.30, 569–587, DOI: 10.1007/s10115-011-0389-1 (2012)

  32. [32]

    Miao, Z.et al.Iterative human and automated identification of wildlife images.Nat. Mach. Intell.3, 885–895, DOI: 10.1038/s42256-021-00393-0 (2021)

  33. [33]

    Bodesheim, P.et al.Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research—individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes.Mammalian Biol.102, 875–897, D...

  34. [34]

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    Perez, G., Sheldon, D., Van Horn, G. & Maji, S. Human-in-the-loop visual re-id for population size estimation. In European Conference on Computer Vision (ECCV)(2024). 38.Ren, T.et al.Grounded sam: Assembling open-world models for diverse visual tasks (2024). 2401.14159. Acknowledgements All figures were prepared by H.S.K. with assistance from J.H. 12/13 A...