pith. sign in

arxiv: 1907.02642 · v1 · pith:SLKQI3JMnew · submitted 2019-07-03 · 💻 cs.CV · eess.IV

Primate Face Identification in the Wild

Pith reviewed 2026-05-25 10:03 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords primate face identificationpairwise lossfacial recognitionwildlife conservationrhesus macaqueschimpanzeesopen-set recognitionclosed-set identification
0
0 comments X

The pith

Primate face identification improves by augmenting cross-entropy loss with a pairwise loss on image pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the PFID approach to identify individual wild primates from facial images taken in uncontrolled settings. It trains a network to classify identities while also distinguishing positive image pairs from negative ones. This combined loss is meant to produce features that remain effective despite pose changes, lighting shifts, occlusions, and small training sets. The method is evaluated on rhesus macaques and chimpanzees and reports better accuracy than earlier techniques across classification, verification, closed-set, and open-set protocols. A reader would care because the work targets a practical need for efficient, non-invasive population monitoring in conservation.

Core claim

The PFID loss augments the standard cross entropy loss with a pairwise loss to learn more discriminative and generalizable features, thus making it appropriate for other related identification tasks like open-set, closed set and verification. State-of-the-art accuracy is reported on facial recognition of rhesus macaques and chimpanzees under the four protocols of classification, verification, closed-set identification and open-set recognition.

What carries the argument

The PFID loss, which adds a pairwise term on positive and negative image pairs to the usual cross-entropy objective.

If this is right

  • State-of-the-art accuracy is achieved on rhesus macaques and chimpanzees for four identification protocols.
  • The learned features support open-set recognition, closed-set identification, and verification tasks.
  • The method directly targets the challenges of limited data and nuisance factors in wild images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same loss construction could be tested on other animal species that require individual tracking from camera-trap images.
  • Integration with existing camera networks might allow continuous, automated population estimates without repeated field visits.
  • The pairwise term's effectiveness likely depends on how representative the chosen image pairs are of real environmental variation.

Load-bearing premise

Training on positive and negative image pairs will produce features robust to pose, lighting, and occlusions even when training data is limited and environments are uncontrolled.

What would settle it

If a model trained only with standard cross-entropy loss matches or exceeds the PFID model's accuracy on the same primate test sets that contain large pose and lighting variation, the added benefit of the pairwise term would be falsified.

Figures

Figures reproduced from arXiv: 1907.02642 by Ankita Shukla, Gullal Singh Cheema, Qamar Qureshi, Saket Anand, Yadvendradev Jhala.

Figure 1
Figure 1. Figure 1: Example images showing primates in human shared space and crop raiding [source: google [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of proposed PFID loss function vs. the standard cross entropy loss on the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pose variations for one of the Rhesus Macaque (Top) and Chimpanzee (Below) from the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CMC (Top) and TAR vs FAR (Bottom) plots for (Left) C-Zoo+CTai and (Right) Rhesus [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

Ecological imbalance owing to rapid urbanization and deforestation has adversely affected the population of several wild animals. This loss of habitat has skewed the population of several non-human primate species like chimpanzees and macaques and has constrained them to co-exist in close proximity of human settlements, often leading to human-wildlife conflicts while competing for resources. For effective wildlife conservation and conflict management, regular monitoring of population and of conflicted regions is necessary. However, existing approaches like field visits for data collection and manual analysis by experts is resource intensive, tedious and time consuming, thus necessitating an automated, non-invasive, more efficient alternative like image based facial recognition. The challenge in individual identification arises due to unrelated factors like pose, lighting variations and occlusions due to the uncontrolled environments, that is further exacerbated by limited training data. Inspired by human perception, we propose to learn representations that are robust to such nuisance factors and capture the notion of similarity over the individual identity sub-manifolds. The proposed approach, Primate Face Identification (PFID), achieves this by training the network to distinguish between positive and negative pairs of images. The PFID loss augments the standard cross entropy loss with a pairwise loss to learn more discriminative and generalizable features, thus making it appropriate for other related identification tasks like open-set, closed set and verification. We report state-of-the-art accuracy on facial recognition of two primate species, rhesus macaques and chimpanzees under the four protocols of classification, verification, closed-set identification and open-set recognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the Primate Face Identification (PFID) method, which augments standard cross-entropy loss with a pairwise loss term to learn discriminative and generalizable features for individual primate face recognition in uncontrolled environments. It targets challenges of pose, lighting, and occlusions with limited data and claims state-of-the-art results on rhesus macaques and chimpanzees across four protocols: classification, verification, closed-set identification, and open-set recognition.

Significance. If the empirical results and robustness claims hold after proper validation, the work could support automated, non-invasive tools for primate population monitoring and wildlife conservation. The loss combination itself follows established supervised metric-learning patterns, so novelty would rest on the primate-specific application and any demonstrated gains on the four protocols.

major comments (2)
  1. [Abstract] Abstract: the assertion of 'state-of-the-art accuracy' on four protocols supplies no numerical results, baseline comparisons, dataset statistics (images per identity, total identities), or ablation isolating the pairwise term, rendering the central empirical claim unverifiable.
  2. [Abstract] Abstract: the claim that the pairwise loss produces representations robust to pose, lighting, and occlusions requires that positive pairs explicitly span those intra-identity variations; the text provides neither pair-sampling details nor per-identity image counts to establish this condition.
minor comments (1)
  1. [Abstract] The phrase 'inspired by human perception' is stated without elaboration on the concrete mapping to the loss or architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. The comments highlight opportunities to strengthen the abstract's clarity and verifiability. We address each point below and will incorporate revisions to include additional quantitative details and methodological clarifications where the manuscript body already contains supporting information.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of 'state-of-the-art accuracy' on four protocols supplies no numerical results, baseline comparisons, dataset statistics (images per identity, total identities), or ablation isolating the pairwise term, rendering the central empirical claim unverifiable.

    Authors: We agree that the abstract would be strengthened by including key numerical results, baseline comparisons, and dataset statistics to allow immediate verification of the central claims. The full manuscript reports these in the Experiments section (including per-protocol accuracies, comparisons to standard cross-entropy baselines, total identities, average images per identity, and ablations isolating the pairwise term). In revision we will condense the most salient figures and statistics into the abstract while preserving its length constraints. revision: yes

  2. Referee: [Abstract] Abstract: the claim that the pairwise loss produces representations robust to pose, lighting, and occlusions requires that positive pairs explicitly span those intra-identity variations; the text provides neither pair-sampling details nor per-identity image counts to establish this condition.

    Authors: The manuscript states that the datasets were collected in uncontrolled environments and that positive pairs are formed from images of the same individual. To make the robustness argument explicit in the abstract, we will add concise statements on pair construction (random sampling of same-identity images that naturally include pose/lighting/occlusion variation) and report the per-identity image counts already tabulated in the dataset description section. This does not require new experiments, only clearer exposition. revision: yes

Circularity Check

0 steps flagged

No circularity; standard supervised loss on held-out evaluation

full rationale

The paper presents PFID as an augmentation of cross-entropy by a pairwise term to encourage discriminative features, then reports empirical accuracies on four protocols using held-out splits. No equation reduces a claimed prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled via prior work. The central claim that the loss yields robustness is an empirical assertion, not a definitional identity, so the derivation chain remains independent of its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus the untested premise that the pairwise term yields robustness under the stated nuisance factors. No new physical entities or ad-hoc constants beyond typical loss weighting are introduced.

free parameters (1)
  • pairwise loss weight
    Balance between cross-entropy and pairwise terms must be chosen; not specified in abstract.
axioms (1)
  • domain assumption Deep networks trained with pairwise similarity objectives learn features invariant to common imaging nuisances when data are limited.
    Invoked to justify why the PFID loss should generalize to open-set and verification tasks.

pith-pipeline@v0.9.0 · 5812 in / 1265 out tokens · 49139 ms · 2026-05-25T10:03:25.272093+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    Anand, S., Radhakrishna, S.: Investigating trends in human-wildlife conflict: is conflict es- calation real or imagined? Journal of Asia-Pacific Biodiversity 10(2), 154 – 161 (2017)

  2. [2]

    Anderson, C.J., Johnson, S.A., Hostetler, M.E., Summers, M.G.: History and status of in- troduced rhesus macaques (macaca mulatta) in silver springs state park, florida (2016), http://edis.ifas.ufl.edu/uw412

  3. [3]

    IEEE Transactions on Neural Networks and Learning Systems 27, 1997–2008 (2016)

    Brahma, P.P., Wu, D., She, Y .: Why deep learning works: A manifold disentanglement per- spective. IEEE Transactions on Neural Networks and Learning Systems 27, 1997–2008 (2016)

  4. [4]

    In: CVPR

    Brust, C.A., Burghardt, T., Groenenberg, M., K¨ading, C., K¨uhl, H.S., Manguette, M.L., Den- zler, J.: Towards automated visual monitoring of individual gorillas in the wild. In: CVPR. pp. 2820–2830 (2017)

  5. [5]

    Journal of Threatened Taxa 10(3), 11391–11398 (2018) 3 http://smartconservationtools.org/ Primate Face Identification in the Wild 13

    Cabral, S.J., Prasad, T., Deeyagoda, T.P., Weerakkody, S.N., Nadarajah, A., Rudran, R.: In- vestigating sri lanka’s human-monkey conflict and developing a strategy to mitigate the prob- lem. Journal of Threatened Taxa 10(3), 11391–11398 (2018) 3 http://smartconservationtools.org/ Primate Face Identification in the Wild 13

  6. [6]

    BMC Zoology 2(1), 2 (2017)

    Crouse, D., Jacobs, R.L., Richardson, Z., Klum, S., Jain, A., Baden, A.L., Tecot, S.R.: Lemurfaceid: a face recognition system to facilitate individual identification of lemurs. BMC Zoology 2(1), 2 (2017)

  7. [7]

    Face Recognition: Primates in the Wild

    Deb, D., Wiper, S., Russo, A., Gong, S., Shi, Y ., Tymoszek, C., Jain, A.: Face recognition: Primates in the wild. arXiv preprint arXiv:1804.08790 (2018)

  8. [8]

    Arcface: Additive angular margin loss for deep face recognition,

    Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698 (2018)

  9. [9]

    In: German Conference on Pattern Recognition

    Freytag, A., Rodner, E., Simon, M., Loos, A., K ¨uhl, H.S., Denzler, J.: Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities and attributes of primates. In: German Conference on Pattern Recognition. pp. 51–63. Springer (2016)

  10. [10]

    In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  11. [11]

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014)

  12. [12]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Liu, W., Wen, Y ., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 212–220 (2017)

  13. [13]

    In: Multi- media (ISM), 2012 IEEE International Symposium on

    Loos, A., Ernst, A.: Detection and identification of chimpanzee faces in the wild. In: Multi- media (ISM), 2012 IEEE International Symposium on. pp. 116–119. IEEE (2012)

  14. [14]

    In: Applications of Artificial Neural Networks in Image Processing III

    Lu, H.M., Fainman, Y ., Hecht-Nielsen, R.: Image manifolds. In: Applications of Artificial Neural Networks in Image Processing III. vol. 3307, pp. 52–64. International Society for Optics and Photonics (1998)

  15. [15]

    IEEE transactions on pattern analysis and machine intelligence 31(4), 607–626 (2008)

    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: A survey. IEEE transactions on pattern analysis and machine intelligence 31(4), 607–626 (2008)

  16. [16]

    Annual Review of Environment and Resources 41(1), 143–171 (2016)

    Nyhus, P.J.: Human–wildlife conflict and coexistence. Annual Review of Environment and Resources 41(1), 143–171 (2016)

  17. [17]

    In: CVPR

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR. pp. 779–788 (2016)

  18. [18]

    In: NIPS

    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS. pp. 91–99 (2015)

  19. [19]

    European Journal of Wildlife Research61(3), 435–443 (Jun 2015)

    Saraswat, R., Sinha, A., Radhakrishna, S.: A god becomes a pest? human-rhesus macaque in- teractions in himachal pradesh, northern india. European Journal of Wildlife Research61(3), 435–443 (Jun 2015)

  20. [20]

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015)

    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recogni- tion and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015)

  21. [21]

    In: Advances in neural information processing systems

    Sun, Y ., Chen, Y ., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems. pp. 1988– 1996 (2014)

  22. [22]

    IEEE Signal Processing Letters 25(7), 926–930 (July 2018)

    Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Processing Letters 25(7), 926–930 (July 2018)

  23. [23]

    In: European Conference on Computer Vision

    Wen, Y ., Zhang, K., Li, Z., Qiao, Y .: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision. pp. 499–515. Springer (2016)

  24. [24]

    Journal of neuroscience methods (2017)

    Witham, C.L.: Automated face recognition of rhesus macaques. Journal of neuroscience methods (2017)

  25. [25]

    IEEE transactions on pattern analysis and machine intelligence 31(2), 210– 227 (2009)

    Wright, J., Yang, A.Y ., Ganesh, A., Sastry, S.S., Ma, Y .: Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence 31(2), 210– 227 (2009)

  26. [26]

    Learning Face Representation from Scratch

    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014)