The iWildCam 2019 Challenge Dataset

Dan Morris; Pietro Perona; Sara Beery

arxiv: 1907.07617 · v1 · pith:H227ZEAZnew · submitted 2019-07-15 · 💻 cs.CV

The iWildCam 2019 Challenge Dataset

Sara Beery , Dan Morris , Pietro Perona This is my paper

Pith reviewed 2026-05-24 21:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords camera trapsspecies classificationdomain generalizationwildlife monitoringtransfer learningbenchmark datasetiWildCam

0 comments

The pith

The iWildCam 2019 dataset trains species classifiers on Southwest camera traps and tests them on Northwest data with partial species overlap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a benchmark for wildlife species classification that requires models to handle geographic domain shift. Training images come from the American Southwest while test images are from the Northwest, with some shared species and others unique to each region. Auxiliary data from iNaturalist and simulated images are provided to support transfer learning. This setup lets researchers measure how well current methods generalize to new environments rather than just new images from familiar places. Biologists could use improved models to monitor biodiversity across wider areas without collecting new labeled data everywhere.

Core claim

The iWildCam 2019 Challenge provides Caltech Camera Traps data from the Southwest as training, a new IDFG dataset from the Northwest as test with partial class overlap, and allows use of iNaturalist and TrapCam-AirSim for filling species gaps through transfer learning.

What carries the argument

The cross-region split between Southwest training and Northwest test sets, which creates a domain shift test for species classification while permitting auxiliary transfer learning sources.

Load-bearing premise

The specific differences between American Southwest and Northwest camera trap locations represent a meaningful and general test of domain shift for species classification.

What would settle it

A model achieving similar accuracy on the Northwest test set as on a held-out Southwest test set would indicate that the geographic split does not create a substantial domain shift.

Figures

Figures reproduced from arXiv: 1907.07617 by Dan Morris, Pietro Perona, Sara Beery.

**Figure 2.** Figure 2: Altenate domain examples. (Left) iNaturalist, (Right) TrapCam-AirSim (1) Illumination (2) Blur (3) ROI Size (4) Occlusion (5) Camouflage (6) Perspective [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Number of annotations for each location. (Top) CCT locations, containing 14 classes. (Bottom) IDFG locations, containing images of 8 classes. The distribution of images per location is long-tailed, and each location has a different and peculiar class distribution. an initial learning rate of 0.0045, rmsprop with a momentum of 0.9, and a square input resolution of 299. We employed random cropping (conta… view at source ↗

read the original abstract

Camera Traps (or Wild Cams) enable the automatic collection of large quantities of image data. Biologists all over the world use camera traps to monitor biodiversity and population density of animal species. The computer vision community has been making strides towards automating the species classification challenge in camera traps, but as we try to expand the scope of these models from specific regions where we have collected training data to different areas we are faced with an interesting problem: how do you classify a species in a new region that you may not have seen in previous training data? In order to tackle this problem, we have prepared a dataset and challenge where the training data and test data are from different regions, namely The American Southwest and the American Northwest. We use the Caltech Camera Traps dataset, collected from the American Southwest, as training data. We add a new dataset from the American Northwest, curated from data provided by the Idaho Department of Fish and Game (IDFG), as our test dataset. The test data has some class overlap with the training data, some species are found in both datasets, but there are both species seen during training that are not seen during test and vice versa. To help fill the gaps in the training species, we allow competitors to utilize transfer learning from two alternate domains: human-curated images from iNaturalist and synthetic images from Microsoft's TrapCam-AirSim simulation environment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces the iWildCam 2019 Challenge Dataset to support research on species classification in camera trap images under geographic domain shift. Training data is drawn from the Caltech Camera Traps collection in the American Southwest; the test set is a new collection from the Idaho Department of Fish and Game (IDFG) in the American Northwest that exhibits partial class overlap with the training set. Competitors are permitted to use auxiliary data from iNaturalist (human-curated images) and TrapCam-AirSim (synthetic images) to address gaps in species coverage.

Significance. If the described sources, splits, and auxiliary-data policy are implemented as stated, the release supplies a concrete, reproducible benchmark for domain generalization in wildlife camera-trap classification. The explicit construction of a train-test geographic shift together with controlled class overlap and permitted transfer-learning sources directly targets a practical limitation of existing camera-trap models and should facilitate comparable evaluation across methods.

minor comments (2)

[Abstract] Abstract: the statement that the test set contains 'some species seen during training that are not seen during test and vice versa' would be strengthened by reporting the exact numbers of shared, training-only, and test-only species (or a reference to a table that supplies these counts).
The manuscript would benefit from an explicit statement of the total number of images and classes in each split and the precise protocol used to curate the IDFG test set (e.g., filtering criteria, annotation process).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, significance assessment, and recommendation to accept the manuscript. The report accurately captures the dataset construction, geographic shift, partial class overlap, and auxiliary data policy.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a dataset release paper containing no derivations, equations, predictions, fitted parameters, or model claims. The central contribution is the explicit construction of training/test splits (Caltech Camera Traps Southwest as train, IDFG Northwest as test with partial overlap, plus iNaturalist and AirSim auxiliaries) to address domain shift; this holds by definition of the data sources and splits described in the abstract and full text, with no internal reduction or self-citation chain required to support any asserted result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset description paper with no mathematical model, fitted parameters, or theoretical derivations.

pith-pipeline@v0.9.0 · 5775 in / 1117 out tokens · 24078 ms · 2026-05-24T21:12:58.685389+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 3 internal anchors

[1]

Synthetic Examples Improve Generalization for Rare Classes

S. Beery, Y . Liu, D. Morris, J. Piavis, A. Kapoor, M. Meister, and P. Perona. Synthetic examples improve generalization for rare classes. arXiv preprint arXiv:1904.05916, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[2]

The iWildCam 2018 Challenge Dataset

S. Beery, G. van Horn, O. MacAodha, and P. Perona. The iwildcam 2018 challenge dataset. arXiv preprint arXiv:1904.05986, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Beery, G

S. Beery, G. Van Horn, and P. Perona. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV), pages 456–473, 2018

work page 2018
[4]

G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester. Deep convolutional neural network based species recognition for wild animal monitoring. In Image Processing (ICIP), 2014 IEEE International Conference on , pages 858–862. IEEE, 2014

work page 2014
[5]

Giraldo-Zuluaga, A

J.-H. Giraldo-Zuluaga, A. Salazar, A. Gomez, and A. Diaz- Pulido. Camera-trap images segmentation using multi-layer robust principal component analysis. The Visual Computer, pages 1–13, 2017

work page 2017
[6]

K.-H. Lin, P. Khorrami, J. Wang, M. Hasegawa-Johnson, and T. S. Huang. Foreground object detection in highly dynamic scenes using saliency. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 1125–1129. IEEE, 2014

work page 2014
[7]

Miguel, S

A. Miguel, S. Beery, E. Flores, L. Klemesrud, and R. Bayrakcismith. Finding areas of motion in camera trap images. In Image Processing (ICIP), 2016 IEEE Interna- tional Conference on, pages 1334–1338. IEEE, 2016

work page 2016
[8]

M. S. Norouzzadeh, A. Nguyen, M. Kosmala, A. Swanson, C. Packer, and J. Clune. Automatically identifying wild animals in camera trap images with deep learning. arXiv preprint arXiv:1703.05830, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

X. Ren, T. X. Han, and Z. He. Ensemble video object cut in highly dynamic scenes. In Computer Vision and Pat- tern Recognition (CVPR), 2013 IEEE Conference on , pages 1947–1954. IEEE, 2013

work page 2013
[10]

Swanson, M

A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer. Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an african savanna. Scientiﬁc data, 2:150026, 2015

work page 2015
[11]

Van Horn, O

G. Van Horn, O. Mac Aodha, Y . Song, Y . Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie. The inatu- ralist species classiﬁcation and detection dataset. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018

work page 2018
[12]

A. G. Villa, A. Salazar, and F. Vargas. Towards automatic wild animal monitoring: Identiﬁcation of animal species in camera-trap images using very deep convolutional neural networks. Ecological Informatics, 41:24–32, 2017

work page 2017
[13]

M. J. Wilber, W. J. Scheirer, P. Leitner, B. Heﬂin, J. Zott, D. Reinke, D. K. Delaney, and T. E. Boult. Animal recogni- tion in the mojave desert: Vision tools for ﬁeld biologists. In Applications of Computer Vision (WACV), 2013 IEEE Work- shop on, pages 206–213. IEEE, 2013

work page 2013
[14]

Yousif, J

H. Yousif, J. Yuan, R. Kays, and Z. He. Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classiﬁcation. In Circuits and Systems (ISCAS), 2017 IEEE International Symposium on, pages 1–4. IEEE, 2017

work page 2017
[15]

X. Yu, J. Wang, R. Kays, P. A. Jansen, T. Wang, and T. Huang. Automated identiﬁcation of animal species in camera trap images. EURASIP Journal on Image and Video Processing, 2013(1):52, 2013

work page 2013
[16]

Zhang, T

Z. Zhang, T. X. Han, and Z. He. Coupled ensemble graph cuts and object veriﬁcation for animal segmentation from highly cluttered videos. In Image Processing (ICIP), 2015 IEEE International Conference on, pages 2830–2834. IEEE, 2015

work page 2015
[17]

Zhang, Z

Z. Zhang, Z. He, G. Cao, and W. Cao. Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch veriﬁcation. IEEE Transactions on Multimedia, 18(10):2079–2092, 2016. 4

work page 2079

[1] [1]

Synthetic Examples Improve Generalization for Rare Classes

S. Beery, Y . Liu, D. Morris, J. Piavis, A. Kapoor, M. Meister, and P. Perona. Synthetic examples improve generalization for rare classes. arXiv preprint arXiv:1904.05916, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[2] [2]

The iWildCam 2018 Challenge Dataset

S. Beery, G. van Horn, O. MacAodha, and P. Perona. The iwildcam 2018 challenge dataset. arXiv preprint arXiv:1904.05986, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Beery, G

S. Beery, G. Van Horn, and P. Perona. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV), pages 456–473, 2018

work page 2018

[4] [4]

G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester. Deep convolutional neural network based species recognition for wild animal monitoring. In Image Processing (ICIP), 2014 IEEE International Conference on , pages 858–862. IEEE, 2014

work page 2014

[5] [5]

Giraldo-Zuluaga, A

J.-H. Giraldo-Zuluaga, A. Salazar, A. Gomez, and A. Diaz- Pulido. Camera-trap images segmentation using multi-layer robust principal component analysis. The Visual Computer, pages 1–13, 2017

work page 2017

[6] [6]

K.-H. Lin, P. Khorrami, J. Wang, M. Hasegawa-Johnson, and T. S. Huang. Foreground object detection in highly dynamic scenes using saliency. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 1125–1129. IEEE, 2014

work page 2014

[7] [7]

Miguel, S

A. Miguel, S. Beery, E. Flores, L. Klemesrud, and R. Bayrakcismith. Finding areas of motion in camera trap images. In Image Processing (ICIP), 2016 IEEE Interna- tional Conference on, pages 1334–1338. IEEE, 2016

work page 2016

[8] [8]

M. S. Norouzzadeh, A. Nguyen, M. Kosmala, A. Swanson, C. Packer, and J. Clune. Automatically identifying wild animals in camera trap images with deep learning. arXiv preprint arXiv:1703.05830, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

X. Ren, T. X. Han, and Z. He. Ensemble video object cut in highly dynamic scenes. In Computer Vision and Pat- tern Recognition (CVPR), 2013 IEEE Conference on , pages 1947–1954. IEEE, 2013

work page 2013

[10] [10]

Swanson, M

A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer. Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an african savanna. Scientiﬁc data, 2:150026, 2015

work page 2015

[11] [11]

Van Horn, O

G. Van Horn, O. Mac Aodha, Y . Song, Y . Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie. The inatu- ralist species classiﬁcation and detection dataset. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018

work page 2018

[12] [12]

A. G. Villa, A. Salazar, and F. Vargas. Towards automatic wild animal monitoring: Identiﬁcation of animal species in camera-trap images using very deep convolutional neural networks. Ecological Informatics, 41:24–32, 2017

work page 2017

[13] [13]

M. J. Wilber, W. J. Scheirer, P. Leitner, B. Heﬂin, J. Zott, D. Reinke, D. K. Delaney, and T. E. Boult. Animal recogni- tion in the mojave desert: Vision tools for ﬁeld biologists. In Applications of Computer Vision (WACV), 2013 IEEE Work- shop on, pages 206–213. IEEE, 2013

work page 2013

[14] [14]

Yousif, J

H. Yousif, J. Yuan, R. Kays, and Z. He. Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classiﬁcation. In Circuits and Systems (ISCAS), 2017 IEEE International Symposium on, pages 1–4. IEEE, 2017

work page 2017

[15] [15]

X. Yu, J. Wang, R. Kays, P. A. Jansen, T. Wang, and T. Huang. Automated identiﬁcation of animal species in camera trap images. EURASIP Journal on Image and Video Processing, 2013(1):52, 2013

work page 2013

[16] [16]

Zhang, T

Z. Zhang, T. X. Han, and Z. He. Coupled ensemble graph cuts and object veriﬁcation for animal segmentation from highly cluttered videos. In Image Processing (ICIP), 2015 IEEE International Conference on, pages 2830–2834. IEEE, 2015

work page 2015

[17] [17]

Zhang, Z

Z. Zhang, Z. He, G. Cao, and W. Cao. Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch veriﬁcation. IEEE Transactions on Multimedia, 18(10):2079–2092, 2016. 4

work page 2079