pith. machine review for the scientific record. sign in

arxiv: 1406.2952 · v1 · submitted 2014-06-11 · 💻 cs.CV

Recognition: unknown

Bird Species Categorization Using Pose Normalized Deep Convolutional Nets

Authors on Pith no claims yet
classification 💻 cs.CV
keywords poseclassificationfeaturebirdconvolutionaldeepfeaturesimage
0
0 comments X
read the original abstract

We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object's pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric warping functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of state-of-the-art deep convolutional feature implementations and fine-tuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

    cs.CV 2026-05 unverdicted novelty 7.0

    GAPan uses invertible normalizing flows to learn generative appearance priors from seen categories and aligns retrieval embeddings to these priors, improving performance on unseen categories in fine-grained image retrieval.