pith. machine review for the scientific record. sign in

arxiv: 1411.7766 · v3 · submitted 2014-11-28 · 💻 cs.CV

Recognition: unknown

Deep Learning Face Attributes in the Wild

Authors on Pith no claims yet
classification 💻 cs.CV
keywords faceattributelnetanetconceptslearninglocalizationmassive
0
0 comments X
read the original abstract

Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs

    cs.CV 2026-05 unverdicted novelty 6.0

    Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation w...

  2. UNBOX: Unveiling Black-box visual models with Natural-language

    cs.CV 2026-03 unverdicted novelty 6.0

    UNBOX recovers interpretable text concepts that maximally activate classes in black-box vision models by recasting activation maximization as semantic search with LLMs and diffusion models.