Attribute Aware Pooling for Pedestrian Attribute Recognition

Chang Xu; Chuanjian Liu; Chunjing Xu; Han Shu; Kai Han; Yunhe Wang

arxiv: 1907.11837 · v1 · pith:VYCQAL42new · submitted 2019-07-27 · 💻 cs.CV · cs.LG· eess.IV

Attribute Aware Pooling for Pedestrian Attribute Recognition

Kai Han , Yunhe Wang , Han Shu , Chuanjian Liu , Chunjing Xu , Chang Xu This is my paper

Pith reviewed 2026-05-24 15:16 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV

keywords pedestrian attribute recognitionattribute aware poolingmulti-branch architectureattribute correlationsmulti-attribute classificationdeep convolutional networkscontext information

0 comments

The pith

Attribute aware pooling integrates each branch's prediction with context from correlated attributes to recognize entangled pedestrian traits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-branch CNN architecture where separate branches focus on attributes in different body regions. It then introduces attribute aware pooling that combines each branch's own output with information drawn from the other branches. This step exploits correlations between attributes to resolve cases where individual attributes are indistinct or overlap with others. A reader would care because standard CNNs applied directly to multi-attribute tasks suffer from large label spaces and entanglement, limiting their use in applications like surveillance. The method is shown to improve results on benchmark datasets by making fuller use of those correlations.

Core claim

Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. By adopting a multi-branch architecture for focusing on attributes at different regions and developing the attribute aware pooling to integrate both the prediction based on each branch itself and the context information of each branch, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the correlation between different attributes.

What carries the argument

Attribute aware pooling, which combines each branch's self-prediction with context information drawn from the remaining branches to produce the final decision.

If this is right

Pedestrian attribute recognition benefits from explicitly using correlations between attributes rather than treating them independently.
Multi-branch networks for region-specific attributes become more effective when their outputs are pooled with cross-branch context.
Attributes that are hard to distinguish in isolation become recognizable once context from related attributes is supplied.
The approach scales to the larger label space typical of multi-attribute problems without requiring changes to the underlying CNN backbone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pooling idea could be tested on other multi-label image tasks such as scene attribute recognition or fine-grained object classification.
Replacing the hand-designed pooling with a learned fusion layer might further improve results if the correlations are more complex than the current formulation assumes.
The multi-branch plus context design suggests a general template for any recognition problem where labels share spatial or semantic dependencies.

Load-bearing premise

Context information from other branches in the multi-branch architecture can be used to resolve indistinct or entangled attributes.

What would settle it

A controlled comparison on the same benchmark datasets in which the attribute aware pooling step is removed and performance does not drop relative to the full model.

Figures

Figures reproduced from arXiv: 1907.11837 by Chang Xu, Chuanjian Liu, Chunjing Xu, Han Shu, Kai Han, Yunhe Wang.

**Figure 1.** Figure 1: The diagram of the proposed attribute aware pooling approach. The input instance is fed into a shared CNN and produce multiple [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: C learned on PA-100K dataset. Darker color means smaller value. branches as ˜bl . The probability of the occurrence of attribute a j in ˜bl can be calculated by Pr(a j l ) = Pr(a j |b1, ..., bl−1, bl+1, ..., bm). (6) However, this high-order posterior probability cannot be accurately calculated. Alternatively, we use the following locally max-pooling as an approximation: Ql,j = Pr(a j l ) ≈ max i6=l Pr(a… view at source ↗

**Figure 3.** Figure 3: Feature maps parition for multi-branch architecture. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results from PA-100K dataset of CoCNN [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is a multi-branch CNN plus attribute-aware pooling to pull in cross-region context for entangled pedestrian attributes, but the abstract gives no equations or numbers so the integration claim stays untested.

read the letter

The core idea is straightforward: use separate branches to focus on different body regions, then add an attribute-aware pooling step that folds in context from the other branches so that correlated or hard-to-see attributes get resolved together. That framing of the problem—vanilla CNNs hitting limits from label space size and attribute entanglement—is accurate and worth addressing for surveillance-type tasks. The multi-branch setup itself is not new, but tying the pooling explicitly to attribute correlations is the claimed novelty. The paper does a clean job naming the practical issue and sketching a targeted fix without overclaiming a broad new architecture. The soft spot is exactly where the stress-test note lands: the abstract asserts that context information is employed for the decision and that the pooling integrates both local and contextual signals, yet supplies no derivation, no ablation, and no quantitative results to show the integration actually captures correlations rather than just averaging independent scores. Without those details it is impossible to tell whether the method moves the needle or simply restates multi-task learning. Experiments are mentioned on benchmark datasets but never quantified here, which keeps the support for the claim thin. This is a narrow-scope methods paper aimed at people already working on pedestrian attribute recognition or fine-grained multi-label vision. A reader in that sub-area could extract a usable idea if the full version includes reproducible code and clear ablations; outside that niche the advance looks incremental. I would send it to peer review because the problem statement is honest and the proposed mechanism is simple enough to evaluate quickly, even if the current write-up leaves the central integration step underspecified.

Referee Report

2 major / 2 minor

Summary. The paper claims to develop a novel attribute aware pooling algorithm for pedestrian attribute recognition. It adopts a multi-branch CNN architecture to focus on attributes in different regions and proposes the pooling step to integrate each branch's local prediction with context information from other branches, thereby exploiting attribute correlations and entanglement. This is asserted to enable accurate recognition of indistinct or tangled attributes, with experiments on benchmark datasets claimed to demonstrate that the method appropriately explores and exploits these correlations.

Significance. If the attribute-aware pooling step is shown to meaningfully integrate cross-branch context in a way that resolves attribute correlations beyond independent per-branch predictions, the work would address a practical challenge in multi-label pedestrian attribute recognition. The multi-branch regional focus plus context integration could provide a useful architectural pattern for correlated multi-attribute tasks, though its advantage would need to be quantified against standard multi-label baselines.

major comments (2)

[Abstract] Abstract: The claim that the attribute aware pooling 'integrate[s] both kinds of information' and thereby allows attributes 'which are indistinct or tangled with others' to be 'accurately recognized by exploiting the context information' is load-bearing for the central contribution, yet the abstract supplies neither an equation defining the pooling operation nor an ablation isolating the effect of the context-integration step versus independent branch predictions.
[Abstract] Abstract: The assertion that 'Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes' is unsupported by any quantitative results, baselines, error analysis, or implementation details, leaving the empirical support for the method unevaluable.

minor comments (2)

[Abstract] Typo: 'fucusing' should be 'focusing'.
[Abstract] Grammar: 'challenges that hampers the development' should be 'challenges that hamper the development'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. The comments correctly identify that the abstract makes strong claims without including supporting details such as equations or quantitative results. We will revise the abstract to address these points while preserving its concise nature.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the attribute aware pooling 'integrate[s] both kinds of information' and thereby allows attributes 'which are indistinct or tangled with others' to be 'accurately recognized by exploiting the context information' is load-bearing for the central contribution, yet the abstract supplies neither an equation defining the pooling operation nor an ablation isolating the effect of the context-integration step versus independent branch predictions.

Authors: We agree that the abstract would benefit from greater specificity on the central mechanism. The attribute-aware pooling operation is formally defined in Equation (3) of Section 3.2, and the ablation isolating the context-integration component (versus per-branch predictions alone) appears in Table 3 of Section 4.3. We will revise the abstract to include a brief reference to the pooling formulation and to note that the contribution of context integration is quantified via ablation. revision: yes
Referee: [Abstract] Abstract: The assertion that 'Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes' is unsupported by any quantitative results, baselines, error analysis, or implementation details, leaving the empirical support for the method unevaluable.

Authors: The abstract summarizes the experimental outcome at a high level. Full quantitative comparisons against baselines, error analysis, and implementation details are provided in Section 4 (Tables 1–4) on the RAP and PETA datasets. To strengthen the abstract, we will add a sentence reporting the key mA and F1 improvements over the strongest baseline and note the datasets used. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic proposal with experimental validation

full rationale

The paper proposes a new attribute-aware pooling method in a multi-branch CNN to handle attribute correlations for pedestrian recognition. The claimed benefit (integrating branch predictions with context) is presented as an algorithmic design choice whose effectiveness is evaluated via new experiments on benchmarks. No derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing steps exist; the contribution does not reduce to its inputs by construction and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of a newly introduced pooling operation whose performance is claimed via experiments; the ledger reflects standard deep learning assumptions plus the new algorithmic entity without external validation.

axioms (1)

domain assumption Convolutional neural networks can extract useful features from images for classification tasks
The paper relies on CNNs as the base architecture without additional justification.

invented entities (1)

Attribute aware pooling no independent evidence
purpose: To combine branch-specific predictions with context information from other branches for improved multi-attribute recognition
Core new component introduced by the authors to address attribute entanglement.

pith-pipeline@v0.9.0 · 5690 in / 1051 out tokens · 50627 ms · 2026-05-24T15:16:44.853350+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

Timeml-compliant text analysis for temporal reasoning

[Boguraev and Ando, 2005] Branimir Boguraev and Rie Kubota Ando. Timeml-compliant text analysis for temporal reasoning. In IJCAI,

work page 2005
[2]

Pedestrian attribute recognition at far distance

[Deng et al., 2014] Yubin Deng, Ping Luo, Chen Change Loy, and Xiaoou Tang. Pedestrian attribute recognition at far distance. In ACM MM,

work page 2014
[3]

Multi-label classiﬁcation using conditional dependency networks

[Guo and Gu, 2011] Yuhong Guo and Suicheng Gu. Multi-label classiﬁcation using conditional dependency networks. In IJCAI,

work page 2011
[4]

Attributes for improved attributes: A multi-task network utiliz- ing implicit and explicit relationships for facial attribute classiﬁ- cation

[Hand and Chellappa, 2017] Emily M Hand and Rama Chellappa. Attributes for improved attributes: A multi-task network utiliz- ing implicit and explicit relationships for facial attribute classiﬁ- cation. In AAAI, pages 4068–4074,

work page 2017
[5]

Deep residual learning for image recognition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR,

work page 2016
[6]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Person re-identiﬁcation by attributes

[Layne et al., 2012] Ryan Layne, Timothy M Hospedales, Shao- gang Gong, and Q Mary. Person re-identiﬁcation by attributes. In Bmvc, volume 2, page 8,

work page 2012
[8]

Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios

[Li et al., 2015] Dangwei Li, Xiaotang Chen, and Kaiqi Huang. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR,

work page 2015
[9]

A Richly Annotated Dataset for Pedestrian Attribute Recognition

[Li et al., 2016] Dangwei Li, Zhang Zhang, Xiaotang Chen, Haibin Ling, and Kaiqi Huang. A richly annotated dataset for pedestrian attribute recognition. arXiv preprint arXiv:1603.07054,

work page internal anchor Pith review Pith/arXiv arXiv 2016
[10]

Sequential per- son recognition in photo albums with a recurrent network

[Li et al., 2017] Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. Sequential per- son recognition in photo albums with a recurrent network. In CVPR,

work page 2017
[11]

Localization guided learning for pedestrian attribute recognition

[Liu et al., 2018] Pengze Liu, Xihui Liu, Junjie Yan, and Jing Shao. Localization guided learning for pedestrian attribute recognition. In BMVC,

work page 2018
[12]

Fully-adaptive fea- ture sharing in multi-task networks with applications in person attribute classiﬁcation

[Lu et al., 2017] Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, and Rogerio Feris. Fully-adaptive fea- ture sharing in multi-task networks with applications in person attribute classiﬁcation. In CVPR, volume 1, page 6,

work page 2017
[13]

Heterogeneous multitask metric learning across multiple do- mains

[Luo et al., 2018] Yong Luo, Yonggang Wen, and Dacheng Tao. Heterogeneous multitask metric learning across multiple do- mains. IEEE T-NNLS, 29(9):4051–4064,

work page 2018
[14]

Transferring knowledge fragments for learning distance metric from a heterogeneous domain

[Luo et al., 2019] Yong Luo, Yonggang Wen, Tongliang Liu, and Dacheng Tao. Transferring knowledge fragments for learning distance metric from a heterogeneous domain. IEEE T-PAMI, 41(4):1013–1026,

work page 2019
[15]

Costa: Co-occurrence statistics for zero-shot classiﬁcation

[Mensink et al., 2014] Thomas Mensink, Efstratios Gavves, and Cees GM Snoek. Costa: Co-occurrence statistics for zero-shot classiﬁcation. In CVPR,

work page 2014
[16]

Video classiﬁcation using semantic concept co-occurrences

[Modiri Assari et al., 2014] Shayan Modiri Assari, Amir Roshan Zamir, and Mubarak Shah. Video classiﬁcation using semantic concept co-occurrences. In CVPR,

work page 2014
[17]

Automatic differentiation in pytorch

[Paszke et al., 2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W,

work page 2017
[18]

Joint learning of semantic and latent attributes

[Peng et al., 2016] Peixi Peng, Yonghong Tian, Tao Xiang, Yaowei Wang, and Tiejun Huang. Joint learning of semantic and latent attributes. In ECCV,

work page 2016
[19]

Faster R-CNN: Towards real-time object detection with region proposal networks

[Ren et al., 2015] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS,

work page 2015
[20]

Ima- genet large scale visual recognition challenge.IJCV, 115(3):211– 252,

[Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Ima- genet large scale visual recognition challenge.IJCV, 115(3):211– 252,

work page 2015
[21]

Deep imbalanced attribute classiﬁcation using visual attention aggregation

[Saraﬁanos et al., 2018] Nikolaos Saraﬁanos, Xiang Xu, and Ioan- nis A Kakadiaris. Deep imbalanced attribute classiﬁcation using visual attention aggregation. In ECCV,

work page 2018
[22]

Person attribute recognition with a jointly-trained holistic cnn model

[Sudowe et al., 2015] Patrick Sudowe, Hannah Spitzer, and Bastian Leibe. Person attribute recognition with a jointly-trained holistic cnn model. In ICCV Workshops,

work page 2015
[23]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline)

[Sun et al., 2018] Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline). In ECCV,

work page 2018
[24]

Visual words reﬁning exploiting spatial co-occurrence table

[Wang et al., 2013] Yunhe Wang, Miaojing Shi, Yuan Gao, and Chao Xu. Visual words reﬁning exploiting spatial co-occurrence table. In 2013 IEEE Global High Tech Congress on Electronics , pages 99–104. IEEE,

work page 2013
[25]

Cnnpack: Packing convolutional neural net- works in the frequency domain

[Wang et al., 2016] Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. Cnnpack: Packing convolutional neural net- works in the frequency domain. In NeurIPS,

work page 2016
[26]

Attribute recognition by joint recurrent learning of context and correlation

[Wang et al., 2017] Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. Attribute recognition by joint recurrent learning of context and correlation. In ICCV, Oct

work page 2017
[27]

Learning versatile ﬁlters for efﬁcient con- volutional neural networks

[Wang et al., 2018] Yunhe Wang, Chang Xu, XU Chunjing, Chao Xu, and Dacheng Tao. Learning versatile ﬁlters for efﬁcient con- volutional neural networks. In NeurIPS,

work page 2018
[28]

Learning deep feature representations with do- main guided dropout for person re-identiﬁcation

[Xiao et al., 2016] Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. Learning deep feature representations with do- main guided dropout for person re-identiﬁcation. InCVPR,

work page 2016
[29]

Deep metric learning for person re-identiﬁcation

[Yi et al., 2014] Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. Deep metric learning for person re-identiﬁcation. In ICPR,

work page 2014
[30]

Grouping attribute recognition for pedestrian with joint recurrent learning

[Zhao et al., 2018] Xin Zhao, Liufang Sang, Guiguang Ding, Yuchen Guo, and Xiaoming Jin. Grouping attribute recognition for pedestrian with joint recurrent learning. In IJCAI,

work page 2018
[31]

Multi-label cnn based pedestrian attribute learning for soft biometrics

[Zhu et al., 2015] Jianqing Zhu, Shengcai Liao, Dong Yi, Zhen Lei, and Stan Z Li. Multi-label cnn based pedestrian attribute learning for soft biometrics. In ICB. IEEE, 2015

work page 2015

[1] [1]

Timeml-compliant text analysis for temporal reasoning

[Boguraev and Ando, 2005] Branimir Boguraev and Rie Kubota Ando. Timeml-compliant text analysis for temporal reasoning. In IJCAI,

work page 2005

[2] [2]

Pedestrian attribute recognition at far distance

[Deng et al., 2014] Yubin Deng, Ping Luo, Chen Change Loy, and Xiaoou Tang. Pedestrian attribute recognition at far distance. In ACM MM,

work page 2014

[3] [3]

Multi-label classiﬁcation using conditional dependency networks

[Guo and Gu, 2011] Yuhong Guo and Suicheng Gu. Multi-label classiﬁcation using conditional dependency networks. In IJCAI,

work page 2011

[4] [4]

Attributes for improved attributes: A multi-task network utiliz- ing implicit and explicit relationships for facial attribute classiﬁ- cation

[Hand and Chellappa, 2017] Emily M Hand and Rama Chellappa. Attributes for improved attributes: A multi-task network utiliz- ing implicit and explicit relationships for facial attribute classiﬁ- cation. In AAAI, pages 4068–4074,

work page 2017

[5] [5]

Deep residual learning for image recognition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR,

work page 2016

[6] [6]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

Person re-identiﬁcation by attributes

[Layne et al., 2012] Ryan Layne, Timothy M Hospedales, Shao- gang Gong, and Q Mary. Person re-identiﬁcation by attributes. In Bmvc, volume 2, page 8,

work page 2012

[8] [8]

Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios

[Li et al., 2015] Dangwei Li, Xiaotang Chen, and Kaiqi Huang. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR,

work page 2015

[9] [9]

A Richly Annotated Dataset for Pedestrian Attribute Recognition

[Li et al., 2016] Dangwei Li, Zhang Zhang, Xiaotang Chen, Haibin Ling, and Kaiqi Huang. A richly annotated dataset for pedestrian attribute recognition. arXiv preprint arXiv:1603.07054,

work page internal anchor Pith review Pith/arXiv arXiv 2016

[10] [10]

Sequential per- son recognition in photo albums with a recurrent network

[Li et al., 2017] Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. Sequential per- son recognition in photo albums with a recurrent network. In CVPR,

work page 2017

[11] [11]

Localization guided learning for pedestrian attribute recognition

[Liu et al., 2018] Pengze Liu, Xihui Liu, Junjie Yan, and Jing Shao. Localization guided learning for pedestrian attribute recognition. In BMVC,

work page 2018

[12] [12]

Fully-adaptive fea- ture sharing in multi-task networks with applications in person attribute classiﬁcation

[Lu et al., 2017] Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, and Rogerio Feris. Fully-adaptive fea- ture sharing in multi-task networks with applications in person attribute classiﬁcation. In CVPR, volume 1, page 6,

work page 2017

[13] [13]

Heterogeneous multitask metric learning across multiple do- mains

[Luo et al., 2018] Yong Luo, Yonggang Wen, and Dacheng Tao. Heterogeneous multitask metric learning across multiple do- mains. IEEE T-NNLS, 29(9):4051–4064,

work page 2018

[14] [14]

Transferring knowledge fragments for learning distance metric from a heterogeneous domain

[Luo et al., 2019] Yong Luo, Yonggang Wen, Tongliang Liu, and Dacheng Tao. Transferring knowledge fragments for learning distance metric from a heterogeneous domain. IEEE T-PAMI, 41(4):1013–1026,

work page 2019

[15] [15]

Costa: Co-occurrence statistics for zero-shot classiﬁcation

[Mensink et al., 2014] Thomas Mensink, Efstratios Gavves, and Cees GM Snoek. Costa: Co-occurrence statistics for zero-shot classiﬁcation. In CVPR,

work page 2014

[16] [16]

Video classiﬁcation using semantic concept co-occurrences

[Modiri Assari et al., 2014] Shayan Modiri Assari, Amir Roshan Zamir, and Mubarak Shah. Video classiﬁcation using semantic concept co-occurrences. In CVPR,

work page 2014

[17] [17]

Automatic differentiation in pytorch

[Paszke et al., 2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W,

work page 2017

[18] [18]

Joint learning of semantic and latent attributes

[Peng et al., 2016] Peixi Peng, Yonghong Tian, Tao Xiang, Yaowei Wang, and Tiejun Huang. Joint learning of semantic and latent attributes. In ECCV,

work page 2016

[19] [19]

Faster R-CNN: Towards real-time object detection with region proposal networks

[Ren et al., 2015] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS,

work page 2015

[20] [20]

Ima- genet large scale visual recognition challenge.IJCV, 115(3):211– 252,

[Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Ima- genet large scale visual recognition challenge.IJCV, 115(3):211– 252,

work page 2015

[21] [21]

Deep imbalanced attribute classiﬁcation using visual attention aggregation

[Saraﬁanos et al., 2018] Nikolaos Saraﬁanos, Xiang Xu, and Ioan- nis A Kakadiaris. Deep imbalanced attribute classiﬁcation using visual attention aggregation. In ECCV,

work page 2018

[22] [22]

Person attribute recognition with a jointly-trained holistic cnn model

[Sudowe et al., 2015] Patrick Sudowe, Hannah Spitzer, and Bastian Leibe. Person attribute recognition with a jointly-trained holistic cnn model. In ICCV Workshops,

work page 2015

[23] [23]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline)

[Sun et al., 2018] Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline). In ECCV,

work page 2018

[24] [24]

Visual words reﬁning exploiting spatial co-occurrence table

[Wang et al., 2013] Yunhe Wang, Miaojing Shi, Yuan Gao, and Chao Xu. Visual words reﬁning exploiting spatial co-occurrence table. In 2013 IEEE Global High Tech Congress on Electronics , pages 99–104. IEEE,

work page 2013

[25] [25]

Cnnpack: Packing convolutional neural net- works in the frequency domain

[Wang et al., 2016] Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. Cnnpack: Packing convolutional neural net- works in the frequency domain. In NeurIPS,

work page 2016

[26] [26]

Attribute recognition by joint recurrent learning of context and correlation

[Wang et al., 2017] Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. Attribute recognition by joint recurrent learning of context and correlation. In ICCV, Oct

work page 2017

[27] [27]

Learning versatile ﬁlters for efﬁcient con- volutional neural networks

[Wang et al., 2018] Yunhe Wang, Chang Xu, XU Chunjing, Chao Xu, and Dacheng Tao. Learning versatile ﬁlters for efﬁcient con- volutional neural networks. In NeurIPS,

work page 2018

[28] [28]

Learning deep feature representations with do- main guided dropout for person re-identiﬁcation

[Xiao et al., 2016] Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. Learning deep feature representations with do- main guided dropout for person re-identiﬁcation. InCVPR,

work page 2016

[29] [29]

Deep metric learning for person re-identiﬁcation

[Yi et al., 2014] Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. Deep metric learning for person re-identiﬁcation. In ICPR,

work page 2014

[30] [30]

Grouping attribute recognition for pedestrian with joint recurrent learning

[Zhao et al., 2018] Xin Zhao, Liufang Sang, Guiguang Ding, Yuchen Guo, and Xiaoming Jin. Grouping attribute recognition for pedestrian with joint recurrent learning. In IJCAI,

work page 2018

[31] [31]

Multi-label cnn based pedestrian attribute learning for soft biometrics

[Zhu et al., 2015] Jianqing Zhu, Shengcai Liao, Dong Yi, Zhen Lei, and Stan Z Li. Multi-label cnn based pedestrian attribute learning for soft biometrics. In ICB. IEEE, 2015

work page 2015