PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network
Pith reviewed 2026-05-24 18:51 UTC · model grok-4.3
The pith
PH-GCN builds a hierarchical graph of body parts and uses message passing to combine local, global, and structural features for person re-identification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a person image, the framework first builds a hierarchical graph to represent pairwise relationships among different parts. Message passing on this graph performs both local and global feature learning by letting each node incorporate information from related nodes. A final perceptron layer produces part label predictions used for re-identification. The central claim is that this construction supplies a unified end-to-end solution that simultaneously learns local, global, and structural features, overcoming the independence assumption of prior part-based models.
What carries the argument
The part-based hierarchical graph whose nodes are image parts and whose edges carry relationship information; graph convolutional layers propagate messages across the graph to produce context-aware part features.
If this is right
- Re-identification accuracy improves when part features are allowed to exchange information rather than being computed in isolation.
- A single network can jointly optimize local detail, global appearance, and part structure without separate stages.
- The same graph-based message passing can be applied at multiple levels of part granularity inside one model.
- End-to-end training becomes feasible for models that previously required independent part detectors followed by separate fusion.
Where Pith is reading between the lines
- The same message-passing idea could be tested on other vision tasks that decompose objects into parts, such as fine-grained recognition or pose-guided retrieval.
- If the graph edges can be learned directly from data instead of predefined, the method might reduce reliance on explicit part detectors.
- Different hierarchy depths or edge-weighting schemes could be compared to identify the minimal graph structure that still captures useful relationships.
Load-bearing premise
The hierarchical graph accurately encodes meaningful pairwise relationships among parts and message passing on it produces better features than processing parts independently.
What would settle it
An ablation that replaces the learned hierarchical graph with random edges or removes the graph entirely and measures whether re-identification accuracy on the same benchmarks stays the same or drops.
Figures
read the original abstract
The person re-identification (Re-ID) task requires to robustly extract feature representations for person images. Recently, part-based representation models have been widely studied for extracting the more compact and robust feature representations for person images to improve person Re-ID results. However, existing part-based representation models mostly extract the features of different parts independently which ignore the relationship information between different parts. To overcome this limitation, in this paper we propose a novel deep learning framework, named Part-based Hierarchical Graph Convolutional Network (PH-GCN) for person Re-ID problem. Given a person image, PH-GCN first constructs a hierarchical graph to represent the pairwise relationships among different parts. Then, both local and global feature learning are performed by the messages passing in PH-GCN, which takes other nodes information into account for part feature representation. Finally, a perceptron layer is adopted for the final person part label prediction and re-identification. The proposed framework provides a general solution that integrates local, global and structural feature learning simultaneously in a unified end-to-end network. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed PH-GCN based Re-ID approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PH-GCN, a part-based hierarchical graph convolutional network for person re-identification. It constructs a hierarchical graph to model pairwise relationships among image parts, performs message passing to integrate local and global features while accounting for structural relations, and applies a perceptron for final prediction. The central claim is that this provides a general end-to-end solution integrating local, global, and structural feature learning, with effectiveness shown via experiments on benchmark Re-ID datasets.
Significance. If the hierarchical graph construction and message passing are shown to be jointly optimized with the CNN backbone and to yield measurable gains over independent part processing, the work could advance part-based Re-ID by providing a principled way to incorporate relational structure. The unified network formulation is a potential strength if the structural component is not reducible to a fixed topology.
major comments (3)
- [§3] §3 (Graph Construction): The description of the hierarchical graph does not specify whether the adjacency matrix or edge weights are computed from fixed priors (e.g., spatial proximity or part-type rules) or are learned parameters updated during back-propagation. If the former, message passing operates over a static topology and the claim of simultaneous end-to-end structural feature learning is not supported.
- [§4] §4 (Loss and Training): No derivation or ablation isolates the contribution of the GCN message-passing step versus standard part-feature concatenation or pooling. Without controls that disable the graph while keeping the rest of the pipeline identical, it is unclear whether the reported gains on Market-1501 and DukeMTMC-reID are attributable to structural learning.
- [Table 2] Table 2 (Ablation results): The row comparing PH-GCN to its non-graph variant reports only aggregate mAP/Rank-1; the per-part feature similarity or message-passing ablation is missing, preventing verification that the hierarchical relations improve part representations beyond independent processing.
minor comments (2)
- [§3.1] Notation for the hierarchical levels (e.g., part nodes vs. super-nodes) is introduced without an explicit diagram or equation defining the node feature update rule.
- [Abstract] The abstract states the framework is 'parameter-free' in its integration claim, yet the perceptron layer and GCN weights are learned; this phrasing should be clarified.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment point by point below and indicate planned revisions to improve clarity and experimental rigor.
read point-by-point responses
-
Referee: [§3] §3 (Graph Construction): The description of the hierarchical graph does not specify whether the adjacency matrix or edge weights are computed from fixed priors (e.g., spatial proximity or part-type rules) or are learned parameters updated during back-propagation. If the former, message passing operates over a static topology and the claim of simultaneous end-to-end structural feature learning is not supported.
Authors: The hierarchical graph is constructed using fixed priors based on spatial proximity and part-type rules from the part division. The adjacency matrix and edge weights are not learned parameters. End-to-end optimization applies to the feature representations via message passing integrated with the CNN backbone. We will revise §3 to explicitly state the static topology and adjust the claim from 'structural feature learning' to 'incorporating structural relations via message passing' to prevent overstatement. revision: yes
-
Referee: [§4] §4 (Loss and Training): No derivation or ablation isolates the contribution of the GCN message-passing step versus standard part-feature concatenation or pooling. Without controls that disable the graph while keeping the rest of the pipeline identical, it is unclear whether the reported gains on Market-1501 and DukeMTMC-reID are attributable to structural learning.
Authors: We agree that isolating the message-passing contribution would strengthen the claims. Table 2 already includes a non-graph variant, but we will add a dedicated ablation that disables only the GCN message passing (while retaining identical part extraction, concatenation, and training) to quantify its specific effect on the reported gains. revision: yes
-
Referee: [Table 2] Table 2 (Ablation results): The row comparing PH-GCN to its non-graph variant reports only aggregate mAP/Rank-1; the per-part feature similarity or message-passing ablation is missing, preventing verification that the hierarchical relations improve part representations beyond independent processing.
Authors: The existing ablation uses standard aggregate Re-ID metrics. We will expand the experimental section with per-part similarity metrics and an explicit message-passing ablation to demonstrate improvements from hierarchical relations over independent part processing. revision: yes
Circularity Check
No circularity: architecture proposal does not reduce to input definitions or self-citations
full rationale
The paper introduces PH-GCN as a new end-to-end network that constructs a hierarchical graph over parts and performs message passing. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. The central integration of local/global/structural features is presented as an architectural contribution rather than a derived equivalence to its own inputs. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Person re- identification by multi-channel parts-based cnn with improved triplet loss function,
D. Cheng, Y . Gong, S. Zhou, J. Wang, and N. Zheng, “Person re- identification by multi-channel parts-based cnn with improved triplet loss function,” in Computer Vision and Pattern Recognition , 2016
work page 2016
-
[2]
Dual attention matching network for context-aware feature sequence based person re-identification,
J. Si, H. Zhang, C.-G. Li, J. Kuen, X. Kong, A. C. Kot, and G. Wang, “Dual attention matching network for context-aware feature sequence based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5363–5372
work page 2018
-
[3]
Svdnet for pedestrian retrieval,
Y . Sun, L. Zheng, W. Deng, and S. Wang, “Svdnet for pedestrian retrieval,” in IEEE International Conference on Computer Vision , 2017
work page 2017
-
[4]
Deep group- shuffling random walk for person re-identification,
Y . Shen, H. Li, T. Xiao, S. Yi, D. Chen, and X. Wang, “Deep group- shuffling random walk for person re-identification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 2265–2274
work page 2018
-
[5]
Group consistent simi- larity learning via deep crf for person re-identification,
D. Chen, D. Xu, H. Li, N. Sebe, and X. Wang, “Group consistent simi- larity learning via deep crf for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 8649–8658
work page 2018
-
[6]
Harmonious attention network for person re-identification,
W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 2285–2294
work page 2018
-
[7]
D. Chen, H. Li, T. Xiao, S. Yi, and X. Wang, “Video person re- identification with competitive snippet-similarity aggregation and co- attentive snippet embedding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018
work page 2018
-
[8]
Multi-region bilinear con- volutional neural networks for person re-identification,
E. Ustinova, Y . Ganin, and V . Lempitsky, “Multi-region bilinear con- volutional neural networks for person re-identification,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017, pp. 1–6
work page 2017
-
[9]
Pose-driven deep convolutional model for person re-identification,
C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identification,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3960–3969
work page 2017
-
[10]
Person re-identification by deep learning attribute-complementary information,
A. Schumann and R. Stiefelhagen, “Person re-identification by deep learning attribute-complementary information,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work- shops, 2017, pp. 20–28
work page 2017
-
[11]
Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),
Y . Sun, L. Zheng, Y . Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 480–496
work page 2018
-
[12]
Deeply-learned part-aligned representations for person re-identification,
L. Zhao, X. Li, Y . Zhuang, and J. Wang, “Deeply-learned part-aligned representations for person re-identification,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017
work page 2017
-
[13]
Part-aligned bilinear representations for person re-identification,
Y . Suh, J. Wang, S. Tang, T. Mei, and K. M. Lee, “Part-aligned bilinear representations for person re-identification,” in Computer Vision–ECCV
-
[14]
Springer, 2018, pp. 418–437
work page 2018
-
[15]
Glad: Global-local- alignment descriptor for pedestrian retrieval,
L. Wei, S. Zhang, H. Yao, W. Gao, and Q. Tian, “Glad: Global-local- alignment descriptor for pedestrian retrieval,” in Proceedings of the 25th ACM international conference on Multimedia . ACM, 2017, pp. 420– 428
work page 2017
-
[16]
Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training
F. Zheng, X. Sun, X. Jiang, X. Guo, Z. Yu, and F. Huang, “A coarse-to- fine pyramidal model for person re-identification via multi-loss dynamic training,” arXiv preprint arXiv:1810.12193 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
Learning Discriminative Features with Multiple Granularities for Person Re-Identification
G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” arXiv preprint arXiv:1804.01438, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Spectral Networks and Locally Connected Networks on Graphs
J. Bruna, W. Zaremba, A. Szlam, and Y . LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[19]
Semi-Supervised Classification with Graph Convolutional Networks
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
Learning convolutional neural networks for graphs,
M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in International conference on machine learning , 2016, pp. 2014–2023
work page 2016
-
[21]
Convolutional neural networks on graphs with fast localized spectral filtering,
M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in neural information processing systems , 2016, pp. 3844–3852
work page 2016
-
[22]
Spatial temporal graph convolutional networks for skeleton-based action recognition,
S. Yan, Y . Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Thirty-Second AAAI Conference on Artificial Intelligence , 2018
work page 2018
-
[23]
Spectral networks and locally connected networks on graphs,
J. Bruna, W. Zaremba, A. Szlam, and Y . LeCun, “Spectral networks and locally connected networks on graphs,” in International Conference on Learning Representations, 2014
work page 2014
-
[24]
Diffusion-convolutional neural networks,
J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993– 2001
work page 2016
-
[25]
Geometric deep learning on graphs and manifolds using mixture model cnns,
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5423–5434
work page 2017
-
[26]
P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y . Ben- gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
Neural graph matching networks for fewshot 3d action recognition,
M. Guo, E. Chou, D.-A. Huang, S. Song, S. Yeung, and L. Fei-Fei, “Neural graph matching networks for fewshot 3d action recognition,” in Proceedings of the European Conference on Computer Vision (ECCV) , 2018, pp. 653–669
work page 2018
-
[28]
3d graph neural networks for rgbd semantic segmentation,
X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3d graph neural networks for rgbd semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 5199–5208
work page 2017
-
[29]
Person re-identification with deep similarity-guided graph neural network,
Y . Shen, H. Li, S. Yi, D. Chen, and X. Wang, “Person re-identification with deep similarity-guided graph neural network,” in Proceedings of the European Conference on Computer Vision (ECCV) , 2018, pp. 486–504
work page 2018
-
[30]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
work page 2016
-
[31]
ImageNet Large Scale Visual Recognition Challenge
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” arXiv preprint arXiv:1409.0575 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[32]
A tutorial on the cross-entropy method,
P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,” Annals of operations research , vol. 134, no. 1, pp. 19–67, 2005
work page 2005
-
[33]
Hydraplus-net: Attentive deep features for pedestrian analysis,
X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017
work page 2017
-
[34]
Re-ranking person re- identification with k-reciprocal encoding,
Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re- identification with k-reciprocal encoding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017
work page 2017
-
[35]
Person Re-Identification by Deep Joint Learning of Multi-Loss Classification
W. Li, X. Zhu, and S. Gong, “Person re-identification by deep joint learning of multi-loss classification,” arXiv preprint arXiv:1705.04724 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Scalable person re-identification on supervised smoothed manifold,
S. Bai, X. Bai, and Q. Tian, “Scalable person re-identification on supervised smoothed manifold,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 2530–2539
work page 2017
-
[37]
Person re-identification by deep learning multi-scale representations,
Y . Chen, X. Zhu, and S. Gong, “Person re-identification by deep learning multi-scale representations,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2590–2600
work page 2017
-
[38]
Camera style adaptation for person re-identification,
Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y . Yang, “Camera style adaptation for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5157–5166
work page 2018
-
[39]
A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,
M. Saquib Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, “A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 420–429
work page 2018
-
[40]
Pose transferrable person re-identification,
J. Liu, B. Ni, Y . Yan, P. Zhou, S. Cheng, and J. Hu, “Pose transferrable person re-identification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018
work page 2018
-
[41]
Person re-identification with cascaded pairwise convolutions,
Y . Wang, Z. Chen, F. Wu, and G. Wang, “Person re-identification with cascaded pairwise convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 1470–1478
work page 2018
-
[42]
Mask-guided contrastive attention model for person re-identification,
C. Song, Y . Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 1179–1188
work page 2018
-
[43]
K. Yuan, Q. Zhang, C. Huang, S. Xiang, C. Pan, and H. Robotics, “Safenet: Scale-normalization and anchor-based feature extraction net- work for person re-identification.” in IJCAI, 2018, pp. 1121–1127
work page 2018
-
[44]
Scalable person re-identification: A benchmark,
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in The IEEE International Conference on Computer Vision (ICCV) , December 2015. 7
work page 2015
-
[45]
Person re-identification by local maximal occurrence representation and metric learning,
S. Liao, Y . Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 2197–2206
work page 2015
-
[46]
Unlabeled samples generated by gan improve the person re-identification baseline in vitro,
Z. Zheng, L. Zheng, and Y . Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3754–3762
work page 2017
-
[47]
Re-ranking person re- identification with k-reciprocal encoding,
Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re- identification with k-reciprocal encoding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1318–1327
work page 2017
-
[48]
Deepreid: Deep filter pairing neural network for person re-identification,
W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in CVPR, 2014
work page 2014
-
[49]
Object detection with discriminatively trained part-based models,
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010
work page 2010
-
[50]
Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking
E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi, “Perfor- mance measures and a data set for multi-target, multi-camera tracking,” arXiv preprint arXiv:1609.01775 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[51]
Large-scale machine learning with stochastic gradient de- scent,
L. Bottou, “Large-scale machine learning with stochastic gradient de- scent,” in Proceedings of COMPSTAT’2010, 2010, pp. 177–186
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.