pith. sign in

arxiv: 1907.08822 · v1 · pith:ZXQJ24CJnew · submitted 2019-07-20 · 💻 cs.CV

PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network

Pith reviewed 2026-05-24 18:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords person re-identificationgraph convolutional networkpart-based representationhierarchical graphmessage passingfeature learningend-to-end network
0
0 comments X

The pith

PH-GCN builds a hierarchical graph of body parts and uses message passing to combine local, global, and structural features for person re-identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the limitation that existing part-based re-identification models extract features from body parts independently and therefore miss relationship information between parts. It constructs a hierarchical graph whose nodes represent parts and whose edges encode pairwise relationships, then applies graph convolutional message passing so each part's representation incorporates context from others. Local feature extraction, global context, and structural dependencies are learned together inside one end-to-end network followed by a perceptron for final matching. Experiments on standard benchmarks are presented to show that the integrated approach yields more robust person representations than independent part processing.

Core claim

Given a person image, the framework first builds a hierarchical graph to represent pairwise relationships among different parts. Message passing on this graph performs both local and global feature learning by letting each node incorporate information from related nodes. A final perceptron layer produces part label predictions used for re-identification. The central claim is that this construction supplies a unified end-to-end solution that simultaneously learns local, global, and structural features, overcoming the independence assumption of prior part-based models.

What carries the argument

The part-based hierarchical graph whose nodes are image parts and whose edges carry relationship information; graph convolutional layers propagate messages across the graph to produce context-aware part features.

If this is right

  • Re-identification accuracy improves when part features are allowed to exchange information rather than being computed in isolation.
  • A single network can jointly optimize local detail, global appearance, and part structure without separate stages.
  • The same graph-based message passing can be applied at multiple levels of part granularity inside one model.
  • End-to-end training becomes feasible for models that previously required independent part detectors followed by separate fusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same message-passing idea could be tested on other vision tasks that decompose objects into parts, such as fine-grained recognition or pose-guided retrieval.
  • If the graph edges can be learned directly from data instead of predefined, the method might reduce reliance on explicit part detectors.
  • Different hierarchy depths or edge-weighting schemes could be compared to identify the minimal graph structure that still captures useful relationships.

Load-bearing premise

The hierarchical graph accurately encodes meaningful pairwise relationships among parts and message passing on it produces better features than processing parts independently.

What would settle it

An ablation that replaces the learned hierarchical graph with random edges or removes the graph entirely and measures whether re-identification accuracy on the same benchmarks stays the same or drops.

Figures

Figures reproduced from arXiv: 1907.08822 by Bin Luo, Bo Jiang, Xixi Wang.

Figure 1
Figure 1. Figure 1: Architecture of the proposed PH-GCN network for person Re-ID, which contains CNN based part feature extraction [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results of PH-GCN with different settings of parame [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of two variants (P-GCN, Ours-NoGCN) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

The person re-identification (Re-ID) task requires to robustly extract feature representations for person images. Recently, part-based representation models have been widely studied for extracting the more compact and robust feature representations for person images to improve person Re-ID results. However, existing part-based representation models mostly extract the features of different parts independently which ignore the relationship information between different parts. To overcome this limitation, in this paper we propose a novel deep learning framework, named Part-based Hierarchical Graph Convolutional Network (PH-GCN) for person Re-ID problem. Given a person image, PH-GCN first constructs a hierarchical graph to represent the pairwise relationships among different parts. Then, both local and global feature learning are performed by the messages passing in PH-GCN, which takes other nodes information into account for part feature representation. Finally, a perceptron layer is adopted for the final person part label prediction and re-identification. The proposed framework provides a general solution that integrates local, global and structural feature learning simultaneously in a unified end-to-end network. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed PH-GCN based Re-ID approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes PH-GCN, a part-based hierarchical graph convolutional network for person re-identification. It constructs a hierarchical graph to model pairwise relationships among image parts, performs message passing to integrate local and global features while accounting for structural relations, and applies a perceptron for final prediction. The central claim is that this provides a general end-to-end solution integrating local, global, and structural feature learning, with effectiveness shown via experiments on benchmark Re-ID datasets.

Significance. If the hierarchical graph construction and message passing are shown to be jointly optimized with the CNN backbone and to yield measurable gains over independent part processing, the work could advance part-based Re-ID by providing a principled way to incorporate relational structure. The unified network formulation is a potential strength if the structural component is not reducible to a fixed topology.

major comments (3)
  1. [§3] §3 (Graph Construction): The description of the hierarchical graph does not specify whether the adjacency matrix or edge weights are computed from fixed priors (e.g., spatial proximity or part-type rules) or are learned parameters updated during back-propagation. If the former, message passing operates over a static topology and the claim of simultaneous end-to-end structural feature learning is not supported.
  2. [§4] §4 (Loss and Training): No derivation or ablation isolates the contribution of the GCN message-passing step versus standard part-feature concatenation or pooling. Without controls that disable the graph while keeping the rest of the pipeline identical, it is unclear whether the reported gains on Market-1501 and DukeMTMC-reID are attributable to structural learning.
  3. [Table 2] Table 2 (Ablation results): The row comparing PH-GCN to its non-graph variant reports only aggregate mAP/Rank-1; the per-part feature similarity or message-passing ablation is missing, preventing verification that the hierarchical relations improve part representations beyond independent processing.
minor comments (2)
  1. [§3.1] Notation for the hierarchical levels (e.g., part nodes vs. super-nodes) is introduced without an explicit diagram or equation defining the node feature update rule.
  2. [Abstract] The abstract states the framework is 'parameter-free' in its integration claim, yet the perceptron layer and GCN weights are learned; this phrasing should be clarified.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point by point below and indicate planned revisions to improve clarity and experimental rigor.

read point-by-point responses
  1. Referee: [§3] §3 (Graph Construction): The description of the hierarchical graph does not specify whether the adjacency matrix or edge weights are computed from fixed priors (e.g., spatial proximity or part-type rules) or are learned parameters updated during back-propagation. If the former, message passing operates over a static topology and the claim of simultaneous end-to-end structural feature learning is not supported.

    Authors: The hierarchical graph is constructed using fixed priors based on spatial proximity and part-type rules from the part division. The adjacency matrix and edge weights are not learned parameters. End-to-end optimization applies to the feature representations via message passing integrated with the CNN backbone. We will revise §3 to explicitly state the static topology and adjust the claim from 'structural feature learning' to 'incorporating structural relations via message passing' to prevent overstatement. revision: yes

  2. Referee: [§4] §4 (Loss and Training): No derivation or ablation isolates the contribution of the GCN message-passing step versus standard part-feature concatenation or pooling. Without controls that disable the graph while keeping the rest of the pipeline identical, it is unclear whether the reported gains on Market-1501 and DukeMTMC-reID are attributable to structural learning.

    Authors: We agree that isolating the message-passing contribution would strengthen the claims. Table 2 already includes a non-graph variant, but we will add a dedicated ablation that disables only the GCN message passing (while retaining identical part extraction, concatenation, and training) to quantify its specific effect on the reported gains. revision: yes

  3. Referee: [Table 2] Table 2 (Ablation results): The row comparing PH-GCN to its non-graph variant reports only aggregate mAP/Rank-1; the per-part feature similarity or message-passing ablation is missing, preventing verification that the hierarchical relations improve part representations beyond independent processing.

    Authors: The existing ablation uses standard aggregate Re-ID metrics. We will expand the experimental section with per-part similarity metrics and an explicit message-passing ablation to demonstrate improvements from hierarchical relations over independent part processing. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture proposal does not reduce to input definitions or self-citations

full rationale

The paper introduces PH-GCN as a new end-to-end network that constructs a hierarchical graph over parts and performs message passing. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. The central integration of local/global/structural features is presented as an architectural contribution rather than a derived equivalence to its own inputs. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond the standard assumption that a graph can be meaningfully constructed from image parts.

pith-pipeline@v0.9.0 · 5730 in / 1031 out tokens · 19483 ms · 2026-05-24T18:51:16.053525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 8 internal anchors

  1. [1]

    Person re- identification by multi-channel parts-based cnn with improved triplet loss function,

    D. Cheng, Y . Gong, S. Zhou, J. Wang, and N. Zheng, “Person re- identification by multi-channel parts-based cnn with improved triplet loss function,” in Computer Vision and Pattern Recognition , 2016

  2. [2]

    Dual attention matching network for context-aware feature sequence based person re-identification,

    J. Si, H. Zhang, C.-G. Li, J. Kuen, X. Kong, A. C. Kot, and G. Wang, “Dual attention matching network for context-aware feature sequence based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5363–5372

  3. [3]

    Svdnet for pedestrian retrieval,

    Y . Sun, L. Zheng, W. Deng, and S. Wang, “Svdnet for pedestrian retrieval,” in IEEE International Conference on Computer Vision , 2017

  4. [4]

    Deep group- shuffling random walk for person re-identification,

    Y . Shen, H. Li, T. Xiao, S. Yi, D. Chen, and X. Wang, “Deep group- shuffling random walk for person re-identification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 2265–2274

  5. [5]

    Group consistent simi- larity learning via deep crf for person re-identification,

    D. Chen, D. Xu, H. Li, N. Sebe, and X. Wang, “Group consistent simi- larity learning via deep crf for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 8649–8658

  6. [6]

    Harmonious attention network for person re-identification,

    W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 2285–2294

  7. [7]

    Video person re- identification with competitive snippet-similarity aggregation and co- attentive snippet embedding,

    D. Chen, H. Li, T. Xiao, S. Yi, and X. Wang, “Video person re- identification with competitive snippet-similarity aggregation and co- attentive snippet embedding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018

  8. [8]

    Multi-region bilinear con- volutional neural networks for person re-identification,

    E. Ustinova, Y . Ganin, and V . Lempitsky, “Multi-region bilinear con- volutional neural networks for person re-identification,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017, pp. 1–6

  9. [9]

    Pose-driven deep convolutional model for person re-identification,

    C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identification,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3960–3969

  10. [10]

    Person re-identification by deep learning attribute-complementary information,

    A. Schumann and R. Stiefelhagen, “Person re-identification by deep learning attribute-complementary information,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work- shops, 2017, pp. 20–28

  11. [11]

    Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),

    Y . Sun, L. Zheng, Y . Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 480–496

  12. [12]

    Deeply-learned part-aligned representations for person re-identification,

    L. Zhao, X. Li, Y . Zhuang, and J. Wang, “Deeply-learned part-aligned representations for person re-identification,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

  13. [13]

    Part-aligned bilinear representations for person re-identification,

    Y . Suh, J. Wang, S. Tang, T. Mei, and K. M. Lee, “Part-aligned bilinear representations for person re-identification,” in Computer Vision–ECCV

  14. [14]

    Springer, 2018, pp. 418–437

  15. [15]

    Glad: Global-local- alignment descriptor for pedestrian retrieval,

    L. Wei, S. Zhang, H. Yao, W. Gao, and Q. Tian, “Glad: Global-local- alignment descriptor for pedestrian retrieval,” in Proceedings of the 25th ACM international conference on Multimedia . ACM, 2017, pp. 420– 428

  16. [16]

    Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training

    F. Zheng, X. Sun, X. Jiang, X. Guo, Z. Yu, and F. Huang, “A coarse-to- fine pyramidal model for person re-identification via multi-loss dynamic training,” arXiv preprint arXiv:1810.12193 , 2018

  17. [17]

    Learning Discriminative Features with Multiple Granularities for Person Re-Identification

    G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” arXiv preprint arXiv:1804.01438, 2018

  18. [18]

    Spectral Networks and Locally Connected Networks on Graphs

    J. Bruna, W. Zaremba, A. Szlam, and Y . LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013

  19. [19]

    Semi-Supervised Classification with Graph Convolutional Networks

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 , 2016

  20. [20]

    Learning convolutional neural networks for graphs,

    M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in International conference on machine learning , 2016, pp. 2014–2023

  21. [21]

    Convolutional neural networks on graphs with fast localized spectral filtering,

    M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in neural information processing systems , 2016, pp. 3844–3852

  22. [22]

    Spatial temporal graph convolutional networks for skeleton-based action recognition,

    S. Yan, Y . Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Thirty-Second AAAI Conference on Artificial Intelligence , 2018

  23. [23]

    Spectral networks and locally connected networks on graphs,

    J. Bruna, W. Zaremba, A. Szlam, and Y . LeCun, “Spectral networks and locally connected networks on graphs,” in International Conference on Learning Representations, 2014

  24. [24]

    Diffusion-convolutional neural networks,

    J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993– 2001

  25. [25]

    Geometric deep learning on graphs and manifolds using mixture model cnns,

    F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5423–5434

  26. [26]

    Graph Attention Networks

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y . Ben- gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017

  27. [27]

    Neural graph matching networks for fewshot 3d action recognition,

    M. Guo, E. Chou, D.-A. Huang, S. Song, S. Yeung, and L. Fei-Fei, “Neural graph matching networks for fewshot 3d action recognition,” in Proceedings of the European Conference on Computer Vision (ECCV) , 2018, pp. 653–669

  28. [28]

    3d graph neural networks for rgbd semantic segmentation,

    X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3d graph neural networks for rgbd semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 5199–5208

  29. [29]

    Person re-identification with deep similarity-guided graph neural network,

    Y . Shen, H. Li, S. Yi, D. Chen, and X. Wang, “Person re-identification with deep similarity-guided graph neural network,” in Proceedings of the European Conference on Computer Vision (ECCV) , 2018, pp. 486–504

  30. [30]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  31. [31]

    ImageNet Large Scale Visual Recognition Challenge

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” arXiv preprint arXiv:1409.0575 , 2014

  32. [32]

    A tutorial on the cross-entropy method,

    P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,” Annals of operations research , vol. 134, no. 1, pp. 19–67, 2005

  33. [33]

    Hydraplus-net: Attentive deep features for pedestrian analysis,

    X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

  34. [34]

    Re-ranking person re- identification with k-reciprocal encoding,

    Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re- identification with k-reciprocal encoding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

  35. [35]

    Person Re-Identification by Deep Joint Learning of Multi-Loss Classification

    W. Li, X. Zhu, and S. Gong, “Person re-identification by deep joint learning of multi-loss classification,” arXiv preprint arXiv:1705.04724 , 2017

  36. [36]

    Scalable person re-identification on supervised smoothed manifold,

    S. Bai, X. Bai, and Q. Tian, “Scalable person re-identification on supervised smoothed manifold,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 2530–2539

  37. [37]

    Person re-identification by deep learning multi-scale representations,

    Y . Chen, X. Zhu, and S. Gong, “Person re-identification by deep learning multi-scale representations,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2590–2600

  38. [38]

    Camera style adaptation for person re-identification,

    Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y . Yang, “Camera style adaptation for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5157–5166

  39. [39]

    A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,

    M. Saquib Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, “A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 420–429

  40. [40]

    Pose transferrable person re-identification,

    J. Liu, B. Ni, Y . Yan, P. Zhou, S. Cheng, and J. Hu, “Pose transferrable person re-identification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018

  41. [41]

    Person re-identification with cascaded pairwise convolutions,

    Y . Wang, Z. Chen, F. Wu, and G. Wang, “Person re-identification with cascaded pairwise convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 1470–1478

  42. [42]

    Mask-guided contrastive attention model for person re-identification,

    C. Song, Y . Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 1179–1188

  43. [43]

    Safenet: Scale-normalization and anchor-based feature extraction net- work for person re-identification

    K. Yuan, Q. Zhang, C. Huang, S. Xiang, C. Pan, and H. Robotics, “Safenet: Scale-normalization and anchor-based feature extraction net- work for person re-identification.” in IJCAI, 2018, pp. 1121–1127

  44. [44]

    Scalable person re-identification: A benchmark,

    L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in The IEEE International Conference on Computer Vision (ICCV) , December 2015. 7

  45. [45]

    Person re-identification by local maximal occurrence representation and metric learning,

    S. Liao, Y . Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 2197–2206

  46. [46]

    Unlabeled samples generated by gan improve the person re-identification baseline in vitro,

    Z. Zheng, L. Zheng, and Y . Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3754–3762

  47. [47]

    Re-ranking person re- identification with k-reciprocal encoding,

    Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re- identification with k-reciprocal encoding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1318–1327

  48. [48]

    Deepreid: Deep filter pairing neural network for person re-identification,

    W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in CVPR, 2014

  49. [49]

    Object detection with discriminatively trained part-based models,

    P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010

  50. [50]

    Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

    E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi, “Perfor- mance measures and a data set for multi-target, multi-camera tracking,” arXiv preprint arXiv:1609.01775 , 2016

  51. [51]

    Large-scale machine learning with stochastic gradient de- scent,

    L. Bottou, “Large-scale machine learning with stochastic gradient de- scent,” in Proceedings of COMPSTAT’2010, 2010, pp. 177–186