pith. sign in

arxiv: 1907.09000 · v1 · pith:NL253WI5new · submitted 2019-07-21 · 💻 cs.CV · cs.LG

Image Classification with Hierarchical Multigraph Networks

Pith reviewed 2026-05-24 18:32 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords graph convolutional networksimage classificationsuperpixelsmultigraph networkshierarchical graphsMNISTCIFAR-10PASCAL
0
0 comments X

The pith

Hierarchical multigraph networks built from superpixels let GCNs match or exceed CNN accuracy on image classification tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that graph convolutional networks can be adapted for image classification by representing images as hierarchical multigraphs whose nodes are superpixels and whose edges encode multiple relations at different scales. This design exploits GCNs' natural handling of irregular inputs and multirelational structure, properties that standard CNNs do not directly encode. The authors identify concrete design choices that allow these networks to reach classification performance comparable to or better than CNNs on the MNIST, CIFAR-10, and PASCAL datasets. A reader would care because the approach replaces the rigid pixel grid with a more flexible graph representation that could lower computational cost while preserving accuracy. The central demonstration is therefore that domain knowledge for vision can be injected into GCNs through careful graph construction rather than through hardcoded convolutional filters.

Core claim

By constructing hierarchical multigraphs from superpixels of images, where edges represent multiple relations, and applying graph convolutional networks, it is possible to perform image classification at accuracies that match or exceed those of convolutional neural networks on datasets including MNIST, CIFAR-10, and PASCAL.

What carries the argument

Hierarchical multigraph networks, in which image superpixels become nodes connected by multiple edge types across resolution levels, carrying the spatial and relational information needed for classification.

If this is right

  • GCNs can operate directly on irregular image representations without requiring a fixed rectangular grid.
  • Multiple edge relations let the model capture distinct kinds of pixel or region interactions in one forward pass.
  • Hierarchical construction reduces the number of nodes relative to raw pixels and thereby lowers memory and compute demands.
  • Best-practice design choices for superpixel graphs and edge types transfer across the three evaluated datasets.
  • The same graph-construction recipe can be applied to other vision tasks that benefit from irregular or multirelational inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same superpixel multigraph approach might be tested on video or 3-D data where the underlying structure is already non-grid.
  • If superpixel quality varies with image content, an adaptive segmentation step could become necessary for consistent performance.
  • Because the model never sees the raw pixel lattice, it offers a natural testbed for measuring how much translation invariance is truly required for a given dataset.
  • Scaling the hierarchy to deeper levels or larger images would reveal whether the accuracy gains persist or saturate.

Load-bearing premise

Superpixel graphs together with the chosen multirelational edges preserve enough spatial layout and semantic content from the original pixel image to support high-accuracy classification.

What would settle it

Running the proposed multigraph model on MNIST, CIFAR-10, and PASCAL and finding that its accuracy falls below a standard CNN baseline on every dataset would disprove the claim of comparability or superiority.

Figures

Figures reproduced from arXiv: 1907.09000 by Boris Knyazev, Graham W. Taylor, Mohamed R. Amer, Xiao Lin.

Figure 1
Figure 1. Figure 1: Examples of the original images (a-c), defined on a regular grid, and their superpixel repre￾sentations (d-f) for MNIST (a,d), CIFAR-10 (b,e) and PASCAL (c,f); N is the number of superpixels (nodes in our graphs). GCNs can learn both from images and superpixels due to their flexibility, whereas standard CNNs can learn only from images defined on a regular grid (a-c). The challenge of generalizing convoluti… view at source ↗
Figure 2
Figure 2. Figure 2: An example of the “PC” relation type fusion based on a trainable projection (Eq. 3). We first project features onto a common multirelational space, where we fuse them using a fusion operator, such as summation or concatenation. In this work, relation types 1 and 2 can denote spatial and hierarchical (or learned) edges. We also allow for three or more relation types. Convolution is an essential computationa… view at source ↗
Figure 3
Figure 3. Figure 3: (top) We compute superpixels at several scales and combine all of them into a single set. (bottom) We then build a graph, where each node corresponds to a superpixel from this set and has fea￾tures, such as mean RGB color and coordinates of the centres of masses. Using Eq. 4 and 7, we compute spatial (a) and hierarchical (c) edges. Nodes 0 to 300 correspond to the first level of the hierarchy (first scale … view at source ↗
Figure 4
Figure 4. Figure 4: Image classification pipeline using our model. Each m th graph convolutional layer in our model takes the graph Gm = (Vm,E (r) ) and returns a graph with the same nodes and edges. Node features become increasingly global after each subsequent layer as the receptive field increases, while edges are propagated without changes. As a result, after several graph convolutional layers, each node in the graph cont… view at source ↗
Figure 6
Figure 6. Figure 6: (a) Number of trainable parameters, # params, in a graph convolutional layer as a func￾tion of the number of relations, R. Fusion methods based on trainable projections, including those proposed in our work, have “# params” comparable to the baseline concatenation method while being more powerful in terms of classification (see Tables 1 and 2). (b) Comparison of single edge types, where learned and hierarc… view at source ↗
read the original abstract

Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is the ability to work on irregular inputs, such as superpixels of images. This could significantly reduce the computational cost of image reasoning tasks. Another key advantage inherent to GCNs is the natural ability to model multirelational data. Building upon these two promising properties, in this work, we show best practices for designing GCNs for image classification; in some cases even outperforming CNNs on the MNIST, CIFAR-10 and PASCAL image datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes hierarchical multigraph networks (HMGNs) for image classification. Images are segmented into superpixels forming graphs with multiple relation types; GCN layers operate on these irregular structures to perform classification. The central claim is that this approach incorporates domain knowledge via multirelational edges and can, in some cases, outperform standard CNNs on MNIST, CIFAR-10, and PASCAL while reducing computational cost.

Significance. If the empirical results are rigorously validated with proper controls and baselines, the work would indicate that GCNs on superpixel multigraphs can encode sufficient spatial and semantic information to compete with translation-equivariant CNN filters on standard vision benchmarks. This could support more efficient irregular-domain models for image tasks.

major comments (2)
  1. [Abstract] Abstract: The headline claim of outperforming CNNs on MNIST, CIFAR-10, and PASCAL supplies no numerical results, baselines, error bars, or experimental controls. This absence makes the central empirical assertion unevaluable and directly undermines assessment of whether the multigraph construction preserves the information CNNs exploit.
  2. [Abstract] The assumption that superpixel graphs plus multirelational edges preserve sufficient spatial and semantic information (stated in the abstract as addressing the lack of hardcoded CNN filters) is load-bearing for the outperformance claim. No explicit positional encoding or coordinate features are described to recover absolute layout lost when superpixel boundaries merge or split regions, and the irregular topology lacks the fixed-grid equivariance of CNN kernels.
minor comments (1)
  1. Notation for the multirelational edge types and hierarchical pooling steps should be defined more clearly with explicit equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate where revisions will be made to the abstract and related sections.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of outperforming CNNs on MNIST, CIFAR-10, and PASCAL supplies no numerical results, baselines, error bars, or experimental controls. This absence makes the central empirical assertion unevaluable and directly undermines assessment of whether the multigraph construction preserves the information CNNs exploit.

    Authors: We agree that the abstract should supply concrete numbers to make the claim evaluable. The revised abstract will include our reported accuracies (with standard deviations across runs) on MNIST, CIFAR-10 and PASCAL, together with the CNN baselines used in the experimental section. This will allow readers to assess the empirical support directly from the abstract. revision: yes

  2. Referee: [Abstract] The assumption that superpixel graphs plus multirelational edges preserve sufficient spatial and semantic information (stated in the abstract as addressing the lack of hardcoded CNN filters) is load-bearing for the outperformance claim. No explicit positional encoding or coordinate features are described to recover absolute layout lost when superpixel boundaries merge or split regions, and the irregular topology lacks the fixed-grid equivariance of CNN kernels.

    Authors: The manuscript constructs multiple edge relation types from superpixel adjacency and appearance features; these relations are intended to encode relative spatial layout without requiring a fixed grid. However, the current text does not describe explicit coordinate features for superpixel centroids. We will add a short clarification in the methods section on how the chosen relations capture positional information and will note the absence of explicit absolute coordinates as a point for potential future augmentation. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical design with no derivation chain or self-referential predictions

full rationale

The provided abstract and text describe an empirical method for applying GCNs to superpixel graphs for image classification, with performance evaluated on MNIST, CIFAR-10 and PASCAL. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear. Claims rest on experimental results rather than any mathematical reduction to inputs by construction. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; any such elements would require the full manuscript.

pith-pipeline@v0.9.0 · 5677 in / 956 out tokens · 19831 ms · 2026-05-24T18:32:07.545208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 4 internal anchors

  1. [1]

    Slic superpixels compared to state-of-the-art superpixel methods

    Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11):2274–2282, 2012

  2. [2]

    Contour detection and hi- erarchical image segmentation

    Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hi- erarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010

  3. [3]

    Relational inductive biases, deep learning, and graph networks

    Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zam- baldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018

  4. [4]

    Translating embeddings for modeling multi-relational data

    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pages 2787–2795, 2013

  5. [5]

    Ge- ometric deep learning: going beyond euclidean data

    Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Ge- ometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4): 18–42, 2017

  6. [6]

    Spectral networks and locally connected networks on graphs

    Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. InInternational Conference on Learning Representations (ICLR), 2014

  7. [7]

    Iterative visual reasoning beyond convo- lutions

    Xinlei Chen, Li-Jia Li, Li Fei-Fei, and Abhinav Gupta. Iterative visual reasoning beyond convo- lutions. In Proc. CVPR, 2018

  8. [8]

    Convolutional neural networks on graphs with fast localized spectral filtering

    Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016

  9. [9]

    Weighted graph cuts without eigenvectors a multilevel approach

    Inderjit S Dhillon, Yuqiang Guan, and Brian Kulis. Weighted graph cuts without eigenvectors a multilevel approach. IEEE transactions on pattern analysis and machine intelligence , 29(11), 2007

  10. [10]

    The pascal visual object classes (voc) challenge

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2): 303–338, 2010

  11. [11]

    Splinecnn: Fast geomet- ric deep learning with continuous b-spline kernels

    Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Müller. Splinecnn: Fast geomet- ric deep learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 869–877, 2018

  12. [12]

    Graph U-Net

    Hongyang Gao and Shuiwang Ji. Graph U-Net. In Proceedings of the 36th International Confer- ence on Machine Learning (ICML), 2019

  13. [13]

    Neural message passing for quantum chemistry

    Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1263–1272, 2017

  14. [14]

    Representation Learning on Graphs: Methods and Applications

    William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017. 12 B. KNY AZEV , X. LIN, M.R. AMER, G.W. TA YLOR: IMAGE CLASS., HIER. MULTIGRAPHS

  15. [15]

    Deep Convolutional Networks on Graph-Structured Data

    Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, 2015

  16. [16]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015

  17. [17]

    Graph-based isometry invariant representation learning

    Renata Khasanova and Pascal Frossard. Graph-based isometry invariant representation learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages 1847–1856. JMLR. org, 2017

  18. [18]

    Adam: A method for stochastic optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

  19. [19]

    Semi-supervised classification with graph convolutional net- works

    Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional net- works. In International Conference on Learning Representations (ICLR), 2017

  20. [20]

    Spectral multigraph net- works for discovering and fusing relationships in molecules

    Boris Knyazev, Xiao Lin, Mohamed R Amer, and Graham W Taylor. Spectral multigraph net- works for discovering and fusing relationships in molecules. In NeurIPS Workshop on Machine Learning for Molecules and Materials, 2018

  21. [21]

    On valid optimal assignment kernels and applications to graph classification

    Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. On valid optimal assignment kernels and applications to graph classification. In Advances in Neural Information Processing Systems, pages 1623–1631, 2016

  22. [22]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Cite- seer, 2009

  23. [23]

    Gradient-based learning applied to document recognition

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

  24. [24]

    Semantic object parsing with graph lstm

    Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, and Shuicheng Yan. Semantic object parsing with graph lstm. In European Conference on Computer Vision, pages 125–143. Springer, 2016

  25. [25]

    Visual relationship detection with language priors

    Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. Visual relationship detection with language priors. In European Conference on Computer Vision, 2016

  26. [26]

    Geometric deep learning on graphs and manifolds using mixture model cnns

    Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proc. CVPR, volume 1, page 3, 2017

  27. [27]

    Learning convolutional neural networks for graphs

    Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pages 2014–2023, 2016

  28. [28]

    Attribute-graph: A graph based approach to image ranking

    Nikita Prabhu and R Venkatesh Babu. Attribute-graph: A graph based approach to image ranking. In Proceedings of the IEEE International Conference on Computer Vision , pages 1071–1079, 2015

  29. [29]

    Modeling relational data with graph convolutional networks

    Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607. Springer, 2018

  30. [30]

    Weisfeiler-lehman graph kernels

    Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep): 2539–2561, 2011. B. KNY AZEV , X. LIN, M.R. AMER, G.W. TA YLOR: IMAGE CLASS., HIER. MULTIGRAPHS13

  31. [31]

    Dynamic edgeconditioned filters in convolutional neural networks on graphs

    Martin Simonovsky and Nikos Komodakis. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In Proc. CVPR, 2017

  32. [32]

    Graphvae: Towards generation of small graphs using variational autoencoders

    Martin Simonovsky and Nikos Komodakis. Graphvae: Towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pages 412–

  33. [33]

    Striving for Simplicity: The All Convolutional Net

    Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014

  34. [34]

    Dropout: a simple way to prevent neural networks from overfitting

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

  35. [35]

    Graph attention networks

    Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR), 2018

  36. [36]

    Deep graph infomax

    Petar Veli ˇckovi´c, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. In International Conference on Learning Representations (ICLR), 2019

  37. [37]

    How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019

    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019

  38. [38]

    Deep graph kernels

    Pinar Yanardag and SVN Vishwanathan. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1365–

  39. [39]

    Hierarchical graph representation learning with differentiable pooling

    Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pages 4805–4815, 2018