Semi-Supervised Graph Embedding for Multi-Label Graph Node Classification

Cangqi Zhou; Jing Zhang; Kaisheng Gao

arxiv: 1907.05743 · v1 · pith:Y2XYXH2Tnew · submitted 2019-07-12 · 💻 cs.LG · stat.ML

Semi-Supervised Graph Embedding for Multi-Label Graph Node Classification

Kaisheng Gao , Jing Zhang , Cangqi Zhou This is my paper

Pith reviewed 2026-05-24 22:15 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords graph convolution networkmulti-label classificationsemi-supervised learningskip-gram modelnode embeddinglabel correlationgraph node classification

0 comments

The pith

ML-GCN embeds labels and nodes in one vector space so a skip-gram model can learn their correlations for multi-label node classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ML-GCN to fix a gap in standard graph convolution networks for multi-label tasks. Usual GCN training only matches final outputs to ground-truth labels and therefore misses relationships among the labels themselves. ML-GCN first runs a GCN on node features and graph structure, then creates a random label matrix whose rows sit in the same dimension as the nodes before the last layer. Node vectors and these label vectors are concatenated and fed to a relaxed skip-gram model that pulls out both node-label and label-label patterns during training. Experiments on several graph datasets show the resulting model beats four earlier methods.

Core claim

The central claim is that randomly generating label vectors in the same space as pre-final-layer node embeddings, then training a relaxed skip-gram model on their concatenations, extracts usable node-label and label-label correlations that improve semi-supervised multi-label node classification beyond what cross-entropy alone achieves.

What carries the argument

Concatenation of GCN node vectors with randomly generated label vectors of matching dimension, followed by relaxed skip-gram training to detect correlations.

If this is right

Standard GCN cross-entropy loss can be augmented with explicit correlation modeling without changing the underlying graph convolution layers.
Labels and nodes sharing a common embedding space allows the same skip-gram objective to handle both node-label and label-label relationships at once.
The approach applies to any multi-label graph dataset where label correlations exist beyond what independent label prediction captures.
Training remains semi-supervised because only the observed node labels are used while the generated label vectors serve as additional context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same concatenation-plus-skip-gram step could be inserted after other node embedding methods besides GCN, such as random-walk or attention-based encoders.
If label correlations change over time, the random label matrix would need to be regenerated or learned jointly rather than fixed at the start.
Performance on graphs with very sparse or very dense label co-occurrences could reveal whether the random initialization step remains stable.

Load-bearing premise

Randomly chosen label vectors will produce meaningful correlations when concatenated with node vectors and processed by skip-gram rather than fitting noise.

What would settle it

On a controlled synthetic graph whose label co-occurrence matrix is known exactly, measure whether accuracy gains scale directly with the strength of the planted label correlations.

read the original abstract

The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which tends to lose some useful information such as label correlations, so that prevents from obtaining high performance. In this paper, we pro-pose a novel GCN-based semi-supervised learning approach for multi-label classification, namely ML-GCN. ML-GCN first uses a GCN to embed the node features and graph topologic information. Then, it randomly generates a label matrix, where each row (i.e., label vector) represents a kind of labels. The dimension of the label vector is the same as that of the node vector before the last convolution operation of GCN. That is, all labels and nodes are embedded in a uniform vector space. Finally, during the ML-GCN model training, label vectors and node vectors are concatenated to serve as the inputs of the relaxed skip-gram model to detect the node-label correlation as well as the label-label correlation. Experimental results on several graph classification datasets show that the proposed ML-GCN outperforms four state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ML-GCN adds random label vectors and skip-gram to GCN for multi-label node classification, but the random construction makes it unclear whether gains come from real correlations or training artifacts.

read the letter

The paper's main move is to take a standard GCN node embedding, generate a random label matrix whose rows match the embedding dimension, concatenate the two, and run a relaxed skip-gram objective on the pairs. This is meant to recover node-label and label-label correlations that plain cross-entropy on the final layer would miss. That specific combination does not appear in the cited prior work, so the pipeline counts as new on its face. It also directly targets a known shortcoming in multi-label GCN settings where label dependencies matter for social or biological graphs. Credit for spelling out the construction clearly in the abstract and for testing against four existing methods on multiple datasets. The experiments are presented as showing consistent gains, which is the kind of concrete result that can be checked later. The soft spot is exactly the one the stress-test flags. Nothing in the label-matrix step ties the random vectors to observed label co-occurrence or node features, so the skip-gram step has no obvious reason to surface genuine correlations instead of whatever patterns the random draw and negative sampling happen to produce. If the gains survive ablation on the random matrix or on different initializations, that would strengthen the claim; without that, the performance edge could be fragile. The paper is aimed at people already running GCN variants on multi-label node tasks who are looking for a lightweight way to add correlation modeling. A reader who needs a drop-in method to try on their own graphs could get value from the description even if the theory is thin. It is coherent enough on its own terms to deserve a serious referee rather than a desk reject, though any review would need to press on the justification for the random matrix and on the experimental controls. I would send it out for review but would not cite it myself until the random-matrix issue is addressed with more evidence.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ML-GCN for semi-supervised multi-label node classification on graphs. A GCN first embeds node features and topology. A label matrix is then randomly generated such that each row (label vector) has the same dimension as the node vectors immediately before the final GCN convolution. Node and label vectors are concatenated and fed to a relaxed skip-gram model whose objective is intended to capture node-label and label-label correlations. The abstract states that experimental results on several graph classification datasets show ML-GCN outperforming four state-of-the-art methods.

Significance. If the reported gains are reproducible and arise from genuine correlation modeling rather than artifacts of the random initialization, the construction would provide a distinctive mechanism for injecting label dependencies into GCN pipelines. The use of skip-gram on concatenated embeddings is an original modeling choice, but the absence of any link between the randomly drawn label vectors and observed label statistics leaves the source of any performance improvement unclear.

major comments (2)

[Abstract] Abstract: the central empirical claim (outperformance over four SOTA methods) is presented without any reference to the datasets, baseline implementations, evaluation protocol, number of runs, or statistical tests, rendering the support for the primary result impossible to assess.
[Method description] Method description (paragraph beginning 'ML-GCN first uses a GCN...'): the label matrix is generated randomly with no dependence on label co-occurrence counts or node features; the relaxed skip-gram objective on the concatenated vectors therefore has no explicit pathway to recover semantically meaningful node-label or label-label correlations and may instead fit spurious patterns induced by the random draw and the choice of skip-gram hyperparameters.

minor comments (1)

[Abstract] Abstract contains the typographical error 'pro-pose'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim (outperformance over four SOTA methods) is presented without any reference to the datasets, baseline implementations, evaluation protocol, number of runs, or statistical tests, rendering the support for the primary result impossible to assess.

Authors: We agree that the abstract lacks sufficient details on the experimental setup. In the revised version, we will update the abstract to include references to the specific datasets used, the four state-of-the-art baseline methods, the evaluation protocol, the number of runs, and any statistical tests performed. This will make the empirical claims more assessable. revision: yes
Referee: [Method description] Method description (paragraph beginning 'ML-GCN first uses a GCN...'): the label matrix is generated randomly with no dependence on label co-occurrence counts or node features; the relaxed skip-gram objective on the concatenated vectors therefore has no explicit pathway to recover semantically meaningful node-label or label-label correlations and may instead fit spurious patterns induced by the random draw and the choice of skip-gram hyperparameters.

Authors: The label vectors are randomly generated to embed all labels into the same vector space as the node embeddings produced by the GCN. The relaxed skip-gram objective is then applied to the concatenated node-label vectors to model both node-label and label-label correlations based on the observed data during training. While the initialization is random and does not explicitly incorporate co-occurrence counts, the training process optimizes the embeddings to reflect the correlations present in the multi-label annotations. We will revise the method description to better explain how the skip-gram objective captures these correlations and discuss the role of the random initialization. revision: partial

Circularity Check

0 steps flagged

No circularity; modeling choice is independent of claimed outputs

full rationale

The paper describes a construction (GCN node embeddings concatenated with randomly generated label vectors of matching dimension, then trained via relaxed skip-gram) and reports empirical outperformance on datasets. No equations, self-citations, or fitted parameters are shown that reduce the extracted node-label or label-label correlations to the inputs by definition. The random label matrix and skip-gram objective are presented as an explicit modeling decision whose validity is tested externally via experiments rather than derived tautologically. This matches the default expectation of a non-circular proposal.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the effectiveness of a newly introduced random label matrix and the skip-gram training step on concatenated vectors; these elements are postulated rather than derived from prior results.

free parameters (2)

label vector dimension
Chosen to equal the node vector dimension before the final convolution layer
skip-gram model hyperparameters
Used to detect correlations but not specified in the abstract

axioms (2)

domain assumption A GCN produces useful embeddings of node features and graph topology
Invoked as the first stage of the pipeline
ad hoc to paper Concatenating node and label vectors and feeding them to skip-gram extracts meaningful correlations
This is the core training mechanism introduced by the paper

invented entities (1)

randomly generated label matrix no independent evidence
purpose: To place labels in the same vector space as nodes so that skip-gram can operate on both
Created as part of the ML-GCN procedure

pith-pipeline@v0.9.0 · 5768 in / 1329 out tokens · 32043 ms · 2026-05-24T22:15:57.324113+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 9 internal anchors

[1]

Distributed large -scale natural graph factoriza tion

Ahmed, Amr, et al. "Distributed large -scale natural graph factoriza tion." Proceedings of the 22nd international conference on World Wide Web. ACM, 2013

work page 2013
[2]

Line: Large -scale information network embedding

Tang, Jian, et al. "Line: Large -scale information network embedding." Proceedings of the 24th international conference on world wide web. International World Wide Web Con fer- ences Steering Committee, 2015

work page 2015
[3]

3: Online learning of social represen- tations

Perozzi, Bryan, Rami Al -Rfou, and Steven Skiena. " 3: Online learning of social represen- tations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014

work page 2014
[4]

Efficient Estimation of Word Representations in Vector Space

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Inductive representation learning on large graphs

Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in Neural Information Processing Systems. 2017

work page 2017
[7]

Semi-Supervised Classification with Graph Convolutional Networks

Kipf, Thomas N., and Max Welling. "Semi -supervised classification with graph convolu- tional networks." arXiv preprint arXiv:1609.02907 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

Marcheggiani, Diego, and Ivan Titov. "Encoding sentences with graph convolutional net- works for semantic role labeling." arXiv preprint arXiv:1703.04826 (2017). Dataset Method 10% 20% 30% 40% Facebook GCN [7] 57.25 58.45 59.95 60.05 ML-GCN 58.13 59.63 60.14 60.98 Yeast GCN [7] 61.23 62.45 62.73 63.68 ML-GCN 63.03 64.54 64.04 65.77 Movie GCN [7] 36.82 37....

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Graph convolutional networks with argument - aware pooling for event detection

Nguyen, Thien Huu, and Ralph Grishman. "Graph convolutional networks with argument - aware pooling for event detection." Thirty -Second AAAI Conference on Artificial Intelli- gence. 2018

work page 2018
[10]

Graph convolutional neural networks for web -scale recommender sys- tems

Ying, Rex, et al. "Graph convolutional neural networks for web -scale recommender sys- tems." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018

work page 2018
[11]

A comprehensive survey on graph neural networks

Wu, Zonghan, et al. "A comprehensive survey on graph neural networks." arXiv preprint arXiv:1901.00596 (2019)

work page arXiv 1901
[12]

Graph Attention Networks

Veličković, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Variational Graph Auto-Encoders

Kipf, Thomas N., and Max Welling. "Variational graph auto -encoders." arXiv preprint arXiv:1611.07308 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

You, Jiaxuan, et al. "Graphrnn: A deep generative model for graphs." arXiv preprint arXiv:1802.08773 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Spatial temporal graph convolutional net- works for skeleton -based action recognition

Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional net- works for skeleton -based action recognition." Thirty -Second AAAI Conference on Artifi- cial Intelligence. 2018

work page 2018
[16]

Neural message passing for quantum chemistry

Gilmer, Justin, et al. "Neural message passing for quantum chemistry." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017

work page 2017
[17]

The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains

Shuman, David I., et al. "The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains." arXiv preprint arXiv:1211.0053 (2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012
[18]

Deeper insights into graph convolutional networks for semi-supervised learning

Li, Qimai, Zhichao Han, and Xiao -Ming Wu. "Deeper insights into graph convolutional networks for semi-supervised learning." Thirty-Second AAAI Conference on Artificial In- telligence. 2018

work page 2018
[19]

A signal processing approach to fair surface design

Taubin, Gabriel. "A signal processing approach to fair surface design." Procee dings of the 22nd annual conference on Computer graphics and interactive techniques. ACM, 1995

work page 1995
[20]

Distributed representations of words and phrases and their compo- sitionality

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compo- sitionality." Advances in neural information processing systems. 2013

work page 2013
[21]

McAuley and J

J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012

work page 2012
[22]

Adam: A Method for Stochastic Optimization

Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

Deep Convolutional Ranking for Multilabel Image Annotation

Gong, Yunchao, et al. "Deep convolutional r anking for multilabel image annotation." arXiv preprint arXiv:1312.4894 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[24]

Cnn -rnn: A unified framework for multi -label image classification

Wang, Jiang, et al. "Cnn -rnn: A unified framework for multi -label image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016
[25]

Multi -label image recognition by recurrently discovering attention- al regions

Wang, Zhouxia, et al. "Multi -label image recognition by recurrently discovering attention- al regions." Proceedings of the IEEE International Conference on Computer Vision. 2017

work page 2017
[26]

Collective classification in network data

Sen, Prithviraj, et al. "Collective classification in network data." AI magazin e 29.3 (2008): 93-93

work page 2008

[1] [1]

Distributed large -scale natural graph factoriza tion

Ahmed, Amr, et al. "Distributed large -scale natural graph factoriza tion." Proceedings of the 22nd international conference on World Wide Web. ACM, 2013

work page 2013

[2] [2]

Line: Large -scale information network embedding

Tang, Jian, et al. "Line: Large -scale information network embedding." Proceedings of the 24th international conference on world wide web. International World Wide Web Con fer- ences Steering Committee, 2015

work page 2015

[3] [3]

3: Online learning of social represen- tations

Perozzi, Bryan, Rami Al -Rfou, and Steven Skiena. " 3: Online learning of social represen- tations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014

work page 2014

[4] [4]

Efficient Estimation of Word Representations in Vector Space

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Inductive representation learning on large graphs

Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in Neural Information Processing Systems. 2017

work page 2017

[6] [7]

Semi-Supervised Classification with Graph Convolutional Networks

Kipf, Thomas N., and Max Welling. "Semi -supervised classification with graph convolu- tional networks." arXiv preprint arXiv:1609.02907 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [8]

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

Marcheggiani, Diego, and Ivan Titov. "Encoding sentences with graph convolutional net- works for semantic role labeling." arXiv preprint arXiv:1703.04826 (2017). Dataset Method 10% 20% 30% 40% Facebook GCN [7] 57.25 58.45 59.95 60.05 ML-GCN 58.13 59.63 60.14 60.98 Yeast GCN [7] 61.23 62.45 62.73 63.68 ML-GCN 63.03 64.54 64.04 65.77 Movie GCN [7] 36.82 37....

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [9]

Graph convolutional networks with argument - aware pooling for event detection

Nguyen, Thien Huu, and Ralph Grishman. "Graph convolutional networks with argument - aware pooling for event detection." Thirty -Second AAAI Conference on Artificial Intelli- gence. 2018

work page 2018

[9] [10]

Graph convolutional neural networks for web -scale recommender sys- tems

Ying, Rex, et al. "Graph convolutional neural networks for web -scale recommender sys- tems." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018

work page 2018

[10] [11]

A comprehensive survey on graph neural networks

Wu, Zonghan, et al. "A comprehensive survey on graph neural networks." arXiv preprint arXiv:1901.00596 (2019)

work page arXiv 1901

[11] [12]

Graph Attention Networks

Veličković, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [13]

Variational Graph Auto-Encoders

Kipf, Thomas N., and Max Welling. "Variational graph auto -encoders." arXiv preprint arXiv:1611.07308 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [14]

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

You, Jiaxuan, et al. "Graphrnn: A deep generative model for graphs." arXiv preprint arXiv:1802.08773 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [15]

Spatial temporal graph convolutional net- works for skeleton -based action recognition

Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional net- works for skeleton -based action recognition." Thirty -Second AAAI Conference on Artifi- cial Intelligence. 2018

work page 2018

[15] [16]

Neural message passing for quantum chemistry

Gilmer, Justin, et al. "Neural message passing for quantum chemistry." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017

work page 2017

[16] [17]

The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains

Shuman, David I., et al. "The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains." arXiv preprint arXiv:1211.0053 (2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012

[17] [18]

Deeper insights into graph convolutional networks for semi-supervised learning

Li, Qimai, Zhichao Han, and Xiao -Ming Wu. "Deeper insights into graph convolutional networks for semi-supervised learning." Thirty-Second AAAI Conference on Artificial In- telligence. 2018

work page 2018

[18] [19]

A signal processing approach to fair surface design

Taubin, Gabriel. "A signal processing approach to fair surface design." Procee dings of the 22nd annual conference on Computer graphics and interactive techniques. ACM, 1995

work page 1995

[19] [20]

Distributed representations of words and phrases and their compo- sitionality

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compo- sitionality." Advances in neural information processing systems. 2013

work page 2013

[20] [21]

McAuley and J

J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012

work page 2012

[21] [22]

Adam: A Method for Stochastic Optimization

Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [23]

Deep Convolutional Ranking for Multilabel Image Annotation

Gong, Yunchao, et al. "Deep convolutional r anking for multilabel image annotation." arXiv preprint arXiv:1312.4894 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[23] [24]

Cnn -rnn: A unified framework for multi -label image classification

Wang, Jiang, et al. "Cnn -rnn: A unified framework for multi -label image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016

[24] [25]

Multi -label image recognition by recurrently discovering attention- al regions

Wang, Zhouxia, et al. "Multi -label image recognition by recurrently discovering attention- al regions." Proceedings of the IEEE International Conference on Computer Vision. 2017

work page 2017

[25] [26]

Collective classification in network data

Sen, Prithviraj, et al. "Collective classification in network data." AI magazin e 29.3 (2008): 93-93

work page 2008