Multi-Label Node Classification with Label Influence Propagation

Bingsheng He; Bryan Hooi; Jia Chen; Rizal Fathony; Yang Yang; Yifei Sun; Zemin Liu

arxiv: 2607.00671 · v1 · pith:L3ATCFDOnew · submitted 2026-07-01 · 💻 cs.LG · cs.AI

Multi-Label Node Classification with Label Influence Propagation

Yifei Sun , Zemin Liu , Bryan Hooi , Yang Yang , Rizal Fathony , Jia Chen , Bingsheng He This is my paper

Pith reviewed 2026-07-02 15:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords multi-label node classificationgraph neural networkslabel influence graphmessage passing decompositionlabel correlationspropagationtransformationdynamic adjustment

0 comments

The pith

Decomposing GNN message passing into propagation and transformation yields a label influence graph that dynamically adjusts multi-label node classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that standard GNN approaches to multi-label node classification miss the detailed ways labels influence one another on graphs. By splitting message passing into separate propagation and transformation steps, the authors measure these label-to-label effects in each step and combine them into a single label influence graph. High-order effects are then passed along this graph so that positive label contributions are strengthened and negative ones are reduced while the model learns. Readers would care because multi-label nodes appear in protein networks, social platforms, and e-commerce graphs where better handling of label dependencies could raise prediction quality.

Core claim

We decompose the message passing process in GNNs into two operations: propagation and transformation. We then conduct a comprehensive analysis and quantification of the influence correlations between labels in each operation. Building on these insights, we propose a novel model, Label Influence Propagation (LIP). Specifically, we construct a label influence graph based on the integrated label correlations. Then, we propagate high-order influences through this graph, dynamically adjusting the learning process by amplifying labels with positive contributions and mitigating those with negative influence.

What carries the argument

Label influence graph, built from quantified correlations extracted from GNN propagation and transformation steps, that propagates high-order influences to adjust learning.

If this is right

The constructed label influence graph enables consistent outperformance of prior methods across multiple benchmark datasets and settings.
Dynamic amplification of positive label effects and mitigation of negative ones improves handling of intricate label influences on non-Euclidean graphs.
High-order influence propagation through the label graph supplies an integrated view of label correlations that standard co-occurrence or embedding methods miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition could be tested on other GNN architectures to see whether the resulting influence graphs transfer across message-passing variants.
If the influence graph proves stable under label noise, the method might be extended to settings with incomplete or partially observed labels.
Integrating the quantified correlations with existing label-embedding techniques could produce hybrid models that combine proximity and influence signals.

Load-bearing premise

Decomposing GNN message passing into propagation and transformation produces accurate label influence correlations that can be assembled into a useful graph for dynamic adjustment.

What would settle it

On the same benchmark datasets, a version of the model that omits the label influence graph and its dynamic adjustment step shows no accuracy gain over prior GNN baselines.

Figures

Figures reproduced from arXiv: 2607.00671 by Bingsheng He, Bryan Hooi, Jia Chen, Rizal Fathony, Yang Yang, Yifei Sun, Zemin Liu.

**Figure 2.** Figure 2: The overall framework of LIP: (a) quantify the influence correlations in P operation (Sec. 4.2) where the arrows with colors represents the mutual influence between labels; (b) quantify the influence correlations in T operation (Sec. 4.3) where Lbi separately calculates the loss of each label; (c) combine these two influence correlations to get the high-order label influence, and propagate through the con… view at source ↗

**Figure 3.** Figure 3: Illustration of the direction of positive and negative influence during P step. The nodes with both colors indicates the ones with both labels. Influence between label sets. Having computed the influence correlations magnitude INFP (vi , vj ) between any pair of nodes during the P phase, represented as an n×n node influence correlations matrix, the second part involves integrating the influence from all n… view at source ↗

**Figure 4.** Figure 4: Model Analysis showing the effectiveness of [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: More model analysis on different datasets. [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Changing hyper-parameters of α and β on datasets DBLP and PCG. E.6 EVALUATION UNDER INDUCTIVE SETTING Settings. In this section, we perform the inductive train/val/test split (6:2:2) and conducted experiments to validate the effectiveness in such setting. We used GraphSage as the backbone because it is naturally suited for the inductive setting. We also excluded certain baselines that are not suitable for… view at source ↗

**Figure 7.** Figure 7: Case study on node 197 from dataset DBLP. From the local structure of node 197, we can observe its connections to nodes 0, 18566, and 22, along with their respective node indices and ground truth labels, as shown in the figure. However, when GCN predicts the labels for node 197, the resulting predicted probabilities are shown in the bar chart. Based on the observations from [PITH_FULL_IMAGE:figures/full_… view at source ↗

read the original abstract

Graphs are a complex and versatile data structure used across various domains, with possibly multi-label nodes playing a particularly crucial role. Examples include proteins in PPI networks with multiple functions and users in social or e-commerce networks exhibiting diverse interests. Tackling multi-label node classification (MLNC) on graphs has led to the development of various approaches. Some methods leverage graph neural networks (GNNs) to exploit label co-occurrence correlations, while others incorporate label embeddings to capture label proximity. However, these approaches fail to account for the intricate influences between labels in non-Euclidean graph data. To address this issue, we decompose the message passing process in GNNs into two operations: propagation and transformation. We then conduct a comprehensive analysis and quantification of the influence correlations between labels in each operation. Building on these insights, we propose a novel model, Label Influence Propagation (LIP). Specifically, we construct a label influence graph based on the integrated label correlations. Then, we propagate high-order influences through this graph, dynamically adjusting the learning process by amplifying labels with positive contributions and mitigating those with negative influence. Finally, our framework is evaluated on comprehensive benchmark datasets, consistently outperforming SOTA methods across various settings, demonstrating its effectiveness on MLNC tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LIP builds a label influence graph from decomposed GNN operations for dynamic adjustment in multi-label node classification, but the abstract supplies no equations or results to check if the quantification step adds anything real.

read the letter

The paper's main move is to split GNN message passing into propagation and transformation, quantify positive and negative label influences in each, then build a separate label influence graph that propagates high-order effects and amplifies or mitigates labels on the fly. That construction is the concrete technical step beyond the co-occurrence and embedding baselines cited in the abstract.

It does a reasonable job of naming a limitation in prior multi-label node classification work: standard approaches do not explicitly track how labels push or pull each other through the graph structure. The dynamic adjustment idea follows directly from that observation and could be useful in settings like protein networks or user-interest graphs where labels interact in non-obvious ways.

The soft spots are straightforward. The abstract asserts consistent outperformance on benchmarks yet shows no equations for the influence quantification, no ablation tables, no significance tests, and no error analysis. Without those, it is impossible to tell whether the derived correlations actually isolate effects that label co-occurrence methods already capture or whether the adjustment mechanism is doing the claimed work. The stress-test concern about the decomposition step landing on something more than a heuristic therefore remains open on the evidence given.

This is for people already working on graph ML for multi-label problems who want to try a correlation-aware propagation variant. A reader who needs reproducible details or strong experimental controls will not get much from the abstract alone. The paper deserves a serious referee because the proposed construction has enough specificity to be checked against the data and the math, even if the current write-up is thin.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes the Label Influence Propagation (LIP) framework for multi-label node classification on graphs. It decomposes GNN message passing into propagation and transformation operations to quantify label influence correlations (positive/negative), constructs a label influence graph for high-order propagation, and dynamically adjusts the learning process by amplifying positive contributions and mitigating negative influences. The framework is evaluated on benchmark datasets and claimed to consistently outperform SOTA methods.

Significance. If the central claims hold, the work could advance MLNC by providing an explicit mechanism to capture and propagate intricate label influences in non-Euclidean data beyond standard co-occurrence or embedding approaches, with relevance to applications such as protein function prediction in PPI networks. The decomposition-based quantification and dynamic adjustment are presented as novel, but their added value requires demonstration.

major comments (2)

[Abstract] Abstract: the load-bearing claim that 'decomposing the message passing process in GNNs into two operations: propagation and transformation' permits 'accurate quantification of the influence correlations between labels' which can then be integrated into an effective label influence graph for 'dynamic adjustment' is not supported by any equations, derivation details, or validation showing these correlations provide an advantage over label co-occurrence baselines; without this, the subsequent graph construction and adjustment steps offer no guaranteed improvement on MLNC tasks.
[Abstract] Abstract: the claim that the framework 'consistently outperforming SOTA methods across various settings' is presented without any quantitative results, ablation studies, statistical significance tests, or error analysis, preventing verification of whether the performance claim is supported by the data or undermined by post-hoc choices in the quantification step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each comment below, clarifying the support provided in the full manuscript while agreeing to targeted revisions for improved clarity and transparency.

read point-by-point responses

Referee: [Abstract] Abstract: the load-bearing claim that 'decomposing the message passing process in GNNs into two operations: propagation and transformation' permits 'accurate quantification of the influence correlations between labels' which can then be integrated into an effective label influence graph for 'dynamic adjustment' is not supported by any equations, derivation details, or validation showing these correlations provide an advantage over label co-occurrence baselines; without this, the subsequent graph construction and adjustment steps offer no guaranteed improvement on MLNC tasks.

Authors: The full manuscript details the decomposition of GNN message passing into propagation and transformation operations in Section 3, with explicit equations for quantifying positive and negative label influence correlations in each operation. Section 4 includes direct comparisons to label co-occurrence baselines, demonstrating the added value of the influence graph construction and dynamic adjustment. The abstract summarizes these contributions at a high level for brevity. We agree that the abstract could better signal the presence of these derivations and validations; we will revise it to reference the quantification approach and its distinction from co-occurrence methods. revision: yes
Referee: [Abstract] Abstract: the claim that the framework 'consistently outperforming SOTA methods across various settings' is presented without any quantitative results, ablation studies, statistical significance tests, or error analysis, preventing verification of whether the performance claim is supported by the data or undermined by post-hoc choices in the quantification step.

Authors: The abstract provides a high-level summary of the empirical findings, while the full manuscript presents quantitative results, ablation studies, statistical significance tests, and error analysis in Section 5 across multiple benchmarks and settings. We acknowledge that the abstract's performance claim would be stronger with a brief indication of the evaluation rigor. We will revise the abstract to note the comprehensive experimental validation supporting the outperformance. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation proceeds from explicit decomposition analysis to graph construction without reduction to inputs by construction.

full rationale

The provided abstract and description outline a chain that begins with a standard GNN message-passing decomposition into propagation and transformation, followed by an analysis to quantify label influence correlations (positive/negative), construction of a label influence graph, and subsequent high-order propagation with dynamic adjustment. No quoted equations, self-citations, or steps demonstrate a self-definitional loop (e.g., defining influences in terms of the final graph), a fitted parameter renamed as a prediction, or a load-bearing uniqueness result imported from the authors' prior work. The central claim rests on the empirical effectiveness of the constructed graph rather than tautological equivalence to the inputs, and performance evaluation is described as external benchmarking. The derivation is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that label influence correlations extracted from the decomposed GNN operations are stable and transferable across datasets; no free parameters, axioms, or invented entities are explicitly listed in the abstract.

pith-pipeline@v0.9.1-grok · 5765 in / 1226 out tokens · 20523 ms · 2026-07-02T15:48:13.555019+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 15 canonical work pages · 2 internal anchors

[1]

Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu

doi: 10.1109/ICDM.2019.00010. Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu. Correlation-aware graph convolutional networks for multi-label node classification. arXiv preprint arXiv:2411.17350,

work page doi:10.1109/icdm.2019.00010 2019
[2]

Adaptive universal generalized pagerank graph neural network.arXiv preprint arXiv:2006.07988,

Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network.arXiv preprint arXiv:2006.07988,

work page arXiv 2006
[3]

On the equivalence of decoupled graph convolution network and label propagation

Hande Dong, Jiawei Chen, Fuli Feng, Xiangnan He, Shuxian Bi, Zhaolin Ding, and Peng Cui. On the equivalence of decoupled graph convolution network and label propagation. InProceedings of the Web Conference 2021, pp. 3651–3662,

2021
[4]

Lightgcn: Simplifying and powering graph convolution network for recommendation

11 Published as a conference paper at ICLR 2025 Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648,

2025
[5]

Semi-supervised learning with graph learning-convolutional networks.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Bo Jiang, Ziyan Zhang, Doudou Lin, Jin Tang, and Bin Luo. Semi-supervised learning with graph learning-convolutional networks.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11305–11312,

2019
[6]

Variational Graph Auto-Encoders

Thomas N Kipf and Max Welling. Variational graph auto-encoders. InArXiv, volume abs/1611.07308,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

A survey of imbalanced learning on graphs: Problems, techniques, and future directions.arXiv preprint arXiv:2308.13821,

Zemin Liu, Yuan Li, Nan Chen, Qian Wang, Bryan Hooi, and Bingsheng He. A survey of imbalanced learning on graphs: Problems, techniques, and future directions.arXiv preprint arXiv:2308.13821,

work page arXiv
[8]

URL http://dx.doi.org/10.1145/2623330.2623732

doi: 10.1145/2623330.2623732. URL http://dx.doi.org/10.1145/2623330.2623732. Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training.arXiv preprint arXiv:2006.09963,

work page doi:10.1145/2623330.2623732 2006
[9]

ISBN 978-3-642-04174-7

Springer Berlin Heidelberg. ISBN 978-3-642-04174-7. 12 Published as a conference paper at ICLR 2025 Min Shi, Yufei Tang, and Xingquan Zhu. Mlne: Multi-label network embedding.IEEE Transactions on Neural Networks and Learning Systems, 31(9):3682–3695, 2020a. doi: 10.1109/TNNLS.2019. 2945869. Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang...

work page doi:10.1109/tnnls.2019 2025
[10]

ISBN 9781450384469

Associa- tion for Computing Machinery. ISBN 9781450384469. doi: 10.1145/3459637.3482391. URL https://doi.org/10.1145/3459637.3482391. Yifei Sun, Haoran Deng, Yang Yang, Chunping Wang, Jiarong Xu, Renhong Huang, Linfeng Cao, Yang Wang, and Lei Chen. Beyond homophily: Structure-aware path aggregation graph neural network. In Lud De Raedt (ed.),Proceedings o...

work page doi:10.1145/3459637.3482391
[11]

URLhttps://doi.org/ 10.24963/ijcai.2022/310

doi: 10.24963/ijcai.2022/310. URLhttps://doi.org/ 10.24963/ijcai.2022/310. Main Track. Yifei Sun, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, and Lei Chen. Fine- tuning graph neural networks by preserving graph generative patterns. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 9053–9061,

work page doi:10.24963/ijcai.2022/310 2022
[12]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 5eaafd67434a4cfb1cf829722c65f184-Paper-Datasets_and_Benchmarks. pdf. Grigorios Tsoumakas, Ioannis Manousos Katakis, and Ioannis P. Vlahavas. Mining multi-label data. InData Mining and Knowledge Discovery Handbook,

2023
[13]

Graph Attention Networks

URLhttps://api. semanticscholar.org/CorpusID:22998. Petar Velickovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Unifying graph convolutional neural networks and label propa- gation.arXiv preprint arXiv:2002.06755,

Hongwei Wang and Jure Leskovec. Unifying graph convolutional neural networks and label propa- gation.arXiv preprint arXiv:2002.06755,

work page arXiv 2002
[15]

A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems, 32(1):4–24,

13 Published as a conference paper at ICLR 2025 Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems, 32(1):4–24,

2025
[16]

doi: https://doi.org/10.1016/j.ins.2021.12.130

ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2021.12.130. URLhttps: //www.sciencedirect.com/science/article/pii/S0020025522000111. Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. InInternational conference on machine learning, pp....

work page doi:10.1016/j.ins.2021.12.130 2021
[17]

Ml-knn: A lazy learning approach to multi-label learning

Min-Ling Zhang and Zhi-Hua Zhou. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn., 40(7):2038–2048, jul

2038
[18]

doi: 10.1016/j.patcog.2006.12

ISSN 0031-3203. doi: 10.1016/j.patcog.2006.12

work page doi:10.1016/j.patcog.2006.12 2006
[19]

Min-Ling Zhang and Zhi-Hua Zhou

URLhttps://doi.org/10.1016/j.patcog.2006.12.019. Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms.IEEE transactions on knowledge and data engineering, 26(8):1819–1837,

work page doi:10.1016/j.patcog.2006.12.019 2006
[20]

Node dependent local smoothing for scalable graph learning.Advances in Neural Information Processing Systems, 34:20321–20332, 2021a

Wentao Zhang, Mingyu Yang, Zeang Sheng, Yang Li, Wen Ouyang, Yangyu Tao, Zhi Yang, and Bin Cui. Node dependent local smoothing for scalable graph learning.Advances in Neural Information Processing Systems, 34:20321–20332, 2021a. Wentao Zhang, Zeang Sheng, Ziqi Yin, Yuezihan Jiang, Yikuan Xia, Jun Gao, Zhi Yang, and Bin Cui. Model degradation hinders deep ...

work page arXiv
[21]

URLhttps://openreview.net/forum?id=EZhkV2BjDP

ISSN 2835-8856. URLhttps://openreview.net/forum?id=EZhkV2BjDP. 14 Published as a conference paper at ICLR 2025 Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu, and Victor S. Sheng. Multi- label graph node classification with label attentive neighborhood convolution.Expert Systems with Applications, 180:115063,

2025
[22]

doi: https://doi.org/10.1016/j.eswa

ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa. 2021.115063. URLhttps://www.sciencedirect.com/science/article/pii/ S0957417421005042. Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs.Advances in neural information processing systems,...

work page doi:10.1016/j.eswa 2021
[23]

15 Published as a conference paper at ICLR 2025 APPENDIX A DISCUSSIONS ONINFLUENCE FROMPROPAGATION This section aims to supplement Sec 4.2 regarding the pair wise influence value from propagation (Poperation). Specifically, inspired by previous work (Xu et al., 2018a; Wang & Leskovec, 2020), we prove the correctness of the calculation method on influence ...

2025
[24]

∂z (l) j ∂z (0) i # p   =ρ· 1Y m=l 1 ^deg vmp ·w (m) q .(18) Thus, we know that E

=ρ, we have E   " ∂z (l) j ∂z (0) i # p   =ρ· 1Y m=l 1 ^deg vmp ·w (m) q .(18) Thus, we know that E " ∂z (l) j ∂z (0) i # =ρ· 1Y m=l Wm ·   λX p=1 1Y m=l 1 ^deg vmp   .(19) On the other hand,k-step random walk probability atv i can be calculated by summing up the probability of all paths of lengthkfromv i tov j, which is exactlyPλ p=1 Q1 m=l 1 ^de...

2025
[25]

B TIMECOMPLEXITYCOMPARISON B.1 THEORETICALCOMPARISON We now analyze the time complexity ofLIPbased on vanilla MLNC (Sec. 3.2). The calculation of influence in Sec. 4.2 (Pstep) is part of data pre-processing where we can use fast personalized PageRank algorithms designed for large graphs. Thus, the additional time includes two parts: (i) the influence calc...

2025
[26]

However, these methods do not analyze the complex non-Euclidean nature of graphs

argue that the label matrix is approximately full-rank and use the label correlation to regularize the prediction which is similar to PLAIN (Wang et al., 2023). However, these methods do not analyze the complex non-Euclidean nature of graphs. For image and text data, where data points are independent, label correlations exist within label semantics. For g...

2023
[27]

Hence we use their structure-based initialization method in Tab

Moreover, we find that another frequently used feature initialization method on BlogCat is to use structure-base embedding (Qiu et al., 2020). Hence we use their structure-based initialization method in Tab. 6 and Tab

2020
[28]

Table 4: The statistics of all the MLNC graph datasets. DBLP BlogCat OGB-p PCG # nodes 28,702 10,312 132,534 3,233 # edges 68,335 667,966 39,561,252 37,352 # labels 4 39 112 15 rhomo 0.76 0.10 0.15 0.17 Domain Citation Network Social Network PPI PPI Node Author Blogger Protein Protein Edge Co-authorship Friendship Biological associations Biological associ...

2020
[29]

Moreover, as noted in (Zhao et al., 2023; Yang et al., 2015; Sun et al., 2025; 2024), AUC can be misleading for highly imbalanced datasets

for highly imbalanced scenarios. Moreover, as noted in (Zhao et al., 2023; Yang et al., 2015; Sun et al., 2025; 2024), AUC can be misleading for highly imbalanced datasets. (Zhao et al.,

2023
[30]

Thus, we also report Macro F1 and AP

also reveals that 19 Published as a conference paper at ICLR 2025 OGB-p has a nearly 90% of unlabeled nodes, indicating extreme imbalance for most labels. Thus, we also report Macro F1 and AP. Macro F1 computes the F1 score for each class independently and then takes the average. AP (Average Precision) is a performance metric that summarizes the precision...

2025
[31]

Backbones.As shown in the Tab

When comparing with other baselines, we set the same number of layers for the backbone if the same backbone is used. Backbones.As shown in the Tab. 4, our datasets include both homophily and heterophily datasets, so we used four different backbones to validate the effectiveness of our method. Among them, GCN (Kipf & Welling, 2017), GAT (Velickovi´c et al....

2017
[32]

20 Published as a conference paper at ICLR 2025 Due to its abundance of labels, better reveals whether the model truly uncovers the rich relationships among labels

Under the same split ratio, we repeat the random splitting process with different random seeds five times. 20 Published as a conference paper at ICLR 2025 Due to its abundance of labels, better reveals whether the model truly uncovers the rich relationships among labels. From the Tab. 6, we can draw several conclusions. First and foremost, our method is t...

2025
[33]

In this setting, the limited training labels make it more challenging to capture label relationships

We simulated a scenario with sparse training samples, as often seen in real-world applications. In this setting, the limited training labels make it more challenging to capture label relationships. Methods that introduce labels as new nodes and construct a new graph with label-node edges are more affected, as fewer training labels result in fewer connecti...

1947
[34]

”None” stands for simply using the backbone model without any quantification of influence correlations between labels. It can be observed that in all cases, utilizing the influence correlations from both propagation (P) and transformation (T) steps (noted as All in the table) achieves the best performance than using the influence from either phase alone. ...

2025
[35]

Although the model cannot observe the complete graph in the inductive setting, the subgraph containing the nodes whose labels need to be predicted is visible

It shows that our method also achieves satisfactory performance under the inductive setting. Although the model cannot observe the complete graph in the inductive setting, the subgraph containing the nodes whose labels need to be predicted is visible. Therefore, our model’s quantification of influence correlation during the P step remains meaningful and e...

2025
[36]

Consequently, its negative influence is received by node 197 during the P process, resulting in label 1’s probability exceeding 0.5

24 Published as a conference paper at ICLR 2025 Localized Negative Influence: Among the neighboring nodes of node 197, only node 18566 has label 1, which is different from its surroundings. Consequently, its negative influence is received by node 197 during the P process, resulting in label 1’s probability exceeding 0.5. Impact on label 3: The high predic...

2025

[1] [1]

Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu

doi: 10.1109/ICDM.2019.00010. Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu. Correlation-aware graph convolutional networks for multi-label node classification. arXiv preprint arXiv:2411.17350,

work page doi:10.1109/icdm.2019.00010 2019

[2] [2]

Adaptive universal generalized pagerank graph neural network.arXiv preprint arXiv:2006.07988,

Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network.arXiv preprint arXiv:2006.07988,

work page arXiv 2006

[3] [3]

On the equivalence of decoupled graph convolution network and label propagation

Hande Dong, Jiawei Chen, Fuli Feng, Xiangnan He, Shuxian Bi, Zhaolin Ding, and Peng Cui. On the equivalence of decoupled graph convolution network and label propagation. InProceedings of the Web Conference 2021, pp. 3651–3662,

2021

[4] [4]

Lightgcn: Simplifying and powering graph convolution network for recommendation

11 Published as a conference paper at ICLR 2025 Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648,

2025

[5] [5]

Semi-supervised learning with graph learning-convolutional networks.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Bo Jiang, Ziyan Zhang, Doudou Lin, Jin Tang, and Bin Luo. Semi-supervised learning with graph learning-convolutional networks.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11305–11312,

2019

[6] [6]

Variational Graph Auto-Encoders

Thomas N Kipf and Max Welling. Variational graph auto-encoders. InArXiv, volume abs/1611.07308,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

A survey of imbalanced learning on graphs: Problems, techniques, and future directions.arXiv preprint arXiv:2308.13821,

Zemin Liu, Yuan Li, Nan Chen, Qian Wang, Bryan Hooi, and Bingsheng He. A survey of imbalanced learning on graphs: Problems, techniques, and future directions.arXiv preprint arXiv:2308.13821,

work page arXiv

[8] [8]

URL http://dx.doi.org/10.1145/2623330.2623732

doi: 10.1145/2623330.2623732. URL http://dx.doi.org/10.1145/2623330.2623732. Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training.arXiv preprint arXiv:2006.09963,

work page doi:10.1145/2623330.2623732 2006

[9] [9]

ISBN 978-3-642-04174-7

Springer Berlin Heidelberg. ISBN 978-3-642-04174-7. 12 Published as a conference paper at ICLR 2025 Min Shi, Yufei Tang, and Xingquan Zhu. Mlne: Multi-label network embedding.IEEE Transactions on Neural Networks and Learning Systems, 31(9):3682–3695, 2020a. doi: 10.1109/TNNLS.2019. 2945869. Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang...

work page doi:10.1109/tnnls.2019 2025

[10] [10]

ISBN 9781450384469

Associa- tion for Computing Machinery. ISBN 9781450384469. doi: 10.1145/3459637.3482391. URL https://doi.org/10.1145/3459637.3482391. Yifei Sun, Haoran Deng, Yang Yang, Chunping Wang, Jiarong Xu, Renhong Huang, Linfeng Cao, Yang Wang, and Lei Chen. Beyond homophily: Structure-aware path aggregation graph neural network. In Lud De Raedt (ed.),Proceedings o...

work page doi:10.1145/3459637.3482391

[11] [11]

URLhttps://doi.org/ 10.24963/ijcai.2022/310

doi: 10.24963/ijcai.2022/310. URLhttps://doi.org/ 10.24963/ijcai.2022/310. Main Track. Yifei Sun, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, and Lei Chen. Fine- tuning graph neural networks by preserving graph generative patterns. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 9053–9061,

work page doi:10.24963/ijcai.2022/310 2022

[12] [12]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 5eaafd67434a4cfb1cf829722c65f184-Paper-Datasets_and_Benchmarks. pdf. Grigorios Tsoumakas, Ioannis Manousos Katakis, and Ioannis P. Vlahavas. Mining multi-label data. InData Mining and Knowledge Discovery Handbook,

2023

[13] [13]

Graph Attention Networks

URLhttps://api. semanticscholar.org/CorpusID:22998. Petar Velickovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Unifying graph convolutional neural networks and label propa- gation.arXiv preprint arXiv:2002.06755,

Hongwei Wang and Jure Leskovec. Unifying graph convolutional neural networks and label propa- gation.arXiv preprint arXiv:2002.06755,

work page arXiv 2002

[15] [15]

A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems, 32(1):4–24,

13 Published as a conference paper at ICLR 2025 Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems, 32(1):4–24,

2025

[16] [16]

doi: https://doi.org/10.1016/j.ins.2021.12.130

ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2021.12.130. URLhttps: //www.sciencedirect.com/science/article/pii/S0020025522000111. Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. InInternational conference on machine learning, pp....

work page doi:10.1016/j.ins.2021.12.130 2021

[17] [17]

Ml-knn: A lazy learning approach to multi-label learning

Min-Ling Zhang and Zhi-Hua Zhou. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn., 40(7):2038–2048, jul

2038

[18] [18]

doi: 10.1016/j.patcog.2006.12

ISSN 0031-3203. doi: 10.1016/j.patcog.2006.12

work page doi:10.1016/j.patcog.2006.12 2006

[19] [19]

Min-Ling Zhang and Zhi-Hua Zhou

URLhttps://doi.org/10.1016/j.patcog.2006.12.019. Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms.IEEE transactions on knowledge and data engineering, 26(8):1819–1837,

work page doi:10.1016/j.patcog.2006.12.019 2006

[20] [20]

Node dependent local smoothing for scalable graph learning.Advances in Neural Information Processing Systems, 34:20321–20332, 2021a

Wentao Zhang, Mingyu Yang, Zeang Sheng, Yang Li, Wen Ouyang, Yangyu Tao, Zhi Yang, and Bin Cui. Node dependent local smoothing for scalable graph learning.Advances in Neural Information Processing Systems, 34:20321–20332, 2021a. Wentao Zhang, Zeang Sheng, Ziqi Yin, Yuezihan Jiang, Yikuan Xia, Jun Gao, Zhi Yang, and Bin Cui. Model degradation hinders deep ...

work page arXiv

[21] [21]

URLhttps://openreview.net/forum?id=EZhkV2BjDP

ISSN 2835-8856. URLhttps://openreview.net/forum?id=EZhkV2BjDP. 14 Published as a conference paper at ICLR 2025 Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu, and Victor S. Sheng. Multi- label graph node classification with label attentive neighborhood convolution.Expert Systems with Applications, 180:115063,

2025

[22] [22]

doi: https://doi.org/10.1016/j.eswa

ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa. 2021.115063. URLhttps://www.sciencedirect.com/science/article/pii/ S0957417421005042. Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs.Advances in neural information processing systems,...

work page doi:10.1016/j.eswa 2021

[23] [23]

15 Published as a conference paper at ICLR 2025 APPENDIX A DISCUSSIONS ONINFLUENCE FROMPROPAGATION This section aims to supplement Sec 4.2 regarding the pair wise influence value from propagation (Poperation). Specifically, inspired by previous work (Xu et al., 2018a; Wang & Leskovec, 2020), we prove the correctness of the calculation method on influence ...

2025

[24] [24]

∂z (l) j ∂z (0) i # p   =ρ· 1Y m=l 1 ^deg vmp ·w (m) q .(18) Thus, we know that E

=ρ, we have E   " ∂z (l) j ∂z (0) i # p   =ρ· 1Y m=l 1 ^deg vmp ·w (m) q .(18) Thus, we know that E " ∂z (l) j ∂z (0) i # =ρ· 1Y m=l Wm ·   λX p=1 1Y m=l 1 ^deg vmp   .(19) On the other hand,k-step random walk probability atv i can be calculated by summing up the probability of all paths of lengthkfromv i tov j, which is exactlyPλ p=1 Q1 m=l 1 ^de...

2025

[25] [25]

B TIMECOMPLEXITYCOMPARISON B.1 THEORETICALCOMPARISON We now analyze the time complexity ofLIPbased on vanilla MLNC (Sec. 3.2). The calculation of influence in Sec. 4.2 (Pstep) is part of data pre-processing where we can use fast personalized PageRank algorithms designed for large graphs. Thus, the additional time includes two parts: (i) the influence calc...

2025

[26] [26]

However, these methods do not analyze the complex non-Euclidean nature of graphs

argue that the label matrix is approximately full-rank and use the label correlation to regularize the prediction which is similar to PLAIN (Wang et al., 2023). However, these methods do not analyze the complex non-Euclidean nature of graphs. For image and text data, where data points are independent, label correlations exist within label semantics. For g...

2023

[27] [27]

Hence we use their structure-based initialization method in Tab

Moreover, we find that another frequently used feature initialization method on BlogCat is to use structure-base embedding (Qiu et al., 2020). Hence we use their structure-based initialization method in Tab. 6 and Tab

2020

[28] [28]

Table 4: The statistics of all the MLNC graph datasets. DBLP BlogCat OGB-p PCG # nodes 28,702 10,312 132,534 3,233 # edges 68,335 667,966 39,561,252 37,352 # labels 4 39 112 15 rhomo 0.76 0.10 0.15 0.17 Domain Citation Network Social Network PPI PPI Node Author Blogger Protein Protein Edge Co-authorship Friendship Biological associations Biological associ...

2020

[29] [29]

Moreover, as noted in (Zhao et al., 2023; Yang et al., 2015; Sun et al., 2025; 2024), AUC can be misleading for highly imbalanced datasets

for highly imbalanced scenarios. Moreover, as noted in (Zhao et al., 2023; Yang et al., 2015; Sun et al., 2025; 2024), AUC can be misleading for highly imbalanced datasets. (Zhao et al.,

2023

[30] [30]

Thus, we also report Macro F1 and AP

also reveals that 19 Published as a conference paper at ICLR 2025 OGB-p has a nearly 90% of unlabeled nodes, indicating extreme imbalance for most labels. Thus, we also report Macro F1 and AP. Macro F1 computes the F1 score for each class independently and then takes the average. AP (Average Precision) is a performance metric that summarizes the precision...

2025

[31] [31]

Backbones.As shown in the Tab

When comparing with other baselines, we set the same number of layers for the backbone if the same backbone is used. Backbones.As shown in the Tab. 4, our datasets include both homophily and heterophily datasets, so we used four different backbones to validate the effectiveness of our method. Among them, GCN (Kipf & Welling, 2017), GAT (Velickovi´c et al....

2017

[32] [32]

20 Published as a conference paper at ICLR 2025 Due to its abundance of labels, better reveals whether the model truly uncovers the rich relationships among labels

Under the same split ratio, we repeat the random splitting process with different random seeds five times. 20 Published as a conference paper at ICLR 2025 Due to its abundance of labels, better reveals whether the model truly uncovers the rich relationships among labels. From the Tab. 6, we can draw several conclusions. First and foremost, our method is t...

2025

[33] [33]

In this setting, the limited training labels make it more challenging to capture label relationships

We simulated a scenario with sparse training samples, as often seen in real-world applications. In this setting, the limited training labels make it more challenging to capture label relationships. Methods that introduce labels as new nodes and construct a new graph with label-node edges are more affected, as fewer training labels result in fewer connecti...

1947

[34] [34]

”None” stands for simply using the backbone model without any quantification of influence correlations between labels. It can be observed that in all cases, utilizing the influence correlations from both propagation (P) and transformation (T) steps (noted as All in the table) achieves the best performance than using the influence from either phase alone. ...

2025

[35] [35]

Although the model cannot observe the complete graph in the inductive setting, the subgraph containing the nodes whose labels need to be predicted is visible

It shows that our method also achieves satisfactory performance under the inductive setting. Although the model cannot observe the complete graph in the inductive setting, the subgraph containing the nodes whose labels need to be predicted is visible. Therefore, our model’s quantification of influence correlation during the P step remains meaningful and e...

2025

[36] [36]

Consequently, its negative influence is received by node 197 during the P process, resulting in label 1’s probability exceeding 0.5

24 Published as a conference paper at ICLR 2025 Localized Negative Influence: Among the neighboring nodes of node 197, only node 18566 has label 1, which is different from its surroundings. Consequently, its negative influence is received by node 197 during the P process, resulting in label 1’s probability exceeding 0.5. Impact on label 3: The high predic...

2025