Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

Brown Zaz; Ferran Hernandez Caralt; Mar Gonz\`alez I Catal\`a; Moshe Eliasof; Pietro Li\`o

arxiv: 2605.20248 · v1 · pith:V4CMUYV3new · submitted 2026-05-18 · 💻 cs.LG

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

Brown Zaz , Mar Gonz\`alez I Catal\`a , Ferran Hernandez Caralt , Moshe Eliasof , Pietro Li\`o This is my paper

Pith reviewed 2026-05-21 08:28 UTC · model grok-4.3

classification 💻 cs.LG

keywords transductive learningnode classificationsemi-supervised learninggraph neural networksentropy minimizationloss modificationprediction sharpening

0 comments

The pith

Transductive Sharpening improves node classification by minimizing entropy on unlabeled predictions while balancing labeled ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that in transductive graph settings, where the full structure is known but labels are partial, standard losses ignore predictions on unlabeled nodes even though those predictions may carry useful signal. It draws on the decomposition of cross-entropy into a label-dependent alignment part and a label-independent entropy part to treat low-entropy predictions as a proxy for . Transductive Sharpening therefore adds an entropy-minimization term on unlabeled nodes and introduces a counterbalancing adjustment on labeled nodes. The result is higher accuracy on node-classification tasks without any change to the underlying model architecture. A sympathetic reader would care because the change is orthogonal to the architectural innovations that have dominated recent progress.

Core claim

Transductive Sharpening is a loss-level modification that minimizes the entropy of model predictions on unlabeled nodes while applying a counterbalancing term on labeled nodes. This extracts usable training signal from the predictions that transductive models already produce for every node, including those without ground-truth labels. The method is motivated by the observation that cross-entropy can be separated into a label-dependent alignment component and a label-independent entropy component, allowing the entropy term to serve as a surrogate objective when labels are absent.

What carries the argument

Transductive Sharpening (TS), a loss modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes to extract training signal from unlabeled predictions.

If this is right

Accuracy rises consistently across a range of node-classification benchmarks in the transductive setting.
The gains occur without any modification to the backbone graph neural network architecture.
The entropy term from the cross-entropy decomposition can be used independently of label availability.
The same modification applies to any existing transductive model that produces predictions for all nodes during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sharpening principle could be tested in other semi-supervised structured-prediction tasks where the model sees the entire input during training.
Combining the entropy term with existing graph-specific regularizers might produce additive or synergistic effects.
The benefit may be largest in regimes with very few labels, where the unlabeled predictions become the dominant source of training signal.
An adaptive version that adjusts the strength of the counterbalancing term per dataset could further stabilize results.

Load-bearing premise

Low-entropy predictions on unlabeled nodes supply reliable training signal in the absence of ground-truth labels.

What would settle it

Applying Transductive Sharpening to standard citation-network benchmarks and observing no accuracy improvement or a performance drop would falsify the claim of consistent gains.

Figures

Figures reproduced from arXiv: 2605.20248 by Brown Zaz, Ferran Hernandez Caralt, Mar Gonz\`alez I Catal\`a, Moshe Eliasof, Pietro Li\`o.

**Figure 1.** Figure 1: aggregates the Glass-normalized gains over the 13 datasets for each GNN backbone. The median curves remain close to or above zero for small positive values of λ, with the most stable region lying roughly between λ = 0 and λ = 0.5. Beyond this range, the curves gradually deteriorate, and large values become harmful more often. −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 λ −2 −1 0 1 2 Glass's Δ GCN SAGE GAT [PITH_FULL_IM… view at source ↗

**Figure 2.** Figure 2: Entropy dynamics during training for the supervised baseline (grey) and TS (blue). TS [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of improvements and regressions as a function of [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Test accuracy as a function of λ for each dataset and backbone. The crosshair marks the λ=0 supervised baseline. Across many dataset–backbone pairs, performance improves over a finite interval of positive λ values before degrading when λ becomes too large, while negative values of λ are often harmful. This supports the use of moderate positive sharpening and helps explain why a conservative universal value… view at source ↗

read the original abstract

In the transductive setting, where the full graph is observed but node labels are only partially available, progress in semi-supervised node classification has largely focused on architectural innovation. In this paper, we revisit an orthogonal axis: the training objective. We start from a simple observation: transductive models produce predictions for every node during training, including nodes without labels. These unlabeled-node predictions may contain useful training signal, but standard supervised objectives discard them because no ground-truth labels are available. Inspired by the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, we propose prediction confidence as a natural way to extract this signal in the absence of labels. This motivates Transductive Sharpening (TS): a loss-level modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes. We evaluate Transductive Sharpening across a wide range of node-classification benchmarks and observe consistent performance improvements without requiring any changes to the backbone architecture. Code is available at https://github.com/transductive-sharpening/tunedGNN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Transductive Sharpening (TS), a loss-level modification for transductive semi-supervised node classification on graphs. Starting from the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, TS minimizes prediction entropy on unlabeled nodes while applying a counterbalancing adjustment on labeled nodes. This is intended to extract useful training signal from the model's own predictions on unlabeled data. The authors report consistent performance improvements across node-classification benchmarks without requiring changes to the backbone GNN architecture.

Significance. If the reported gains prove robust, this would represent a simple, orthogonal contribution to GNN training objectives that could be applied broadly without architectural redesign. The emphasis on leveraging unlabeled predictions via entropy minimization is a natural extension of existing ideas in semi-supervised learning. The public release of code supports reproducibility and is a strength.

major comments (2)

[Abstract and motivation] The central premise (abstract and motivation paragraph on cross-entropy decomposition) that low-entropy predictions on unlabeled nodes supply reliable signal assumes sufficient alignment with true labels. This assumption is load-bearing for the claim of consistent gains but is vulnerable early in training, under sparse labels, or on heterophilic graphs where initial predictions may be near-uniform or biased; the counterbalancing term on labeled nodes does not provably prevent error amplification in the joint objective.
[Method description] The balancing weight for the entropy term on unlabeled nodes is a free hyperparameter. Its selection procedure, sensitivity analysis, and impact on the claimed consistency of improvements across label rates and graph types need explicit treatment, as this directly affects whether the method delivers parameter-light gains or requires additional tuning.

minor comments (2)

[Experiments] The experimental evaluation would benefit from explicit reporting of statistical significance (e.g., standard deviations over multiple runs) and ablation studies isolating the contribution of the unlabeled entropy term versus the counterbalancing term.
[Method] Clarify the exact form of the counterbalancing term on labeled nodes and how it is derived from the cross-entropy decomposition to avoid ambiguity in implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, proposing revisions to clarify assumptions and strengthen the presentation of the method.

read point-by-point responses

Referee: [Abstract and motivation] The central premise (abstract and motivation paragraph on cross-entropy decomposition) that low-entropy predictions on unlabeled nodes supply reliable signal assumes sufficient alignment with true labels. This assumption is load-bearing for the claim of consistent gains but is vulnerable early in training, under sparse labels, or on heterophilic graphs where initial predictions may be near-uniform or biased; the counterbalancing term on labeled nodes does not provably prevent error amplification in the joint objective.

Authors: We agree that the reliability of low-entropy predictions is an important consideration, particularly early in training or on challenging graphs. Our empirical results across benchmarks, including heterophilic graphs and varying label rates, show consistent gains, suggesting the joint objective with the counterbalancing term on labeled nodes provides practical robustness. However, we will revise the motivation section to explicitly discuss this assumption's limitations and add experiments tracking prediction entropy and accuracy over training epochs to illustrate the dynamics. revision: partial
Referee: [Method description] The balancing weight for the entropy term on unlabeled nodes is a free hyperparameter. Its selection procedure, sensitivity analysis, and impact on the claimed consistency of improvements across label rates and graph types need explicit treatment, as this directly affects whether the method delivers parameter-light gains or requires additional tuning.

Authors: We acknowledge that the balancing weight requires more detailed treatment to support the claim of consistent, low-tuning gains. In the revised manuscript, we will add a dedicated subsection on hyperparameter selection (using a validation set), include sensitivity analysis plots for the weight across label rates and graph types, and discuss how it affects performance consistency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; TS loss is an independent objective addition

full rationale

The paper introduces Transductive Sharpening as a direct modification to the training objective, motivated by the standard cross-entropy decomposition into alignment and entropy terms. This decomposition is a known property of the loss function and is not derived from or fitted to the paper's own results. The proposed TS term minimizes entropy on unlabeled nodes with a counterbalance on labeled nodes; it does not reduce by the paper's equations to any quantity previously fitted on the target data or to a self-citation chain. Performance claims rest on empirical benchmarks rather than a closed derivation loop. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on one domain assumption and at least one tunable hyperparameter whose value is not fixed by prior literature.

free parameters (1)

balancing weight for entropy term on unlabeled nodes
A scalar coefficient must be chosen to trade off the sharpening loss against the supervised loss; its value is not derived from first principles.

axioms (1)

domain assumption Unlabeled-node predictions contain useful training signal that can be extracted via entropy minimization
This premise is invoked to justify keeping the entropy term for unlabeled nodes while the standard supervised loss is retained for labeled nodes.

pith-pipeline@v0.9.0 · 5733 in / 1266 out tokens · 32822 ms · 2026-05-21T08:28:44.111757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 11 internal anchors

[1]

Bundle Neural Networks for message diffusion on graphs

Jacob Bamberger et al. “Bundle Neural Networks for message diffusion on graphs”. In: The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv: 2405.15540 [cs.LG]

work page arXiv 2025
[2]

Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations

Deyu Bo et al. “Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations”. In:Proceedings of the AAAI Conference on Artificial Intelligence36.4 (June 2022), pp. 3913–3921.DOI:10.1609/aaai.v36i4.20307

work page doi:10.1609/aaai.v36i4.20307 2022
[3]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein et al. “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges”. In:arXiv preprint arXiv:2104.13478(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Joan Bruna et al.Spectral Networks and Locally Connected Networks on Graphs. 2014. arXiv: 1312.6203 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[5]

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Jinsong Chen et al. “NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs”. In:International Conference on Learning Representations. 2023. arXiv: 2206.04910 [cs.LG]

work page arXiv 2023
[6]

Adaptive universal generalized pagerank graph neural network

Eli Chien et al. “Adaptive Universal Generalized PageRank Graph Neural Network”. In: International Conference on Learning Representations. 2021. arXiv:2006.07988 [cs.LG]

work page arXiv 2021
[7]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst.Convolutional Neural Net- works on Graphs with Fast Localized Spectral Filtering. 2017. arXiv:1606.09375 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[8]

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

Chenhui Deng, Zichao Yue, and Zhiru Zhang. “Polynormer: Polynomial-Expressive Graph Transformer in Linear Time”. In:The Twelfth International Conference on Learning Repre- sentations (ICLR). 2024. arXiv: 2403.01232 [cs.LG].URL: https://openreview.net/ forum?id=hmv1LpNfXa

work page arXiv 2024
[9]

Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization

Tien Huu Do et al. “Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization”. In:Expert Systems with Applications 174 (2021), p. 114711.ISSN: 0957-4174.DOI:10.1016/j.eswa.2021.114711

work page doi:10.1016/j.eswa.2021.114711 2021
[10]

Moshe Eliasof, Eldad Haber, and Eran Treister.Every Node Counts: Improving the Training of Graph Neural Networks on Node Classification. 2022. arXiv:2211.16631 [cs.LG]

work page arXiv 2022
[11]

Matthias Fey and Jan Eric Lenssen.Fast Graph Representation Learning with PyTorch Geo- metric. 2019. arXiv:1903.02428 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[12]

Justin Gilmer et al.Neural Message Passing for Quantum Chemistry. 2017. arXiv: 1704. 01212 [cs.LG]

work page 2017
[13]

Learning task-dependent distributed representations by backpropa- gation through structure

C. Goller and A. Kuchler. “Learning task-dependent distributed representations by backpropa- gation through structure”. In:Proceedings of International Conference on Neural Networks (ICNN’96). V ol. 1. 1996, 347–352 vol.1.DOI:10.1109/ICNN.1996.548916

work page doi:10.1109/icnn.1996.548916 1996
[14]

A new model for learning in graph domains

M. Gori, G. Monfardini, and F. Scarselli. “A new model for learning in graph domains”. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.V ol. 2. 2005, 729–734 vol. 2.DOI:10.1109/IJCNN.2005.1555942

work page doi:10.1109/ijcnn.2005.1555942 2005
[15]

Semi-supervised Learning by Entropy Mini- mization

Yves Grandvalet and Yoshua Bengio. “Semi-supervised Learning by Entropy Mini- mization”. In:Advances in Neural Information Processing Systems. V ol. 17. 2004. URL: https : / / proceedings . neurips . cc / paper / 2004 / hash / 96f2b50b5d3613adf9c27049b2a888c7-Abstract.html

work page 2004
[16]

Arman Gupta et al.Flow Matters: Directional and Expressive GNNs for Heterophilic Graphs

work page
[17]

arXiv:2509.00772 [cs.LG]

work page arXiv
[18]

Inductive Representation Learning on Large Graphs

William L. Hamilton, Rex Ying, and Jure Leskovec.Inductive Representation Learning on Large Graphs. 2018. arXiv:1706.02216 [cs.SI]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Xiaotian Han et al.G-Mixup: Graph Data Augmentation for Graph Classification. 2022. arXiv: 2202.07179 [cs.LG]

work page arXiv 2022
[20]

Accepted to The Web Conference (WWW) 2026

Zhaolin Hu et al.GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus. Accepted to The Web Conference (WWW) 2026. 2026. arXiv:2510.10631 [cs.CV]

work page arXiv 2026
[21]

Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks

Jincheng Huang et al. “Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks”. In:Proceedings of the 42nd International Conference on Machine Learning (ICML). 2025. arXiv:2411.02279 [cs.LG]

work page arXiv 2025
[22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling.Semi-Supervised Classification with Graph Convolutional Networks. 2017. arXiv:1609.02907 [cs.LG]. 11

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

GOAT: A Global Transformer on Large-scale Graphs

Kezhi Kong et al. “GOAT: A Global Transformer on Large-scale Graphs”. In:International Conference on Machine Learning. 2023.URL: https://proceedings.mlr.press/v202/ kong23a.html

work page 2023
[24]

Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily

Xiang Li et al. “Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily”. In:International Conference on Machine Learning. 2022. arXiv: 2205.07308 [cs.LG]

work page arXiv 2022
[25]

When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability

Sitao Luan et al. “When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability”. In:Advances in Neural Information Processing Systems36 (2023), pp. 28748–28760

work page 2023
[26]

Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification

Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification”. In:The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2024.DOI: 10.52202/079017-3098. URL:https://openreview.net/forum?id=xkljKdGe4E

work page doi:10.52202/079017-3098 2024
[27]

Classic gnns are strong baselines: Reassessing gnns for node classification

Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic gnns are strong baselines: Reassessing gnns for node classification”. In:Advances in Neural Information Processing Systems37 (2024), pp. 97650–97669

work page 2024
[28]

Simplifying approach to node classifi- cation in graph neural networks

Sunil Kumar Maurya, Xin Liu, and Tsuyoshi Murata. “Simplifying approach to node classifi- cation in graph neural networks”. In:Journal of Computational Science62 (2022), p. 101695. DOI:10.1016/j.jocs.2022.101695. arXiv:2111.06748 [cs.LG]

work page doi:10.1016/j.jocs.2022.101695 2022
[29]

Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

Péter Mernyei and C ˘at˘alina Cangea. “Wiki-CS: A Wikipedia-based Benchmark for Graph Neural Networks”. In: (2020). arXiv:2007.02901 [cs.LG]

work page arXiv 2020
[30]

When does label smoothing help?

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. “When does label smoothing help?” In:Advances in neural information processing systems32 (2019)

work page 2019
[31]

Improving Graph Neural Networks by Learning Continuous Edge Directions

Seong Ho Pahng and Sahand Hormoz. “Improving Graph Neural Networks by Learning Continuous Edge Directions”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv:2410.14109 [cs.LG]

work page arXiv 2025
[32]

Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

Moonjeong Park, Jaeseung Heo, and Dongwoo Kim. “Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs”. In:Proceedings of the 41st International Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. V ol. 235. Proceedings of Machine Learning Research. PMLR, 21–27 Jul 2024, pp. 39667–39681.URL: https : //proceedings....

work page 2024
[33]

Geom-GCN: Geometric Graph Convolutional Networks

Hongbin Pei et al. “Geom-GCN: Geometric Graph Convolutional Networks”. In:International Conference on Learning Representations. 2020. arXiv:2002.05287 [cs.LG]

work page arXiv 2020
[34]

A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?

Oleg Platonov et al. “A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?” In:arXiv preprint arXiv:2302.11640(2023).DOI:10.48550/ arXiv.2302.11640

work page arXiv 2023
[35]

Recipe for a general, powerful, scalable graph transformer

Ladislav Rampášek et al. “Recipe for a General, Powerful, Scalable Graph Transformer”. In: Advances in Neural Information Processing Systems. 2022. arXiv:2205.12454 [cs.LG]

work page arXiv 2022
[36]

Yu Rong et al.DropEdge: Towards Deep Graph Convolutional Networks on Node Classifica- tion. 2020. arXiv:1907.10903 [cs.LG]

work page arXiv 2020
[37]

A mathematical theory of communication

C. E. Shannon. “A Mathematical Theory of Communication”. In:Bell System Technical Journal27.3 (1948), pp. 379–423.DOI:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948
[38]

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur et al. “Pitfalls of graph neural network evaluation”. In: (2018). arXiv: 1811.05868 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

Exphormer: Sparse Transformers for Graphs

Hamed Shirzad et al. “Exphormer: Sparse Transformers for Graphs”. In:International Confer- ence on Machine Learning. 2023. arXiv:2303.06147 [cs.LG]

work page arXiv 2023
[40]

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In:Journal of Machine Learning Research15.56 (2014), pp. 1929–1958.URL: http://jmlr. org/papers/v15/srivastava14a.html

work page 2014
[41]

Fan-Yun Sun et al.InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. 2020. arXiv:1908.01000 [cs.LG]

work page arXiv 2020
[42]

, year 1988

Constantino Tsallis. “Possible Generalization of Boltzmann-Gibbs Statistics”. In:J. Statist. Phys.52 (1988), pp. 479–487.DOI:10.1007/BF01016429

work page doi:10.1007/bf01016429 1988
[43]

Petar Veli ˇckovi´c et al.Graph Attention Networks. 2018. arXiv:1710.10903 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

Vikas Verma et al.GraphMix: Regularized Training of Graph Neural Networks for Semi- Supervised Learning. Sept. 2019.DOI:10.48550/arXiv.1909.11715. 12

work page doi:10.48550/arxiv.1909.11715 2019
[45]

Vikas Verma et al.Manifold Mixup: Better Representations by Interpolating Hidden States

work page
[46]

arXiv:1806.05236 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Minjie Wang et al.Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. 2020. arXiv:1909.01315 [cs.LG]

work page internal anchor Pith review arXiv 2020
[48]

Mixup for Node and Graph Classification

Yiwei Wang et al. “Mixup for Node and Graph Classification”. In:Proceedings of the Web Conference 2021. WWW ’21. Ljubljana, Slovenia: Association for Computing Machinery, 2021, pp. 3663–3674.ISBN: 9781450383127.DOI:10.1145/3442381.3449796

work page doi:10.1145/3442381.3449796 2021
[49]

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

Qitian Wu et al. “NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification”. In:Advances in Neural Information Processing Systems. 2022.URL: https: //openreview.net/forum?id=sMezXGG5So

work page 2022
[50]

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Qitian Wu et al. “SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations”. In:Advances in Neural Information Processing Systems. 2023. arXiv: 2306. 10759 [cs.LG]

work page 2023
[51]

Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework

Yujie Xing et al. “Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework”. In:Advances in Neural Information Processing Systems (NeurIPS). 2025. arXiv: 2510.18825 [cs.LG]

work page arXiv 2025
[52]

Yuning You et al.Graph Contrastive Learning with Augmentations. 2021. arXiv:2010.13902 [cs.LG]

work page arXiv 2021
[53]

Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification

Baoming Zhang et al. “Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2025. arXiv:2501.08581 [cs.LG]

work page arXiv 2025
[54]

Hongyi Zhang et al.mixup: Beyond Empirical Risk Minimization. 2018. arXiv:1710.09412 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[55]

A graph transformer with optimized attention scores for node classification

Yu Zhang et al. “A graph transformer with optimized attention scores for node classification”. In:Scientific Reports15.1 (2025), p. 30015.DOI: 10.1038/s41598-025-15551-2 .URL: https://www.nature.com/articles/s41598-025-15551-2

work page doi:10.1038/s41598-025-15551-2 2025
[56]

Lingxiao Zhao and Leman Akoglu.PairNorm: Tackling Oversmoothing in GNNs. 2020. arXiv: 1909.12223 [cs.LG]

work page arXiv 2020
[57]

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

Jiong Zhu et al. “Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs”. In:Advances in Neural Information Processing Systems. 2020. arXiv: 2006.11468 [cs.LG]

work page arXiv 2020
[58]

Graph Neural Networks with Heterophily

Jiong Zhu et al. “Graph Neural Networks with Heterophily”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2021. arXiv:2009.13566 [cs.LG]

work page arXiv 2021
[59]

DUALFormer: Dual Graph Transformer

Jiaming Zhuo et al. “DUALFormer: Dual Graph Transformer”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025.URL: https://openreview.net/ forum?id=4v4RcAODj9. 13 A Datasets and Experimental Details A.1 Computing Environment Our implementation is built upon tunedGNN [25], which is based on PyG [11] and DGL [45]. The experim...

work page 2025

[1] [1]

Bundle Neural Networks for message diffusion on graphs

Jacob Bamberger et al. “Bundle Neural Networks for message diffusion on graphs”. In: The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv: 2405.15540 [cs.LG]

work page arXiv 2025

[2] [2]

Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations

Deyu Bo et al. “Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations”. In:Proceedings of the AAAI Conference on Artificial Intelligence36.4 (June 2022), pp. 3913–3921.DOI:10.1609/aaai.v36i4.20307

work page doi:10.1609/aaai.v36i4.20307 2022

[3] [3]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein et al. “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges”. In:arXiv preprint arXiv:2104.13478(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[4] [4]

Joan Bruna et al.Spectral Networks and Locally Connected Networks on Graphs. 2014. arXiv: 1312.6203 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2014

[5] [5]

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Jinsong Chen et al. “NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs”. In:International Conference on Learning Representations. 2023. arXiv: 2206.04910 [cs.LG]

work page arXiv 2023

[6] [6]

Adaptive universal generalized pagerank graph neural network

Eli Chien et al. “Adaptive Universal Generalized PageRank Graph Neural Network”. In: International Conference on Learning Representations. 2021. arXiv:2006.07988 [cs.LG]

work page arXiv 2021

[7] [7]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst.Convolutional Neural Net- works on Graphs with Fast Localized Spectral Filtering. 2017. arXiv:1606.09375 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [8]

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

Chenhui Deng, Zichao Yue, and Zhiru Zhang. “Polynormer: Polynomial-Expressive Graph Transformer in Linear Time”. In:The Twelfth International Conference on Learning Repre- sentations (ICLR). 2024. arXiv: 2403.01232 [cs.LG].URL: https://openreview.net/ forum?id=hmv1LpNfXa

work page arXiv 2024

[9] [9]

Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization

Tien Huu Do et al. “Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization”. In:Expert Systems with Applications 174 (2021), p. 114711.ISSN: 0957-4174.DOI:10.1016/j.eswa.2021.114711

work page doi:10.1016/j.eswa.2021.114711 2021

[10] [10]

Moshe Eliasof, Eldad Haber, and Eran Treister.Every Node Counts: Improving the Training of Graph Neural Networks on Node Classification. 2022. arXiv:2211.16631 [cs.LG]

work page arXiv 2022

[11] [11]

Matthias Fey and Jan Eric Lenssen.Fast Graph Representation Learning with PyTorch Geo- metric. 2019. arXiv:1903.02428 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[12] [12]

Justin Gilmer et al.Neural Message Passing for Quantum Chemistry. 2017. arXiv: 1704. 01212 [cs.LG]

work page 2017

[13] [13]

Learning task-dependent distributed representations by backpropa- gation through structure

C. Goller and A. Kuchler. “Learning task-dependent distributed representations by backpropa- gation through structure”. In:Proceedings of International Conference on Neural Networks (ICNN’96). V ol. 1. 1996, 347–352 vol.1.DOI:10.1109/ICNN.1996.548916

work page doi:10.1109/icnn.1996.548916 1996

[14] [14]

A new model for learning in graph domains

M. Gori, G. Monfardini, and F. Scarselli. “A new model for learning in graph domains”. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.V ol. 2. 2005, 729–734 vol. 2.DOI:10.1109/IJCNN.2005.1555942

work page doi:10.1109/ijcnn.2005.1555942 2005

[15] [15]

Semi-supervised Learning by Entropy Mini- mization

Yves Grandvalet and Yoshua Bengio. “Semi-supervised Learning by Entropy Mini- mization”. In:Advances in Neural Information Processing Systems. V ol. 17. 2004. URL: https : / / proceedings . neurips . cc / paper / 2004 / hash / 96f2b50b5d3613adf9c27049b2a888c7-Abstract.html

work page 2004

[16] [16]

Arman Gupta et al.Flow Matters: Directional and Expressive GNNs for Heterophilic Graphs

work page

[17] [17]

arXiv:2509.00772 [cs.LG]

work page arXiv

[18] [18]

Inductive Representation Learning on Large Graphs

William L. Hamilton, Rex Ying, and Jure Leskovec.Inductive Representation Learning on Large Graphs. 2018. arXiv:1706.02216 [cs.SI]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Xiaotian Han et al.G-Mixup: Graph Data Augmentation for Graph Classification. 2022. arXiv: 2202.07179 [cs.LG]

work page arXiv 2022

[20] [20]

Accepted to The Web Conference (WWW) 2026

Zhaolin Hu et al.GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus. Accepted to The Web Conference (WWW) 2026. 2026. arXiv:2510.10631 [cs.CV]

work page arXiv 2026

[21] [21]

Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks

Jincheng Huang et al. “Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks”. In:Proceedings of the 42nd International Conference on Machine Learning (ICML). 2025. arXiv:2411.02279 [cs.LG]

work page arXiv 2025

[22] [22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling.Semi-Supervised Classification with Graph Convolutional Networks. 2017. arXiv:1609.02907 [cs.LG]. 11

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

GOAT: A Global Transformer on Large-scale Graphs

Kezhi Kong et al. “GOAT: A Global Transformer on Large-scale Graphs”. In:International Conference on Machine Learning. 2023.URL: https://proceedings.mlr.press/v202/ kong23a.html

work page 2023

[24] [24]

Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily

Xiang Li et al. “Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily”. In:International Conference on Machine Learning. 2022. arXiv: 2205.07308 [cs.LG]

work page arXiv 2022

[25] [25]

When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability

Sitao Luan et al. “When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability”. In:Advances in Neural Information Processing Systems36 (2023), pp. 28748–28760

work page 2023

[26] [26]

Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification

Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification”. In:The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2024.DOI: 10.52202/079017-3098. URL:https://openreview.net/forum?id=xkljKdGe4E

work page doi:10.52202/079017-3098 2024

[27] [27]

Classic gnns are strong baselines: Reassessing gnns for node classification

Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic gnns are strong baselines: Reassessing gnns for node classification”. In:Advances in Neural Information Processing Systems37 (2024), pp. 97650–97669

work page 2024

[28] [28]

Simplifying approach to node classifi- cation in graph neural networks

Sunil Kumar Maurya, Xin Liu, and Tsuyoshi Murata. “Simplifying approach to node classifi- cation in graph neural networks”. In:Journal of Computational Science62 (2022), p. 101695. DOI:10.1016/j.jocs.2022.101695. arXiv:2111.06748 [cs.LG]

work page doi:10.1016/j.jocs.2022.101695 2022

[29] [29]

Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

Péter Mernyei and C ˘at˘alina Cangea. “Wiki-CS: A Wikipedia-based Benchmark for Graph Neural Networks”. In: (2020). arXiv:2007.02901 [cs.LG]

work page arXiv 2020

[30] [30]

When does label smoothing help?

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. “When does label smoothing help?” In:Advances in neural information processing systems32 (2019)

work page 2019

[31] [31]

Improving Graph Neural Networks by Learning Continuous Edge Directions

Seong Ho Pahng and Sahand Hormoz. “Improving Graph Neural Networks by Learning Continuous Edge Directions”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv:2410.14109 [cs.LG]

work page arXiv 2025

[32] [32]

Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

Moonjeong Park, Jaeseung Heo, and Dongwoo Kim. “Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs”. In:Proceedings of the 41st International Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. V ol. 235. Proceedings of Machine Learning Research. PMLR, 21–27 Jul 2024, pp. 39667–39681.URL: https : //proceedings....

work page 2024

[33] [33]

Geom-GCN: Geometric Graph Convolutional Networks

Hongbin Pei et al. “Geom-GCN: Geometric Graph Convolutional Networks”. In:International Conference on Learning Representations. 2020. arXiv:2002.05287 [cs.LG]

work page arXiv 2020

[34] [34]

A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?

Oleg Platonov et al. “A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?” In:arXiv preprint arXiv:2302.11640(2023).DOI:10.48550/ arXiv.2302.11640

work page arXiv 2023

[35] [35]

Recipe for a general, powerful, scalable graph transformer

Ladislav Rampášek et al. “Recipe for a General, Powerful, Scalable Graph Transformer”. In: Advances in Neural Information Processing Systems. 2022. arXiv:2205.12454 [cs.LG]

work page arXiv 2022

[36] [36]

Yu Rong et al.DropEdge: Towards Deep Graph Convolutional Networks on Node Classifica- tion. 2020. arXiv:1907.10903 [cs.LG]

work page arXiv 2020

[37] [37]

A mathematical theory of communication

C. E. Shannon. “A Mathematical Theory of Communication”. In:Bell System Technical Journal27.3 (1948), pp. 379–423.DOI:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948

[38] [38]

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur et al. “Pitfalls of graph neural network evaluation”. In: (2018). arXiv: 1811.05868 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

Exphormer: Sparse Transformers for Graphs

Hamed Shirzad et al. “Exphormer: Sparse Transformers for Graphs”. In:International Confer- ence on Machine Learning. 2023. arXiv:2303.06147 [cs.LG]

work page arXiv 2023

[40] [40]

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In:Journal of Machine Learning Research15.56 (2014), pp. 1929–1958.URL: http://jmlr. org/papers/v15/srivastava14a.html

work page 2014

[41] [41]

Fan-Yun Sun et al.InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. 2020. arXiv:1908.01000 [cs.LG]

work page arXiv 2020

[42] [42]

, year 1988

Constantino Tsallis. “Possible Generalization of Boltzmann-Gibbs Statistics”. In:J. Statist. Phys.52 (1988), pp. 479–487.DOI:10.1007/BF01016429

work page doi:10.1007/bf01016429 1988

[43] [43]

Petar Veli ˇckovi´c et al.Graph Attention Networks. 2018. arXiv:1710.10903 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[44] [44]

Vikas Verma et al.GraphMix: Regularized Training of Graph Neural Networks for Semi- Supervised Learning. Sept. 2019.DOI:10.48550/arXiv.1909.11715. 12

work page doi:10.48550/arxiv.1909.11715 2019

[45] [45]

Vikas Verma et al.Manifold Mixup: Better Representations by Interpolating Hidden States

work page

[46] [46]

arXiv:1806.05236 [stat.ML]

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Minjie Wang et al.Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. 2020. arXiv:1909.01315 [cs.LG]

work page internal anchor Pith review arXiv 2020

[48] [48]

Mixup for Node and Graph Classification

Yiwei Wang et al. “Mixup for Node and Graph Classification”. In:Proceedings of the Web Conference 2021. WWW ’21. Ljubljana, Slovenia: Association for Computing Machinery, 2021, pp. 3663–3674.ISBN: 9781450383127.DOI:10.1145/3442381.3449796

work page doi:10.1145/3442381.3449796 2021

[49] [49]

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

Qitian Wu et al. “NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification”. In:Advances in Neural Information Processing Systems. 2022.URL: https: //openreview.net/forum?id=sMezXGG5So

work page 2022

[50] [50]

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Qitian Wu et al. “SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations”. In:Advances in Neural Information Processing Systems. 2023. arXiv: 2306. 10759 [cs.LG]

work page 2023

[51] [51]

Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework

Yujie Xing et al. “Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework”. In:Advances in Neural Information Processing Systems (NeurIPS). 2025. arXiv: 2510.18825 [cs.LG]

work page arXiv 2025

[52] [52]

Yuning You et al.Graph Contrastive Learning with Augmentations. 2021. arXiv:2010.13902 [cs.LG]

work page arXiv 2021

[53] [53]

Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification

Baoming Zhang et al. “Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2025. arXiv:2501.08581 [cs.LG]

work page arXiv 2025

[54] [54]

Hongyi Zhang et al.mixup: Beyond Empirical Risk Minimization. 2018. arXiv:1710.09412 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[55] [55]

A graph transformer with optimized attention scores for node classification

Yu Zhang et al. “A graph transformer with optimized attention scores for node classification”. In:Scientific Reports15.1 (2025), p. 30015.DOI: 10.1038/s41598-025-15551-2 .URL: https://www.nature.com/articles/s41598-025-15551-2

work page doi:10.1038/s41598-025-15551-2 2025

[56] [56]

Lingxiao Zhao and Leman Akoglu.PairNorm: Tackling Oversmoothing in GNNs. 2020. arXiv: 1909.12223 [cs.LG]

work page arXiv 2020

[57] [57]

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

Jiong Zhu et al. “Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs”. In:Advances in Neural Information Processing Systems. 2020. arXiv: 2006.11468 [cs.LG]

work page arXiv 2020

[58] [58]

Graph Neural Networks with Heterophily

Jiong Zhu et al. “Graph Neural Networks with Heterophily”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2021. arXiv:2009.13566 [cs.LG]

work page arXiv 2021

[59] [59]

DUALFormer: Dual Graph Transformer

Jiaming Zhuo et al. “DUALFormer: Dual Graph Transformer”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025.URL: https://openreview.net/ forum?id=4v4RcAODj9. 13 A Datasets and Experimental Details A.1 Computing Environment Our implementation is built upon tunedGNN [25], which is based on PyG [11] and DGL [45]. The experim...

work page 2025