pith. sign in

arxiv: 2605.20248 · v1 · pith:V4CMUYV3new · submitted 2026-05-18 · 💻 cs.LG

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

Pith reviewed 2026-05-21 08:28 UTC · model grok-4.3

classification 💻 cs.LG
keywords transductive learningnode classificationsemi-supervised learninggraph neural networksentropy minimizationloss modificationprediction sharpening
0
0 comments X

The pith

Transductive Sharpening improves node classification by minimizing entropy on unlabeled predictions while balancing labeled ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that in transductive graph settings, where the full structure is known but labels are partial, standard losses ignore predictions on unlabeled nodes even though those predictions may carry useful signal. It draws on the decomposition of cross-entropy into a label-dependent alignment part and a label-independent entropy part to treat low-entropy predictions as a proxy for . Transductive Sharpening therefore adds an entropy-minimization term on unlabeled nodes and introduces a counterbalancing adjustment on labeled nodes. The result is higher accuracy on node-classification tasks without any change to the underlying model architecture. A sympathetic reader would care because the change is orthogonal to the architectural innovations that have dominated recent progress.

Core claim

Transductive Sharpening is a loss-level modification that minimizes the entropy of model predictions on unlabeled nodes while applying a counterbalancing term on labeled nodes. This extracts usable training signal from the predictions that transductive models already produce for every node, including those without ground-truth labels. The method is motivated by the observation that cross-entropy can be separated into a label-dependent alignment component and a label-independent entropy component, allowing the entropy term to serve as a surrogate objective when labels are absent.

What carries the argument

Transductive Sharpening (TS), a loss modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes to extract training signal from unlabeled predictions.

If this is right

  • Accuracy rises consistently across a range of node-classification benchmarks in the transductive setting.
  • The gains occur without any modification to the backbone graph neural network architecture.
  • The entropy term from the cross-entropy decomposition can be used independently of label availability.
  • The same modification applies to any existing transductive model that produces predictions for all nodes during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sharpening principle could be tested in other semi-supervised structured-prediction tasks where the model sees the entire input during training.
  • Combining the entropy term with existing graph-specific regularizers might produce additive or synergistic effects.
  • The benefit may be largest in regimes with very few labels, where the unlabeled predictions become the dominant source of training signal.
  • An adaptive version that adjusts the strength of the counterbalancing term per dataset could further stabilize results.

Load-bearing premise

Low-entropy predictions on unlabeled nodes supply reliable training signal in the absence of ground-truth labels.

What would settle it

Applying Transductive Sharpening to standard citation-network benchmarks and observing no accuracy improvement or a performance drop would falsify the claim of consistent gains.

Figures

Figures reproduced from arXiv: 2605.20248 by Brown Zaz, Ferran Hernandez Caralt, Mar Gonz\`alez I Catal\`a, Moshe Eliasof, Pietro Li\`o.

Figure 1
Figure 1. Figure 1: aggregates the Glass-normalized gains over the 13 datasets for each GNN backbone. The median curves remain close to or above zero for small positive values of λ, with the most stable region lying roughly between λ = 0 and λ = 0.5. Beyond this range, the curves gradually deteriorate, and large values become harmful more often. −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 λ −2 −1 0 1 2 Glass's Δ GCN SAGE GAT [PITH_FULL_IM… view at source ↗
Figure 2
Figure 2. Figure 2: Entropy dynamics during training for the supervised baseline (grey) and TS (blue). TS [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of improvements and regressions as a function of [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test accuracy as a function of λ for each dataset and backbone. The crosshair marks the λ=0 supervised baseline. Across many dataset–backbone pairs, performance improves over a finite interval of positive λ values before degrading when λ becomes too large, while negative values of λ are often harmful. This supports the use of moderate positive sharpening and helps explain why a conservative universal value… view at source ↗
read the original abstract

In the transductive setting, where the full graph is observed but node labels are only partially available, progress in semi-supervised node classification has largely focused on architectural innovation. In this paper, we revisit an orthogonal axis: the training objective. We start from a simple observation: transductive models produce predictions for every node during training, including nodes without labels. These unlabeled-node predictions may contain useful training signal, but standard supervised objectives discard them because no ground-truth labels are available. Inspired by the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, we propose prediction confidence as a natural way to extract this signal in the absence of labels. This motivates Transductive Sharpening (TS): a loss-level modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes. We evaluate Transductive Sharpening across a wide range of node-classification benchmarks and observe consistent performance improvements without requiring any changes to the backbone architecture. Code is available at https://github.com/transductive-sharpening/tunedGNN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Transductive Sharpening (TS), a loss-level modification for transductive semi-supervised node classification on graphs. Starting from the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, TS minimizes prediction entropy on unlabeled nodes while applying a counterbalancing adjustment on labeled nodes. This is intended to extract useful training signal from the model's own predictions on unlabeled data. The authors report consistent performance improvements across node-classification benchmarks without requiring changes to the backbone GNN architecture.

Significance. If the reported gains prove robust, this would represent a simple, orthogonal contribution to GNN training objectives that could be applied broadly without architectural redesign. The emphasis on leveraging unlabeled predictions via entropy minimization is a natural extension of existing ideas in semi-supervised learning. The public release of code supports reproducibility and is a strength.

major comments (2)
  1. [Abstract and motivation] The central premise (abstract and motivation paragraph on cross-entropy decomposition) that low-entropy predictions on unlabeled nodes supply reliable signal assumes sufficient alignment with true labels. This assumption is load-bearing for the claim of consistent gains but is vulnerable early in training, under sparse labels, or on heterophilic graphs where initial predictions may be near-uniform or biased; the counterbalancing term on labeled nodes does not provably prevent error amplification in the joint objective.
  2. [Method description] The balancing weight for the entropy term on unlabeled nodes is a free hyperparameter. Its selection procedure, sensitivity analysis, and impact on the claimed consistency of improvements across label rates and graph types need explicit treatment, as this directly affects whether the method delivers parameter-light gains or requires additional tuning.
minor comments (2)
  1. [Experiments] The experimental evaluation would benefit from explicit reporting of statistical significance (e.g., standard deviations over multiple runs) and ablation studies isolating the contribution of the unlabeled entropy term versus the counterbalancing term.
  2. [Method] Clarify the exact form of the counterbalancing term on labeled nodes and how it is derived from the cross-entropy decomposition to avoid ambiguity in implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, proposing revisions to clarify assumptions and strengthen the presentation of the method.

read point-by-point responses
  1. Referee: [Abstract and motivation] The central premise (abstract and motivation paragraph on cross-entropy decomposition) that low-entropy predictions on unlabeled nodes supply reliable signal assumes sufficient alignment with true labels. This assumption is load-bearing for the claim of consistent gains but is vulnerable early in training, under sparse labels, or on heterophilic graphs where initial predictions may be near-uniform or biased; the counterbalancing term on labeled nodes does not provably prevent error amplification in the joint objective.

    Authors: We agree that the reliability of low-entropy predictions is an important consideration, particularly early in training or on challenging graphs. Our empirical results across benchmarks, including heterophilic graphs and varying label rates, show consistent gains, suggesting the joint objective with the counterbalancing term on labeled nodes provides practical robustness. However, we will revise the motivation section to explicitly discuss this assumption's limitations and add experiments tracking prediction entropy and accuracy over training epochs to illustrate the dynamics. revision: partial

  2. Referee: [Method description] The balancing weight for the entropy term on unlabeled nodes is a free hyperparameter. Its selection procedure, sensitivity analysis, and impact on the claimed consistency of improvements across label rates and graph types need explicit treatment, as this directly affects whether the method delivers parameter-light gains or requires additional tuning.

    Authors: We acknowledge that the balancing weight requires more detailed treatment to support the claim of consistent, low-tuning gains. In the revised manuscript, we will add a dedicated subsection on hyperparameter selection (using a validation set), include sensitivity analysis plots for the weight across label rates and graph types, and discuss how it affects performance consistency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; TS loss is an independent objective addition

full rationale

The paper introduces Transductive Sharpening as a direct modification to the training objective, motivated by the standard cross-entropy decomposition into alignment and entropy terms. This decomposition is a known property of the loss function and is not derived from or fitted to the paper's own results. The proposed TS term minimizes entropy on unlabeled nodes with a counterbalance on labeled nodes; it does not reduce by the paper's equations to any quantity previously fitted on the target data or to a self-citation chain. Performance claims rest on empirical benchmarks rather than a closed derivation loop. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on one domain assumption and at least one tunable hyperparameter whose value is not fixed by prior literature.

free parameters (1)
  • balancing weight for entropy term on unlabeled nodes
    A scalar coefficient must be chosen to trade off the sharpening loss against the supervised loss; its value is not derived from first principles.
axioms (1)
  • domain assumption Unlabeled-node predictions contain useful training signal that can be extracted via entropy minimization
    This premise is invoked to justify keeping the entropy term for unlabeled nodes while the standard supervised loss is retained for labeled nodes.

pith-pipeline@v0.9.0 · 5733 in / 1266 out tokens · 32822 ms · 2026-05-21T08:28:44.111757+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 11 internal anchors

  1. [1]

    Bundle Neural Networks for message diffusion on graphs

    Jacob Bamberger et al. “Bundle Neural Networks for message diffusion on graphs”. In: The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv: 2405.15540 [cs.LG]

  2. [2]

    Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations

    Deyu Bo et al. “Regularizing Graph Neural Networks via Consistency-Diversity Graph Aug- mentations”. In:Proceedings of the AAAI Conference on Artificial Intelligence36.4 (June 2022), pp. 3913–3921.DOI:10.1609/aaai.v36i4.20307

  3. [3]

    Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    Michael M Bronstein et al. “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges”. In:arXiv preprint arXiv:2104.13478(2021)

  4. [4]

    Joan Bruna et al.Spectral Networks and Locally Connected Networks on Graphs. 2014. arXiv: 1312.6203 [cs.LG]

  5. [5]

    NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

    Jinsong Chen et al. “NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs”. In:International Conference on Learning Representations. 2023. arXiv: 2206.04910 [cs.LG]

  6. [6]

    Adaptive universal generalized pagerank graph neural network

    Eli Chien et al. “Adaptive Universal Generalized PageRank Graph Neural Network”. In: International Conference on Learning Representations. 2021. arXiv:2006.07988 [cs.LG]

  7. [7]

    Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst.Convolutional Neural Net- works on Graphs with Fast Localized Spectral Filtering. 2017. arXiv:1606.09375 [cs.LG]

  8. [8]

    Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

    Chenhui Deng, Zichao Yue, and Zhiru Zhang. “Polynormer: Polynomial-Expressive Graph Transformer in Linear Time”. In:The Twelfth International Conference on Learning Repre- sentations (ICLR). 2024. arXiv: 2403.01232 [cs.LG].URL: https://openreview.net/ forum?id=hmv1LpNfXa

  9. [9]

    Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization

    Tien Huu Do et al. “Graph convolutional neural networks with node transition probability- based message passing and DropNode regularization”. In:Expert Systems with Applications 174 (2021), p. 114711.ISSN: 0957-4174.DOI:10.1016/j.eswa.2021.114711

  10. [10]

    Moshe Eliasof, Eldad Haber, and Eran Treister.Every Node Counts: Improving the Training of Graph Neural Networks on Node Classification. 2022. arXiv:2211.16631 [cs.LG]

  11. [11]

    Matthias Fey and Jan Eric Lenssen.Fast Graph Representation Learning with PyTorch Geo- metric. 2019. arXiv:1903.02428 [cs.LG]

  12. [12]

    Justin Gilmer et al.Neural Message Passing for Quantum Chemistry. 2017. arXiv: 1704. 01212 [cs.LG]

  13. [13]

    Learning task-dependent distributed representations by backpropa- gation through structure

    C. Goller and A. Kuchler. “Learning task-dependent distributed representations by backpropa- gation through structure”. In:Proceedings of International Conference on Neural Networks (ICNN’96). V ol. 1. 1996, 347–352 vol.1.DOI:10.1109/ICNN.1996.548916

  14. [14]

    A new model for learning in graph domains

    M. Gori, G. Monfardini, and F. Scarselli. “A new model for learning in graph domains”. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.V ol. 2. 2005, 729–734 vol. 2.DOI:10.1109/IJCNN.2005.1555942

  15. [15]

    Semi-supervised Learning by Entropy Mini- mization

    Yves Grandvalet and Yoshua Bengio. “Semi-supervised Learning by Entropy Mini- mization”. In:Advances in Neural Information Processing Systems. V ol. 17. 2004. URL: https : / / proceedings . neurips . cc / paper / 2004 / hash / 96f2b50b5d3613adf9c27049b2a888c7-Abstract.html

  16. [16]

    Arman Gupta et al.Flow Matters: Directional and Expressive GNNs for Heterophilic Graphs

  17. [17]

    arXiv:2509.00772 [cs.LG]

  18. [18]

    Inductive Representation Learning on Large Graphs

    William L. Hamilton, Rex Ying, and Jure Leskovec.Inductive Representation Learning on Large Graphs. 2018. arXiv:1706.02216 [cs.SI]

  19. [19]

    Xiaotian Han et al.G-Mixup: Graph Data Augmentation for Graph Classification. 2022. arXiv: 2202.07179 [cs.LG]

  20. [20]

    Accepted to The Web Conference (WWW) 2026

    Zhaolin Hu et al.GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus. Accepted to The Web Conference (WWW) 2026. 2026. arXiv:2510.10631 [cs.CV]

  21. [21]

    Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks

    Jincheng Huang et al. “Enhancing the Influence of Labels on Unlabeled Nodes in Graph Convolutional Networks”. In:Proceedings of the 42nd International Conference on Machine Learning (ICML). 2025. arXiv:2411.02279 [cs.LG]

  22. [22]

    Semi-Supervised Classification with Graph Convolutional Networks

    Thomas N. Kipf and Max Welling.Semi-Supervised Classification with Graph Convolutional Networks. 2017. arXiv:1609.02907 [cs.LG]. 11

  23. [23]

    GOAT: A Global Transformer on Large-scale Graphs

    Kezhi Kong et al. “GOAT: A Global Transformer on Large-scale Graphs”. In:International Conference on Machine Learning. 2023.URL: https://proceedings.mlr.press/v202/ kong23a.html

  24. [24]

    Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily

    Xiang Li et al. “Finding Global Homophily in Graph Neural Networks When Meeting Het- erophily”. In:International Conference on Machine Learning. 2022. arXiv: 2205.07308 [cs.LG]

  25. [25]

    When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability

    Sitao Luan et al. “When do graph neural networks help with node classification? investigating the homophily principle on node distinguishability”. In:Advances in Neural Information Processing Systems36 (2023), pp. 28748–28760

  26. [26]

    Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification

    Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification”. In:The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2024.DOI: 10.52202/079017-3098. URL:https://openreview.net/forum?id=xkljKdGe4E

  27. [27]

    Classic gnns are strong baselines: Reassessing gnns for node classification

    Yuankai Luo, Lei Shi, and Xiao-Ming Wu. “Classic gnns are strong baselines: Reassessing gnns for node classification”. In:Advances in Neural Information Processing Systems37 (2024), pp. 97650–97669

  28. [28]

    Simplifying approach to node classifi- cation in graph neural networks

    Sunil Kumar Maurya, Xin Liu, and Tsuyoshi Murata. “Simplifying approach to node classifi- cation in graph neural networks”. In:Journal of Computational Science62 (2022), p. 101695. DOI:10.1016/j.jocs.2022.101695. arXiv:2111.06748 [cs.LG]

  29. [29]

    Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901, 2020

    Péter Mernyei and C ˘at˘alina Cangea. “Wiki-CS: A Wikipedia-based Benchmark for Graph Neural Networks”. In: (2020). arXiv:2007.02901 [cs.LG]

  30. [30]

    When does label smoothing help?

    Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. “When does label smoothing help?” In:Advances in neural information processing systems32 (2019)

  31. [31]

    Improving Graph Neural Networks by Learning Continuous Edge Directions

    Seong Ho Pahng and Sahand Hormoz. “Improving Graph Neural Networks by Learning Continuous Edge Directions”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025. arXiv:2410.14109 [cs.LG]

  32. [32]

    Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

    Moonjeong Park, Jaeseung Heo, and Dongwoo Kim. “Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs”. In:Proceedings of the 41st International Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. V ol. 235. Proceedings of Machine Learning Research. PMLR, 21–27 Jul 2024, pp. 39667–39681.URL: https : //proceedings....

  33. [33]

    Geom-GCN: Geometric Graph Convolutional Networks

    Hongbin Pei et al. “Geom-GCN: Geometric Graph Convolutional Networks”. In:International Conference on Learning Representations. 2020. arXiv:2002.05287 [cs.LG]

  34. [34]

    A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?

    Oleg Platonov et al. “A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?” In:arXiv preprint arXiv:2302.11640(2023).DOI:10.48550/ arXiv.2302.11640

  35. [35]

    Recipe for a general, powerful, scalable graph transformer

    Ladislav Rampášek et al. “Recipe for a General, Powerful, Scalable Graph Transformer”. In: Advances in Neural Information Processing Systems. 2022. arXiv:2205.12454 [cs.LG]

  36. [36]

    Yu Rong et al.DropEdge: Towards Deep Graph Convolutional Networks on Node Classifica- tion. 2020. arXiv:1907.10903 [cs.LG]

  37. [37]

    A mathematical theory of communication

    C. E. Shannon. “A Mathematical Theory of Communication”. In:Bell System Technical Journal27.3 (1948), pp. 379–423.DOI:10.1002/j.1538-7305.1948.tb01338.x

  38. [38]

    Pitfalls of Graph Neural Network Evaluation

    Oleksandr Shchur et al. “Pitfalls of graph neural network evaluation”. In: (2018). arXiv: 1811.05868 [cs.LG]

  39. [39]

    Exphormer: Sparse Transformers for Graphs

    Hamed Shirzad et al. “Exphormer: Sparse Transformers for Graphs”. In:International Confer- ence on Machine Learning. 2023. arXiv:2303.06147 [cs.LG]

  40. [40]

    Dropout: A Simple Way to Prevent Neural Networks from Overfitting

    Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In:Journal of Machine Learning Research15.56 (2014), pp. 1929–1958.URL: http://jmlr. org/papers/v15/srivastava14a.html

  41. [41]

    Fan-Yun Sun et al.InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. 2020. arXiv:1908.01000 [cs.LG]

  42. [42]

    , year 1988

    Constantino Tsallis. “Possible Generalization of Boltzmann-Gibbs Statistics”. In:J. Statist. Phys.52 (1988), pp. 479–487.DOI:10.1007/BF01016429

  43. [43]

    Petar Veli ˇckovi´c et al.Graph Attention Networks. 2018. arXiv:1710.10903 [stat.ML]

  44. [44]

    Vikas Verma et al.GraphMix: Regularized Training of Graph Neural Networks for Semi- Supervised Learning. Sept. 2019.DOI:10.48550/arXiv.1909.11715. 12

  45. [45]

    Vikas Verma et al.Manifold Mixup: Better Representations by Interpolating Hidden States

  46. [46]

    arXiv:1806.05236 [stat.ML]

  47. [47]

    Minjie Wang et al.Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. 2020. arXiv:1909.01315 [cs.LG]

  48. [48]

    Mixup for Node and Graph Classification

    Yiwei Wang et al. “Mixup for Node and Graph Classification”. In:Proceedings of the Web Conference 2021. WWW ’21. Ljubljana, Slovenia: Association for Computing Machinery, 2021, pp. 3663–3674.ISBN: 9781450383127.DOI:10.1145/3442381.3449796

  49. [49]

    NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

    Qitian Wu et al. “NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification”. In:Advances in Neural Information Processing Systems. 2022.URL: https: //openreview.net/forum?id=sMezXGG5So

  50. [50]

    SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

    Qitian Wu et al. “SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations”. In:Advances in Neural Information Processing Systems. 2023. arXiv: 2306. 10759 [cs.LG]

  51. [51]

    Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework

    Yujie Xing et al. “Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework”. In:Advances in Neural Information Processing Systems (NeurIPS). 2025. arXiv: 2510.18825 [cs.LG]

  52. [52]

    Yuning You et al.Graph Contrastive Learning with Augmentations. 2021. arXiv:2010.13902 [cs.LG]

  53. [53]

    Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification

    Baoming Zhang et al. “Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2025. arXiv:2501.08581 [cs.LG]

  54. [54]

    Hongyi Zhang et al.mixup: Beyond Empirical Risk Minimization. 2018. arXiv:1710.09412 [cs.LG]

  55. [55]

    A graph transformer with optimized attention scores for node classification

    Yu Zhang et al. “A graph transformer with optimized attention scores for node classification”. In:Scientific Reports15.1 (2025), p. 30015.DOI: 10.1038/s41598-025-15551-2 .URL: https://www.nature.com/articles/s41598-025-15551-2

  56. [56]

    Lingxiao Zhao and Leman Akoglu.PairNorm: Tackling Oversmoothing in GNNs. 2020. arXiv: 1909.12223 [cs.LG]

  57. [57]

    Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

    Jiong Zhu et al. “Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs”. In:Advances in Neural Information Processing Systems. 2020. arXiv: 2006.11468 [cs.LG]

  58. [58]

    Graph Neural Networks with Heterophily

    Jiong Zhu et al. “Graph Neural Networks with Heterophily”. In:Proceedings of the AAAI Conference on Artificial Intelligence. 2021. arXiv:2009.13566 [cs.LG]

  59. [59]

    DUALFormer: Dual Graph Transformer

    Jiaming Zhuo et al. “DUALFormer: Dual Graph Transformer”. In:The Thirteenth International Conference on Learning Representations (ICLR). 2025.URL: https://openreview.net/ forum?id=4v4RcAODj9. 13 A Datasets and Experimental Details A.1 Computing Environment Our implementation is built upon tunedGNN [25], which is based on PyG [11] and DGL [45]. The experim...