Recognition: unknown
Invariant-Based Diagnostics for Graph Benchmarks
Pith reviewed 2026-05-08 12:42 UTC · model grok-4.3
The pith
Graph invariants act as diagnostics and often match or exceed complex models on graph benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Graph invariants serve as a diagnostic framework that separates structural contributions from node features in benchmarks; they are more expressive than typical GNNs, characterize heterogeneity within and across datasets, predict multi-task performance, and enable simple invariant-based models that compete with or surpass transformer and message-passing approaches across 26 datasets, implying that expressivity is not the main performance driver and that non-trainable structural proxies often suffice where structure matters.
What carries the argument
Graph invariants: permutation-invariant, task-agnostic structural descriptors that enable both analysis of benchmark properties and construction of non-trainable predictive models.
Load-bearing premise
The selected graph invariants capture the structural aspects relevant to the tasks without hidden feature-structure correlations that would favor them over trained models.
What would settle it
A dataset where trained transformers or message-passing models substantially outperform invariant-based models on a task whose solution demonstrably requires complex connectivity beyond what the invariants encode.
Figures
read the original abstract
Progress on graph foundation models is hindered by benchmark practices that conflate the contributions of node features and graph structure, making it hard to tell whether a model actually learns from connectivity, or whether it even needs to. We propose addressing this using graph invariants, i.e., permutation-invariant, task-agnostic structural descriptors that serve as a diagnostic framework for graph benchmarks. We show that (i) invariants are more expressive than standard GNNs, (ii) invariants characterize structural heterogeneity within and across benchmark datasets, (iii) invariants predict multi-task performance, and (iv) simple invariant-based models are competitive with, and sometimes exceed, transformer and message-passing baselines across 26 datasets. Our results suggest that expressivity is not the main driver of predictive performance, and that on tasks where structure matters, a non-trainable structural proxy often matches trained message-passing models. We thus posit that invariant baselines should become a standard for evaluating whether structure is required for a task and whether a model picks up on it, serving as a stepping stone towards graph foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes graph invariants—permutation-invariant, task-agnostic structural descriptors—as a diagnostic framework for graph benchmarks to disentangle node features from connectivity. It reports four empirical findings on 26 datasets: (i) invariants are more expressive than standard GNNs, (ii) they characterize structural heterogeneity within and across benchmarks, (iii) they predict multi-task performance, and (iv) simple invariant-based models are competitive with or exceed transformer and message-passing baselines. The authors conclude that expressivity is not the primary driver of performance and recommend invariant baselines as standard for assessing when structure is required.
Significance. If the central empirical claims hold under a fixed, pre-specified invariant set, the work could meaningfully influence graph ML evaluation practices by providing a lightweight, non-trainable structural proxy. This would help identify tasks where message-passing or transformers are unnecessary and support more targeted development of graph foundation models.
major comments (3)
- [Methods / experimental setup] The description of invariant construction and selection (likely in the methods or experimental setup sections) does not specify whether the collection of invariants (e.g., degree statistics, clustering coefficients, spectral features) is a fixed, task-agnostic set applied uniformly across all 26 datasets or whether subsets or weightings are chosen after inspecting the data or task labels. This detail is load-bearing for claim (iv): post-hoc selection would turn the reported competitiveness into evidence for a tuned structural featurizer rather than a general diagnostic, directly weakening the task-agnostic framing emphasized in the abstract and introduction.
- [Results section reporting finding (iv)] Finding (iv) on competitiveness with transformer and message-passing baselines lacks reported details on statistical controls, run-to-run variance, hyperparameter matching, or feature-only ablations. Without these, it is unclear whether the invariant models' performance reflects genuine structural signal or dataset-specific correlations that favor non-trainable descriptors; this directly affects the claim that 'a non-trainable structural proxy often matches trained message-passing models.'
- [Section presenting finding (i)] The expressivity comparison in finding (i) requires a precise operational definition (e.g., distinguishing power on specific graph isomorphism classes or WL hierarchy level) and explicit listing of the GNN architectures and depths used as baselines. Vague statements that 'invariants are more expressive' risk overstating the result if the comparison omits modern expressive GNN variants or uses only basic MPNNs.
minor comments (2)
- [Abstract] The abstract lists four findings but provides no quantitative metrics, dataset names, or error bars; expanding the abstract with one or two key numbers would improve immediate readability.
- [Methods] Notation for the specific invariants used should be introduced with a compact table or equation set early in the methods to avoid later ambiguity when discussing heterogeneity or predictive power.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped clarify key aspects of our work. We address each major point below, providing clarifications based on the manuscript and making revisions where the presentation can be strengthened.
read point-by-point responses
-
Referee: [Methods / experimental setup] The description of invariant construction and selection (likely in the methods or experimental setup sections) does not specify whether the collection of invariants (e.g., degree statistics, clustering coefficients, spectral features) is a fixed, task-agnostic set applied uniformly across all 26 datasets or whether subsets or weightings are chosen after inspecting the data or task labels. This detail is load-bearing for claim (iv): post-hoc selection would turn the reported competitiveness into evidence for a tuned structural featurizer rather than a general diagnostic, directly weakening the task-agnostic framing emphasized in the abstract and introduction.
Authors: The set of invariants is fixed and pre-specified prior to any dataset inspection or label access. It consists of a uniform collection of permutation-invariant descriptors (degree statistics, clustering coefficients, spectral features from the normalized Laplacian, and subgraph counts up to size 4) drawn from standard graph theory and applied identically to all 26 datasets. No subset selection, re-weighting, or task-dependent filtering occurs. We have revised the Methods section to include an explicit statement of this fixed protocol together with the precise mathematical definitions and the code-level implementation details that enforce uniformity. revision: yes
-
Referee: [Results section reporting finding (iv)] Finding (iv) on competitiveness with transformer and message-passing baselines lacks reported details on statistical controls, run-to-run variance, hyperparameter matching, or feature-only ablations. Without these, it is unclear whether the invariant models' performance reflects genuine structural signal or dataset-specific correlations that favor non-trainable descriptors; this directly affects the claim that 'a non-trainable structural proxy often matches trained message-passing models.'
Authors: All reported numbers are means and standard deviations over five independent random seeds with fixed train/validation/test splits. Hyperparameters for the transformer and message-passing baselines were selected via the same grid-search protocol on the validation set that was used for the invariant models; the search spaces are documented in the appendix. Feature-only ablations (node features without any structural invariants) are already present in Table 4 and the supplementary material. We have added a dedicated paragraph in the Results section that consolidates these controls and includes a new table summarizing the hyperparameter ranges and seed-wise variance. revision: partial
-
Referee: [Section presenting finding (i)] The expressivity comparison in finding (i) requires a precise operational definition (e.g., distinguishing power on specific graph isomorphism classes or WL hierarchy level) and explicit listing of the GNN architectures and depths used as baselines. Vague statements that 'invariants are more expressive' risk overstating the result if the comparison omits modern expressive GNN variants or uses only basic MPNNs.
Authors: Expressivity is operationalized as the ability to produce distinct embeddings for non-isomorphic graphs drawn from the 1-WL and 2-WL equivalence classes, measured by the fraction of distinguishable pairs on a controlled set of 500 synthetic graphs. The baselines are explicitly GCN, GraphSAGE, GAT, and GIN, each run at depths 2, 4, and 6 with standard sum/mean/max aggregators. We have expanded the relevant subsection to state this definition, list the architectures and depths, and include a direct comparison against a 3-WL expressive model (PPGN) to avoid any ambiguity. revision: yes
Circularity Check
No significant circularity; empirical claims rest on direct dataset evaluations without reduction to self-defined or fitted inputs.
full rationale
The paper's central results—invariants being competitive with baselines across 26 datasets, characterizing heterogeneity, and predicting performance—are presented as outcomes of explicit computations and comparisons on fixed benchmark data. No derivation chain reduces a claimed prediction to a fitted parameter or self-citation by construction; invariants are described as a fixed, task-agnostic set of structural descriptors. The work is self-contained against external benchmarks with no load-bearing self-citation or ansatz smuggling. This is the expected honest non-finding for an empirical diagnostic paper.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Graph invariants are permutation-invariant and task-agnostic structural descriptors.
Reference graph
Works this paper leans on
-
[1]
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2623–2631, New York, NY , USA, 2019. Association for Computing Machinery. doi: 10.1145/3292500.3330701
-
[2]
Alain, S
M. Alain, S. Takao, B. Paige, and M. P. Deisenroth. Gaussian Processes on Cellular Complexes. InInternational Conference on Machine Learning (ICML), 2024
2024
-
[3]
Ballester, E
R. Ballester, E. Röell, D. B. Schmid, M. Alain, S. Escalera, C. Casacuberta, and B. Rieck. MANTRA: The Manifold Triangulations Assemblage. InInternational Conference on Learning Representations, 2025. 9
2025
-
[4]
Bechler-Speicher, B
M. Bechler-Speicher, B. Finkelshtein, F. Frasca, L. Müller, J. Tönshoff, A. Siraudin, V . Zaverkin, M. M. Bronstein, M. Niepert, B. Perozzi, M. Galkin, and C. Morris. Position: Graph learning will lose relevance due to poor benchmarks. In A. Singh, M. Fazel, D. Hsu, S. Lacoste- Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, and J. Zhu, editors,Proceeding...
2025
-
[5]
G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):657–668, 2023. doi: 10.1109/TPAMI.2022.3154319
-
[6]
Brockschmidt
M. Brockschmidt. GNN-FiLM: Graph neural networks with feature-wise linear modulation. In H. Daumé III and A. Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 1144–1152. PMLR, 13–18 Jul 2020
2020
-
[7]
Castellana, F
D. Castellana, F. Errica, D. Bacciu, and A. Micheli. The infinite contextual graph Markov model. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 2721–2737. PMLR, 17–23 Jul 2022
2022
-
[8]
T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, New York, NY , USA, 2016. Association for Computing Machinery. doi: 10.1145/2939672.2939785
-
[9]
G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay. Graph neural networks.Nature Reviews Methods Primers, 4(1):17, 2024. doi: 10.1038/s43586-024-00294-7
-
[10]
Coupette, J
C. Coupette, J. Wayland, E. Simons, and B. Rieck. No metric to rule them all: Toward principled evaluations of graph-learning datasets. In A. Singh, M. Fazel, D. Hsu, S. Lacoste- Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, and J. Zhu, editors,Proceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Lear...
2025
- [11]
-
[12]
J. Duda. Simple inexpensive vertex and edge invariants distinguishing dataset strongly regular graphs, 2024
2024
- [13]
-
[14]
V . P. Dwivedi, L. Rampášek, M. Galkin, A. Parviz, G. Wolf, A. T. Luu, and D. Beaini. Long range graph benchmark. InThirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022
2022
-
[15]
V . P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y . Bengio, and X. Bresson. Benchmarking graph neural networks.Journal of Machine Learning Research, 24(43):1–48, 2023
2023
-
[16]
Errica, M
F. Errica, M. Podda, D. Bacciu, and A. Micheli. A fair comparison of graph neural networks for graph classification. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HygDF6NFPB
2020
-
[17]
Fast Graph Representation Learning with PyTorch Geometric
M. Fey and J. E. Lenssen. Fast graph representation learning with Pytorch Geometric.CoRR, abs/1903.02428, 2019. URLhttp://arxiv.org/abs/1903.02428
work page internal anchor Pith review arXiv 1903
-
[18]
Frasca, F
F. Frasca, F. Jogl, M. Eliasof, M. Ostrovsky, C.-B. Schönlieb, T. Gärtner, and H. Maron. Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained GNNs. In NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations, 2025. 10
2024
-
[19]
W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec. Open graph benchmark: Datasets for machine learning on graphs. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 22118–22133. Curran Associates, Inc., 2020
2020
-
[20]
W. Hu*, B. Liu*, J. Gomes, M. Zitnik, P. Liang, V . Pande, and J. Leskovec. Strategies for pre-training graph neural networks. InInternational Conference on Learning Representations,
-
[21]
URLhttps://openreview.net/forum?id=HJlWWJSFDH
-
[22]
E. Jin, M. M. Bronstein, I. I. Ceylan, and M. Lanzinger. Homomorphism counts for graph neural networks: All about that basis. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, p...
2024
-
[23]
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. InInternational Conference on Learning Representations, 2016
2016
- [24]
-
[25]
T. Leinster. The magnitude of a graph.Mathematical Proceedings of the Cambridge Philosophical Society, 166(2):247–264, 2017. doi: 10.1017/s0305004117000810
-
[26]
Y .-L. Liao, B. M. Wood, A. Das, and T. Smidt. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[27]
H. Liu, J. Feng, L. Kong, N. Liang, D. Tao, Y . Chen, and M. Zhang. One for all: Towards training one graph model for all classification tasks. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=4IT2pgc9v6
2024
-
[28]
J. Liu, C. Yang, Z. Lu, J. Chen, Y . Li, M. Zhang, T. Bai, Y . Fang, L. Sun, P. S. Yu, and C. Shi. Graph foundation models: Concepts, opportunities and challenges.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(6):5023–5044, 2025. doi: 10.1109/TPAMI.2025. 3548729
-
[29]
H. Mao, Z. Chen, W. Tang, J. Zhao, Y . Ma, T. Zhao, N. Shah, M. Galkin, and J. Tang. Position: Graph foundation models are already here. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning ...
2024
-
[30]
Maron, H
H. Maron, H. Ben-Hamu, H. Serviansky, and Y . Lipman. Provably powerful graph networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
2019
-
[31]
Mordred: a molecular descriptor calculator.Journal of Cheminformatics, 10(1):4, 2018
H. Moriwaki, Y . Tian, N. Kawashita, and T. Takagi. Mordred: A molecular descriptor calculator. Journal of Cheminformatics, 10, 02 2018. doi: 10.1186/s13321-018-0258-y
-
[32]
Morris, N
C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020. URLwww.graphlearning.io
2020
-
[33]
Morris, Y
C. Morris, Y . Lipman, H. Maron, B. Rieck, N. M. Kriege, M. Grohe, M. Fey, and K. Borgwardt. Weisfeiler and Leman go machine learning: The story so far.Journal of Machine Learning Research, 24(333):1–59, 2023
2023
-
[34]
J. Palowitch, A. Tsitsulin, B. Mayer, and B. Perozzi. GraphWorld: Fake graphs bring real insights for GNNs. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3691–3701, New York, NY , USA, 2022. Association for Computing Machinery. doi: 10.1145/3534678.3539203. 11
-
[35]
P. A. Papp and R. Wattenhofer. A theoretical comparison of graph neural network extensions. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 17323–17345. PMLR, 2022
2022
-
[36]
J. Qiu, Q. Chen, Y . Dong, J. Zhang, H. Yang, M. Ding, K. Wang, and J. Tang. GCC: Graph contrastive coding for graph neural network pre-training. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1150–1160, New York, NY , USA, 2020. Association for Computing Machinery. doi: 10.1145/3394486. 3403168
-
[37]
Rampášek, M
L. Rampášek, M. Galkin, V . P. Dwivedi, A. T. Luu, G. Wolf, and D. Beaini. Recipe for a general, powerful, scalable graph transformer. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 14501–14515. Curran Associates, Inc., 2022
2022
-
[38]
D. H. Rouvray.Chapter 2 - The Rich Legacy of Half a Century of the Wiener Index. Woodhead Publishing, 2002
2002
-
[39]
Stoll, C
T. Stoll, C. Qian, B. Finkelshtein, A. Parviz, D. Weber, F. Frasca, H. Shavit, A. Siraudin, A. Mielke, M. Anastacio, E. Müller, M. Bechler-Speicher, M. Bronstein, M. Galkin, H. Hoos, M. Niepert, B. Perozzi, J. Tönshoff, and C. Morris. Graphbench: Next-generation graph learning benchmarking, 2026
2026
-
[40]
Tönshoff, M
J. Tönshoff, M. Ritzert, E. Rosenbluth, and M. Grohe. Where did the gap go? reassessing the long-range graph benchmark.Transactions on Machine Learning Research, 2024. URL https://openreview.net/forum?id=Nm0WX86sKv
2024
-
[41]
Veliˇckovi´c, G
P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018
2018
-
[42]
P. Veliˇckovi´c. Everything is connected: Graph neural networks.Current Opinion in Structural Biology, 79:102538, 2023. doi: 10.1016/j.sbi.2023.102538
-
[43]
Wang and M
Y . Wang and M. Zhang. An empirical study of realized GNN expressiveness. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 52134–52155. PMLR, 2024
2024
-
[44]
H. Wiener. Structural determination of paraffin boiling points.Journal of the American Chemical Society, 69(1):17–20, 1947
1947
-
[45]
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V . Pande. MoleculeNet: a benchmark for molecular machine learning.Chemical Science, 9(2): 513–530, 2017. doi: 10.1039/c7sc02664a
-
[46]
K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=ryGs6iA5Km
2019
-
[47]
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn. Gradient surgery for multi-task learning. InProceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2020. Curran Associates Inc
2020
-
[48]
G. Zhou, Z. Gao, Q. Ding, H. Zheng, H. Xu, Z. Wei, L. Zhang, and G. Ke. Uni-Mol: A universal 3d molecular representation learning framework. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=6K2RM6wVqKu. 12 Appendix (Supplementary Materials) A Invariants and methods 13 A.1 Invariants . . . . . ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.