TUDataset: A collection of benchmark datasets for learning with graphs
Pith reviewed 2026-05-25 07:49 UTC · model grok-4.3
The pith
The TUDataset supplies over 120 benchmark datasets for graph classification and regression together with Python data loaders, kernel and graph neural network baselines, and evaluation tools.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments.
What carries the argument
The TUDataset collection, which aggregates more than 120 benchmark datasets for graph tasks and supplies loaders plus baseline code for kernels and graph neural networks.
If this is right
- Standardized evaluation procedures enable direct comparisons of methods on the same graph classification tasks.
- Baseline kernel and graph neural network results serve as reference points for new approaches.
- Access to datasets from many application areas supports testing across domains.
- Reproducible code allows verification of reported performance numbers.
Where Pith is reading between the lines
- Widespread adoption might reduce the spread of incomparable results across different studies.
- The collection could become a starting point for creating additional standardized test suites in related graph tasks.
Load-bearing premise
The main obstacle to progress in graph learning is the lack of meaningful benchmark datasets and standardized evaluation procedures, so releasing this collection will reduce that obstacle.
What would settle it
Papers in the area continue to rely on non-overlapping datasets and differing evaluation protocols without adopting the TUDataset resources.
read the original abstract
Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce the TUDataset collection for graph classification and regression, consisting of over 120 datasets from various applications. It provides Python-based data loaders, kernel and GNN baseline implementations, and evaluation tools. The abstract states that an overview of the datasets, standardized evaluation procedures, and baseline experiments are given, with all datasets available at www.graphlearning.io and experiments reproducible from code at the provided GitHub repository.
Significance. If the collection is comprehensive and the tools effective, this resource could help standardize benchmarks in graph learning, facilitating advancements by addressing the lack of meaningful benchmarks. The explicit commitment to reproducibility through available code is a strength that enhances the potential impact.
major comments (1)
- [Abstract] Abstract: The abstract asserts the existence and availability of the collection and tools but supplies no details on dataset selection criteria, validation, or baseline performance numbers; soundness of the central claim cannot be verified beyond the statement of availability.
Simulated Author's Rebuttal
We thank the referee for their review and address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts the existence and availability of the collection and tools but supplies no details on dataset selection criteria, validation, or baseline performance numbers; soundness of the central claim cannot be verified beyond the statement of availability.
Authors: Abstracts are intentionally concise and serve to summarize the paper's contributions at a high level. The manuscript body provides the overview of the datasets (including selection criteria and characteristics from various applications), standardized evaluation procedures, and baseline experiments with performance numbers. The central claim of introducing a reproducible collection is substantiated by the public availability of all datasets at www.graphlearning.io and the code at the GitHub repository, enabling direct verification and use by the community. We maintain that the abstract appropriately highlights these elements without requiring the level of detail suggested. revision: no
Circularity Check
No significant circularity; resource announcement only
full rationale
The paper is a dataset collection announcement containing no derivations, equations, predictions, fitted parameters, or load-bearing technical claims. The abstract describes introducing TUDataset with loaders and baselines but presents no analytic chain that could reduce to self-definition, fitted inputs, or self-citations. This is a standard resource paper whose claims are self-contained and externally verifiable by dataset availability.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 24 Pith papers
-
GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?
GraphIP-Bench shows stealing GNNs is easy at moderate query budgets, most defenses fail to block or reliably trace extraction, and watermarks lose verification power on surrogates while heterophilic graphs are harder ...
-
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
Authors release HSG-12M, a dataset of 16.7 million spatial multigraphs generated from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, along with initial GNN benchmarks.
-
Beyond Oversquashing: Understanding Signal Propagation in GNNs Via Observables
Quantum-inspired observables reveal poor signal routing in standard spectral GNNs and motivate Schrödinger GNNs with superior propagation capacity.
-
Higher-order Persistence Diagrams
Higher-order persistence diagrams are defined recursively via interval containments, and their aggregations can be evaluated in nearly linear time using zeta transforms instead of explicit pair enumeration.
-
CTQWformer: A CTQW-based Transformer for Graph Classification
CTQWformer fuses continuous-time quantum walks into a graph transformer and recurrent module to outperform standard GNNs and graph kernels on classification benchmarks.
-
Concept Graph Convolutions: Message Passing in the Concept Space
Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.
-
R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII
R2G is a multi-view circuit graph benchmark showing that representation choice affects GNN accuracy more than model architecture, with node-centric views and deeper decoders performing best.
-
Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA
HyperX is the first end-to-end FPGA accelerator for Nyström-based HDC graph classification, delivering 6.85× speedup and 169× energy efficiency over CPU baselines plus 3.4% average accuracy gain on TUDataset benchmarks.
-
Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization
Logic-based Weisfeiler-Leman variants enable graph-to-table conversion for classification that matches GNN and graph transformer accuracy while running 5-20x faster without GPUs.
-
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
HSG-12M is a large dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, positioned as the first large-scale benchmark of this graph type.
-
A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants
RelSC is a new graph regression benchmark from program graphs with execution time labels, released in homogeneous (RelSC-H) and multi-relational (RelSC-M) variants to study representation effects.
-
Estimating Subgraph Importance with Structural Prior Domain Knowledge
A label-free Group Lasso method estimates important subgraphs in pretrained GNNs by incorporating domain structural knowledge.
-
Quantum Injection Pathways for Implicit Graph Neural Networks
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and...
-
GraphNetz: Statistical Benchmarking of Graph Neural Networks with Paired Tests and Rank Aggregation
GraphNetz supplies an automated statistical pipeline for GNN benchmarking that includes per-cell confidence intervals, paired tests with multiple-comparison correction, and critical-difference diagrams across tasks an...
-
Subgraph Concept Networks: Concept Levels in Graph Classification
Subgraph Concept Network is a new GNN architecture that distills meaningful concepts at node, subgraph, and graph levels via soft clustering to improve explainability while maintaining competitive accuracy.
-
Learning from Historical Activations in Graph Neural Networks
HISTOGRAPH applies unified layer-wise attention followed by node-wise attention over historical GNN activations to improve graph classification, especially in deep models.
-
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
Adaptive canonicalization selects input canonical forms by maximizing network predictive confidence to yield continuous symmetry-preserving models with universal approximation for equivariant geometric networks.
-
How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations
Quantum-oriented embeddings deliver consistent gains on structure-driven graph datasets while classical baselines perform adequately on attribute-limited social graphs, under identical training pipelines across five T...
-
GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks
GP2F is a dual-branch graph prompting framework that fuses frozen pre-trained knowledge with task-specific adaptation to reduce estimation error and outperform baselines in cross-domain few-shot node and graph classification.
-
OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks
OpenGLT benchmark finds no single GNN architecture dominates graph-level tasks, with subgraph-based models strongest in expressiveness, graph learning and SSL models in robustness, node and pooling models in efficienc...
-
Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence
The paper claims current graph condensation approaches are flawed due to full-dataset training requirements, high overhead, poor generalization, and misleading evaluation metrics, calling for a reset toward lightweigh...
-
Fine-Grained Graph Generation through Latent Mixture Scheduling
A novel CVAE with mixture scheduling achieves fine-grained structural control in graph generation, showing high quality and controllability on five datasets.
-
Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence
Graph condensation methods must move beyond full-dataset training and model dependence toward lightweight, architecture-agnostic designs to achieve practical efficiency in GNNs.
-
Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey
A survey compiling graph rewiring techniques for mitigating over-squashing and over-smoothing in GNNs.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.