Hypergraphx-data: a repository for higher-order network data
Pith reviewed 2026-05-20 00:03 UTC · model grok-4.3
The pith
A new repository supplies real-world hypergraph datasets to support analysis of group interactions beyond pairs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors assembled hypergraphx-data, a repository containing real-world hypergraph datasets from multiple domains that support weighted, directed, temporal, and multiplex configurations. Each dataset is provided in an open JSON format and a binarized format compatible with the Hypergraphx library, accompanied by relational information and metadata. The site offers a user-friendly interface for browsing and filtering plus hash-based verification and versioning to maintain integrity.
What carries the argument
The hypergraphx-data repository, which organizes and distributes hypergraph datasets with standardized formats, metadata, and an interface for access and verification.
If this is right
- Social scientists can examine group-level patterns in collaboration or communication data that pairwise networks obscure.
- Biologists gain ready access to datasets describing multi-species or multi-molecule interactions.
- Finance researchers can test models of higher-order dependencies in transaction or portfolio data.
- Algorithm developers obtain standardized benchmarks for hypergraph-specific methods.
- Reproducibility improves because each dataset carries versioning and integrity checks.
Where Pith is reading between the lines
- Widespread use of the repository could prompt the creation of new machine-learning pipelines that operate directly on hypergraphs rather than their pairwise reductions.
- The collection might serve as a testbed for comparing how well different higher-order models recover known structures in empirical systems.
- Researchers in adjacent fields such as epidemiology or ecology could adapt the datasets to study group contagion or multi-species dynamics.
Load-bearing premise
That real-world systems frequently involve simultaneous interactions among more than two entities and that making such data widely available will enable new empirical analyses.
What would settle it
Whether independent researchers can download the datasets, project them to pairwise graphs, and demonstrate that higher-order metrics or models produce measurably different results on at least one dataset from the collection.
Figures
read the original abstract
The availability of network datasets advances research in network science, machine learning and related fields by enabling empirical analyses and their reproducibility, algorithm development, model validation and benchmarking. Existing repositories, such as SNAP and Netzschleuder, have made traditional network datasets widely accessible with metadata, metrics, and basic visualizations. However, they primarily focus on pairwise interactions, limiting data access to systems with many-body interactions. To address this gap, we created hypergraphx-data, a repository of real-world hypergraph datasets for higher-order network analysis, spanning different domains from social networks to biology and finance, and supporting configurations such as weighted, directed, temporal, and multiplex hypergraphs. Each dataset includes relational information and metadata, provided in an open JSON format and a binarized format for Hypergraphx. We provide a user-friendly interface to facilitate browsing, filtering, and accessing the datasets, while also ensuring integrity and reproducibility through hash-based verification and data versioning. The repository is available at https://hgx-team.github.io/hypergraphx-data
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the creation of the hypergraphx-data repository to address the lack of accessible real-world hypergraph datasets for higher-order network analysis. It notes that existing resources such as SNAP and Netzschleuder focus primarily on pairwise interactions and presents hypergraphx-data as filling this gap with datasets spanning social networks, biology, and finance. The repository supports weighted, directed, temporal, and multiplex hypergraphs, provides data in open JSON format along with a binarized format compatible with Hypergraphx, includes metadata, and incorporates hash-based verification, versioning, and a user-friendly interface for browsing and access. The repository is hosted at https://hgx-team.github.io/hypergraphx-data.
Significance. A well-curated and maintained repository of this type would meaningfully advance empirical work in higher-order networks by supplying reproducible data for algorithm development, model validation, and cross-domain benchmarking. The emphasis on multiple formats, verification mechanisms, and accessibility features directly supports the stated goals of reproducibility and usability; these practical elements constitute a clear strength of the contribution.
major comments (1)
- [Datasets section] The manuscript would be strengthened by the inclusion of a summary table (likely in the Datasets or Repository Description section) listing each dataset together with basic statistics (number of nodes, hyperedges, and supported attributes such as weighted/temporal) and source references; without this, readers cannot readily assess the breadth and balance of the collection.
minor comments (2)
- [Abstract] The abstract would benefit from a brief quantitative statement (e.g., total number of datasets or illustrative examples) to convey the repository's scale more concretely.
- [Repository features] Clarify in the text whether every listed dataset supports all mentioned configurations or whether support varies by dataset; this would avoid potential ambiguity for users.
Simulated Author's Rebuttal
We thank the referee for their constructive review and positive recommendation for minor revision. We have carefully considered the single major comment and will incorporate the suggested improvement to strengthen the manuscript.
read point-by-point responses
-
Referee: [Datasets section] The manuscript would be strengthened by the inclusion of a summary table (likely in the Datasets or Repository Description section) listing each dataset together with basic statistics (number of nodes, hyperedges, and supported attributes such as weighted/temporal) and source references; without this, readers cannot readily assess the breadth and balance of the collection.
Authors: We agree that a summary table would improve the manuscript by enabling readers to quickly gauge the scale, diversity, and attributes of the included hypergraphs. In the revised version, we will add a table in the Datasets section (or Repository Description) that reports, for each dataset, the number of nodes, number of hyperedges, supported attributes (weighted, directed, temporal, multiplex), and the original source references. This addition directly addresses the concern about assessing breadth and balance. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes the curation and public release of a hypergraph dataset repository as a factual contribution to address the gap in existing network data sources. No derivations, equations, predictions, fitted parameters, or first-principles results are claimed or present. The central assertion is verifiable externally by repository access and does not reduce to any self-definition, self-citation chain, or input renaming. The work is self-contained as a data curation effort with standard integrity checks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection,http://snap.stanford.edu/ data(2014)
work page 2014
-
[3]
R. A. Rossi, N. K. Ahmed, An interactive data repository with visual analytics.SIGKDD Explor.17, 37–41 (2016)
work page 2016
-
[4]
J. Kunegis,Proc. Int. Conf. on World Wide Web Com- panion(2013), pp. 1343–1350
work page 2013
-
[5]
T. P. Peixoto, The netzschleuder network catalogue and repository (2023)
work page 2023
-
[6]
F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lu- cas, A. Patania, J.-G. Young, G. Petri, Networks beyond 6 pairwise interactions: structure and dynamics.Physics Reports874, 1–92 (2020)
work page 2020
-
[7]
Berge,Graphs and hypergraphs(North-Holland Pub
C. Berge,Graphs and hypergraphs(North-Holland Pub. Co., 1973)
work page 1973
-
[8]
F. Battiston, E. Amico, A. Barrat, G. Bianconi, G. Fer- raz de Arruda, B. Franceschiello, I. Iacopini, S. K´ efi, V. Latora, Y. Moreno,et al., The physics of higher- order interactions in complex systems.Nature Physics 17, 1093–1098 (2021)
work page 2021
-
[9]
Q. F. Lotito, M. Contisciani, C. De Bacco, L. Di Gae- tano, L. Gallo, A. Montresor, F. Musciotto, N. Rug- geri, F. Battiston, Hypergraphx: a library for higher- order network analysis.Journal of Complex Networks 11, cnad019 (2023)
work page 2023
-
[10]
N. W. Landry, M. Lucas, I. Iacopini, G. Petri, A. Schwarze, A. Patania, L. Torres, Xgi: A python pack- age for higher-order interaction networks.Journal of Open Source Software8, 5162 (2023)
work page 2023
-
[11]
A. Antelmi, D. De Vinco, C. Spagnuolo,International Workshop on Algorithms and Models for the Web-Graph (Springer, 2024), pp. 159–173
work page 2024
-
[12]
J. Stehl´ e, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.- F. Pinton, M. Quaggiotto, W. Van den Broeck, C. R´ egis, B. Lina,et al., High-resolution measurements of face-to- face contact patterns in a primary school.PloS one6, e23176 (2011)
work page 2011
-
[13]
P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer, C. R´ egis, B.-a. Kim, B. Comte, N. Voirin, Estimating potential infection transmission routes in hos- pital wards using wearable proximity sensors.PloS one 8, e73970 (2013)
work page 2013
-
[14]
R. Mastrandrea, J. Fournet, A. Barrat, Contact patterns in a high school: a comparison between data collected us- ing wearable sensors, contact diaries and friendship sur- veys.PloS one10, e0136497 (2015)
work page 2015
-
[15]
M. G´ enois, C. L. Vestergaard, J. Fournet, A. Panisson, I. Bonmarin, A. Barrat, Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers.Network Science3, 326–347 (2015)
work page 2015
-
[16]
M. G’enois, A. Barrat, Can co-location be used as a proxy for face-to-face contacts?EPJ Data Science7, 11 (2018)
work page 2018
- [17]
-
[18]
A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, J. Kleinberg, Simplicial closure and higher-order link pre- diction.Proceedings of the National Academy of Sciences 115, E11221–E11230 (2018). [19]https://journals.aps.org/datasets(2021)
work page 2018
- [19]
-
[20]
L. Epstein, T. G. Walker, N. S. S. Hendrickson, J. Roberts, The U.S. Supreme Court Justices Database (2019)
work page 2019
-
[21]
J. Leskovec, J. Kleinberg, C. Faloutsos, Graph evolution: Densification and shrinking diameters.ACM Transac- tions on Knowledge Discovery from Data1(2007)
work page 2007
-
[22]
H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich,Proceed- ings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(ACM Press, 2017)
work page 2017
-
[23]
V. Gelardi, J. Godard, D. Paleressompoulle, N. Claidiere, A. Barrat, Measuring social networks in primates: Wear- able sensors versus direct observations.Proceedings of the Royal Society A: Mathematical, Physical and Engineer- ing Sciences476, 20190737 (2020)
work page 2020
-
[24]
A. Bauer-Mehren, M. Bundschus, M. Rautschka, M. A. Mayer, F. Sanz, L. I. Furlong, Gene-disease network anal- ysis reveals functional modules in mendelian, complex and environmental diseases.PLOS ONE6, 1-13 (2011)
work page 2011
-
[25]
We tested the entire dataset catalog, av- eraging loading times for ten runs
Experimental setup: Ubuntu 24.04.3 LTS, x86 64, 8 CPU cores,∼94 GiB RAM, Python 3.10; experiments were run single-threaded. We tested the entire dataset catalog, av- eraging loading times for ten runs. Storage sizes refer to uncompressed files
- [26]
-
[27]
B. Praggastis, S. Aksoy, D. Arendt, M. Bonicillo, C. Joslyn, E. Purvine, M. Shapiro, J. Y. Yun, Hyper- NetX: A Python package for modeling complex network data as hypergraphs.Journal of Open Source Software 9, 6016 (2024)
work page 2024
-
[28]
C. Spagnuolo, G. Cordasco, P. Szufel, P. Pra lat, V. Scarano, B. Kami´ nski, A. Antelmi, Analyzing, Ex- ploring, and Visualizing Complex Networks via Hyper- graphs using SimpleHypergraphs.jl.Internet Mathemat- ics1(2020)
work page 2020
-
[29]
Sociopatterns: Measuring and analyzing social interac- tion patterns
-
[30]
G. Cencetti, F. Battiston, B. Lepri, M. Karsai, Temporal properties of higher-order interactions in social networks. Scientific Reports11, 1–10 (2021)
work page 2021
-
[31]
Q. F. Lotito, A. Vendramini, A. Montresor, F. Bat- tiston, The microscale organization of directed hyper- graphs.Communications Physics(2026)
work page 2026
-
[32]
Q. F. Lotito, A. Montresor, F. Battiston, Multiplex mea- sures for higher-order networks.Applied Network Science 9, 55 (2024)
work page 2024
-
[33]
Q. F. Lotito, F. Musciotto, F. Battiston, A. Montre- sor, Exact and sampling methods for mining higher-order motifs in large hypergraphs.Computing106, 475–494 (2024)
work page 2024
-
[34]
M. Contisciani, F. Battiston, C. De Bacco, Inference of hyperedges and overlapping communities in hypergraphs. Nature Communications13, 7229 (2022)
work page 2022
-
[35]
N. Ruggeri, M. Contisciani, F. Battiston, C. De Bacco, Community detection in large hypergraphs.Science Ad- vances9, eadg9159 (2023)
work page 2023
-
[36]
A. Kirkley, H. Felippe, F. Battiston, Structural reducibil- ity of hypergraphs.Physical Review Letters135, 247401 (2025)
work page 2025
-
[37]
F. Musciotto, F. Battiston, R. N. Mantegna, Detect- ing informative higher-order interactions in statistically validated hypergraphs.Communications Physics4, 1–9 (2021)
work page 2021
-
[38]
S. Genetti, E. Ribaga, E. Cunegatti, Q. F. Lotito, G. Iacca,International Conference on Parallel Problem Solving from Nature(Springer, 2024), pp. 217–235
work page 2024
-
[39]
B. L. Nortier, S. Dobson, F. Battiston, Higher-order shortest paths in hypergraphs.Physical Review E112, 054302 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.