pith. sign in

arxiv: 2605.18166 · v1 · pith:GAYRS6YGnew · submitted 2026-05-18 · ⚛️ physics.soc-ph · cs.SI

Hypergraphx-data: a repository for higher-order network data

Pith reviewed 2026-05-20 00:03 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.SI
keywords hypergraphshigher-order networksnetwork datasetsdata repositorysocial networksbiological networksfinancial networksmultiplex hypergraphs
0
0 comments X

The pith

A new repository supplies real-world hypergraph datasets to support analysis of group interactions beyond pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents hypergraphx-data as a dedicated collection of hypergraph datasets drawn from social, biological, and financial domains. Existing network repositories have centered on pairwise links, which leaves researchers without ready access to data on simultaneous multi-entity interactions. By supplying hypergraphs in open formats along with metadata, browsing tools, and verification features, the repository aims to make higher-order network studies more practical and reproducible. A sympathetic reader would see this as a practical step that could let empirical work catch up with theoretical interest in many-body systems.

Core claim

The authors assembled hypergraphx-data, a repository containing real-world hypergraph datasets from multiple domains that support weighted, directed, temporal, and multiplex configurations. Each dataset is provided in an open JSON format and a binarized format compatible with the Hypergraphx library, accompanied by relational information and metadata. The site offers a user-friendly interface for browsing and filtering plus hash-based verification and versioning to maintain integrity.

What carries the argument

The hypergraphx-data repository, which organizes and distributes hypergraph datasets with standardized formats, metadata, and an interface for access and verification.

If this is right

  • Social scientists can examine group-level patterns in collaboration or communication data that pairwise networks obscure.
  • Biologists gain ready access to datasets describing multi-species or multi-molecule interactions.
  • Finance researchers can test models of higher-order dependencies in transaction or portfolio data.
  • Algorithm developers obtain standardized benchmarks for hypergraph-specific methods.
  • Reproducibility improves because each dataset carries versioning and integrity checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of the repository could prompt the creation of new machine-learning pipelines that operate directly on hypergraphs rather than their pairwise reductions.
  • The collection might serve as a testbed for comparing how well different higher-order models recover known structures in empirical systems.
  • Researchers in adjacent fields such as epidemiology or ecology could adapt the datasets to study group contagion or multi-species dynamics.

Load-bearing premise

That real-world systems frequently involve simultaneous interactions among more than two entities and that making such data widely available will enable new empirical analyses.

What would settle it

Whether independent researchers can download the datasets, project them to pairwise graphs, and demonstrate that higher-order metrics or models produce measurably different results on at least one dataset from the collection.

Figures

Figures reproduced from arXiv: 2605.18166 by Alberto Montresor, Bern\'e Nortier, Federico Battiston, Lorenzo Betti, Quintino Francesco Lotito.

Figure 1
Figure 1. Figure 1: FIG. 1: Distribution of datasets by application domain [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: Distribution of datasets by hypergraph type in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: Home page of the repository website, showcasing its interface and key features for navigating and accessing [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

The availability of network datasets advances research in network science, machine learning and related fields by enabling empirical analyses and their reproducibility, algorithm development, model validation and benchmarking. Existing repositories, such as SNAP and Netzschleuder, have made traditional network datasets widely accessible with metadata, metrics, and basic visualizations. However, they primarily focus on pairwise interactions, limiting data access to systems with many-body interactions. To address this gap, we created hypergraphx-data, a repository of real-world hypergraph datasets for higher-order network analysis, spanning different domains from social networks to biology and finance, and supporting configurations such as weighted, directed, temporal, and multiplex hypergraphs. Each dataset includes relational information and metadata, provided in an open JSON format and a binarized format for Hypergraphx. We provide a user-friendly interface to facilitate browsing, filtering, and accessing the datasets, while also ensuring integrity and reproducibility through hash-based verification and data versioning. The repository is available at https://hgx-team.github.io/hypergraphx-data

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript describes the creation of the hypergraphx-data repository to address the lack of accessible real-world hypergraph datasets for higher-order network analysis. It notes that existing resources such as SNAP and Netzschleuder focus primarily on pairwise interactions and presents hypergraphx-data as filling this gap with datasets spanning social networks, biology, and finance. The repository supports weighted, directed, temporal, and multiplex hypergraphs, provides data in open JSON format along with a binarized format compatible with Hypergraphx, includes metadata, and incorporates hash-based verification, versioning, and a user-friendly interface for browsing and access. The repository is hosted at https://hgx-team.github.io/hypergraphx-data.

Significance. A well-curated and maintained repository of this type would meaningfully advance empirical work in higher-order networks by supplying reproducible data for algorithm development, model validation, and cross-domain benchmarking. The emphasis on multiple formats, verification mechanisms, and accessibility features directly supports the stated goals of reproducibility and usability; these practical elements constitute a clear strength of the contribution.

major comments (1)
  1. [Datasets section] The manuscript would be strengthened by the inclusion of a summary table (likely in the Datasets or Repository Description section) listing each dataset together with basic statistics (number of nodes, hyperedges, and supported attributes such as weighted/temporal) and source references; without this, readers cannot readily assess the breadth and balance of the collection.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief quantitative statement (e.g., total number of datasets or illustrative examples) to convey the repository's scale more concretely.
  2. [Repository features] Clarify in the text whether every listed dataset supports all mentioned configurations or whether support varies by dataset; this would avoid potential ambiguity for users.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and positive recommendation for minor revision. We have carefully considered the single major comment and will incorporate the suggested improvement to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Datasets section] The manuscript would be strengthened by the inclusion of a summary table (likely in the Datasets or Repository Description section) listing each dataset together with basic statistics (number of nodes, hyperedges, and supported attributes such as weighted/temporal) and source references; without this, readers cannot readily assess the breadth and balance of the collection.

    Authors: We agree that a summary table would improve the manuscript by enabling readers to quickly gauge the scale, diversity, and attributes of the included hypergraphs. In the revised version, we will add a table in the Datasets section (or Repository Description) that reports, for each dataset, the number of nodes, number of hyperedges, supported attributes (weighted, directed, temporal, multiplex), and the original source references. This addition directly addresses the concern about assessing breadth and balance. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes the curation and public release of a hypergraph dataset repository as a factual contribution to address the gap in existing network data sources. No derivations, equations, predictions, fitted parameters, or first-principles results are claimed or present. The central assertion is verifiable externally by repository access and does not reduce to any self-definition, self-citation chain, or input renaming. The work is self-contained as a data curation effort with standard integrity checks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work consists of data collection and standard repository practices.

pith-pipeline@v0.9.0 · 5722 in / 1029 out tokens · 30331 ms · 2026-05-20T00:03:08.157781+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Newman, Networks: An introduction (2010)

    M. Newman, Networks: An introduction (2010)

  2. [2]

    Leskovec, A

    J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection,http://snap.stanford.edu/ data(2014)

  3. [3]

    R. A. Rossi, N. K. Ahmed, An interactive data repository with visual analytics.SIGKDD Explor.17, 37–41 (2016)

  4. [4]

    Kunegis,Proc

    J. Kunegis,Proc. Int. Conf. on World Wide Web Com- panion(2013), pp. 1343–1350

  5. [5]

    T. P. Peixoto, The netzschleuder network catalogue and repository (2023)

  6. [6]

    Battiston, G

    F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lu- cas, A. Patania, J.-G. Young, G. Petri, Networks beyond 6 pairwise interactions: structure and dynamics.Physics Reports874, 1–92 (2020)

  7. [7]

    Berge,Graphs and hypergraphs(North-Holland Pub

    C. Berge,Graphs and hypergraphs(North-Holland Pub. Co., 1973)

  8. [8]

    Battiston, E

    F. Battiston, E. Amico, A. Barrat, G. Bianconi, G. Fer- raz de Arruda, B. Franceschiello, I. Iacopini, S. K´ efi, V. Latora, Y. Moreno,et al., The physics of higher- order interactions in complex systems.Nature Physics 17, 1093–1098 (2021)

  9. [9]

    Q. F. Lotito, M. Contisciani, C. De Bacco, L. Di Gae- tano, L. Gallo, A. Montresor, F. Musciotto, N. Rug- geri, F. Battiston, Hypergraphx: a library for higher- order network analysis.Journal of Complex Networks 11, cnad019 (2023)

  10. [10]

    N. W. Landry, M. Lucas, I. Iacopini, G. Petri, A. Schwarze, A. Patania, L. Torres, Xgi: A python pack- age for higher-order interaction networks.Journal of Open Source Software8, 5162 (2023)

  11. [11]

    Antelmi, D

    A. Antelmi, D. De Vinco, C. Spagnuolo,International Workshop on Algorithms and Models for the Web-Graph (Springer, 2024), pp. 159–173

  12. [12]

    Stehl´ e, N

    J. Stehl´ e, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.- F. Pinton, M. Quaggiotto, W. Van den Broeck, C. R´ egis, B. Lina,et al., High-resolution measurements of face-to- face contact patterns in a primary school.PloS one6, e23176 (2011)

  13. [13]

    Vanhems, A

    P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer, C. R´ egis, B.-a. Kim, B. Comte, N. Voirin, Estimating potential infection transmission routes in hos- pital wards using wearable proximity sensors.PloS one 8, e73970 (2013)

  14. [14]

    Mastrandrea, J

    R. Mastrandrea, J. Fournet, A. Barrat, Contact patterns in a high school: a comparison between data collected us- ing wearable sensors, contact diaries and friendship sur- veys.PloS one10, e0136497 (2015)

  15. [15]

    G´ enois, C

    M. G´ enois, C. L. Vestergaard, J. Fournet, A. Panisson, I. Bonmarin, A. Barrat, Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers.Network Science3, 326–347 (2015)

  16. [16]

    G’enois, A

    M. G’enois, A. Barrat, Can co-location be used as a proxy for face-to-face contacts?EPJ Data Science7, 11 (2018)

  17. [17]

    Sinha, Z

    A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. P. Hsu, K. Wang,Proceedings of the 24th International Confer- ence on World Wide Web(ACM Press, 2015)

  18. [18]

    A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, J. Kleinberg, Simplicial closure and higher-order link pre- diction.Proceedings of the National Academy of Sciences 115, E11221–E11230 (2018). [19]https://journals.aps.org/datasets(2021)

  19. [19]

    Amburg, N

    I. Amburg, N. Veldt, A. R. Benson, Hypergraph clus- tering for finding diverse and experienced groups.arXiv preprint arXiv:2006.05645(2020)

  20. [20]

    Epstein, T

    L. Epstein, T. G. Walker, N. S. S. Hendrickson, J. Roberts, The U.S. Supreme Court Justices Database (2019)

  21. [21]

    Leskovec, J

    J. Leskovec, J. Kleinberg, C. Faloutsos, Graph evolution: Densification and shrinking diameters.ACM Transac- tions on Knowledge Discovery from Data1(2007)

  22. [22]

    H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich,Proceed- ings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(ACM Press, 2017)

  23. [23]

    Gelardi, J

    V. Gelardi, J. Godard, D. Paleressompoulle, N. Claidiere, A. Barrat, Measuring social networks in primates: Wear- able sensors versus direct observations.Proceedings of the Royal Society A: Mathematical, Physical and Engineer- ing Sciences476, 20190737 (2020)

  24. [24]

    Bauer-Mehren, M

    A. Bauer-Mehren, M. Bundschus, M. Rautschka, M. A. Mayer, F. Sanz, L. I. Furlong, Gene-disease network anal- ysis reveals functional modules in mendelian, complex and environmental diseases.PLOS ONE6, 1-13 (2011)

  25. [25]

    We tested the entire dataset catalog, av- eraging loading times for ten runs

    Experimental setup: Ubuntu 24.04.3 LTS, x86 64, 8 CPU cores,∼94 GiB RAM, Python 3.10; experiments were run single-threaded. We tested the entire dataset catalog, av- eraging loading times for ten runs. Storage sizes refer to uncompressed files

  26. [26]

    M. Coll, C. A. Joslyn, N. W. Landry, Q. F. Lotito, A. Myers, J. Pickard, B. Praggastis, P. Szufel,et al., Hif: The hypergraph interchange format for higher-order networks.arXiv preprint arXiv:2507.11520(2025)

  27. [27]

    Praggastis, S

    B. Praggastis, S. Aksoy, D. Arendt, M. Bonicillo, C. Joslyn, E. Purvine, M. Shapiro, J. Y. Yun, Hyper- NetX: A Python package for modeling complex network data as hypergraphs.Journal of Open Source Software 9, 6016 (2024)

  28. [28]

    Spagnuolo, G

    C. Spagnuolo, G. Cordasco, P. Szufel, P. Pra lat, V. Scarano, B. Kami´ nski, A. Antelmi, Analyzing, Ex- ploring, and Visualizing Complex Networks via Hyper- graphs using SimpleHypergraphs.jl.Internet Mathemat- ics1(2020)

  29. [29]

    Sociopatterns: Measuring and analyzing social interac- tion patterns

  30. [30]

    Cencetti, F

    G. Cencetti, F. Battiston, B. Lepri, M. Karsai, Temporal properties of higher-order interactions in social networks. Scientific Reports11, 1–10 (2021)

  31. [31]

    Q. F. Lotito, A. Vendramini, A. Montresor, F. Bat- tiston, The microscale organization of directed hyper- graphs.Communications Physics(2026)

  32. [32]

    Q. F. Lotito, A. Montresor, F. Battiston, Multiplex mea- sures for higher-order networks.Applied Network Science 9, 55 (2024)

  33. [33]

    Q. F. Lotito, F. Musciotto, F. Battiston, A. Montre- sor, Exact and sampling methods for mining higher-order motifs in large hypergraphs.Computing106, 475–494 (2024)

  34. [34]

    Contisciani, F

    M. Contisciani, F. Battiston, C. De Bacco, Inference of hyperedges and overlapping communities in hypergraphs. Nature Communications13, 7229 (2022)

  35. [35]

    Ruggeri, M

    N. Ruggeri, M. Contisciani, F. Battiston, C. De Bacco, Community detection in large hypergraphs.Science Ad- vances9, eadg9159 (2023)

  36. [36]

    Kirkley, H

    A. Kirkley, H. Felippe, F. Battiston, Structural reducibil- ity of hypergraphs.Physical Review Letters135, 247401 (2025)

  37. [37]

    Musciotto, F

    F. Musciotto, F. Battiston, R. N. Mantegna, Detect- ing informative higher-order interactions in statistically validated hypergraphs.Communications Physics4, 1–9 (2021)

  38. [38]

    Genetti, E

    S. Genetti, E. Ribaga, E. Cunegatti, Q. F. Lotito, G. Iacca,International Conference on Parallel Problem Solving from Nature(Springer, 2024), pp. 217–235

  39. [39]

    B. L. Nortier, S. Dobson, F. Battiston, Higher-order shortest paths in hypergraphs.Physical Review E112, 054302 (2025)