arxiv: 2604.25597 · v1 · submitted 2026-04-28 · 💻 cs.SI · cs.DL· physics.soc-ph· stat.AP

Recognition: unknown

Generating Synthetic Citation Networks with Communities

{\L}ukasz Brzozowski , Marek Gagolewski , Grzegorz Siudem

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:13 UTC · model grok-4.3

classification 💻 cs.SI cs.DLphysics.soc-phstat.AP

keywords citation networkssynthetic graph generationcommunity detection benchmarksPrice-Pareto modeldirected acyclic graphsnetwork growth modelsstochastic block modelsmesoscopic structure

0 comments

The pith

The Citation Seeder generates realistic synthetic citation networks using up to four orders of magnitude fewer parameters than leading alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper systematically compares twelve generators for directed nearly acyclic graphs that include planted community structure, testing them against seven real citation networks on twenty-six metrics. It shows that simply reversing edge directions in static models improves their fit to citation flow and that high-parameter models tend to overfit by memorizing specific community statistics. The authors introduce the Citation Seeder, an iterative process based on the Price-Pareto preferential attachment rule, which matches the performance of the strongest baselines while remaining interpretable and computationally linear in the size of the output graph.

Core claim

The Citation Seeder algorithm iteratively constructs directed graphs by adding nodes and directing edges according to a Price-Pareto citation process. When evaluated on seven real citation networks using twenty-six metrics, it produces results competitive with the best-performing baselines while using up to four orders of magnitude fewer parameters. The paper further establishes that reversing edges in static generators breaks cycles and induces realistic flow, and that exogenous mesoscopic similarities between generated and real networks matter more for realism than endogenous ones; high-parameter models fail because they memorize planted community statistics rather than capturing general,,

What carries the argument

The Citation Seeder algorithm, an iterative generator that adds nodes and edges according to the Price-Pareto preferential attachment model with a small number of interpretable parameters and linear runtime.

If this is right

Reversing edge directions in static community generators produces more citation-like acyclic structures and improves performance of the degree-corrected stochastic block model.
Exogenous mesoscopic similarities are more important than endogenous ones when judging how realistic a generated network is.
Models with many free parameters overfit by memorizing the statistics of planted communities and therefore fail to produce realistic overall network structure.
The Citation Seeder supplies an interpretable framework that can both explain observed citation growth and forecast future network evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The small number of interpretable parameters could be estimated from early citation data to forecast how a new paper or patent will accumulate citations over time.
The same iterative construction might be adapted to generate synthetic patent or software-dependency networks that also exhibit near-acyclic flow.
Improved low-parameter benchmarks could make community-detection methods more robust when tested on networks whose community structure evolves rather than remaining fixed.
The distinction between endogenous and exogenous similarities suggests that future generators should prioritize matching global growth statistics over exact reproduction of any single planted partition.

Load-bearing premise

The chosen twenty-six metrics on seven real networks are enough to judge whether a synthetic citation network is realistic, and that the ground-truth communities are fixed external features rather than arising from the network's own growth rules.

What would settle it

Fit the Citation Seeder parameters to the first half of a real citation network's history, generate forward trajectories, and check whether the predicted citation counts and community evolution match the actual second half of the same network.

Figures

Figures reproduced from arXiv: 2604.25597 by Grzegorz Siudem, {\L}ukasz Brzozowski, Marek Gagolewski.

**Figure 1.** Figure 1: An example network generated by our algorithm with parameters estimated from the Cora citation view at source ↗

**Figure 2.** Figure 2: CS-generated vs theoretical in-degree distributions (Eq. view at source ↗

**Figure 3.** Figure 3: Global mean ranks with 95% bootstrap confidence intervals, obtained by block-resampling view at source ↗

**Figure 4.** Figure 4: Pairwise comparison: number of CS wins / ties / losses against its six nearest competitors, view at source ↗

**Figure 5.** Figure 5: Category-level performance scores for selected methods. Score view at source ↗

**Figure 6.** Figure 6: Heatmap presenting impact of cycle breaking per category. Each cell represents a percent of view at source ↗

**Figure 7.** Figure 7: Ablation: effect of back-edge injection on category scores. Positive differences (annotated) indicate view at source ↗

**Figure 8.** Figure 8: Per-dataset performance scores for CS, CS (DAG), and DC-SBM-nD. Left: all 26 metrics; right: view at source ↗

**Figure 9.** Figure 9: Performance-parsimony trade-off. Horizontal axis: geometric mean of parameter counts across view at source ↗

read the original abstract

Generating realistic synthetic citation, patent, or component dependency networks is essential for benchmarking community detection, graph visualisation, and network data mining algorithms. We present the first systematic comparison of generators of directed graphs that are nearly acyclic and have a ground-truth community structure. We evaluate 12 methods across 7 real citation networks and 26 metrics. We propose the practice of reversing directions of edges in static generators to break cycles and induce a citation-like flow, which significantly improves the performance of a degree-corrected Stochastic Block Model. Our novel methodological approach to evaluating community detection benchmarks distinguishes between endogenous and exogenous mesoscopic similarities, with the latter proving more important. This distinction reveals that high-parameter models suffer from overfitting by memorising planted community statistics which lead to their failing to produce realistic networks. Finally, we introduce the Citation Seeder (CS) algorithm, an iterative generator grounded in the Price-Pareto model of citation networks, with interpretable parameters and O(N+E) runtime. CS achieves competitive results against the best-performing baselines while using up to four orders of magnitude fewer parameters and providing a clean framework for explaining and predicting a network's future growth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents the first systematic comparison of 12 generators for directed, nearly acyclic graphs with ground-truth community structure. It evaluates them across 7 real citation networks using 26 metrics, proposes reversing edge directions in static generators (improving degree-corrected SBM performance), distinguishes endogenous from exogenous mesoscopic similarities (finding the latter more important), critiques high-parameter models for overfitting planted communities, and introduces the Citation Seeder (CS) algorithm based on the Price-Pareto model. CS is claimed to achieve competitive results with up to four orders of magnitude fewer parameters, O(N+E) runtime, and a framework for explaining and predicting network growth.

Significance. If the results hold, the work supplies a parsimonious, interpretable generative model for citation networks that avoids the overfitting issues identified in high-parameter baselines. The endogenous/exogenous distinction offers a useful methodological lens for designing community-detection benchmarks, and the low-parameter count plus linear runtime would make CS practical for large-scale synthetic data generation in network science.

major comments (2)

[Abstract / Evaluation] Abstract and evaluation description: The central claim that CS supplies a framework for predicting a network's future growth is not load-bearingly supported by the described evaluation. The 26 metrics are applied to static snapshots of 7 networks; temporal validation (e.g., citation-age distributions, attachment-rate evolution across time slices, or hold-out prediction of future edges) is required to test whether the iterative Price-Pareto process reproduces observed growth dynamics rather than merely matching static statistics.
[Abstract] Abstract: The assertion that exogenous mesoscopic similarities are more important than endogenous ones, and that this explains why high-parameter models fail, needs explicit quantitative linkage to the 26 metrics. Which specific metrics demonstrate the overfitting of planted community statistics, and how does CS avoid this while still producing realistic networks?

minor comments (1)

The abstract refers to '26 metrics' without enumerating or categorizing them (structural, community, temporal, etc.). A concise table or appendix listing the metrics and their groupings would improve reproducibility and allow readers to assess coverage of growth-related properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important aspects of our evaluation and claims that we address point by point below. We propose targeted revisions to strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and evaluation description: The central claim that CS supplies a framework for predicting a network's future growth is not load-bearingly supported by the described evaluation. The 26 metrics are applied to static snapshots of 7 networks; temporal validation (e.g., citation-age distributions, attachment-rate evolution across time slices, or hold-out prediction of future edges) is required to test whether the iterative Price-Pareto process reproduces observed growth dynamics rather than merely matching static statistics.

Authors: We agree that the predictive capability of the CS framework would benefit from explicit temporal validation to fully substantiate the claim in the abstract. The Price-Pareto model is iterative by design, with parameters that directly control attachment rates and growth, enabling forward simulation in principle. However, the current experiments evaluate static match to real networks. We will add a new subsection with hold-out experiments: training CS parameters on early time slices of the citation networks and evaluating prediction of later edges via metrics such as precision@K for future citations and evolution of in-degree distributions. This will be reported alongside the existing 26 metrics. revision: yes
Referee: [Abstract] Abstract: The assertion that exogenous mesoscopic similarities are more important than endogenous ones, and that this explains why high-parameter models fail, needs explicit quantitative linkage to the 26 metrics. Which specific metrics demonstrate the overfitting of planted community statistics, and how does CS avoid this while still producing realistic networks?

Authors: The distinction between endogenous and exogenous mesoscopic structure is operationalized by comparing models that directly optimize or plant community statistics (e.g., degree-corrected SBM variants) against those that generate communities as an emergent outcome of the growth process (CS and certain baselines). Overfitting is evidenced by high-parameter models achieving strong scores on community-aware metrics such as modularity, conductance, and normalized mutual information with the planted labels, yet performing poorly on global structural metrics including degree distribution, clustering coefficient, and acyclicity measures. CS avoids this by using only a handful of interpretable parameters that govern preferential attachment and community seeding without directly fitting mesoscopic statistics. We will insert a new paragraph and reference table in the evaluation section that explicitly maps subsets of the 26 metrics to the endogenous/exogenous distinction and the overfitting diagnosis. revision: partial

Circularity Check

0 steps flagged

No significant circularity: CS derivation grounded in external Price-Pareto model with independent benchmarks

full rationale

The paper grounds the Citation Seeder (CS) in the established Price-Pareto citation model and evaluates generated networks against 7 independent real citation networks using 26 metrics. This provides external validation rather than reducing claims to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The distinction between endogenous/exogenous structure is a methodological contribution evaluated on held-out real data, and the predictive growth framework follows directly from the iterative generative process without tautological reduction. No equations or steps in the provided text exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the paper relies on the Price-Pareto model as a domain assumption and introduces CS with its parameters.

free parameters (1)

parameters of CS
Interpretable parameters but number not specified in abstract.

axioms (1)

domain assumption Citation networks follow the Price-Pareto model
The CS is grounded in this model.

pith-pipeline@v0.9.0 · 5507 in / 1087 out tokens · 52771 ms · 2026-05-07T14:13:28.367327+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Community detection and stochastic block models: Recent developments

Abbe, E., 2018. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research 18, 1–86

2018
[2]

Achieving the KS threshold in the general Stochastic Block Model with linearized acyclic belief propagation, in: Proc

Abbé, E., Sandon, C., 2016. Achieving the KS threshold in the general Stochastic Block Model with linearized acyclic belief propagation, in: Proc. Neural Information Processing Systems (NIPS’16), pp. 1334–1342

2016
[3]

Statistical mechanics of complex networks

Albert, R., Barabási, A.L., 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47–97. doi:10.1103/RevModPhys.74.47

work page doi:10.1103/revmodphys.74.47 2002
[4]

Time -Related Outcome Following Palliative Spatially Fractionated Stereotactic Radiation Therapy (Lattice) of Large Tumors,

Bertoli-Barsotti, L., Gagolewski, M., Siudem, G., Żogała Siudem, B., 2024. Equivalence of inequality indices in the three-dimensional model of informetric impact. Journal of Informetrics 18, 101566. doi:10.1016/j.joi.2024.101566

work page doi:10.1016/j.joi.2024.101566 2024
[5]

Competition and multiscaling in evolving networks

Bianconi, G., Barabási, A.L., 2001. Competition and multiscaling in evolving networks. Europhysics Letters 54, 436. doi:10.1209/epl/i2001-00260-6

work page doi:10.1209/epl/i2001-00260-6 2001
[6]

Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking, in: International Conference on Learning Representations (ICLR’18)

Bojchevski, A., Günnemann, S., 2018. Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking, in: International Conference on Learning Representations (ICLR’18)

2018
[7]

A probabilistic proof of an asymptotic formula for the number of labelled regular graphs

Bollobás, B., 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 311–316. doi:10.1016/S0195-6698(80)80030-8

work page doi:10.1016/s0195-6698(80)80030-8 1980
[8]

Graph generators

Bonifati, A., Holubová, I., Prat-Pèrez, A., Sakr, S., 2020. Graph generators. ACM Computing Surveys 53, 1–30. doi:10. 1145/3379445

2020
[9]

DynBenchmark: Customizable ground truths to benchmark community detection and tracking in temporal networks, in: Proc

Brisson, L., Bothorel, C., Duminy, N., 2025. DynBenchmark: Customizable ground truths to benchmark community detection and tracking in temporal networks, in: Proc. France’s International Conference on Complex Systems (FRCCS 2025), pp. 74–85. doi:10.1007/978-3-032-00206-8_8

work page doi:10.1007/978-3-032-00206-8_8 2025
[10]

UnPACD: Unified patent and citation dataset.https://github.com/lukaszbrzozowski/unpacd

Brzozowski, L., 2026. UnPACD: Unified patent and citation dataset.https://github.com/lukaszbrzozowski/unpacd

2026
[11]

The Price-Pareto growth model of networks with community structure

Brzozowski, L., Gagolewski, M., Siudem, G., Żogała Siudem, B., 2026. The Price-Pareto growth model of networks with community structure doi:10.48550/arxiv.2510.13392. under review (preprint). Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 21 of 23 Generating Synthetic Citation Networks with Communities

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.13392 2026
[12]

Recovering asymmetric communities in the Stochastic Block Model

Caltagirone, F., Lelarge, M., Miolane, L., 2017. Recovering asymmetric communities in the Stochastic Block Model. IEEE Transactions on Network Science and Engineering 5, 237–246. doi:10.1109/tnse.2017.2758201

work page doi:10.1109/tnse.2017.2758201 2017
[13]

Generating directed graphs with dual attention and asymmetric encoding, in: The Fourteenth International Conference on Learning Representations

Carballo-Castro, A., Madeira, M., QIN, Y., Thanou, D., Frossard, P., 2026. Generating directed graphs with dual attention and asymmetric encoding, in: The Fourteenth International Conference on Learning Representations

2026
[14]

Relational topic models for document networks, in: van Dyk, D., Welling, M

Chang, J., Blei, D., 2009. Relational topic models for document networks, in: van Dyk, D., Welling, M. (Eds.), Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pp. 81–88

2009
[15]

Community detection in subspace of attribute

Chen, H., Yu, Z., Yang, Q., Shao, J., 2022. Community detection in subspace of attribute. Information Sciences 602, 220–235. doi:10.1016/j.ins.2022.04.047

work page doi:10.1016/j.ins.2022.04.047 2022
[16]

Asymptotic analysis of the Stochastic Block Model for modular networks and its algorithmic applications

Decelle, A., Krząkała, F., Moore, C., Zdeborová, L., 2011. Asymptotic analysis of the Stochastic Block Model for modular networks and its algorithmic applications. Physical Review E 84. doi:10.1103/physreve.84.066106

work page doi:10.1103/physreve.84.066106 2011
[17]

Random graph modeling: A survey of the concepts

Drobyshevskiy, M., Turdakov, D., 2019. Random graph modeling: A survey of the concepts. ACM Computing Surveys 52, 131. doi:10.1145/3369782

work page doi:10.1145/3369782 2019
[18]

A fast and effective heuristic for the feedback arc set problem

Eades, P., Lin, X., Smyth, W., 1993. A fast and effective heuristic for the feedback arc set problem. Information Processing Letters 47, 319–323. doi:10.1016/0020-0190(93)90079-O

work page doi:10.1016/0020-0190(93)90079-o 1993
[19]

The use of ranks to avoid the assumption of normality implicit in the analysis of variance

Friedman, M., 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 675–701

1937
[20]

Mannhardt, A

Gagolewski, M., 2022. A framework for benchmarking clustering algorithms. SoftwareX 20, 101270. doi:10.1016/j.softx. 2022.101270

work page doi:10.1016/j.softx 2022
[21]

Are cluster validity measures (in)valid? Information Sciences 581, 620–636

Gagolewski, M., Bartoszuk, M., Cena, A., 2021. Are cluster validity measures (in)valid? Information Sciences 581, 620–636. doi:10.1016/j.ins.2021.10.004

work page doi:10.1016/j.ins.2021.10.004 2021
[22]

doi: https://doi.org/10.1016/0378-8733(83)90021-7

Holland, P.W., Laskey, K.B., Leinhardt, S., 1983. Stochastic blockmodels: First steps. Social Networks 5, 109–137. doi:10.1016/0378-8733(83)90021-7

work page doi:10.1016/0378-8733(83)90021-7 1983
[23]

The aging effect in evolving scientific citation networks

Hu, F., Ma, L., Zhan, X.X., Zhou, Y., Liu, C., Zhao, H., Zhang, Z.K., 2021. The aging effect in evolving scientific citation networks. Scientometrics 126, 4297–4309. doi:10.1007/s11192-021-03929-8

work page doi:10.1007/s11192-021-03929-8 2021
[24]

Open Graph Benchmark: Datasets for machine learning on graphs

Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., Leskovec, J., 2020. Open Graph Benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems 33, 22118–22133

2020
[25]

Cluster analysis: A modern statistical review

Jaeger, A., Banks, D., 2023. Clusteranalysis: Amodernstatisticalreview. WileyInterdisciplinaryReviews: Computational Statistics 15, e1597. doi:10.1002/wics.1597

work page doi:10.1002/wics.1597 2023
[26]

Random graph models for directed acyclic networks

Karrer, B., Newman, M.E.J., 2009. Random graph models for directed acyclic networks. Physical Review E 80, 046110. doi:10.1103/PhysRevE.80.046110

work page doi:10.1103/physreve.80.046110 2009
[27]

Stochastic blockmodels with a growing number of classes

Karrer, B., Newman, M.E.J., 2011. Stochastic blockmodels with a growing number of classes. Physical Review E 83, 016107. doi:10.1103/PhysRevE.83.016107

work page doi:10.1103/physreve.83.016107 2011
[28]

Benchmark graphs for testing community detection algorithms , volume =

Lancichinetti, A., Fortunato, S., Radicchi, F., 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78. doi:10.1103/physreve.78.046110

work page doi:10.1103/physreve.78.046110 2008
[29]

A review of Stochastic Block Models and extensions for graph clustering

Lee, C., Wilkinson, D., 2019. A review of Stochastic Block Models and extensions for graph clustering. Applied Network Science 4. doi:10.1007/s41109-019-0232-2

work page doi:10.1007/s41109-019-0232-2 2019
[30]

Bayesian testing for exogenous partition structures in Stochastic Block Models

Legramanti, S., Rigon, T., Durante, D., 2022. Bayesian testing for exogenous partition structures in Stochastic Block Models. Sankhya A 84, 108–126. doi:10.1007/s13171-020-00231-2

work page doi:10.1007/s13171-020-00231-2 2022
[31]

SNAPDatasets: Stanfordlargenetworkdatasetcollection.http://snap.stanford.edu/data

Leskovec, J., Krevl, A., 2014. SNAPDatasets: Stanfordlargenetworkdatasetcollection.http://snap.stanford.edu/data

2014
[32]

Dirichlet graph variational autoencoder, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H

Li, J., Yu, J., Li, J., Zhang, H., Zhao, K., Rong, Y., Cheng, H., Huang, J., 2020. Dirichlet graph variational autoencoder, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (Eds.), Advances in Neural Information Processing Systems, pp. 5274–5283

2020
[33]

Hygen: Generating random graphs with hyperbolic communities

Metzler, S., Miettinen, P., 2019. Hygen: Generating random graphs with hyperbolic communities. Applied Network Science 4. doi:10.1007/s41109-019-0166-8

work page doi:10.1007/s41109-019-0166-8 2019
[34]

The computer science and physics of community detection: Landscapes, phase transitions, and hardness

Moore, C., 2017. The computer science and physics of community detection: Landscapes, phase transitions, and hardness. Bulletin of the EATCS 121

2017
[35]

Multilayer network approach to modeling authorship influence on citation dynamics in physics journals

Nanumyan, V., Gote, C., Schweitzer, F., 2020. Multilayer network approach to modeling authorship influence on citation dynamics in physics journals. Physical Review E 102. doi:10.1103/physreve.102.032303

work page doi:10.1103/physreve.102.032303 2020
[36]

ma-CODE: A multi-phase approach on community detection in evolving networks

Nath, K., Shanmugam, R., Varadaranjan, V., 2021. ma-CODE: A multi-phase approach on community detection in evolving networks. Information Sciences 569, 326–343. doi:10.1016/j.ins.2021.02.068

work page doi:10.1016/j.ins.2021.02.068 2021
[37]

Rezzolla and O

Newman, M., 2018. Networks. Oxford University Press. doi:10.1093/oso/9780198805090.001.0001

work page doi:10.1093/oso/9780198805090.001.0001 2018
[38]

The first-mover advantage in scientific publication

Newman, M.E.J., 2009. The first-mover advantage in scientific publication. EPL (Europhysics Letters) 86, 68001–68001. doi:10.1209/0295-5075/86/68001

work page doi:10.1209/0295-5075/86/68001 2009
[39]

Networks of scientific papers

Price, D., 1965. Networks of scientific papers. Science 149, 510–515. doi:10.1126/science.149.3683.510

work page doi:10.1126/science.149.3683.510 1965
[40]

The map equation

Rosvall, M., Axelsson, D., Bergstrom, C.T., 2009. The map equation. The European Physical Journal Special Topics 178, 13–23. doi:10.1140/epjst/e2010-01179-1

work page doi:10.1140/epjst/e2010-01179-1 2009
[41]

Collective classification in network data

Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T., 2008. Collective classification in network data. AI Magazine 29, 93–106

2008
[42]

Three dimensions of scientific impact

Siudem, G., Żogała Siudem, B., Cena, A., Gagolewski, M., 2020. Three dimensions of scientific impact. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 117, 13896–13900. doi:10.1073/pnas.2001064117

work page doi:10.1073/pnas.2001064117 2020
[43]

Community detection in directed acyclic graphs

Speidel, L., Takaguchi, T., Masuda, N., 2015. Community detection in directed acyclic graphs. The European Physical Journal B 88. doi:10.1140/epjb/e2015-60226-y

work page doi:10.1140/epjb/e2015-60226-y 2015
[44]

Sun, J., Ajwani, D., Nicholson, P.K., Sala, A., Parthasarathy, S., 2017. Breaking cycles in noisy hierarchies, in: Proceedings Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 22 of 23 Generating Synthetic Citation Networks with Communities of the 2017 ACM on Web Science Conference, Association for Computing Machinery, New York,...

work page doi:10.1145/3091478.3091495 2017
[45]

Arnetminer: Extraction and mining of academic social networks, in: KDD’08, pp

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z., 2008. Arnetminer: Extraction and mining of academic social networks, in: KDD’08, pp. 990–998

2008
[46]

Scientific Reports9(1), 5233 (2019) https://doi

Traag, V., Waltman, L., vanEck, N.J., 2019. FromLouvaintoLeiden: Guaranteeingwell-connectedcommunities. Scientific Reports 9, 5233. doi:10.1038/s41598-019-41695-z

work page doi:10.1038/s41598-019-41695-z 2019
[47]

Validation of cluster analysis results on validation data: A systematic framework

Ullmann, T., Hennig, C., Boulesteix, A.L., 2022. Validation of cluster analysis results on validation data: A systematic framework. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, e1444. doi:10.1002/widm.1444

work page doi:10.1002/widm.1444 2022
[48]

A white paper on good research practices in benchmarking: The case of cluster analysis

van Mechelen, I., Boulesteix, A.L., Dangl, R., et al., 2023. A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13, e1511. doi:10. 1002/widm.1511

2023
[49]

Quantifying long-term scientific impact

Wang, D., Song, C., Barabási, A.L., 2013. Quantifying long-term scientific impact. Science 342, 127–132

2013
[50]

Detectability of macroscopic structures in directed asymmetric Stochastic Block Model

Wiliński, M., Mazzarisi, P., Tantari, D., Lillo, F., 2019. Detectability of macroscopic structures in directed asymmetric Stochastic Block Model. Physical Review E 99. doi:10.1103/physreve.99.042310

work page doi:10.1103/physreve.99.042310 2019
[51]

Community detection based on modularity and k-plexes

Zhu, J., Chen, B., Zeng, Y., 2020. Community detection based on modularity and k-plexes. Information Sciences 513, 127–142. doi:10.1016/j.ins.2019.10.076. Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 23 of 23

work page doi:10.1016/j.ins.2019.10.076 2020