pith. sign in

arxiv: 1906.09537 · v1 · pith:R3K7JIQWnew · submitted 2019-06-23 · 🧮 math.ST · math.AC· stat.TH

Algebraic Statistics in Practice: Applications to Networks

Pith reviewed 2026-05-25 18:06 UTC · model grok-4.3

classification 🧮 math.ST math.ACstat.TH
keywords algebraic statisticsnetwork modelscausal discoveryphylogeneticsrelational datacommutative algebrastatistical geometry
0
0 comments X

The pith

Algebraic statistics applies algebra, geometry and combinatorics to three network problems in statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey shows how algebraic statistics uses tools from multilinear algebra, commutative algebra, computational algebra, geometry and combinatorics to address problems in mathematical statistics. It focuses on three network areas: models for relational data, causal structure discovery, and phylogenetics. The authors review recent results and stress the statistical achievements and practical relevance these tools bring to applications in other fields. A reader would care if the algebraic descriptions deliver concrete improvements in identifiability, computation, or interpretation that standard methods miss.

Core claim

Algebraic statistics uses tools from algebra, geometry and combinatorics to provide insight into knotty problems in mathematical statistics, illustrated on network models for relational data, causal structure discovery and phylogenetics, with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines.

What carries the argument

Algebraic, geometric and combinatorial descriptions of statistical models for networks, which turn model properties into exact algebraic statements that enable new computations and tests.

If this is right

  • Relational data models gain exact algebraic parametrizations that clarify identifiability and allow direct computation of likelihoods.
  • Causal structure discovery obtains geometric criteria that distinguish identifiable models from non-identifiable ones.
  • Phylogenetic inference benefits from combinatorial invariants that reduce the search space over tree topologies.
  • The same toolkit extends to concrete applications in biology, social sciences and other domains that rely on network data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The surveyed methods could be tested on large-scale modern network datasets to measure gains in scalability over existing software.
  • Algebraic descriptions might reveal new connections between relational data models and causal graphs that were not previously noticed.
  • Similar algebraic techniques could be applied to other statistical domains such as time-series or spatial data where networks appear.

Load-bearing premise

These algebraic, geometric and combinatorial tools actually produce new statistical insights with practical relevance that standard methods do not already deliver.

What would settle it

A direct comparison on the same network datasets showing that conventional statistical procedures match or exceed the accuracy, speed or interpretability obtained from the algebraic approach would undermine the central claim.

Figures

Figures reproduced from arXiv: 1906.09537 by Caroline Uhler, Marta Casanellas, Sonja Petrovi\'c.

Figure 1
Figure 1. Figure 1: Author networks constructed from the data collected by Ji and Jin ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Testing model fit for the ERGMs with node degrees a sufficient statistics. Left: histogram [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: 3-dimensional permutohedron consisting of all permutations of length 4 and the DAG [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The three different unrooted tree topologies on four leaves are denoted as 12 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each problem we give an overview of recent results in algebraic statistics with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is a survey of algebraic statistics, using tools from multilinear algebra, commutative algebra, computational algebra, geometry, and combinatorics to address problems in mathematical statistics. It illustrates these on three network-related domains—network models for relational data, causal structure discovery, and phylogenetics—by overviewing recent results and emphasizing the resulting statistical achievements and their practical relevance to other scientific disciplines.

Significance. As a survey without new theorems or empirical results, the paper's value lies in synthesizing existing literature to make algebraic statistics more accessible for network problems. If the overviews are accurate and the cited works indeed demonstrate concrete statistical insights beyond standard methods, the manuscript could facilitate cross-disciplinary applications in statistics, computer science, and biology by highlighting practical relevance.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'recent results' is vague without a time frame or indication of the most influential cited papers; adding one or two key references would improve reader orientation.
  2. The survey structure would benefit from a short concluding section that explicitly compares algebraic approaches to conventional statistical methods across the three domains, to better substantiate the claim of new insights.
  3. Ensure consistent notation for algebraic objects (e.g., ideals, varieties) across sections and verify that all external results are cited with page or theorem numbers where possible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The assessment accurately captures the manuscript as a survey synthesizing algebraic statistics tools for network-related problems in relational data, causal discovery, and phylogenetics. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

Survey paper: no derivations or predictions present

full rationale

This is a survey paper that overviews existing applications of algebraic statistics to network problems. It introduces no new theorems, models, equations, predictions, or fitted quantities. All referenced results are drawn from external cited literature. The central claim is descriptive rather than deductive, so no load-bearing step reduces to its own inputs by construction. This is the most common honest finding for overview articles.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no new free parameters, axioms, or invented entities; it reviews applications from prior algebraic statistics literature.

pith-pipeline@v0.9.0 · 5602 in / 963 out tokens · 25740 ms · 2026-05-25T18:06:06.962941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

136 extracted references · 136 canonical work pages · 3 internal anchors

  1. [1]

    M., Blei, D

    Airoldi, E. M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems\/ , pages 33--40

  2. [2]

    Allman, E. S. and Rhodes, J. A. (2007). Phylogenetic invariants. In O. Gascuel and M. A. Steel, editors, Reconstructing Evolution\/ . Oxford University Press

  3. [3]

    Allman, E. S. and Rhodes, J. A. (2008). Phylogenetic ideals and varieties for the general M arkov model. Advances in Applied Mathematics\/ , 40 , 127--148

  4. [4]

    S., Ane , C., and Rhodes , J

    Allman , E. S., Ane , C., and Rhodes , J. A. (2008). Identifiability of a Markovian model of molecular evolution with gamma-distributed rates . Advances in Applied Probability\/ , 40 (1), 229--249

  5. [5]

    S., Matias, C., and Rhodes, J

    Allman, E. S., Matias, C., and Rhodes, J. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics\/ , 37 (6A), 3099--3132

  6. [6]

    S., Petrovi\'c, S., Rhodes, J

    Allman, E. S., Petrovi\'c, S., Rhodes, J. A., and Sullivant, S. (2011). Identifiability of two-tree mixtures for group-based models. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 8 (3), 710--722

  7. [7]

    S., Rhodes , J

    Allman , E. S., Rhodes , J. A., and Taylor , A. (2014). A semialgebraic description of the general M arkov model on phylogenetic trees. SIAM Journal on Discrete Mathematics\/ , 28 , 736–--755

  8. [8]

    S., Rhodes, J

    Allman, E. S., Rhodes, J. A., Sturmfels, B., and Zwiernik, P. (2015). Tensors of nonnegative rank two. Linear Algebra and its Applications\/ , 473 , 37--53

  9. [9]

    S., Degnan, J

    Allman, E. S., Degnan, J. H., and Rhodes, J. A. (2018). Species tree inference from gene splits by unrooted star methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 15 (1), 337--342

  10. [10]

    S., Long, C., and Rhodes, J

    Allman, E. S., Long, C., and Rhodes, J. A. (2019). Species tree inference from genomic sequences using the log-det distance. SIAM Journal on Applied Algebra and Geometry\/ , 3 (1), 107--127

  11. [11]

    Andersson, S. A. (1975). Invariant normal models. The Annals of Statistics\/ , 3 , 132--154

  12. [12]

    Aoki, S., Hara, H., and Takemura, A. (2012). Markov Bases in Algebraic Statistics\/ . Springer

  13. [13]

    Bailey, R. A. (1981). A unified approach to design of experiments. Journal of the Royal Statistical Society, Series A\/ , 144 , 214--223

  14. [14]

    Optimal hypothesis testing for stochastic block models with growing degrees

    Banerjee, D. and Ma, Z. (2017). Optimal hypothesis testing for stochastic block models with growing degrees. arXiv preprint arXiv:1705.05305\/

  15. [15]

    Ba \ n os, H. (2019). Identifying species network features from gene tree quartets under the coalescent model. Bulletin of Mathematical Biology\/ , 81 (2), 494--534

  16. [16]

    Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory\/ . Wiley, New York

  17. [17]

    B., Krivitsky, P

    Carnegie, N. B., Krivitsky, P. N., Hunter, D. R., and Goodreau, S. M. (2015). An approximation method for improving dynamic network model fitting. Journal of Computational and Graphical Statistics\/ , 24 (2), 502--519

  18. [18]

    Carr, M. P. and Devadoss, S. L. (2006). Coxeter complexes and graph-associahedra. Topology and its Applications\/ , 153 , 2155--2216

  19. [19]

    and Fern\'andez-S\'anchez, J

    Casanellas, M. and Fern\'andez-S\'anchez, J. (2010). Relevant phylogenetic invariants of evolutionary models. Journalc de Math\'ematiques Pures et Appliquées\/ , 96 , 207--229

  20. [20]

    Casanellas, M., Fern\'andez-S\'anchez, J., and Kedzierska, A. (2012). The space of phylogenetic mixtures for equivariant models. Algorithms for Molecular Biology\/ , 7 (1), 33

  21. [21]

    Catanese, F., Ho s ten, S., Khetan, A., and Sturmfels, B. (2006). The maximum likelihood degree. Amer. J. Math. , 128 (3), 671--697

  22. [22]

    Cavender, J. A. and Felsenstein, J. (1987). Invariants of phylogenies in a simple case with discrete states. Journal of Classification\/ , 4 , 57--71

  23. [23]

    Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical Biosciences\/ , 137 (1), 51--73

  24. [24]

    Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability\/ , 21 (4), 1400--1435

  25. [25]

    Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research\/ , 3 , 507--554

  26. [26]

    and Kubatko, L

    Chifman, J. and Kubatko, L. (2014). Quartet Inference from SNP Data Under the Coalescent Model . Bioinformatics\/ , 30 (23), 3317--3324

  27. [27]

    and Kubatko, L

    Chifman, J. and Kubatko, L. (2015). Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. Journal of Theoretical Biology\/ , 374 , 35--47

  28. [28]

    and Kubatko, L

    Chifman, J. and Kubatko, L. (2019). An invariants-based method for efficient identification of hybrid speciation from large-scale genomic data. https://www.biorxiv.org/content/10.1101/034348v1

  29. [29]

    D., and Snir, S

    Chor, B., Hendy, M. D., and Snir, S. (2006). Maximum likelihood jukes-cantor triplets: Analytic solutions. Molecular Biology and Evolution\/ , 23 (3), 626--632

  30. [30]

    Cong, L., Ran, F., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P., Wu, X., Jiang, W., Marraffini, L., and Zhang, F. (2013). Multiplex genome engineering using CRISPR/Cas systems . Science\/ , 339 (6121), 819--823

  31. [31]

    Cussens, J., Haws, D., and Studen\'y, M. (2017). Polyhedral aspects of score equivalence in B ayesian network structure learning. Mathematical Programming, Series A\/ , 164 , 285--324

  32. [32]

    Davidson, R., Lawhorn, M., Rusinko, J., and Weber, N. (2018). Efficient quartet representations of trees and applications to supertree and summary methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 15 (3), 1010--1015

  33. [33]

    J., Wright, A

    Devitt, T. J., Wright, A. M., Cannatella, D. C., and Hillis, D. M. (2019). Species delimitation in endangered groundwater salamanders: Implications for aquifer management and biodiversity conservation. Proceedings of the National Academy of Sciences\/ , 116 (7), 2624--2633

  34. [34]

    and Sturmfels, B

    Diaconis, P. and Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics\/ , 26 (1), 363--397

  35. [35]

    Dillon, M. (2016). Runtime for performing exact tests on the p_1 statistical model for random graphs\/ . Ph.D. thesis, Illinois Institute of Technology

  36. [36]

    and Matsen IV, F

    Dinh, V. and Matsen IV, F. A. (2017). The shape of the one-dimensional phylogenetic likelihood function. The Annals of Applied Probability\/ , 27 (3), 1646--1677

  37. [37]

    Drton, M., Sturmfels, B., and Sullivant, S. (2009). Lectures on Algebraic Statistics\/ . Oberwolfach Seminars. Birkh \" a user

  38. [38]

    Drton, M., Lin, S., Weihs, L., and Zwiernik, P. (2017). Marginal likelihood and model selection for G aussian latent tree and forest models. Bernoulli\/ , 23 , 1202--1232

  39. [39]

    Drton, M., Robeva, E., and Weihs, L. (2018). Nested covariance determinants and restricted trek separation in Gaussian graphical models. Preprint available at http://arxiv.org/abs/1807.07561

  40. [40]

    and R \'e nyi, A

    Erd\" o s, P. and R \'e nyi, A. (1961). On the evolution of random graphs. Bulletin de L'Institut International de Statistique\/ , 38 (4), 343--347

  41. [41]

    Evans, R. J. (2018). Model selection and local geometry. Preprint available at http://arxiv.org/abs/1801.08364

  42. [42]

    Evans, S. N. and Speed, T. P. (1993). Invariants of some probability models used in phylogenetic inference. The Annals of Statistics\/ , 21 (1), 355--377

  43. [43]

    Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology\/ , 27 , 401--410

  44. [44]

    and Casanellas, M

    Fern\'andez-S\'anchez, J. and Casanellas, M. (2016). Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Systematic Biology\/ , 65 (2), 280--291

  45. [45]

    Fienberg, S. E. and Slavkovic, A. B. (2004). Making the release of confidential data from multi-way tables count. Chance\/ , 17 (3), 5--10

  46. [46]

    Fienberg, S. E. and Wasserman, S. S. (1981a). Categorical data analysis of single sociometric relations. Sociological Methodology\/ , 12 , 156--192

  47. [47]

    Fienberg, S. E. and Wasserman, S. S. (1981b). Discussion of H olland, P. W. and L einhardt, S. `` A n exponential family of probability distributions for directed graphs". Journal of the American Statistical Association\/ , 76 , 54--57

  48. [48]

    E., Meyer, M

    Fienberg, S. E., Meyer, M. M., and Wasserman, S. S. (1985). Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association\/ , 80 (389), 51--67

  49. [49]

    and Strauss, D

    Frank, O. and Strauss, D. (1986). Markov graphs. Journal of the A merican Statistical Association\/ , 81 (395), 832--842

  50. [50]

    and Kubatko, L

    Gaither, J. and Kubatko, L. (2016). Hypothesis tests for phylogenetic quartets, with applications to coalescent-based species tree inference. Journal of Theoretical Biology\/ , 408 , 179--186

  51. [51]

    Testing Network Structure Using Relations Between Small Subgraph Probabilities

    Gao, C. and Lafferty, J. (2017). Testing network structure using relations between small subgraph probabilities. arXiv preprint arXiv:1704.06742\/

  52. [52]

    Gillispie, S. B. and Perlman, M. D. (2001). Enumerating M arkov equivalence classes of acyclic digraph models. In UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence\/

  53. [53]

    X., Fienberg, S

    Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning\/ , 2 (2), 129--233

  54. [54]

    and Long, C

    Gross, E. and Long, C. (2018). Distinguishing phylogenetic networks. SIAM Journal on Applied Algebra and Geometry\/ , 2 (1), 72--93

  55. [55]

    Gross, E., Petrovi\'c, S., and Stasi, D. (2016). G oodness-of-fit for log-linear network models: D ynamic M arkov bases using hypergraphs. Annals of the Institute of Statistical Mathematics\/ , pages 673--704. DOI: 10.1007/s10463-016-0560-2

  56. [56]

    Gross, E., Petrovi\'c, S., and Stasi, D. (2019). Estimating an exact conditional p-value for log-linear ERGM s. Preprint, forthcoming

  57. [57]

    Handcock, M. S. (2003). Assessing degeneracy in statistical models for social networks. Working paper 39., Center for Statistics and the Social Sciences, University of Washington, Seattle

  58. [58]

    Hendy, M. D. and Penny, D. (1989). A framework for the quantitative study of evolutionary trees. Systematic Zoology\/ , 38 , 297--309

  59. [59]

    D., Penny, D., and Steel, M

    Hendy, M. D., Penny, D., and Steel, M. (1994). A discrete F ourier analysis for evolutionary trees. Proceedings National Academy Sciences\/ , 91 , 3339--3343

  60. [60]

    D., Raftery, A

    Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the A merican Statistical association\/ , 97 (460), 1090--1098

  61. [61]

    Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association\/ , 76 (373), 33--65

  62. [62]

    W., Laskey, K

    Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks\/ , 5 (2), 109--137

  63. [63]

    Huelsenbeck, J. P. (1995). Performance of phylogenetic methods in simulation. Systematic Biology\/ , 44 , 17--48

  64. [64]

    R., Handcock, M

    Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., and Morris, M. (2008a). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software\/ , 24 (3), 1--29

  65. [65]

    R., Goodreau, S

    Hunter, D. R., Goodreau, S. M., and Handcock, M. S. (2008b). Goodness of fit of social network models. Journal of the American Statistical Association\/ , 103 (481), 248--258

  66. [66]

    Jaakkola, T., Sontag, D., Globerson, A., and Meila, M. (2010). Learning B ayesian network structure using LP relaxations. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics\/ , pages 358--365

  67. [67]

    James, A. T. (1954). Normal multivariate analysis and the orthogonal group. The Annals of Mathematical Statistics\/ , 25 , 40--75

  68. [68]

    Jensen, S. T. (1988). Covariance hypotheses which are linear in both the covariance and the inverse covariance. The Annals of Statistics\/ , 116 , 302--322

  69. [69]

    and Jin, J

    Ji, P. and Jin, J. (2016). Coauthorship and citation networks for statisticians. Annals of Applied Statistics\/ , 10 (4), 1779--1812

  70. [70]

    and B\"uhlmann, P

    Kalisch, M. and B\"uhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC -algorithm. Journal of Machine Learning Research\/ , 8 , 613--636

  71. [71]

    and Newman, M

    Karrer, B. and Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical Review E\/ , 83 (1), 016107

  72. [72]

    and Petrovi\'c, S

    Karwa, V. and Petrovi\'c, S. (2016). Coauthorship and citation networks for statisticians: Comment. Annals of Applied Statistics\/ , 10 (4), 1827--1834

  73. [73]

    Karwa, V., Pati, D., Petrovi \'c , S., Solus, L., Alexeev, N., Rai c , M., Wilburne, D., Williams, R., and Yan, B. (2016). Exact tests for stochastic block models. arXiv preprint arXiv:1612.06040\/

  74. [74]

    M., Drton, M., Guigo, R., and Casanellas, M

    Kedzierska, A. M., Drton, M., Guigo, R., and Casanellas, M. ( 2012 ). SPIn: Model Selection for Phylogenetic Mixtures via Linear Invariants . Molecular Biology and Evolution\/ , 29 ( 3 ), 929--937

  75. [75]

    Khale, D. (2014). algstat: An R package for algebraic statistics. https://github.com/dkahle/algstat

  76. [76]

    and Kubjas, K

    Kosta, D. and Kubjas, K. (2019). Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry . Bulletin of Mathematical Biology\/ , 81

  77. [77]

    Krivitsky, P. N. and Kolaczyk, E. D. (2015). On the question of effective sample size in network modeling: An asymptotic inquiry. Statistical Science\/ , 30 (2), 198--198

  78. [78]

    Kubjas, K., Robeva, E., and Sturmfels, B. (2015). Fixed points of the EM algorithm and nonnegative rank boundaries. Ann. Statist. , 43 (1), 422--461

  79. [79]

    Lake, J. A. (1987). A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Molecular Biology and Evolution\/ , 4 , 167--191

  80. [80]

    Lauritzen, S. L. (1996). Graphical Models\/ . Oxford University Press

Showing first 80 references.