Algebraic Statistics in Practice: Applications to Networks
Pith reviewed 2026-05-25 18:06 UTC · model grok-4.3
The pith
Algebraic statistics applies algebra, geometry and combinatorics to three network problems in statistics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Algebraic statistics uses tools from algebra, geometry and combinatorics to provide insight into knotty problems in mathematical statistics, illustrated on network models for relational data, causal structure discovery and phylogenetics, with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines.
What carries the argument
Algebraic, geometric and combinatorial descriptions of statistical models for networks, which turn model properties into exact algebraic statements that enable new computations and tests.
If this is right
- Relational data models gain exact algebraic parametrizations that clarify identifiability and allow direct computation of likelihoods.
- Causal structure discovery obtains geometric criteria that distinguish identifiable models from non-identifiable ones.
- Phylogenetic inference benefits from combinatorial invariants that reduce the search space over tree topologies.
- The same toolkit extends to concrete applications in biology, social sciences and other domains that rely on network data.
Where Pith is reading between the lines
- The surveyed methods could be tested on large-scale modern network datasets to measure gains in scalability over existing software.
- Algebraic descriptions might reveal new connections between relational data models and causal graphs that were not previously noticed.
- Similar algebraic techniques could be applied to other statistical domains such as time-series or spatial data where networks appear.
Load-bearing premise
These algebraic, geometric and combinatorial tools actually produce new statistical insights with practical relevance that standard methods do not already deliver.
What would settle it
A direct comparison on the same network datasets showing that conventional statistical procedures match or exceed the accuracy, speed or interpretability obtained from the algebraic approach would undermine the central claim.
Figures
read the original abstract
Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each problem we give an overview of recent results in algebraic statistics with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey of algebraic statistics, using tools from multilinear algebra, commutative algebra, computational algebra, geometry, and combinatorics to address problems in mathematical statistics. It illustrates these on three network-related domains—network models for relational data, causal structure discovery, and phylogenetics—by overviewing recent results and emphasizing the resulting statistical achievements and their practical relevance to other scientific disciplines.
Significance. As a survey without new theorems or empirical results, the paper's value lies in synthesizing existing literature to make algebraic statistics more accessible for network problems. If the overviews are accurate and the cited works indeed demonstrate concrete statistical insights beyond standard methods, the manuscript could facilitate cross-disciplinary applications in statistics, computer science, and biology by highlighting practical relevance.
minor comments (3)
- [Abstract] Abstract: the phrase 'recent results' is vague without a time frame or indication of the most influential cited papers; adding one or two key references would improve reader orientation.
- The survey structure would benefit from a short concluding section that explicitly compares algebraic approaches to conventional statistical methods across the three domains, to better substantiate the claim of new insights.
- Ensure consistent notation for algebraic objects (e.g., ideals, varieties) across sections and verify that all external results are cited with page or theorem numbers where possible.
Simulated Author's Rebuttal
We thank the referee for the positive summary and recommendation of minor revision. The assessment accurately captures the manuscript as a survey synthesizing algebraic statistics tools for network-related problems in relational data, causal discovery, and phylogenetics. No specific major comments were provided in the report.
Circularity Check
Survey paper: no derivations or predictions present
full rationale
This is a survey paper that overviews existing applications of algebraic statistics to network problems. It introduces no new theorems, models, equations, predictions, or fitted quantities. All referenced results are drawn from external cited literature. The central claim is descriptive rather than deductive, so no load-bearing step reduces to its own inputs by construction. This is the most common honest finding for overview articles.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Airoldi, E. M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems\/ , pages 33--40
work page 2009
-
[2]
Allman, E. S. and Rhodes, J. A. (2007). Phylogenetic invariants. In O. Gascuel and M. A. Steel, editors, Reconstructing Evolution\/ . Oxford University Press
work page 2007
-
[3]
Allman, E. S. and Rhodes, J. A. (2008). Phylogenetic ideals and varieties for the general M arkov model. Advances in Applied Mathematics\/ , 40 , 127--148
work page 2008
-
[4]
Allman , E. S., Ane , C., and Rhodes , J. A. (2008). Identifiability of a Markovian model of molecular evolution with gamma-distributed rates . Advances in Applied Probability\/ , 40 (1), 229--249
work page 2008
-
[5]
Allman, E. S., Matias, C., and Rhodes, J. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics\/ , 37 (6A), 3099--3132
work page 2009
-
[6]
Allman, E. S., Petrovi\'c, S., Rhodes, J. A., and Sullivant, S. (2011). Identifiability of two-tree mixtures for group-based models. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 8 (3), 710--722
work page 2011
-
[7]
Allman , E. S., Rhodes , J. A., and Taylor , A. (2014). A semialgebraic description of the general M arkov model on phylogenetic trees. SIAM Journal on Discrete Mathematics\/ , 28 , 736–--755
work page 2014
-
[8]
Allman, E. S., Rhodes, J. A., Sturmfels, B., and Zwiernik, P. (2015). Tensors of nonnegative rank two. Linear Algebra and its Applications\/ , 473 , 37--53
work page 2015
-
[9]
Allman, E. S., Degnan, J. H., and Rhodes, J. A. (2018). Species tree inference from gene splits by unrooted star methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 15 (1), 337--342
work page 2018
-
[10]
Allman, E. S., Long, C., and Rhodes, J. A. (2019). Species tree inference from genomic sequences using the log-det distance. SIAM Journal on Applied Algebra and Geometry\/ , 3 (1), 107--127
work page 2019
-
[11]
Andersson, S. A. (1975). Invariant normal models. The Annals of Statistics\/ , 3 , 132--154
work page 1975
-
[12]
Aoki, S., Hara, H., and Takemura, A. (2012). Markov Bases in Algebraic Statistics\/ . Springer
work page 2012
-
[13]
Bailey, R. A. (1981). A unified approach to design of experiments. Journal of the Royal Statistical Society, Series A\/ , 144 , 214--223
work page 1981
-
[14]
Optimal hypothesis testing for stochastic block models with growing degrees
Banerjee, D. and Ma, Z. (2017). Optimal hypothesis testing for stochastic block models with growing degrees. arXiv preprint arXiv:1705.05305\/
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Ba \ n os, H. (2019). Identifying species network features from gene tree quartets under the coalescent model. Bulletin of Mathematical Biology\/ , 81 (2), 494--534
work page 2019
-
[16]
Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory\/ . Wiley, New York
work page 1978
-
[17]
Carnegie, N. B., Krivitsky, P. N., Hunter, D. R., and Goodreau, S. M. (2015). An approximation method for improving dynamic network model fitting. Journal of Computational and Graphical Statistics\/ , 24 (2), 502--519
work page 2015
-
[18]
Carr, M. P. and Devadoss, S. L. (2006). Coxeter complexes and graph-associahedra. Topology and its Applications\/ , 153 , 2155--2216
work page 2006
-
[19]
Casanellas, M. and Fern\'andez-S\'anchez, J. (2010). Relevant phylogenetic invariants of evolutionary models. Journalc de Math\'ematiques Pures et Appliquées\/ , 96 , 207--229
work page 2010
-
[20]
Casanellas, M., Fern\'andez-S\'anchez, J., and Kedzierska, A. (2012). The space of phylogenetic mixtures for equivariant models. Algorithms for Molecular Biology\/ , 7 (1), 33
work page 2012
-
[21]
Catanese, F., Ho s ten, S., Khetan, A., and Sturmfels, B. (2006). The maximum likelihood degree. Amer. J. Math. , 128 (3), 671--697
work page 2006
-
[22]
Cavender, J. A. and Felsenstein, J. (1987). Invariants of phylogenies in a simple case with discrete states. Journal of Classification\/ , 4 , 57--71
work page 1987
-
[23]
Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical Biosciences\/ , 137 (1), 51--73
work page 1996
-
[24]
Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability\/ , 21 (4), 1400--1435
work page 2011
-
[25]
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research\/ , 3 , 507--554
work page 2002
-
[26]
Chifman, J. and Kubatko, L. (2014). Quartet Inference from SNP Data Under the Coalescent Model . Bioinformatics\/ , 30 (23), 3317--3324
work page 2014
-
[27]
Chifman, J. and Kubatko, L. (2015). Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. Journal of Theoretical Biology\/ , 374 , 35--47
work page 2015
-
[28]
Chifman, J. and Kubatko, L. (2019). An invariants-based method for efficient identification of hybrid speciation from large-scale genomic data. https://www.biorxiv.org/content/10.1101/034348v1
-
[29]
Chor, B., Hendy, M. D., and Snir, S. (2006). Maximum likelihood jukes-cantor triplets: Analytic solutions. Molecular Biology and Evolution\/ , 23 (3), 626--632
work page 2006
-
[30]
Cong, L., Ran, F., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P., Wu, X., Jiang, W., Marraffini, L., and Zhang, F. (2013). Multiplex genome engineering using CRISPR/Cas systems . Science\/ , 339 (6121), 819--823
work page 2013
-
[31]
Cussens, J., Haws, D., and Studen\'y, M. (2017). Polyhedral aspects of score equivalence in B ayesian network structure learning. Mathematical Programming, Series A\/ , 164 , 285--324
work page 2017
-
[32]
Davidson, R., Lawhorn, M., Rusinko, J., and Weber, N. (2018). Efficient quartet representations of trees and applications to supertree and summary methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics\/ , 15 (3), 1010--1015
work page 2018
-
[33]
Devitt, T. J., Wright, A. M., Cannatella, D. C., and Hillis, D. M. (2019). Species delimitation in endangered groundwater salamanders: Implications for aquifer management and biodiversity conservation. Proceedings of the National Academy of Sciences\/ , 116 (7), 2624--2633
work page 2019
-
[34]
Diaconis, P. and Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics\/ , 26 (1), 363--397
work page 1998
-
[35]
Dillon, M. (2016). Runtime for performing exact tests on the p_1 statistical model for random graphs\/ . Ph.D. thesis, Illinois Institute of Technology
work page 2016
-
[36]
Dinh, V. and Matsen IV, F. A. (2017). The shape of the one-dimensional phylogenetic likelihood function. The Annals of Applied Probability\/ , 27 (3), 1646--1677
work page 2017
-
[37]
Drton, M., Sturmfels, B., and Sullivant, S. (2009). Lectures on Algebraic Statistics\/ . Oberwolfach Seminars. Birkh \" a user
work page 2009
-
[38]
Drton, M., Lin, S., Weihs, L., and Zwiernik, P. (2017). Marginal likelihood and model selection for G aussian latent tree and forest models. Bernoulli\/ , 23 , 1202--1232
work page 2017
-
[39]
Drton, M., Robeva, E., and Weihs, L. (2018). Nested covariance determinants and restricted trek separation in Gaussian graphical models. Preprint available at http://arxiv.org/abs/1807.07561
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
Erd\" o s, P. and R \'e nyi, A. (1961). On the evolution of random graphs. Bulletin de L'Institut International de Statistique\/ , 38 (4), 343--347
work page 1961
- [41]
-
[42]
Evans, S. N. and Speed, T. P. (1993). Invariants of some probability models used in phylogenetic inference. The Annals of Statistics\/ , 21 (1), 355--377
work page 1993
-
[43]
Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology\/ , 27 , 401--410
work page 1978
-
[44]
Fern\'andez-S\'anchez, J. and Casanellas, M. (2016). Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Systematic Biology\/ , 65 (2), 280--291
work page 2016
-
[45]
Fienberg, S. E. and Slavkovic, A. B. (2004). Making the release of confidential data from multi-way tables count. Chance\/ , 17 (3), 5--10
work page 2004
-
[46]
Fienberg, S. E. and Wasserman, S. S. (1981a). Categorical data analysis of single sociometric relations. Sociological Methodology\/ , 12 , 156--192
-
[47]
Fienberg, S. E. and Wasserman, S. S. (1981b). Discussion of H olland, P. W. and L einhardt, S. `` A n exponential family of probability distributions for directed graphs". Journal of the American Statistical Association\/ , 76 , 54--57
-
[48]
Fienberg, S. E., Meyer, M. M., and Wasserman, S. S. (1985). Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association\/ , 80 (389), 51--67
work page 1985
-
[49]
Frank, O. and Strauss, D. (1986). Markov graphs. Journal of the A merican Statistical Association\/ , 81 (395), 832--842
work page 1986
-
[50]
Gaither, J. and Kubatko, L. (2016). Hypothesis tests for phylogenetic quartets, with applications to coalescent-based species tree inference. Journal of Theoretical Biology\/ , 408 , 179--186
work page 2016
-
[51]
Testing Network Structure Using Relations Between Small Subgraph Probabilities
Gao, C. and Lafferty, J. (2017). Testing network structure using relations between small subgraph probabilities. arXiv preprint arXiv:1704.06742\/
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[52]
Gillispie, S. B. and Perlman, M. D. (2001). Enumerating M arkov equivalence classes of acyclic digraph models. In UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence\/
work page 2001
-
[53]
Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning\/ , 2 (2), 129--233
work page 2010
-
[54]
Gross, E. and Long, C. (2018). Distinguishing phylogenetic networks. SIAM Journal on Applied Algebra and Geometry\/ , 2 (1), 72--93
work page 2018
-
[55]
Gross, E., Petrovi\'c, S., and Stasi, D. (2016). G oodness-of-fit for log-linear network models: D ynamic M arkov bases using hypergraphs. Annals of the Institute of Statistical Mathematics\/ , pages 673--704. DOI: 10.1007/s10463-016-0560-2
-
[56]
Gross, E., Petrovi\'c, S., and Stasi, D. (2019). Estimating an exact conditional p-value for log-linear ERGM s. Preprint, forthcoming
work page 2019
-
[57]
Handcock, M. S. (2003). Assessing degeneracy in statistical models for social networks. Working paper 39., Center for Statistics and the Social Sciences, University of Washington, Seattle
work page 2003
-
[58]
Hendy, M. D. and Penny, D. (1989). A framework for the quantitative study of evolutionary trees. Systematic Zoology\/ , 38 , 297--309
work page 1989
-
[59]
Hendy, M. D., Penny, D., and Steel, M. (1994). A discrete F ourier analysis for evolutionary trees. Proceedings National Academy Sciences\/ , 91 , 3339--3343
work page 1994
-
[60]
Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the A merican Statistical association\/ , 97 (460), 1090--1098
work page 2002
-
[61]
Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association\/ , 76 (373), 33--65
work page 1981
-
[62]
Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks\/ , 5 (2), 109--137
work page 1983
-
[63]
Huelsenbeck, J. P. (1995). Performance of phylogenetic methods in simulation. Systematic Biology\/ , 44 , 17--48
work page 1995
-
[64]
Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., and Morris, M. (2008a). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software\/ , 24 (3), 1--29
-
[65]
Hunter, D. R., Goodreau, S. M., and Handcock, M. S. (2008b). Goodness of fit of social network models. Journal of the American Statistical Association\/ , 103 (481), 248--258
-
[66]
Jaakkola, T., Sontag, D., Globerson, A., and Meila, M. (2010). Learning B ayesian network structure using LP relaxations. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics\/ , pages 358--365
work page 2010
-
[67]
James, A. T. (1954). Normal multivariate analysis and the orthogonal group. The Annals of Mathematical Statistics\/ , 25 , 40--75
work page 1954
-
[68]
Jensen, S. T. (1988). Covariance hypotheses which are linear in both the covariance and the inverse covariance. The Annals of Statistics\/ , 116 , 302--322
work page 1988
-
[69]
Ji, P. and Jin, J. (2016). Coauthorship and citation networks for statisticians. Annals of Applied Statistics\/ , 10 (4), 1779--1812
work page 2016
-
[70]
Kalisch, M. and B\"uhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC -algorithm. Journal of Machine Learning Research\/ , 8 , 613--636
work page 2007
-
[71]
Karrer, B. and Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical Review E\/ , 83 (1), 016107
work page 2011
-
[72]
Karwa, V. and Petrovi\'c, S. (2016). Coauthorship and citation networks for statisticians: Comment. Annals of Applied Statistics\/ , 10 (4), 1827--1834
work page 2016
- [73]
-
[74]
M., Drton, M., Guigo, R., and Casanellas, M
Kedzierska, A. M., Drton, M., Guigo, R., and Casanellas, M. ( 2012 ). SPIn: Model Selection for Phylogenetic Mixtures via Linear Invariants . Molecular Biology and Evolution\/ , 29 ( 3 ), 929--937
work page 2012
-
[75]
Khale, D. (2014). algstat: An R package for algebraic statistics. https://github.com/dkahle/algstat
work page 2014
-
[76]
Kosta, D. and Kubjas, K. (2019). Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry . Bulletin of Mathematical Biology\/ , 81
work page 2019
-
[77]
Krivitsky, P. N. and Kolaczyk, E. D. (2015). On the question of effective sample size in network modeling: An asymptotic inquiry. Statistical Science\/ , 30 (2), 198--198
work page 2015
-
[78]
Kubjas, K., Robeva, E., and Sturmfels, B. (2015). Fixed points of the EM algorithm and nonnegative rank boundaries. Ann. Statist. , 43 (1), 422--461
work page 2015
-
[79]
Lake, J. A. (1987). A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Molecular Biology and Evolution\/ , 4 , 167--191
work page 1987
-
[80]
Lauritzen, S. L. (1996). Graphical Models\/ . Oxford University Press
work page 1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.