An Algebraic Approach to Evolutionary Accumulation Models
Pith reviewed 2026-05-18 00:08 UTC · model grok-4.3
The pith
An algebraic method uses the polynomial structure of evolutionary processes to define a semi-algebraic set of consistent parameters before likelihood maximization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the evolutionary process possesses a natural underlying polynomial structure. This structure permits construction of a semi-algebraic set of candidate parameters consistent with a given data set. Likelihood maximization can then be carried out within the set, and the resulting solutions align with those obtained from various statistical evolutionary accumulation models while supplying additional information about the model.
What carries the argument
The semi-algebraic set of candidate parameters defined from the polynomial structure of the evolutionary process, which restricts the space before likelihood maximization.
If this is right
- The algebraic construction yields parameters compatible with solutions from existing statistical evolutionary accumulation models.
- The method supplies additional information about the feasible parameter region relative to purely optimization-based approaches.
- Explicit examples confirm that the semi-algebraic sets align with known model outputs in concrete cases.
Where Pith is reading between the lines
- The pre-filtering step could lower computational effort when searching high-dimensional parameter spaces for large evolutionary datasets.
- Characterizing the full consistent set rather than a single optimum might improve uncertainty estimates in evolutionary inference.
- The algebraic framing could be tested for extension to other sequential accumulation processes outside biology.
Load-bearing premise
The evolutionary process possesses a natural underlying polynomial structure that permits construction of a semi-algebraic set of parameters consistent with observed data.
What would settle it
A dataset for which the semi-algebraic set is empty or excludes the maximum-likelihood point found by standard optimization would falsify the compatibility claim.
Figures
read the original abstract
We present an algebraic approach to evolutionary accumulation modelling (EvAM). EvAM is concerned with learning and predicting the order in which evolutionary features accumulate over time. Our approach is complementary to the more common optimisation-based inference methods used in this field. Namely, we first use the natural underlying polynomial structure of the evolutionary process to define a semi-algebraic set of candidate parameters consistent with a given data set before maximising the likelihood function. We consider explicit examples and show that this approach is compatible with the solutions given by various statistical evolutionary accumulation models. Furthermore, we discuss the additional information of our algebraic model relative to these models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an algebraic approach to evolutionary accumulation models (EvAM) that first exploits a claimed natural underlying polynomial structure of the evolutionary process to define a semi-algebraic set of candidate parameters consistent with observed data, then performs likelihood maximization within that set. Compatibility with existing statistical EvAM models is illustrated through explicit examples, and the algebraic perspective is said to supply additional information relative to optimization-based methods.
Significance. If the polynomial-to-semi-algebraic construction is shown to be general and faithful to the underlying dynamics, the method could usefully constrain the feasible parameter region before numerical optimization, offering a complementary tool for EvAM inference with potential gains in interpretability. The examples suggest compatibility is achievable in selected cases, but the significance remains provisional pending a general derivation and quantitative validation.
major comments (2)
- [Abstract] Abstract: the central step of defining a semi-algebraic set of candidate parameters via the 'natural underlying polynomial structure' is asserted without a general derivation or explicit construction from the evolutionary dynamics; this mapping is load-bearing for the claim that the set encodes consistency constraints implied by the data before likelihood maximization.
- [Examples] Examples and results: compatibility with statistical EvAM models is demonstrated only on selected explicit examples, with no accompanying derivation details, error analysis, or quantitative validation metrics (e.g., parameter recovery error or comparison of maximized likelihood values); this leaves open whether the algebraic set excludes valid parameters or is redundant for arbitrary data.
minor comments (2)
- Notation for the polynomials and the resulting semi-algebraic sets should be introduced with explicit equations or definitions to improve readability.
- The manuscript would benefit from a brief discussion of computational considerations when constructing the semi-algebraic set for models with more features.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation of the algebraic approach.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central step of defining a semi-algebraic set of candidate parameters via the 'natural underlying polynomial structure' is asserted without a general derivation or explicit construction from the evolutionary dynamics; this mapping is load-bearing for the claim that the set encodes consistency constraints implied by the data before likelihood maximization.
Authors: We acknowledge that the current manuscript introduces the semi-algebraic set via the polynomial structure without a fully general derivation. The approach is motivated by the structure of the evolutionary process, but we agree that an explicit general construction is needed to support the central claim. In the revised version, we will add a dedicated section deriving the semi-algebraic parameter set directly from the underlying polynomial relations in the evolutionary dynamics, showing how consistency constraints with observed data arise before likelihood maximization. revision: yes
-
Referee: [Examples] Examples and results: compatibility with statistical EvAM models is demonstrated only on selected explicit examples, with no accompanying derivation details, error analysis, or quantitative validation metrics (e.g., parameter recovery error or comparison of maximized likelihood values); this leaves open whether the algebraic set excludes valid parameters or is redundant for arbitrary data.
Authors: The examples were selected to illustrate compatibility in concrete cases, but we agree that this leaves the generality and fidelity open to question. We will expand the results section with additional examples, explicit derivation details for each case, and quantitative validation including parameter recovery errors and direct comparisons of maximized likelihood values between the algebraic and statistical approaches. This will clarify that the semi-algebraic set neither excludes valid parameters nor is redundant. revision: yes
Circularity Check
Algebraic semi-algebraic set construction presented as independent preprocessing step prior to likelihood maximization with no reduction to input data by construction.
full rationale
The paper asserts the existence of a 'natural underlying polynomial structure' of the evolutionary process to define a semi-algebraic set of candidate parameters consistent with observed data, then maximizes the likelihood within that set. It reports compatibility with solutions from statistical EvAM models on explicit examples but provides no equations or derivations showing that the semi-algebraic set is tautologically equivalent to the data constraints already implicit in the likelihood function or that the subsequent maximization is forced by the algebraic step. No self-citations, fitted inputs renamed as predictions, or definitional loops are indicated in the abstract or claims. The derivation chain therefore remains self-contained against external benchmarks, with the algebraic step serving as a complementary filter rather than a re-expression of the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The evolutionary process possesses a natural underlying polynomial structure.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we first use the natural underlying polynomial structure of the evolutionary process to define a semi-algebraic set of candidate parameters consistent with a given data set before maximising the likelihood function
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The algebraic nature that underlies these evolutionary accumulation models leads to a set of non-linear polynomial equations in the transition parameters. These polynomials are the generators of an ideal I
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Olav N. L. Aga, Morten Brun, Kazeem A. Dauda, Ramon Diaz-Uriarte, Konstantinos Giannakis, and Iain G. Johnston. HyperTraPS-CT: Inference and prediction for accumulation pathways with flexible data and model structures.PLOS Computational Biology, 20(9):e1012393, 2024. doi: 10.1371/journal.pcbi.1012393
-
[2]
OlavN.L.Aga, SabrinaJ.Moyo, JoelManyahi, UpendoKibwana, IrenH.Löhr, NinaLangeland, BjørnBlomberg, and Iain G. Johnston. A natural history of AMR in Klebsiella pneumoniae: Global diversity, predictors, and predictions of evolutionary pathways, 2025. bioRxiv. doi: 10.1101/2025.09.20.677523
-
[3]
Fabrizio Angaroni, Kevin Chen, Chiara Damiani, Giulio Caravagna, Alex Graudenzi, and Daniele Ramazzotti. PMCE: efficient inference of expressive models of cancer evolution with high prognostic power.Bioinformatics, 38(3):754–762, 2022. doi: 10.1093/bioinformatics/btab717
-
[4]
Daniel J. Bates, Jonathan D. Hauenstein, Andrew J. Sommese, and Charles W. Wampler. Bertini: Software for numerical algebraic geometry. Available at bertini.nd.edu with permanent doi: dx.doi.org/10.7274/R0H41PB5
-
[5]
Jeremy M. Beaulieu and Brian C. O’Meara. Detecting Hidden Diversification Shifts in Models of Trait-Dependent Speciation and Extinction.Systematic Biology, 65(4):583–601, 2016. doi: 10.1093/sysbio/syw022
-
[6]
Jeremy M. Beaulieu, Brian C. O’Meara, and Michael J. Donoghue. Identifying Hidden Rate Changes in the Evolution of a Binary Morphological Character: The Evolution of Plant Habit in Campanulid Angiosperms. Systematic Biology, 62(5):725–737, 2013. doi: 10.1093/sysbio/syt034
-
[7]
N. Beerenwinkel and S. Sullivant. Markov models for accumulating mutations.Biometrika, 96(3):645–661, 2009. doi: 10.1093/biomet/asp023
-
[8]
Evolution on distributive lattices.Journal of Theoretical Biology, 242(2):409–420, 2006
Niko Beerenwinkel, Nicholas Eriksson, and Bernd Sturmfels. Evolution on distributive lattices.Journal of Theoretical Biology, 242(2):409–420, 2006. doi: 10.1016/j.jtbi.2006.03.013
-
[9]
Conjunctive Bayesian networks.Bernoulli, 13(4): 893–909, 2007
Niko Beerenwinkel, Nicholas Eriksson, and Bernd Sturmfels. Conjunctive Bayesian networks.Bernoulli, 13(4): 893–909, 2007. doi: 10.3150/07-BEJ6133
-
[10]
Schwarz, Moritz Gerstung, and Florian Markowetz
Niko Beerenwinkel, Roland F. Schwarz, Moritz Gerstung, and Florian Markowetz. Cancer Evolution: Mathemat- ical Models and Computational Inference.Systematic Biology, 64(1):e1–e25, 2015. doi: 10.1093/sysbio/syu081
-
[11]
Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical com- puting.SIAM Review, 59(1):65–98, 2017. doi: 10.1137/141000671
-
[12]
James D. Boyko and Jeremy M. Beaulieu. Generalized hidden Markov models for phylogenetic comparative datasets.Methods in Ecology and Evolution, 12(3):468–478, 2021. doi: 10.1111/2041-210X.13534
-
[13]
HomotopyContinuation.jl: A Package for Homotopy Continuation in Julia
Paul Breiding and Sascha Timme. HomotopyContinuation.jl: A Package for Homotopy Continuation in Julia. In International Congress on Mathematical Software, pages 458–465. Springer, 2018
work page 2018
-
[14]
Tianran Chen and Tien-Yien Li. Homotopy continuation method for solving systems of nonlinear and polynomial equations.Communications in Information and Systems, 15(2):119–307, 2015. doi: 10.4310/CIS.2015.v15.n2.a1
-
[15]
13 Graham Everest, Alf Van der Poorten, Igor Shparlinski, and Thomas Ward.Recurrence sequences
David A. Cox, John Little, and Donal O’Shea.Ideals, Varieties, and Algorithms. Springer International Publish- ing, Cham, fourth edition, 2015. doi: 10.1007/978-3-319-16721-3
-
[16]
Özden O. Dalgıç, Haoran Wu, F. Safa Erenay, Mustafa Y. Sir, Osman Y. Özaltın, Brian A. Crum, and Kalyan S. Pasupathy. Mapping of critical events in disease progression through binary classification: Application to amy- otrophic lateral sclerosis.Journal of Biomedical Informatics, 123:103895, 2021. doi: 10.1016/j.jbi.2021.103895. 14
-
[17]
Wolfram Decker, Christian Eder, Claus Fieker, Max Horn, and Michael Joswig, editors.The Computer Algebra System OSCAR: Algorithms and Examples, volume 32 ofAlgorithms and Computation in Mathematics. Springer, 1 edition, 2025. doi: 10.1007/978-3-031-62127-7. URLhttps://link.springer.com/book/9783031621260
- [18]
-
[19]
doi: 10.1109/ACCESS.2025.3558392
-
[20]
Ramon Diaz-Uriarte and Claudia Vasallo. Every which way? On predicting tumor evolution using cancer pro- gression models.PLOS Computational Biology, 15(8):e1007246, 2019. doi: 10.1371/journal.pcbi.1007246
-
[21]
Mathias Drton, Bernd Sturmfels, and Seth Sullivant.Lectures on Algebraic Statistics. Oberwolfach Seminars. Birkhäuser, Basel, 2009. doi: 10.1007/978-3-7643-8905-5
-
[22]
Notices of the American Mathematical Society , author =
Timothy Duff and Margaret Regan. Polynomial systems, homotopy continuation and applications.Notices of the American Mathematical Society, 70(1):151 –155, 2023. doi: 10.1090/noti2592
-
[23]
Moritz Gerstung, Michael Baudis, Holger Moch, and Niko Beerenwinkel. Quantifying cancer progression with conjunctive Bayesian networks.Bioinformatics, 25(21):2809–2815, 2009. doi: 10.1093/bioinformatics/btp505
-
[24]
Greenbury, Mauricio Barahona, and Iain G
Sam F. Greenbury, Mauricio Barahona, and Iain G. Johnston. HyperTraPS: Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways.Cell Systems, 10(1):39–51.e10, 2020. doi: 10.1016/j.cels.2019.10.009
-
[25]
Jonathan D. Hauenstein and Frank Sottile. Algorithm 921: alphaCertified: Certifying Solutions to Polynomial Systems.ACM Transactions on Mathematical Software, 38(4):28:1–28:20, 2012. doi: 10.1145/2331130.2331136
-
[26]
Tree inference for single-cell data.Genome Biology, 17 (1):86, 2016
Katharina Jahn, Jack Kuipers, and Niko Beerenwinkel. Tree inference for single-cell data.Genome Biology, 17 (1):86, 2016. doi: 10.1186/s13059-016-0936-x
-
[27]
Iain G Johnston and Ramon Diaz-Uriarte. A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling.Bioinformatics, 41(1):btae737, 2025. doi: 10.1093/ bioinformatics/btae737
work page 2025
-
[28]
Iain G. Johnston and Ellen C. Røyrvik. Data-Driven Inference Reveals Distinct and Conserved Dynamic Pathways of Tool Use Emergence across Animal Taxa.iScience, 23(6):101245, 2020. doi: 10.1016/j.isci.2020.101245
-
[29]
Iain G. Johnston and Ben P. Williams. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.Cell Systems, 2(2):101–111, 2016. doi: 10.1016/j.cels.2016.01.013
-
[30]
Johnston, Till Hoffmann, Sam F
Iain G. Johnston, Till Hoffmann, Sam F. Greenbury, Ornella Cominetti, Muminatou Jallow, Dominic Kwiatkowski, Mauricio Barahona, Nick S. Jones, and Climent Casals-Pascual. Precision identification of high-risk phenotypes and progression pathways in severe malaria without requiring longitudinal data.npj Digital Medicine, 2(1):1–9, 2019. doi: 10.1038/s41746-...
-
[31]
Turid Knutsen, Vasuki Gobu, Rodger Knaus, Hesed Padilla-Nash, Meena Augustus, Robert L. Strausberg, Ilan R. Kirsch, Karl Sirotkin, and Thomas Ried. The interactive online sky/m-fish & cgh database and the entrez cancer chromosomes search database: Linkage of chromosomal aberrations with the genome sequence.Genes, Chromosomes and Cancer, 44(1):52–64, 2005....
-
[32]
Paul O. Lewis. A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data. Systematic Biology, 50(6):913–925, 2001. doi: 10.1080/106351501753462876
-
[33]
Xiang Ge Luo, Jack Kuipers, and Niko Beerenwinkel. Joint inference of exclusivity patterns and recurrent trajec- tories from tumor mutation trees.Nature Communications, 14(1):3676, 2023. doi: 10.1038/s41467-023-39400-w
-
[34]
Uwe-G Maier, Stefan Zauner, Christian Woehle, Kathrin Bolte, Franziska Hempel, John F. Allen, and William F. Martin. Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes.Genome Biology and Evolution, 5(12):2318–2329, 2013. doi: 10.1093/gbe/evt181
-
[35]
Marcus T Moen and Iain G Johnston. HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs.Bioinformatics, 39(1):btac803, 2023. doi: 10.1093/bioinformatics/btac803
-
[36]
Optim: A mathematical optimization package for Julia
Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia. Journal of Open Source Software, 3(24):615, 2018. doi: 10.21105/joss.00615
-
[37]
Daniel Nichol, Peter Jeavons, Alexander G. Fletcher, Robert A. Bonomo, Philip K. Maini, Jerome L. Paul, Robert A. Gatenby, Alexander R. A. Anderson, and Jacob G. Scott. Steering Evolution with Sequential Therapy to Prevent the Emergence of Bacterial Antibiotic Resistance.PLOS Computational Biology, 11(9):e1004493, 2015. doi: 10.1371/journal.pcbi.1004493. 15
-
[38]
Phillip B. Nicol, Kevin R. Coombes, Courtney Deaver, Oksana Chkrebtii, Subhadeep Paul, Amanda E. Toland, and Amir Asiaee. Oncogenetic network estimation with disjunctive Bayesian networks.Computational and Systems Oncology, 1(2):e1027, 2021. doi: 10.1002/cso2.1027
-
[39]
L Olde Loohuis, G Caravagna, A Graudenzi, D Ramazzotti, and G Mauri. Inferrig tree causal models of cancer progression with probability raising.PLoS ONE, 9(10):e108358, 2014. doi: 10.1371/journal.pone.0108358
-
[40]
OSCAR – Open Source Computer Algebra Research system, Version 1.5.0, 2025
OSCAR. OSCAR – Open Source Computer Algebra Research system, Version 1.5.0, 2025. URLhttps://www. oscar-system.org
work page 2025
-
[41]
Mark Pagel. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters.Proceedings of the Royal Society B: Biological Sciences, 255(1342):37–45, 1994. doi: 10.1098/rspb.1994.0006
-
[42]
Robert L. Peach, Sam F. Greenbury, Iain G. Johnston, Sophia N. Yaliraki, David J. Lefevre, and Mauricio Bara- hona. Understanding learner behaviour in online courses with Bayesian modelling and time series characterisation. Scientific Reports, 11(1):2823, 2021. doi: 10.1038/s41598-021-81709-3
-
[43]
Flexible inference of evolutionary accumulation dynamics using uncertain observational data
Jessica Renz, Morten Brun, and Iain G. Johnston. Flexible inference of evolutionary accumulation dynamics using uncertain observational data, 2025. arXiv:2502.05872 [q-bio.PE]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Jessica Renz, Kazeem A. Dauda, Olav N. L. Aga, Ramon Diaz-Uriarte, Iren H. Löhr, Bjørn Blomberg, and Iain G. Johnston. Evolutionary accumulation modeling in AMR: machine learning to infer and predict evolutionary dynamics of multi-drug resistance.mBio, 0(0):e00488–25, 2025. doi: 10.1128/mbio.00488-25
-
[45]
Modelling cancer progression using Mutual Hazard Networks.Bioinformatics, 36(1):241–249, 2020
Rudolf Schill, Stefan Solbrig, Tilo Wettig, and Rainer Spang. Modelling cancer progression using Mutual Hazard Networks.Bioinformatics, 36(1):241–249, 2020. doi: 10.1093/bioinformatics/btz513
-
[46]
Rudolf Schill, Maren Klever, Kevin Rupp, Y. Linda Hu, Andreas Lösch, Peter Georg, Simon Pfahler, Stefan Vocht, Stefan Hansch, Tilo Wettig, Lars Grasedyck, and Rainer Spang. Reconstructing Disease Histories in Huge Discrete State Spaces.KI - Künstliche Intelligenz, 39:33–43, 2025. doi: 10.1007/s13218-023-00822-9
-
[47]
Russell Schwartz and Alejandro A. Schäffer. The evolution of tumour phylogenetics: principles and practice. Nature Reviews Genetics, 18(4):213–229, 2017. doi: 10.1038/nrg.2016.170
-
[48]
Andrew J. Sommese and Charles W. Wampler.The Numerical Solution Of Systems Of Polynomials Arising In Engineering And Science. World Scientific Publishing Company, Singapore, 2005. doi: 10.1142/5763
-
[49]
Richard P. Stanley. Two poset polytopes.Discrete & Computational Geometry, 1(1):9–23, 1986. doi: 10.1007/ BF02187680
work page 1986
-
[50]
Estimating an oncogenetic tree when false negatives and positives are present
Aniko Szabo and Kenneth Boucher. Estimating an oncogenetic tree when false negatives and positives are present. Mathematical Biosciences, 176(2):219–236, 2002. doi: 10.1016/S0025-5564(02)00086-X
-
[51]
Hideyuki Takeshima and Toshikazu Ushijima. Accumulation of genetic and epigenetic alterations in normal cells and cancer risk.npj Precision Oncology, 3(1):1–8, 2019. doi: 10.1038/s41698-019-0079-0
-
[52]
Longzhi Tan, Stephen Serene, Hui Xiao Chao, and Jeff Gore. Hidden Randomness between Fitness Landscapes Limits Reverse Evolution.Physical Review Letters, 106(19):198102, 2011. doi: 10.1103/PhysRevLett.106.198102
-
[53]
Jan Verschelde. Algorithm 795: PHCpack: A general-purpose solver for polynomial systems by homotopy con- tinuation.ACM Transactions on Mathematical Software, 25(2):251–276, 1999. doi: 10.1145/317275.317286
-
[54]
Jan Verschelde. Polynomial homotopy continuation with PHCpack.ACM Communications in Compututer Alge- bra, 44(3/4):217–220, 2011. doi: 10.1145/1940475.1940524. 16 A Necessity of considering earlier and later states The following example illustrates what problems can occur when working with closed equations and not taking into account earlier and later stat...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.