Regularizing and Normalizing DAGs and Phylogenetic Networks
Pith reviewed 2026-05-22 07:31 UTC · model grok-4.3
The pith
i-regularization retains exactly the unique LCAs of at most i leaves in a DAG, removes the rest via the ⊖ operator, deletes shortcuts, and yields a network isomorphic to the Hasse diagram of the induced lca-clusters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any DAG G and integer i ≥ 1, the regularized DAG reg_i(G) retains precisely the vertices that occur as unique LCAs of leaf subsets of size at most i, removes all other non-leaf vertices by the ⊖ operation, and deletes shortcuts; the resulting network preserves every such LCA, is i-lca-relevant, and is isomorphic to the Hasse diagram of the lca-clusters.
What carries the argument
The i-regularization operator that selects unique LCAs of small leaf sets, applies the ⊖ removal, and deletes shortcuts to produce a regular DAG.
If this is right
- The simplified network admits a purely cluster-based description without reference to internal vertices.
- Regularization and normalization coincide precisely when the visible vertices removed by normalization are exactly the non-unique LCAs.
- A single editing framework unifies cluster-based, LCA-based, and visibility-based simplifications of DAGs and phylogenetic networks.
Where Pith is reading between the lines
- The same regularization could be applied iteratively with increasing i to produce a sequence of successively coarser networks.
- Efficient computation of reg_i(G) would require only the enumeration of small LCAs rather than full transitive closure.
- The construction suggests a way to compare simplification methods across different data types that admit LCA or cluster interpretations.
Load-bearing premise
The ⊖ editing step removes non-LCA vertices without changing which LCAs of small leaf sets remain observable from the leaves.
What would settle it
A concrete DAG in which applying reg_i changes the set of LCAs for some leaf subset of size at most i, or produces a network whose transitive closure of ancestor relations differs from that of the Hasse diagram of its lca-clusters.
Figures
read the original abstract
Phylogenetic networks and, more generally, directed acyclic graphs (DAGs) represent hierarchical structure beyond trees, for instance in the presence of reticulate evolutionary events such as hybridization or horizontal gene transfer. A central question is which parts of such graphs are essential with respect to leaf-observable information, and which parts can be removed without changing this information. Resolving this question can lead to principled simplification methods for phylogenetic networks, such as the recent normalization approach of Francis et al. In this paper, we study this question from three related perspectives: clusters displayed by a DAG $G$, least common ancestors (LCAs) of subsets of its leaf set, and visibility, a path-based property of vertices. We first introduce an LCA-based simplification procedure called $i$-regularization. For a DAG $G$ and $i\geq 1$, the DAG $\reg_i(G)$ retains precisely those vertices that occur as unique LCAs of leaf subsets of size at most $i$, removes the remaining non-leaf vertices by a graph-editing operation $\ominus$, and then deletes shortcuts. We show that $\reg_i(G)$ preserves all such LCAs, is $i$-lca-relevant, and admits a cluster-level description: it is regular, i.e., isomorphic to the Hasse diagram of the corresponding lca-clusters. We then compare LCA-based regularization with normalization. Using the same $\ominus$-operator, we describe the cover construction underlying normalization, identify visible vertices that are nevertheless removed, and characterize when regularization and normalization coincide. Together, these results provide a unified framework for cluster-based, LCA-based, and visibility-based simplifications of DAGs and phylogenetic networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces an i-regularization operator reg_i for DAGs and phylogenetic networks. For given i ≥ 1, reg_i(G) retains exactly the vertices that serve as unique LCAs of leaf subsets of size at most i, removes all other non-leaf vertices via the editing operation ⊖, and then deletes shortcuts. The central theorems assert that reg_i(G) preserves all such LCAs, is i-lca-relevant, and is regular (isomorphic to the Hasse diagram of the corresponding lca-clusters). The paper further compares this construction to the normalization procedure of Francis et al., using the same ⊖ operator to characterize visible vertices removed by normalization and to identify conditions under which the two simplifications coincide.
Significance. If the preservation and regularity claims hold, the work supplies a unified, cluster/LCA/visibility-based framework for simplifying phylogenetic networks while retaining leaf-observable information. This extends existing normalization methods and could support more principled reduction of reticulate networks in evolutionary biology.
major comments (1)
- [regularization procedure / definition of reg_i and ⊖] Definition of the ⊖ editing operation and the subsequent shortcut-deletion step (abstract and the regularization procedure section): the load-bearing claim that removing non-unique-LCA vertices via ⊖ followed by shortcut deletion leaves the set of LCAs for all leaf subsets of size ≤ i unchanged is not accompanied by an explicit argument or counter-example check for DAGs containing reticulations or multiple paths between leaves. This step is invoked both in the definition of reg_i and in the stated preservation theorem; without a self-contained verification, the central simplification result remains at risk.
minor comments (2)
- [Abstract and introduction] The term 'i-lca-relevant' is used in the abstract and theorems without a one-sentence gloss; adding a brief parenthetical definition on first use would improve readability for readers outside the immediate subfield.
- [Throughout the manuscript] Notation for the editing operation alternates between ⊖ and ominus in the text; adopt a single symbol throughout and ensure it is defined before its first use in the regularization construction.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and will revise the paper to strengthen the presentation of the regularization procedure and its proofs.
read point-by-point responses
-
Referee: Definition of the ⊖ editing operation and the subsequent shortcut-deletion step (abstract and the regularization procedure section): the load-bearing claim that removing non-unique-LCA vertices via ⊖ followed by shortcut deletion leaves the set of LCAs for all leaf subsets of size ≤ i unchanged is not accompanied by an explicit argument or counter-example check for DAGs containing reticulations or multiple paths between leaves. This step is invoked both in the definition of reg_i and in the stated preservation theorem; without a self-contained verification, the central simplification result remains at risk.
Authors: We appreciate the referee highlighting the need for greater explicitness here. Theorem 3.2 establishes LCA preservation by showing that reg_i(G) retains precisely the unique LCAs of leaf sets of size at most i, that ⊖ removes only non-unique-LCA vertices without altering the LCAs of retained leaf sets, and that shortcut deletion neither creates nor destroys these LCAs. To make this more self-contained for reticulate DAGs and cases with multiple paths, we will add a new lemma (with proof) that walks through the effect of each step on paths and LCAs, together with two explicit examples: one reticulate network with multiple paths between leaves and one where a non-unique LCA is removed. This revision will directly address the concern while preserving the existing theorems. revision: yes
Circularity Check
No circularity: claims follow from explicit definitions and graph operations
full rationale
The paper introduces the reg_i operator and ⊖ editing step as new definitions that retain vertices which are unique LCAs of leaf subsets of size ≤i, then deletes shortcuts. It states theorems that the resulting DAG preserves those LCAs, is i-lca-relevant, and is regular (isomorphic to the Hasse diagram of lca-clusters). These are derived consequences of the definitions using standard DAG operations rather than any quantity defined in terms of the output itself. No fitted parameters, self-citations, or imported uniqueness theorems appear in the load-bearing steps. The derivation is therefore self-contained against the paper's own stated assumptions and does not reduce any result to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math A DAG admits a well-defined least common ancestor for every non-empty subset of its leaves.
- domain assumption The ⊖ operator removes a vertex while preserving all paths between remaining vertices.
invented entities (2)
-
i-regularization operator reg_i
no independent evidence
-
⊖ editing operation
no independent evidence
Reference graph
Works this paper leans on
-
[2]
IEEE Transactions on Knowledge and Data Engineering 16(9):1128–1142, DOI 10.1109/TKDE.2004
van der Aalst WMP (2004) Workflow mining: Discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering 16(9):1128–1142, DOI 10.1109/TKDE.2004. 47
-
[3]
SIAM Journal on Com- puting 10(3):405–421, DOI 10.1137/0210030
Aho A V , Sagiv Y , Szymanski TG, Ullman JD (1981) Inferring a tree from lowest common an- cestors with an application to the optimization of relational expressions. SIAM Journal on Com- puting 10(3):405–421, DOI 10.1137/0210030
-
[4]
Annals of Combinatorics 10(1):19–30, DOI 10.1007/s00026-006-0271-0
Baroni M, Steel M (2006) Accumulation phylogenies. Annals of Combinatorics 10(1):19–30, DOI 10.1007/s00026-006-0271-0
-
[5]
Ann Comb 8:391–408, DOI 10.1007/s00026-004-0228-0
Baroni M, Semple C, Steel M (2005) A framework for representing reticulate evolution. Ann Comb 8:391–408, DOI 10.1007/s00026-004-0228-0
-
[6]
Systematic Biology 55(1):46–56, DOI 10.1080/10635150500431197
Baroni M, Semple C, Steel M (2006) Hybrids in real time. Systematic Biology 55(1):46–56, DOI 10.1080/10635150500431197
-
[7]
IEEE/ACM Trans Comp Biol Bioinf 6:552–569, DOI 10.1109/TCBB.2007.70270
Cardona G, Rossell ´o F, Valiente G (2009) Comparison of tree-child phylogenetic networks. IEEE/ACM Trans Comp Biol Bioinf 6:552–569, DOI 10.1109/TCBB.2007.70270
-
[8]
Theoret- ical Computer Science 796:129–146, DOI 10.1016/j.tcs.2019.09.003
D ¨ocker J, Linz S, Semple C (2019) Displaying trees across two phylogenetic networks. Theoret- ical Computer Science 796:129–146, DOI 10.1016/j.tcs.2019.09.003
-
[9]
Journal of Theoretical Biology 265(4):535–542, DOI 10.1016/j.jtbi.2010
Dress A, Moulton V , Steel M, Wu T (2010) Species, clusters and the ’tree of life’: A graph- theoretic perspective. Journal of Theoretical Biology 265(4):535–542, DOI 10.1016/j.jtbi.2010. 05.031
-
[10]
Journal of Theoretical Biology 614:112236, DOI 10.1016/j.jtbi.2025.112236
Francis A (2025) ”Normal” phylogenetic networks may be emerging as the leading class. Journal of Theoretical Biology 614:112236, DOI 10.1016/j.jtbi.2025.112236
-
[11]
Molecular Phyloge- netics and Evolution 163:107215, DOI 10.1016/j.ympev.2021.107215
Francis A, Huson DH, Steel M (2021) Normalising phylogenetic networks. Molecular Phyloge- netics and Evolution 163:107215, DOI 10.1016/j.ympev.2021.107215
-
[12]
Journal of Mathematical Biology 80(3):865–953, DOI 10.1007/s00285-019-01444-2
Geiß M, Stadler PF, Hellmuth M (2020) Reciprocal best match graphs. Journal of Mathematical Biology 80(3):865–953, DOI 10.1007/s00285-019-01444-2
-
[13]
The MIT Press, DOI 10.7551/mitpress/9432.001.0001
Gusfield D (2014) ReCombinatorics: the algorithmics of ancestral recombination graphs and explicit phylogenetic networks. The MIT Press, DOI 10.7551/mitpress/9432.001.0001
-
[14]
Bulletin of Mathematical Biology 87(2):20, DOI 10.1007/s11538-024-01398-7
Heiss J, Huson DH, Steel M (2025) Transformations to simplify phylogenetic networks. Bulletin of Mathematical Biology 87(2):20, DOI 10.1007/s11538-024-01398-7
-
[15]
Discrete Applied Mathematics 378:584–593, DOI 10.1016/j.dam.2025.08.037
Hellmuth M, Lindeberg A (2026) Characterizing and transforming DAGs within the I-lca frame- work. Discrete Applied Mathematics 378:584–593, DOI 10.1016/j.dam.2025.08.037
-
[16]
Theory in Biosciences 142(4):301–358, DOI 10.1007/s12064-023-00398-w
Hellmuth M, Schaller D, Stadler PF (2023) Clustering systems of phylogenetic networks. Theory in Biosciences 142(4):301–358, DOI 10.1007/s12064-023-00398-w
-
[17]
Journal of Classification 36(2):200–231, DOI 10.1007/s00357-018-9279-5
Huber KT, Moulton V , Wu T (2019) Hierarchies from lowest stable ancestors in nonbinary phy- logenetic networks. Journal of Classification 36(2):200–231, DOI 10.1007/s00357-018-9279-5
-
[18]
Genome Biol Evol 3:23–35, DOI 10.1093/gbe/evq077
Huson DH, Scornavacca C (2011) A survey of combinatorial methods for phylogenetic networks. Genome Biol Evol 3:23–35, DOI 10.1093/gbe/evq077
- [19]
-
[20]
Bull Math Biol 87(3):44, DOI 10.1007/ s11538-025-01419-z 23
Lindeberg A, Hellmuth M (2025) Simplifying and characterizing DAGs and phylogenetic networks via least common ancestor constraints. Bull Math Biol 87(3):44, DOI 10.1007/ s11538-025-01419-z 23
work page 2025
-
[21]
Journal of Mathematical Biology 81(4):961–980, DOI 10.1007/ s00285-020-01533-7
Linz S, Semple C (2020) Caterpillars on three and four leaves are sufficient to reconstruct binary normal networks. Journal of Mathematical Biology 81(4):961–980, DOI 10.1007/ s00285-020-01533-7
work page 2020
-
[22]
Moret B, Nakhleh L, Warnow T, Linder C, Tholse A, Padolina A, Sun J, Timme R (2004) Phylo- genetic networks: modeling, reconstructibility, and accuracy. IEEE/ACM Transactions on Com- putational Biology and Bioinformatics 1(1):13–23, DOI 10.1109/TCBB.2004.10
-
[23]
Cambridge University Press, DOI 10.1142/S0218126698000043
Pearl J (2009) Causality: Models, Reasoning, and Inference, 2nd edn. Cambridge University Press, DOI 10.1142/S0218126698000043
-
[24]
Oxford University Press, Oxford, UK
Semple C, Steel M (2003) Phylogenetics, Oxford Lecture Series in Mathematics and its Appli- cations, vol 24. Oxford University Press, Oxford, UK
work page 2003
-
[25]
Shanavas A V , Changat M, Hellmuth M, Stadler PF (2024) Unique least common ancestors and clusters in directed acyclic graphs. In: Kalyanasundaram S, Maheshwari A (eds) Algorithms and Discrete Applied Mathematics, Springer Nature Switzerland, Cham, pp 148–161, DOI 10.1007/ 978-3-031-52213-0 11
work page 2024
-
[26]
Algorithms for Molecular Biology 15(1):5, DOI 10.1186/s13015-020-00165-2
Stadler PF, Geiß M, Schaller D, L ´opez S´anchez A, Gonz´alez Laffitte M, Valdivia DI, Hellmuth M, Hern ´andez Rosales M (2020) From pairs of most similar sequences to phylogenetic best matches. Algorithms for Molecular Biology 15(1):5, DOI 10.1186/s13015-020-00165-2
-
[27]
Willson S (2012) CSD Homomorphisms between Phylogenetic Networks. IEEE/ACM Transac- tions on Computational Biology and Bioinformatics 9(4):1128–1138, DOI 10.1109/TCBB.2012. 52
-
[28]
Bulletin of Mathematical Biology 72(2):340–358, DOI 10.1007/s11538-009-9449-z
Willson SJ (2010) Properties of normal phylogenetic networks. Bulletin of Mathematical Biology 72(2):340–358, DOI 10.1007/s11538-009-9449-z
-
[29]
Bulletin of Mathe- matical Biology 73(10):2322–2338, DOI 10.1007/s11538-010-9624-2
Willson SJ (2011) Restricted trees: Simplifying networks with bottlenecks. Bulletin of Mathe- matical Biology 73(10):2322–2338, DOI 10.1007/s11538-010-9624-2
-
[30]
Annals of Combinatorics 20(4):917–938, DOI 10.1007/s00026-016-0324-y
Willson SJ (2016) Comparing and simplifying distinct-cluster phylogenetic networks. Annals of Combinatorics 20(4):917–938, DOI 10.1007/s00026-016-0324-y
-
[31]
Bulletin of Mathematical Biology 84(2):26, DOI 10.1007/s11538-021-00986-1 24
Willson SJ (2022) Merging arcs to produce acyclic phylogenetic networks and normal networks. Bulletin of Mathematical Biology 84(2):26, DOI 10.1007/s11538-021-00986-1 24
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.