Bounds on the sequence length sufficient to reconstruct binary level-1 phylogenetic networks under the CFN model
Pith reviewed 2026-05-17 04:34 UTC · model grok-4.3
The pith
Binary level-1 phylogenetic networks can be reconstructed from finite genetic sequences whose length scales logarithmically, polynomially, or polylogarithmically with the number of taxa under the CFN model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present an inference algorithm that takes as input genetic sequence data and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. Novel inference rules for quartet data in the semi-directed phylogenetic network setting are also provided.
What carries the argument
An inference algorithm that combines novel quartet inference rules for semi-directed networks with statistically consistent reconstruction procedures for binary level-1 networks.
If this is right
- Correct reconstruction occurs with high probability whenever sequence length satisfies the derived scaling bounds.
- The method applies directly to any binary level-1 semi-directed network with vertex-disjoint cycles.
- Reconstruction remains possible even as the number of taxa grows, provided sequence length follows the appropriate regime-dependent bound.
Where Pith is reading between the lines
- The same scaling analysis could be extended to other sequence evolution models or to networks that allow intersecting cycles.
- Experimental designs in genomics could use these bounds to decide how much data is needed before attempting network inference on reticulate histories.
- The novel quartet rules might serve as building blocks for consistency proofs in broader classes of phylogenetic networks.
Load-bearing premise
The underlying inference methods are statistically consistent and recover the true network from infinite data, and the input networks are binary level-1 semi-directed networks with vertex-disjoint cycles.
What would settle it
A specific binary level-1 network and choice of parameters where the probability of correct reconstruction falls below the high-probability threshold even when the input sequence length meets the claimed polylogarithmic bound.
Figures
read the original abstract
Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-$1$ semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex-disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an inference algorithm for binary level-1 semi-directed phylogenetic networks that takes genetic sequence data as input under the CFN model. It establishes explicit bounds showing that the sequence length sufficient for high-probability correct reconstruction scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime (specifically, how far the relevant quartet probabilities are from 1/2). Novel inference rules for quartet data in the semi-directed setting are introduced to support the algorithm.
Significance. If the bounds hold, this is a valuable first step toward finite-data guarantees for network reconstruction, extending well-studied tree results to a class of networks with existing consistent methods. The explicit regime-dependent scalings and the combination of quartet consistency with concentration inequalities provide concrete, falsifiable predictions about data requirements that could guide experimental design.
major comments (2)
- [Main theorems] Main theorems (around the derivation of the three scaling regimes): the union bound over O(n^4) quartets is applied to obtain the overall error probability, but the per-quartet success probability in the polylogarithmic regime is stated to be 1/polylog(n) away from 1/2; it is not immediately clear from the text whether this separation holds uniformly for quartets that include reticulation edges or only for tree-like quartets.
- [Quartet inference rules] Section on novel quartet inference rules: while the rules are defined and claimed to be statistically consistent, the extension of infinite-data consistency to the semi-directed case (accounting for the direction of reticulation edges) is used as a black box in the finite-sample analysis; a short self-contained argument or reference to the precise consistency theorem would make the load-bearing step more transparent.
minor comments (3)
- [Abstract] The abstract states the three scaling regimes but does not briefly indicate the corresponding conditions on edge lengths or reticulation parameters; adding one sentence would improve accessibility.
- [Preliminaries] Notation for the CFN parameters (e.g., edge lengths and reticulation probabilities) is introduced but could be collected in a single table or preliminary section for easier reference when reading the bounds.
- [Discussion] A few sentences in the discussion could explicitly contrast the obtained bounds with known sequence-length requirements for tree reconstruction under the same model.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable suggestions. We address the major comments point by point below. The revisions we have made aim to enhance the clarity and transparency of our proofs and arguments without altering the main results.
read point-by-point responses
-
Referee: [Main theorems] Main theorems (around the derivation of the three scaling regimes): the union bound over O(n^4) quartets is applied to obtain the overall error probability, but the per-quartet success probability in the polylogarithmic regime is stated to be 1/polylog(n) away from 1/2; it is not immediately clear from the text whether this separation holds uniformly for quartets that include reticulation edges or only for tree-like quartets.
Authors: We appreciate the referee's careful attention to the details of our main theorems. The separation from 1/2 in the polylogarithmic regime is indeed uniform and holds for all quartets, including those that incorporate reticulation edges. This uniformity arises because our bounds on the quartet probabilities under the CFN model on binary level-1 networks account for the possible presence of reticulation events through the minimum edge length parameter and the reticulation probability, ensuring a positive separation that scales appropriately with n in the relevant regime. To address the lack of immediate clarity, we have revised the text in the proof of the main theorem to explicitly note that the per-quartet error probability bound applies uniformly across all O(n^4) quartets, with a parenthetical remark confirming its validity for reticulation-involving quartets. This is a partial revision as the underlying mathematics already supported this, but the exposition has been improved. revision: partial
-
Referee: [Quartet inference rules] Section on novel quartet inference rules: while the rules are defined and claimed to be statistically consistent, the extension of infinite-data consistency to the semi-directed case (accounting for the direction of reticulation edges) is used as a black box in the finite-sample analysis; a short self-contained argument or reference to the precise consistency theorem would make the load-bearing step more transparent.
Authors: We agree with the referee that greater transparency in the consistency argument would be beneficial. While the rules are statistically consistent by construction in the infinite-data limit, we acknowledge that the extension to the semi-directed setting was referenced rather than fully derived in the finite-sample section. In the revised manuscript, we have added a concise self-contained argument in Section 4 that demonstrates how the novel inference rules recover both the topology and the directions of reticulation edges from the limiting quartet probabilities under the CFN model. This argument builds directly on the probability calculations presented earlier in the paper. Additionally, we have included a reference to the relevant consistency result for semi-directed network reconstruction. This constitutes a full revision to the section in question. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper explicitly constructs novel quartet inference rules for binary level-1 semi-directed networks and derives sequence-length bounds via union bounds over O(n^4) quartets combined with per-quartet concentration probabilities under the CFN model. These probabilities are bounded away from 1/2 in different regimes (constant, 1/poly(n), or 1/polylog(n)) depending on edge lengths and reticulation parameters, yielding the claimed logarithmic, polynomial, or polylogarithmic scalings directly from the probabilistic analysis. The infinite-data consistency of the underlying methods is invoked as a black-box starting point but is not required to be re-derived here; the finite-sample error analysis stands independently without reducing any claimed bound to a fitted parameter, self-definition, or load-bearing self-citation chain. No step equates a prediction to its own input by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The CFN model accurately describes sequence evolution for the purpose of the bounds.
- domain assumption Input networks are binary level-1 semi-directed with vertex-disjoint cycles.
Reference graph
Works this paper leans on
-
[1]
The tree of blobs of a species network: identifiability under the coalescent
E. S. Allman, H. Baños, J. D. Mitchell, and J. A. Rhodes. “The tree of blobs of a species network: identifiability under the coalescent”. In:Journal of Mathematical Biology86.1 (2023), p. 10
work page 2023
-
[2]
TINNiK: inference of the tree of blobs of a species network un- der the coalescent model
E. S. Allman, H. Baños, J. D. Mitchell, and J. A. Rhodes. “TINNiK: inference of the tree of blobs of a species network un- der the coalescent model”. In:Algorithms for Molecular Biology19.1 (2024), p. 23. https://doi.org/10.1186/s13015- 024-00266-2
-
[3]
NANUQ: a method for inferring species networks from gene trees under the coalescent model
E. S. Allman, H. Baños, and J. A. Rhodes. “NANUQ: a method for inferring species networks from gene trees under the coalescent model”. In:Algorithms for Molecular Biology14.1 (2019), p. 24
work page 2019
-
[4]
NANUQ +: A divide-and-conquer approach to network estimation
E. S. Allman, H. Baños, J. A. Rhodes, and K. Wicke. “NANUQ +: A divide-and-conquer approach to network estimation”. In:Algorithms for Molecular Biology20.1 (2025), p. 14. [5]N. Alon and J. H. Spencer.The Probabilistic Method. John Wiley & Sons, New York, 1992
work page 2025
-
[5]
Reconstructing the shape of a tree from observed dissimilarity data
H.-J. Bandelt and A. Dress. “Reconstructing the shape of a tree from observed dissimilarity data”. In:Advances in Applied Mathematics7.3 (1986), pp. 309–343
work page 1986
-
[6]
Identifying species network features from gene tree quartets under the coalescent model
H. Baños. “Identifying species network features from gene tree quartets under the coalescent model”. In:Bulletin of Mathematical Biology81 (2019), pp. 494–534. 20
work page 2019
-
[7]
Networks: expanding evolutionary thinking
E. Bapteste, L. van Iersel, A. Janke, S. Kelchner, S. Kelk, J. O. McInerney, D. A. Morrison, L. Nakhleh, M. Steel, L. Stougie, et al. “Networks: expanding evolutionary thinking”. In:Trends in Genetics29.8 (2013), pp. 439–441. [9]P . Buneman. “A note on the metric properties of trees”. In:J. Combin. Theory Ser. B17.1 (1974), pp. 48–50
work page 2013
-
[8]
Tree structures for proximity data
H. Colonius and H.-H. Schulze. “Tree structures for proximity data”. In:British Journal of Mathematical and Statistical Psychology34.2 (1981), pp. 167–180
work page 1981
-
[9]
Evolutionary trees can be learned in polynomial time in the two-state general Markov model
M. Cryan, L. A. Goldberg, and P . W . Goldberg. “Evolutionary trees can be learned in polynomial time in the two-state general Markov model”. In:SIAM Journal on Computing31.2 (2001), pp. 375–397. [12]M. Cs ˝urös and M.-Y. Kao. “Recovering evolutionary trees through harmonic greedy triplets”. In:Proceedings of the Tenth Annual ACM-SIAM Symposium on Discret...
work page 2001
-
[10]
Identifiability of Phylogenetic Level-2 Networks under the Jukes-Cantor Model
A. K. Englander, M. Frohn, E. Gross, N. Holtgrefe, L. van Iersel, M. Jones, and S. Sullivant. “Identifiability of Phylogenetic Level-2 Networks under the Jukes-Cantor Model”. In:bioRxiv(2025), pp. 2025–04
work page 2025
-
[11]
A few logs suffice to build (almost) all trees (I)
P . L. Erd˝os, M. A. Steel, L. A. Székely, and T . J. Warnow. “A few logs suffice to build (almost) all trees (I)”. In: Random Structures & Algorithms14.2 (1999), pp. 153–184
work page 1999
-
[12]
A few logs suffice to build (almost) all trees: Part II
P . L. Erd˝os, M. A. Steel, L. A. Székely, and T . J. Warnow. “A few logs suffice to build (almost) all trees: Part II”. In: Theoretical Computer Science221.1-2 (1999), pp. 77–118
work page 1999
-
[13]
Reconstructing semi-directed level-1 networks using few quarnets
M. Frohn, N. Holtgrefe, L. van Iersel, M. Jones, and S. Kelk. “Reconstructing semi-directed level-1 networks using few quarnets”. In:Journal of Computer and System Sciences152 (2025), p. 103655
work page 2025
-
[14]
Distinguishing phylogenetic networks
E. Gross and C. Long. “Distinguishing phylogenetic networks”. In:SIAM Journal on Applied Algebra and Geometry 2.1 (2018), pp. 72–93
work page 2018
-
[15]
M. Hill and S. Roch. “Lower Bounds on the Sample Complexity of Species Tree Estimation when Substitution Rates Vary Across Loci”. In:Bulletin of Mathematical Biology87.152 (2025)
work page 2025
-
[16]
N. Holtgrefe, K. T . Huber, L. van Iersel, M. Jones, S. Martin, and V . Moulton. “Squirrel: Reconstructing semi-directed phylogenetic level-1 networks from four-leaved networks or sequence alignments”. In:Molecular Biology and Evolution42.4 (2025), msaf067. https://doi.org/10.1093/molbev/msaf067
-
[17]
Quarnet inference rules for level-1 networks
K. T . Huber, V . Moulton, C. Semple, and T . Wu. “Quarnet inference rules for level-1 networks”. In:Bulletin of Mathematical Biology80 (2018), pp. 2137–2153
work page 2018
-
[18]
Disk-covering, a fast-converging method for phylogenetic tree reconstruction
D. H. Huson, S. M. Nettles, and T . J. Warnow. “Disk-covering, a fast-converging method for phylogenetic tree reconstruction”. In:Journal of Computational Biology6.3-4 (1999), pp. 369–386
work page 1999
-
[19]
Reconstructing a phylogenetic level-1 network from quartets
J. Keijsper and R. Pendavingh. “Reconstructing a phylogenetic level-1 network from quartets”. In:Bulletin of Mathematical Biology76.10 (2014), pp. 2517–2541
work page 2014
-
[20]
On the complexity of distance-based evolutionary tree
V . King, L. Zhang, and Y. Zhou. “On the complexity of distance-based evolutionary tree”. In:Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. 2003, p. 444
work page 2003
-
[21]
Classes of explicit phylogenetic networks and their biological and mathematical significance
S. Kong, J. C. Pons, L. Kubatko, and K. Wicke. “Classes of explicit phylogenetic networks and their biological and mathematical significance”. In:Journal of Mathematical Biology84.6 (2022), p. 47
work page 2022
-
[22]
Inference of phylogenetic networks from sequence data using composite likelihood
S. Kong, D. L. Swofford, and L. S. Kubatko. “Inference of phylogenetic networks from sequence data using composite likelihood”. In:Systematic Biology74.1 (2025), pp. 53–69
work page 2025
-
[23]
Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances
J. A. Lake. “Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances.” In:Proceedings of the National Academy of Sciences91.4 (1994), pp. 1455–1459
work page 1994
-
[24]
Algebraic invariants for inferring 4-leaf semi-directed phylogenetic networks
S. Martin, N. Holtgrefe, V . Moulton, and R. M. Leggett. “Algebraic invariants for inferring 4-leaf semi-directed phylogenetic networks”. In:Systematic Biology(2025), syaf071
work page 2025
-
[25]
On the impossibility of reconstructing ancestral data and phylogenies
E. Mossel. “On the impossibility of reconstructing ancestral data and phylogenies”. In:Journal of Computational Biology10.5 (2003), pp. 669–676
work page 2003
-
[26]
Phase transitions in phylogeny
E. Mossel. “Phase transitions in phylogeny”. In:Transactions of the American Mathematical Society356.6 (2004), pp. 2379–2404
work page 2004
-
[27]
Molecular studies of evolution: a source of novel statistical problems
J. Neyman. “Molecular studies of evolution: a source of novel statistical problems”. In:Statistical decision theory and related topics. Elsevier, 1971, pp. 1–27
work page 1971
-
[28]
A note on the tree realizability of a distance matrix
J. S. Pereira. “A note on the tree realizability of a distance matrix”. In:Journal of Combinatorial Theory6.3 (1969), pp. 303–310
work page 1969
-
[29]
Identifying circular orders for blobs in phylogenetic networks
J. A. Rhodes, H. Baños, J. Xu, and C. Ané. “Identifying circular orders for blobs in phylogenetic networks”. In: Advances in Applied Mathematics163 (2025), p. 102804. [33]C. Semple and M. Steel.Phylogenetics. Vol. 24. Oxford University Press on Demand, 2003
work page 2025
-
[30]
Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
C. Solís-Lemus and C. Ané. “Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting”. In:PLoS genetics12.3 (2016), e1005896. https://doi.org/10.1371/journal.pgen.1005896
-
[31]
Recovering a tree from the leaf colourations it generates under a Markov model
M. Steel. “Recovering a tree from the leaf colourations it generates under a Markov model”. In:Applied Mathematics Letters7.2 (1994), pp. 19–23
work page 1994
-
[32]
Absolute convergence: true trees from short sequences
T . Warnow, B. M. Moret, and K. St. John. “Absolute convergence: true trees from short sequences”. In:Symposium on Discrete Algorithms: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms. Vol. 7. 09. 2001, pp. 186–195
work page 2001
-
[33]
Advances in estimating level-1 phylogenetic networks from unrooted SNPs
T . Warnow, Y. Tabatabaee, and S. N. Evans. “Advances in estimating level-1 phylogenetic networks from unrooted SNPs”. In:Journal of Computational Biology32.1 (2025), pp. 3–27
work page 2025
-
[34]
Identifiability of local and global features of phylogenetic networks from average distances
J. Xu and C. Ané. “Identifiability of local and global features of phylogenetic networks from average distances”. In: Journal of Mathematical Biology86.1 (2023), p. 12. 21
work page 2023
-
[35]
New absolute fast converging phylogeny estimation methods with improved scalability and accuracy
Q. R. Zhang, S. Rao, and T . Warnow. “New absolute fast converging phylogeny estimation methods with improved scalability and accuracy”. In:18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Schloss Dagstuhl–Leibniz-Zentrum für Informatik. 2018, pp. 8–1. 22
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.