pith. sign in

arxiv: 2606.26919 · v1 · pith:TIZ35CIVnew · submitted 2026-06-25 · 🧮 math.OC · q-bio.PE

The parental parsimony problem on binary, tree-child phylogenetic networks

Pith reviewed 2026-06-26 03:07 UTC · model grok-4.3

classification 🧮 math.OC q-bio.PE
keywords parental parsimony scorephylogenetic networkstree-child networksapproximation algorithmsNP-hard problemsexact algorithmscomputational biology
0
0 comments X

The pith

The parental parsimony score problem admits a constant-factor approximation on binary semi-simplex tree-child phylogenetic networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the parental parsimony score problem, which seeks the most parsimonious tree inside a phylogenetic network under models that include hybridization, recombination, allopolyploidy and incomplete lineage sorting, has a constant-factor approximation when leaf labels are fixed and the network belongs to the binary semi-simplex tree-child class. The same class is proven to keep the problem NP-hard. The authors also supply a new exact algorithm for the broader class of binary tree-child networks and test its running time and solution quality on simulated instances. These results supply the first guaranteed-quality algorithmic handle on an otherwise intractable subtask that arises inside phylogenetic network reconstruction.

Core claim

We provide the first constant-factor approximation for the PPS on arbitrary but fixed leaf labels and a class of networks on which the PPS remains NP-hard, namely binary, semi-simplex, tree-child phylogenetic networks. Furthermore, we introduce a novel exact solution algorithm for the PPS on binary, tree-child phylogenetic networks and analyze its performance on simulated data.

What carries the argument

The parental parsimony score (PPS) that extends a fixed leaf labelling to a most-parsimonious tree inside the network while accounting for reticulation events.

If this is right

  • PPS on binary semi-simplex tree-child networks can be solved to within a constant factor of optimality in polynomial time.
  • An exact algorithm computes optimal solutions for the PPS restricted to binary tree-child networks.
  • The exact algorithm's practical performance is measurable on simulated phylogenetic network data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The semi-simplex restriction may be the precise boundary that separates constant-factor approximability from inapproximability inside the tree-child class.
  • Hybrid exact-plus-approximation schemes become feasible once the exact solver is limited to the tree-child case and the approximation handles the semi-simplex case.
  • The same algorithmic pattern could be tested on other parsimony variants that incorporate allopolyploidy or incomplete lineage sorting.

Load-bearing premise

The input networks belong to the restricted class of binary, semi-simplex, tree-child phylogenetic networks.

What would settle it

A family of binary semi-simplex tree-child networks on which every polynomial-time algorithm for PPS returns solutions whose cost exceeds any fixed multiple of the optimum would falsify the constant-factor claim.

Figures

Figures reproduced from arXiv: 2606.26919 by Martin Frohn.

Figure 1
Figure 1. Figure 1: On the left a phylogenetic tree 𝑇 with suppressed edge weights. In red, a character 𝐶 of Γ and one of its extensions 𝐶 ′ . score(𝑇 , 𝐶′ ) = 3 is provably minimum among all extensions of 𝐶. In the middle a phylogenetic network 𝑁 such that 𝑇 and 𝐶 ′ constitute a feasible solution to the SPS for 𝑁 and 𝐶. On the right a phylogenetic tree 𝑇̂ parentally displayed by 𝑁 and an extension 𝐶 ′ of 𝐶 with score(𝑇 , 𝐶 ̂… view at source ↗
Figure 2
Figure 2. Figure 2: Consider the set of taxa Γ = {𝑥1 , 𝑥2 , …, 𝑥6 }. Then, the directed graph on the left shows a phylogenetic network 𝑁 of Γ. The graph in the middle (right) shows a phylogenetic tree 𝑇𝑆 (𝑇𝑃 ) of Γ which is (parentally) displayed by 𝑁. path containing edges 𝑒1 , 𝑒4 and 𝑒5 is only preserved by the tree 𝑇𝑆 and 𝑇𝑃 in the same figure among all feasible solution to the SPS and PPS posed by 𝑁 and any character 𝐶 of… view at source ↗
Figure 3
Figure 3. Figure 3: Two possible subgraphs 𝑁1 and 𝑁2 depicting parents and children of a reticulation vertex 𝑟 and their incidence relations in a rooted, binary, tree-child network 𝑁. Bicolored vertices can be internal vertices or leaves. If the vertices 𝑣𝑟 are roots of subtrees 𝑇 of 𝑁 and 𝑤3 , 𝑤4 ∈ 𝑉 (𝑇 [𝑣𝑟 ]) such that 𝑉ext(𝑇 [𝑤3 ]) ∩ 𝑉ext(𝑇 [𝑤4 ]) ≠ ∅ and 𝑉ext(𝑇 [𝑤3 ]) ∪ 𝑉ext(𝑇 [𝑤4 ]) = 𝑉ext(𝑇 [𝑣𝑟 ]), then 𝑇1 and 𝑇2 can be… view at source ↗
Figure 4
Figure 4. Figure 4: Two intermediate solutions 𝑇 and 𝐶 ′ of Algorithm 1 for network 𝑁 in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The graph on the left is a semi-simplex phylogenetic network 𝑁 ∈ . The graph on the right is a reticulation graph extension 𝑁+ of 𝑁. time. The graph on the right shows the solution to the SPS returned by Algorithm 1. Clearly, the output of Algorithm 1 depends on the insertion order of vertices added to 𝑄 in line 33. Since Proposition 2 applies for any insertion order, we keep the ordering arbitrary but fi… view at source ↗
Figure 6
Figure 6. Figure 6: The two extensions 𝑁+ 1 and 𝑁+ 2 of reticulation graphs 𝑁1 and 𝑁2 in [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: An induced subgraph 𝑁̂ of the reticulation graph extension 𝑁+ 2 from [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A boxplot for the runtime of Algorithm 4 on 16 data sets each containing 25 simulated binary, tree-child networks. The tuples (𝑠, 𝑡) on the x-axis parameterize the networks to have reticulation depth at most 𝑠 and number of character states 𝑡. Furthermore, the figure on the top-left / top-right / bottom-left / bottom-right requires 50/50/100/100 taxa and 5/20/10/40 reticulations, respectively. In addition,… view at source ↗
read the original abstract

Phylogenetic reconstruction is one of the major challenges in computational biology. Among existing reconstruction methods for phylogenetic networks, an important subtask emerges in extending a leaf-labelling on a phylogenetic network to determine a most parsimonious tree inside the network. There exist different variants of this subtask depending on the biological model assumptions for which distinct evolutionary phenomena are captured by the network. In this article we assume that next to hybridization or recombination events, also allopolyploidy or incomplete lineage sorting are present. Then, finding the most parsimonious tree inside the network is called the parental parsimony score problem (PPS), a NP-hard combinatorial optimization problem. We provide the first constant-factor approximation for the PPS on arbitrary but fixed leaf labels and a class of networks on which the PPS remains NP-hard, namely binary, semi-simplex, tree-child phylogenetic networks. Furthermore, we introduce a novel exact solution algorithm for the PPS on binary, tree-child phylogenetic networks and analyze its performance on simulated data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript studies the parental parsimony score (PPS) problem on phylogenetic networks under models that include hybridization, recombination, allopolyploidy and incomplete lineage sorting. It proves NP-hardness of PPS on the class of binary, semi-simplex, tree-child networks, gives the first constant-factor approximation algorithm for the problem with arbitrary but fixed leaf labels on this class, presents a novel exact algorithm for the PPS on binary tree-child networks, and reports an experimental evaluation of the exact algorithm on simulated data.

Significance. If the stated approximation ratio is constant and the exact algorithm is correct, the results supply the first non-trivial algorithmic guarantees for an NP-hard variant of the PPS on a natural, restricted network class where hardness is also established. The empirical analysis of the exact method on simulated instances further supports practical utility in phylogenetic reconstruction.

minor comments (2)
  1. [Abstract] The abstract states that a constant-factor approximation is provided but does not name the factor; the introduction or the statement of the main theorem should give the explicit ratio.
  2. The precise definition of the semi-simplex property and its role in both the hardness proof and the approximation should be recalled or cross-referenced at the beginning of the algorithmic sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work on the parental parsimony score problem, for recognizing its significance, and for recommending minor revision. No major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity; algorithmic result is self-contained

full rationale

The paper states an algorithmic result: a constant-factor approximation for PPS on binary, semi-simplex, tree-child networks (where NP-hardness also holds) plus an exact algorithm for binary tree-child networks. The abstract and claim formulation contain no equations, fitted parameters, predictions, or self-citations that reduce the central claim to its own inputs by construction. The network class restriction is explicitly part of the stated result rather than an unacknowledged circular premise. No load-bearing step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities; the ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5693 in / 1062 out tokens · 54511 ms · 2026-06-26T03:07:55.243336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references

  1. [1]

    Branching rules revisited

    Achterberg, T., Koch, T., Martin, A., 2005. Branching rules revisited. Operations Research Letters 33, 42–54

  2. [2]

    Finding cuts in the TSP (A preliminary report)

    Applegate, D., Bixby, R., Chvátal, V., Cook, W., 1995. Finding cuts in the TSP (A preliminary report). Technical Report 95-05. DIMACS

  3. [3]

    Inferring phylogenies

    Felsenstein, J., 2004. Inferring phylogenies. Sinauer associates Sunderland, MA

  4. [4]

    On computing the maximum parsimon score of a phylogenetic network

    Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C., 2015. On computing the maximum parsimon score of a phylogenetic network. SIAM Journal on Discrete Mathematics 29, 559–585

  5. [5]

    Towarddefiningthecourseofevolution:Minimumchangeforaspecifiedtreetopology

    Fitch,W.M.,1971. Towarddefiningthecourseofevolution:Minimumchangeforaspecifiedtreetopology. SystematicZoology20,406–416

  6. [6]

    A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networks

    Frohn, M., Kelk, S., 2025. A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networks. Annals of Operations Research 345, 125–145

  7. [7]

    A dichotomy law for certain classes of phylogenetic networks

    Fuchs, M., Steel, M., 2025. A dichotomy law for certain classes of phylogenetic networks. Bulletin of Mathematical Biology 87, 153

  8. [8]

    Phylogenetic networks from multi-labelled trees

    Huber, K., Moulton, V., 2006. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology 52, 613–632

  9. [9]

    Folding and unfolding phylogenetic trees and networks

    Huber, K., Moulton, V., Steel, M., Wu, T., 2016. Folding and unfolding phylogenetic trees and networks. Journal of Mathematical Biology 73, 1761–1780

  10. [10]

    Apracticalfixed-parameteralgorithmforconstructingtree-childnetworks from multiple binary trees

    vanIersel,L.,Janssen,R.,Jones,M.,Murakami,Y.,Zeh,N.,2022. Apracticalfixed-parameteralgorithmforconstructingtree-childnetworks from multiple binary trees. Algorithmica 84, 917–960

  11. [11]

    Maximum likelihood of phylogenetic networks

    Jin, G., Nakhleh, L., Snir, S., Tuller, T., 2006. Maximum likelihood of phylogenetic networks. Bioinformatics 22, 2604–2611

  12. [12]

    Approximationalgorithmsforclassificationproblemswithpairwiserelationships:MetriclabelingandMarkov random fields

    Kleinberg,J.,Tardos,E.,2002. Approximationalgorithmsforclassificationproblemswithpairwiserelationships:MetriclabelingandMarkov random fields. Journal of the ACM 49, 616–639. Frohnet al.:Preprint submitted to ElsevierPage 18 of 19 The PPS on binary, tree-child networks

  13. [13]

    Classes of explicit phylogenetic networks and their biological and mathematical significance

    Kong, S., Pons, J.C., Kubatko, L., Wicke, K., 2022. Classes of explicit phylogenetic networks and their biological and mathematical significance. Journal of Mathematical Biology 84

  14. [14]

    NetRAX:accurateandfastmaximumlikelihoodphylogenetic network inference

    Lutteropp,S.,Scornavacca,C.,Kozlov,A.M.,Morel,B.,Stamatakis,A.,2022. NetRAX:accurateandfastmaximumlikelihoodphylogenetic network inference. Bioinformatics 38, 3725–3733

  15. [15]

    Reconstructing phylogenetic networks using maximum parsimony, in: 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05), IEEE

    Nakhleh, L., Jin, G., Zhao, F., Mellor-Crummey, J., 2005. Reconstructing phylogenetic networks using maximum parsimony, in: 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05), IEEE. pp. 93–102

  16. [16]

    Reconstructing recombination network from sequence data: the small parsimony problem

    Nguyen, C.T., Nguyen, N.B., Sung, W.K., Zhang, L., 2007. Reconstructing recombination network from sequence data: the small parsimony problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4, 394–402

  17. [17]

    Machine learning augmented branch and bound for mixed integer linear programming

    Scavuzzo, L., Aardal, K., Lodi, A., Yorke-Smith, N., 2024. Machine learning augmented branch and bound for mixed integer linear programming. Mathematical Programming , 1–44

  18. [18]

    Schmidt, H., Raphael, B.J., 2025. The tree labeling polytope: a unified approach to ancestral reconstruction problems, in: International Conference on Research in Computational Molecular Biology, Stringer Nature Switzerland. pp. 273–276

  19. [19]

    Ngesh: a Python library for synthetic phylogenetic data

    Tresoldi, T., 2021. Ngesh: a Python library for synthetic phylogenetic data. Journal of Open Source Software 6, 3173

  20. [20]

    Improved maximum parsimony models for phylogenetic networks

    Van Iersel, L., Jones, M., Scornavacca, C., 2018. Improved maximum parsimony models for phylogenetic networks. Systematic biology 67, 518–542

  21. [21]

    Locating a tree in a phylogenetic network

    Van Iersel, L., Semple, C., Steel, M., 2010. Locating a tree in a phylogenetic network. Information Processing Letters 110, 1037–1043

  22. [22]

    Polyploidyinfungi:evolutionafterwhole-genomeduplication

    Warren,A.,Marullo,P.,2012. Polyploidyinfungi:evolutionafterwhole-genomeduplication. ProceedingsoftheRoyalSocietyB:Biological Sciences 279, 2497–2509

  23. [23]

    Parsimoniousinferenceofhybridizationinthepresenceofincompletelineagesorting

    Y.,Y.,M.,B.R.,Nakhleh,L.,2013. Parsimoniousinferenceofhybridizationinthepresenceofincompletelineagesorting. SystematicBiology 62, 738–751. Frohnet al.:Preprint submitted to ElsevierPage 19 of 19