pith. sign in

arxiv: 2501.17622 · v2 · pith:7BDMYBOGnew · submitted 2025-01-29 · 🧮 math.ST · math.PR· q-bio.PE· stat.TH

Likelihood landscape of binary latent model on a tree

Pith reviewed 2026-05-23 05:13 UTC · model grok-4.3

classification 🧮 math.ST math.PRq-bio.PEstat.TH
keywords CFN modellikelihood landscapereconstruction regimestrong concavityHessian decaylatent tree modelscoordinate maximizationphylogenetics
0
0 comments X

The pith

Deep inside the reconstruction regime, the population log-likelihood of the CFN model is strongly concave in a box around the true parameter whose size does not depend on tree topology or number of leaves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for the Cavender-Farris-Neyman model on trees, when parameters lie deep inside the regime allowing tree reconstruction from leaf observations, the log-likelihood becomes strongly concave and smooth in a neighborhood of the true parameter. The neighborhood radius stays fixed regardless of how large or shaped the tree is. This local regularity justifies why coordinate-wise maximization algorithms reliably find good estimates in practice, even though the function is not concave everywhere. It also implies fast convergence rates and consistency for the maximum likelihood estimator under polynomial sample sizes.

Core claim

Sufficiently deep inside the reconstruction regime, the population log-likelihood is strongly concave and smooth within a box around the true parameter, whose size is independent of tree topology and number of leaves. The analysis relies on a decay property of the population Hessian in which diagonal entries remain large while off-diagonal entries decay exponentially with graph distance.

What carries the argument

The decay property of the population Hessian, where diagonal entries stay large and off-diagonal entries decay exponentially with graph distance, which produces strong concavity inside a fixed-size box.

If this is right

  • The empirical log-likelihood shares the strong concavity and smoothness with high probability under polynomial sample complexity.
  • Coordinate maximization converges exponentially fast to an O(1/sqrt(m))-consistent maximum likelihood estimator.
  • These properties supply rigorous justification for the practical effectiveness of likelihood-based methods in tree inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Hessian decay may apply to other latent variable models on graphs.
  • Optimization guarantees may scale to arbitrarily large trees due to size-independent box.
  • Similar analysis could apply to non-tree graphical models or different state spaces.

Load-bearing premise

The parameters must lie sufficiently deep inside the reconstruction regime, where the tree topology remains identifiable from leaf data with high probability.

What would settle it

Finding a sequence of parameters deep in the reconstruction regime for which the population Hessian fails to have off-diagonal entries decaying exponentially with distance, or for which the concavity region shrinks with the number of leaves.

read the original abstract

We investigate the optimization landscape of maximum likelihood estimation (MLE) for the Cavender-Farris-Neyman (CFN) model, a two-state latent tree model fundamental to statistical phylogenetics and the ferromagnetic Ising model. Although the log-likelihood function is non-concave and may admit many critical points, simple coordinate maximization algorithms are remarkably effective in practice. We provide the first theoretical justification for this success. We prove that sufficiently deep inside the reconstruction regime, the population log-likelihood is strongly concave and smooth within a box around the true parameter, whose size is independent of tree topology and number of leaves. This fundamental result implies that the empirical landscape shares these regularity properties with high probability given polynomial sample complexity and also that coordinate maximization converges exponentially fast to an $O(1/\sqrt{m})$-consistent MLE. Our analysis centers on a novel decay property of the population Hessian: diagonal entries remain large while off-diagonal entries decay exponentially with graph distance. These results provide rigorous theoretical evidence for the efficacy of likelihood-based tree inference and suggest broader principles for latent variable models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that for the Cavender-Farris-Neyman (CFN) binary latent tree model, sufficiently deep inside the reconstruction regime the population log-likelihood is strongly concave and smooth inside a box around the true parameter whose radius is independent of tree topology and number of leaves. The proof relies on a decay property of the population Hessian: large diagonal entries and off-diagonal entries that decay exponentially in graph distance. This is used to conclude that coordinate maximization converges exponentially fast to an O(1/sqrt(m))-consistent MLE with polynomial sample complexity.

Significance. If the central claim holds, the result supplies the first rigorous explanation for the observed effectiveness of simple coordinate-ascent heuristics in phylogenetic MLE under the CFN model and identifies a Hessian decay mechanism that may apply more broadly to latent-variable models on trees. The explicit identification of the decay property as the key technical ingredient is a clear strength.

major comments (2)
  1. [Abstract / main theorem statement] Abstract and main theorem: the asserted uniformity of the strong-concavity constant and box radius with respect to the number of leaves is incompatible with the branching structure of binary trees. Off-diagonal Hessian entries decay at rate r^d where r is bounded below by the minimal edge correlation alpha; inside the reconstruction regime one must have alpha > 1/sqrt(2) ≈ 0.707, so the row-sum bound involves terms of order (2r)^d. The geometric series sum_d (2r)^d diverges for any such alpha, implying that the minimal eigenvalue can approach zero as depth (hence number of leaves) grows. This directly undermines the claimed independence from m.
  2. [Abstract / reconstruction-regime hypothesis] The quantitative threshold defining 'sufficiently deep inside the reconstruction regime' is left unspecified. Without an explicit lower bound on alpha that forces the effective decay factor below 1/2, the claimed uniformity cannot be verified and the result remains conditional on an unstated parameter regime whose existence is not demonstrated.
minor comments (1)
  1. Notation for the CFN parameters (edge correlations theta_e) and the precise definition of the population Hessian should be introduced earlier and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The comments highlight important points about uniformity and explicit constants. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses
  1. Referee: [Abstract / main theorem statement] Abstract and main theorem: the asserted uniformity of the strong-concavity constant and box radius with respect to the number of leaves is incompatible with the branching structure of binary trees. Off-diagonal Hessian entries decay at rate r^d where r is bounded below by the minimal edge correlation alpha; inside the reconstruction regime one must have alpha > 1/sqrt(2) ≈ 0.707, so the row-sum bound involves terms of order (2r)^d. The geometric series sum_d (2r)^d diverges for any such alpha, implying that the minimal eigenvalue can approach zero as depth (hence number of leaves) grows. This directly undermines the claimed independence from m.

    Authors: The referee correctly notes that a direct row-sum or Gershgorin argument on the Hessian would fail to deliver a uniform eigenvalue bound because of branching. However, our proof of strong concavity does not proceed via diagonal dominance or row-sum estimates. It instead uses the explicit recursive structure of the CFN likelihood and the tree metric to control the quadratic form directly, showing that the minimal eigenvalue remains bounded below by a positive constant (depending only on the minimal edge correlation alpha when alpha is sufficiently close to 1) independently of tree size. The off-diagonal decay is with respect to graph distance on the edge set, and the tree recursion cancels the branching factor in the quadratic-form analysis. We will add a clarifying remark in the revised manuscript explaining why the row-sum divergence does not contradict the claimed uniformity. revision: no

  2. Referee: [Abstract / reconstruction-regime hypothesis] The quantitative threshold defining 'sufficiently deep inside the reconstruction regime' is left unspecified. Without an explicit lower bound on alpha that forces the effective decay factor below 1/2, the claimed uniformity cannot be verified and the result remains conditional on an unstated parameter regime whose existence is not demonstrated.

    Authors: We agree that an explicit numerical threshold would improve clarity and verifiability. In the revision we will state a concrete lower bound (for example alpha ≥ 0.85) on the minimal edge correlation under which the strong-concavity constant and box radius are uniform, and we will verify that this regime lies strictly inside the reconstruction regime and is non-empty. revision: yes

Circularity Check

0 steps flagged

No circularity; strong concavity derived from direct population Hessian analysis

full rationale

The central claim is established by proving a decay property of the population Hessian (diagonal entries large, off-diagonals decay exponentially in graph distance) directly from the CFN model definition and the reconstruction regime assumption. This yields strong concavity inside a topology-independent box without any reduction to fitted inputs, self-definitional parameters, or load-bearing self-citations. The derivation chain is self-contained against the model equations and does not invoke prior author results as uniqueness theorems or ansatzes. No steps satisfy the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on the standard definition of the CFN model and the reconstruction regime from the phylogenetics literature; no new free parameters or invented entities are introduced.

axioms (2)
  • domain assumption The CFN model is a two-state Markov process on a tree with edge-specific mutation probabilities.
    Standard setup for the binary latent tree model referenced throughout the abstract.
  • domain assumption The reconstruction regime is the parameter region in which the tree topology is identifiable from leaf observations with high probability.
    The concavity statement is explicitly conditioned on being sufficiently deep inside this regime.

pith-pipeline@v0.9.0 · 5721 in / 1418 out tokens · 59532 ms · 2026-05-23T05:13:03.602884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Balakrishnan, M

    S. Balakrishnan, M. J. Wainwright, and B. Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77–120, 2017

  2. [2]

    Borgs, J

    C. Borgs, J. Chayes, E. Mossel, and S. Roch. The Kesten-Stigum reconstruction bound is tight for roughly sym- metric binary channels. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 518–530. IEEE, 2006

  3. [3]

    J. A. Cavender. Taxonomy with confidence. Mathematical biosciences, 40(3-4):271–280, 1978

  4. [4]

    Chen and Y

    X. Chen and Y . Chen. Inference for high-dimensional sparse econometric models.Annual Review of Economics, 11:291–317, 2019

  5. [5]

    Y . Chi, Y . M. Lu, and Y . Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.Trans. Sig. Proc., 67(20):5239–5269, Oct. 2019

  6. [6]

    B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Molecular Biology and Evolution, 17(10):1529–1541, 2000

  7. [7]

    Clancy, Jr., H

    D. Clancy, Jr., H. Lyu, S. Roch, and A. Sly. Likelihood-based root state reconstruction on a tree: Robustness to parameters. arXiv:2501.13208, 2025

  8. [8]

    Dagan, V

    Y . Dagan, V . Kandiros, and C. Daskalakis. Em’s convergence in gaussian latent tree models. In Conference on Learning Theory, pages 2597–2667. PMLR, 2022

  9. [9]

    Dinh and F

    V . Dinh and F. A. M. IV . The shape of the one-dimensional phylogenetic likelihood function. The Annals of Applied Probability, 27(3):1646 – 1677, 2017

  10. [10]

    J. S. Farris. A probability model for inferring evolutionary trees. Systematic Biology, 22(3):250–256, 1973. 40 DA VID CLANCY , JR., HANBAEK LYU, AND SEBASTIEN ROCH

  11. [11]

    Felsenstein

    J. Felsenstein. Evolutionary trees from dna sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17(6):368–376, 1981

  12. [12]

    Felsenstein

    J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts, 2004

  13. [13]

    Fukami and Y

    K. Fukami and Y . Tateno. On the maximum likelihood method for estimating molecular trees: uniqueness of the likelihood point. Journal of molecular evolution, 28:460–464, 1989

  14. [14]

    L. D. García Puente, M. Garrote-López, and E. Shehu. Computing algebraic degrees of phylogenetic varieties. Algebraic Statistics, 14(2):215–231, 2024

  15. [15]

    Guindon, J.-F

    S. Guindon, J.-F. Dufayard, V . Lefort, M. Anisimova, W. Hordijk, and O. Gascuel. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology, 59(3):307–321, 2010

  16. [16]

    Guindon and O

    S. Guindon and O. Gascuel. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology, 52(5):696–704, 2003

  17. [17]

    C. Ma, K. Wang, Y . Chi, and Y . Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution.Foundations of Computational Mathematics, 20:451–632, 2020

  18. [18]

    J. Neyman. Molecular studies of evolution: a source of novel statistical problems. In Statistical decision theory and related topics, pages 1–27. Elsevier, 1971

  19. [19]

    Nguyen, H

    L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, and B. Q. Minh. IQ-TREE: a fast and e ffective stochastic algo- rithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1):268–274, 2015

  20. [20]

    J. S. Rogers and D. L. Swofford. Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Molecular Biology and Evolution, 16(8):1079–1085, 1999

  21. [21]

    Stamatakis

    A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioin- formatics, 30(9):1312–1313, 2014

  22. [22]

    M. Steel. The maximum likelihood point for a phylogenetic tree is not unique. Systematic Biology, 43(4):560– 564, 1994

  23. [23]

    Z. Yang. Molecular Evolution: A Statistical Approach. Oxford University Press, Oxford, 2014. Da vidClancy, Jr., Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: dclancy@math.wisc.edu Hanbaek Lyu, Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: hlyu@math.wisc.edu Sebasti...