Likelihood landscape of binary latent model on a tree

David Clancy Jr.; Hanbaek Lyu; Sebastien Roch

arxiv: 2501.17622 · v2 · pith:7BDMYBOGnew · submitted 2025-01-29 · 🧮 math.ST · math.PR· q-bio.PE· stat.TH

Likelihood landscape of binary latent model on a tree

David Clancy Jr. , Hanbaek Lyu , Sebastien Roch This is my paper

Pith reviewed 2026-05-23 05:13 UTC · model grok-4.3

classification 🧮 math.ST math.PRq-bio.PEstat.TH

keywords CFN modellikelihood landscapereconstruction regimestrong concavityHessian decaylatent tree modelscoordinate maximizationphylogenetics

0 comments

The pith

Deep inside the reconstruction regime, the population log-likelihood of the CFN model is strongly concave in a box around the true parameter whose size does not depend on tree topology or number of leaves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for the Cavender-Farris-Neyman model on trees, when parameters lie deep inside the regime allowing tree reconstruction from leaf observations, the log-likelihood becomes strongly concave and smooth in a neighborhood of the true parameter. The neighborhood radius stays fixed regardless of how large or shaped the tree is. This local regularity justifies why coordinate-wise maximization algorithms reliably find good estimates in practice, even though the function is not concave everywhere. It also implies fast convergence rates and consistency for the maximum likelihood estimator under polynomial sample sizes.

Core claim

Sufficiently deep inside the reconstruction regime, the population log-likelihood is strongly concave and smooth within a box around the true parameter, whose size is independent of tree topology and number of leaves. The analysis relies on a decay property of the population Hessian in which diagonal entries remain large while off-diagonal entries decay exponentially with graph distance.

What carries the argument

The decay property of the population Hessian, where diagonal entries stay large and off-diagonal entries decay exponentially with graph distance, which produces strong concavity inside a fixed-size box.

If this is right

The empirical log-likelihood shares the strong concavity and smoothness with high probability under polynomial sample complexity.
Coordinate maximization converges exponentially fast to an O(1/sqrt(m))-consistent maximum likelihood estimator.
These properties supply rigorous justification for the practical effectiveness of likelihood-based methods in tree inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Hessian decay may apply to other latent variable models on graphs.
Optimization guarantees may scale to arbitrarily large trees due to size-independent box.
Similar analysis could apply to non-tree graphical models or different state spaces.

Load-bearing premise

The parameters must lie sufficiently deep inside the reconstruction regime, where the tree topology remains identifiable from leaf data with high probability.

What would settle it

Finding a sequence of parameters deep in the reconstruction regime for which the population Hessian fails to have off-diagonal entries decaying exponentially with distance, or for which the concavity region shrinks with the number of leaves.

read the original abstract

We investigate the optimization landscape of maximum likelihood estimation (MLE) for the Cavender-Farris-Neyman (CFN) model, a two-state latent tree model fundamental to statistical phylogenetics and the ferromagnetic Ising model. Although the log-likelihood function is non-concave and may admit many critical points, simple coordinate maximization algorithms are remarkably effective in practice. We provide the first theoretical justification for this success. We prove that sufficiently deep inside the reconstruction regime, the population log-likelihood is strongly concave and smooth within a box around the true parameter, whose size is independent of tree topology and number of leaves. This fundamental result implies that the empirical landscape shares these regularity properties with high probability given polynomial sample complexity and also that coordinate maximization converges exponentially fast to an $O(1/\sqrt{m})$-consistent MLE. Our analysis centers on a novel decay property of the population Hessian: diagonal entries remain large while off-diagonal entries decay exponentially with graph distance. These results provide rigorous theoretical evidence for the efficacy of likelihood-based tree inference and suggest broader principles for latent variable models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's claim of tree-size-independent strong concavity for the CFN population likelihood runs into trouble from the branching factor in large trees.

read the letter

The central result here is a proof that, deep enough in the reconstruction regime, the population log-likelihood for the CFN model on a tree is strongly concave inside a box around the true parameters, with the box radius independent of topology and number of leaves. The argument rests on showing that the population Hessian has large diagonals and off-diagonals that decay exponentially in graph distance. This is presented as the first theoretical account for why coordinate maximization works well in practice for these models. If the decay property holds with the stated uniformity, it would also give exponential convergence rates and polynomial sample complexity for the empirical version to inherit the same regularity. That part of the contribution is new relative to prior empirical observations in the phylogenetics literature. The analysis is direct on the population Hessian rather than circular. The stress-test note, however, looks like it lands. In a binary tree the number of edges at distance d grows as 2^d. When the correlation parameter sits above the Kesten-Stigum threshold but not extremely close to 1, the decay rate r satisfies 2r > 1, so the row sums of absolute off-diagonals diverge with depth. This makes it hard to keep the minimal eigenvalue bounded away from zero uniformly in the number of leaves. The claimed independence of the box size therefore appears to require either a stronger regime condition that itself depends on tree size or a different control on the Hessian that the abstract does not indicate. The result is still worth checking in full because the decay idea itself is clean and the practical motivation is real. Readers working on latent tree models or coordinate methods for Ising-type problems would find the Hessian analysis useful even if the uniformity claim needs adjustment. It deserves a serious referee to verify the quantitative thresholds and see whether the branching issue is handled in the proof or simply assumed away by the depth condition.

Referee Report

2 major / 1 minor

Summary. The paper claims that for the Cavender-Farris-Neyman (CFN) binary latent tree model, sufficiently deep inside the reconstruction regime the population log-likelihood is strongly concave and smooth inside a box around the true parameter whose radius is independent of tree topology and number of leaves. The proof relies on a decay property of the population Hessian: large diagonal entries and off-diagonal entries that decay exponentially in graph distance. This is used to conclude that coordinate maximization converges exponentially fast to an O(1/sqrt(m))-consistent MLE with polynomial sample complexity.

Significance. If the central claim holds, the result supplies the first rigorous explanation for the observed effectiveness of simple coordinate-ascent heuristics in phylogenetic MLE under the CFN model and identifies a Hessian decay mechanism that may apply more broadly to latent-variable models on trees. The explicit identification of the decay property as the key technical ingredient is a clear strength.

major comments (2)

[Abstract / main theorem statement] Abstract and main theorem: the asserted uniformity of the strong-concavity constant and box radius with respect to the number of leaves is incompatible with the branching structure of binary trees. Off-diagonal Hessian entries decay at rate r^d where r is bounded below by the minimal edge correlation alpha; inside the reconstruction regime one must have alpha > 1/sqrt(2) ≈ 0.707, so the row-sum bound involves terms of order (2r)^d. The geometric series sum_d (2r)^d diverges for any such alpha, implying that the minimal eigenvalue can approach zero as depth (hence number of leaves) grows. This directly undermines the claimed independence from m.
[Abstract / reconstruction-regime hypothesis] The quantitative threshold defining 'sufficiently deep inside the reconstruction regime' is left unspecified. Without an explicit lower bound on alpha that forces the effective decay factor below 1/2, the claimed uniformity cannot be verified and the result remains conditional on an unstated parameter regime whose existence is not demonstrated.

minor comments (1)

Notation for the CFN parameters (edge correlations theta_e) and the precise definition of the population Hessian should be introduced earlier and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The comments highlight important points about uniformity and explicit constants. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses

Referee: [Abstract / main theorem statement] Abstract and main theorem: the asserted uniformity of the strong-concavity constant and box radius with respect to the number of leaves is incompatible with the branching structure of binary trees. Off-diagonal Hessian entries decay at rate r^d where r is bounded below by the minimal edge correlation alpha; inside the reconstruction regime one must have alpha > 1/sqrt(2) ≈ 0.707, so the row-sum bound involves terms of order (2r)^d. The geometric series sum_d (2r)^d diverges for any such alpha, implying that the minimal eigenvalue can approach zero as depth (hence number of leaves) grows. This directly undermines the claimed independence from m.

Authors: The referee correctly notes that a direct row-sum or Gershgorin argument on the Hessian would fail to deliver a uniform eigenvalue bound because of branching. However, our proof of strong concavity does not proceed via diagonal dominance or row-sum estimates. It instead uses the explicit recursive structure of the CFN likelihood and the tree metric to control the quadratic form directly, showing that the minimal eigenvalue remains bounded below by a positive constant (depending only on the minimal edge correlation alpha when alpha is sufficiently close to 1) independently of tree size. The off-diagonal decay is with respect to graph distance on the edge set, and the tree recursion cancels the branching factor in the quadratic-form analysis. We will add a clarifying remark in the revised manuscript explaining why the row-sum divergence does not contradict the claimed uniformity. revision: no
Referee: [Abstract / reconstruction-regime hypothesis] The quantitative threshold defining 'sufficiently deep inside the reconstruction regime' is left unspecified. Without an explicit lower bound on alpha that forces the effective decay factor below 1/2, the claimed uniformity cannot be verified and the result remains conditional on an unstated parameter regime whose existence is not demonstrated.

Authors: We agree that an explicit numerical threshold would improve clarity and verifiability. In the revision we will state a concrete lower bound (for example alpha ≥ 0.85) on the minimal edge correlation under which the strong-concavity constant and box radius are uniform, and we will verify that this regime lies strictly inside the reconstruction regime and is non-empty. revision: yes

Circularity Check

0 steps flagged

No circularity; strong concavity derived from direct population Hessian analysis

full rationale

The central claim is established by proving a decay property of the population Hessian (diagonal entries large, off-diagonals decay exponentially in graph distance) directly from the CFN model definition and the reconstruction regime assumption. This yields strong concavity inside a topology-independent box without any reduction to fitted inputs, self-definitional parameters, or load-bearing self-citations. The derivation chain is self-contained against the model equations and does not invoke prior author results as uniqueness theorems or ansatzes. No steps satisfy the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on the standard definition of the CFN model and the reconstruction regime from the phylogenetics literature; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption The CFN model is a two-state Markov process on a tree with edge-specific mutation probabilities.
Standard setup for the binary latent tree model referenced throughout the abstract.
domain assumption The reconstruction regime is the parameter region in which the tree topology is identifiable from leaf observations with high probability.
The concavity statement is explicitly conditioned on being sufficiently deep inside this regime.

pith-pipeline@v0.9.0 · 5721 in / 1418 out tokens · 59532 ms · 2026-05-23T05:13:03.602884+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean RCLCombiner_isCoupling_iff; interactionDefect_RCLCombiner echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Z_u = q(θ̂_v Z_v, θ̂_w Z_w) with q(x,y)=(x+y)/(1+xy); Hessian off-diagonals decay as (C9 δ)^⌊(dist(e,f)−1)/4⌋ (Prop. 3.2, Thm. 2.2)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt; LogicNat recovery refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

Strong concavity λ_min(H) ≥ C8/δ − 26 independent of tree size (Cor. 2.3) via Gershgorin + exponential off-diagonal decay

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Balakrishnan, M

S. Balakrishnan, M. J. Wainwright, and B. Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77–120, 2017

work page 2017
[2]

Borgs, J

C. Borgs, J. Chayes, E. Mossel, and S. Roch. The Kesten-Stigum reconstruction bound is tight for roughly sym- metric binary channels. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 518–530. IEEE, 2006

work page 2006
[3]

J. A. Cavender. Taxonomy with confidence. Mathematical biosciences, 40(3-4):271–280, 1978

work page 1978
[4]

Chen and Y

X. Chen and Y . Chen. Inference for high-dimensional sparse econometric models.Annual Review of Economics, 11:291–317, 2019

work page 2019
[5]

Y . Chi, Y . M. Lu, and Y . Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.Trans. Sig. Proc., 67(20):5239–5269, Oct. 2019

work page 2019
[6]

B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Molecular Biology and Evolution, 17(10):1529–1541, 2000

work page 2000
[7]

Clancy, Jr., H

D. Clancy, Jr., H. Lyu, S. Roch, and A. Sly. Likelihood-based root state reconstruction on a tree: Robustness to parameters. arXiv:2501.13208, 2025

work page arXiv 2025
[8]

Dagan, V

Y . Dagan, V . Kandiros, and C. Daskalakis. Em’s convergence in gaussian latent tree models. In Conference on Learning Theory, pages 2597–2667. PMLR, 2022

work page 2022
[9]

Dinh and F

V . Dinh and F. A. M. IV . The shape of the one-dimensional phylogenetic likelihood function. The Annals of Applied Probability, 27(3):1646 – 1677, 2017

work page 2017
[10]

J. S. Farris. A probability model for inferring evolutionary trees. Systematic Biology, 22(3):250–256, 1973. 40 DA VID CLANCY , JR., HANBAEK LYU, AND SEBASTIEN ROCH

work page 1973
[11]

Felsenstein

J. Felsenstein. Evolutionary trees from dna sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17(6):368–376, 1981

work page 1981
[12]

Felsenstein

J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts, 2004

work page 2004
[13]

Fukami and Y

K. Fukami and Y . Tateno. On the maximum likelihood method for estimating molecular trees: uniqueness of the likelihood point. Journal of molecular evolution, 28:460–464, 1989

work page 1989
[14]

L. D. García Puente, M. Garrote-López, and E. Shehu. Computing algebraic degrees of phylogenetic varieties. Algebraic Statistics, 14(2):215–231, 2024

work page 2024
[15]

Guindon, J.-F

S. Guindon, J.-F. Dufayard, V . Lefort, M. Anisimova, W. Hordijk, and O. Gascuel. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology, 59(3):307–321, 2010

work page 2010
[16]

Guindon and O

S. Guindon and O. Gascuel. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology, 52(5):696–704, 2003

work page 2003
[17]

C. Ma, K. Wang, Y . Chi, and Y . Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution.Foundations of Computational Mathematics, 20:451–632, 2020

work page 2020
[18]

J. Neyman. Molecular studies of evolution: a source of novel statistical problems. In Statistical decision theory and related topics, pages 1–27. Elsevier, 1971

work page 1971
[19]

Nguyen, H

L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, and B. Q. Minh. IQ-TREE: a fast and e ffective stochastic algo- rithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1):268–274, 2015

work page 2015
[20]

J. S. Rogers and D. L. Swofford. Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Molecular Biology and Evolution, 16(8):1079–1085, 1999

work page 1999
[21]

Stamatakis

A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioin- formatics, 30(9):1312–1313, 2014

work page 2014
[22]

M. Steel. The maximum likelihood point for a phylogenetic tree is not unique. Systematic Biology, 43(4):560– 564, 1994

work page 1994
[23]

Z. Yang. Molecular Evolution: A Statistical Approach. Oxford University Press, Oxford, 2014. Da vidClancy, Jr., Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: dclancy@math.wisc.edu Hanbaek Lyu, Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: hlyu@math.wisc.edu Sebasti...

work page 2014

[1] [1]

Balakrishnan, M

S. Balakrishnan, M. J. Wainwright, and B. Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77–120, 2017

work page 2017

[2] [2]

Borgs, J

C. Borgs, J. Chayes, E. Mossel, and S. Roch. The Kesten-Stigum reconstruction bound is tight for roughly sym- metric binary channels. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 518–530. IEEE, 2006

work page 2006

[3] [3]

J. A. Cavender. Taxonomy with confidence. Mathematical biosciences, 40(3-4):271–280, 1978

work page 1978

[4] [4]

Chen and Y

X. Chen and Y . Chen. Inference for high-dimensional sparse econometric models.Annual Review of Economics, 11:291–317, 2019

work page 2019

[5] [5]

Y . Chi, Y . M. Lu, and Y . Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.Trans. Sig. Proc., 67(20):5239–5269, Oct. 2019

work page 2019

[6] [6]

B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Molecular Biology and Evolution, 17(10):1529–1541, 2000

work page 2000

[7] [7]

Clancy, Jr., H

D. Clancy, Jr., H. Lyu, S. Roch, and A. Sly. Likelihood-based root state reconstruction on a tree: Robustness to parameters. arXiv:2501.13208, 2025

work page arXiv 2025

[8] [8]

Dagan, V

Y . Dagan, V . Kandiros, and C. Daskalakis. Em’s convergence in gaussian latent tree models. In Conference on Learning Theory, pages 2597–2667. PMLR, 2022

work page 2022

[9] [9]

Dinh and F

V . Dinh and F. A. M. IV . The shape of the one-dimensional phylogenetic likelihood function. The Annals of Applied Probability, 27(3):1646 – 1677, 2017

work page 2017

[10] [10]

J. S. Farris. A probability model for inferring evolutionary trees. Systematic Biology, 22(3):250–256, 1973. 40 DA VID CLANCY , JR., HANBAEK LYU, AND SEBASTIEN ROCH

work page 1973

[11] [11]

Felsenstein

J. Felsenstein. Evolutionary trees from dna sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17(6):368–376, 1981

work page 1981

[12] [12]

Felsenstein

J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts, 2004

work page 2004

[13] [13]

Fukami and Y

K. Fukami and Y . Tateno. On the maximum likelihood method for estimating molecular trees: uniqueness of the likelihood point. Journal of molecular evolution, 28:460–464, 1989

work page 1989

[14] [14]

L. D. García Puente, M. Garrote-López, and E. Shehu. Computing algebraic degrees of phylogenetic varieties. Algebraic Statistics, 14(2):215–231, 2024

work page 2024

[15] [15]

Guindon, J.-F

S. Guindon, J.-F. Dufayard, V . Lefort, M. Anisimova, W. Hordijk, and O. Gascuel. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology, 59(3):307–321, 2010

work page 2010

[16] [16]

Guindon and O

S. Guindon and O. Gascuel. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology, 52(5):696–704, 2003

work page 2003

[17] [17]

C. Ma, K. Wang, Y . Chi, and Y . Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution.Foundations of Computational Mathematics, 20:451–632, 2020

work page 2020

[18] [18]

J. Neyman. Molecular studies of evolution: a source of novel statistical problems. In Statistical decision theory and related topics, pages 1–27. Elsevier, 1971

work page 1971

[19] [19]

Nguyen, H

L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, and B. Q. Minh. IQ-TREE: a fast and e ffective stochastic algo- rithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1):268–274, 2015

work page 2015

[20] [20]

J. S. Rogers and D. L. Swofford. Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Molecular Biology and Evolution, 16(8):1079–1085, 1999

work page 1999

[21] [21]

Stamatakis

A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioin- formatics, 30(9):1312–1313, 2014

work page 2014

[22] [22]

M. Steel. The maximum likelihood point for a phylogenetic tree is not unique. Systematic Biology, 43(4):560– 564, 1994

work page 1994

[23] [23]

Z. Yang. Molecular Evolution: A Statistical Approach. Oxford University Press, Oxford, 2014. Da vidClancy, Jr., Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: dclancy@math.wisc.edu Hanbaek Lyu, Department of Mathematics, University of Wisconsin - Madison, WI, 53717, USA Email address: hlyu@math.wisc.edu Sebasti...

work page 2014