pith. sign in

arxiv: 2508.07906 · v2 · pith:5JOQFXLInew · submitted 2025-08-11 · 🧮 math.PR

Site Frequency Spectrum in stationary branching populations

Pith reviewed 2026-05-25 07:53 UTC · model grok-4.3

classification 🧮 math.PR
keywords site frequency spectrumcontinuous-state branching processreal treesinfinitely many sitesstationary populationsclonal subpopulationnon-extinction conditioning
0
0 comments X

The pith

The expected site frequency spectrum for samples from conditioned branching populations converges to an explicit form as sample size grows to infinity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives the limiting expected site frequency spectrum for large samples drawn from a continuous-state branching process that is conditioned to survive forever. It models the genealogy via a real tree that includes a semi-infinite branch and applies the infinitely-many-sites mutation model to obtain concrete expectations. A continuum version of the spectrum is defined as a random point measure on the positive line, and the density of its intensity measure is computed in closed form. The same framework also supplies estimates for the size of the clonal group that carries the ancestral genotype at any fixed time.

Core claim

Under the quadratic branching mechanism and non-extinction conditioning, the expectation of the site frequency spectrum converges as the sample size tends to infinity, and the continuum SFS, viewed as a random point measure, has an explicitly computable intensity density on the positive reals.

What carries the argument

Real tree with semi-infinite branch that encodes the genealogy of the continuous-state branching process conditioned to non-extinction, together with the infinitely-many-sites assumption.

If this is right

  • The limiting expected SFS is given by an explicit integral expression derived from the branching mechanism.
  • The continuum SFS intensity admits a closed-form density that can be evaluated numerically for any frequency value.
  • The size of the clonal subpopulation descending from the most recent common ancestor admits explicit bounds or asymptotics at fixed times.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit density could be compared directly with frequency spectra observed in microbial or viral populations that obey branching dynamics.
  • Relaxing the quadratic mechanism while keeping the real-tree representation might produce analogous but different density formulas.
  • The continuum point-measure construction may serve as a building block for studying joint distributions of multiple loci under the same genealogy.

Load-bearing premise

The genealogy must be captured by a real tree with a semi-infinite branch under quadratic branching and the non-extinction conditioning.

What would settle it

Generate many realizations of the conditioned branching process, draw large samples, tally the empirical site frequencies, and test whether the averages and the empirical point-measure density match the explicit formulas derived in the paper.

Figures

Figures reproduced from arXiv: 2508.07906 by Jean-Fran\c{c}ois Delmas (CERMICS), Patrick Hoscheit (MaIAGE), Romain Abraham (IDP).

Figure 1
Figure 1. Figure 1: An instance for n = 5 of the ancestral tree Tn with its root ϱn, which appears in Lemma 3.1. In this instance, the semi-infinite branch is attached to X(3) = X0 = 0 and cut at the MRCA ϱn of the uniformly sampled individuals {X1, . . . , X4} in the whole population [−Eg, Ed] and X0. The branch attached to X(k) has length ζk, with ζ3 = 0 by convention as X(3) = 0. The tree T ′ n which appears in Lemma 5.2 i… view at source ↗
Figure 2
Figure 2. Figure 2: Four possible configurations of X(j) , . . . , X(ℓ) with their TMRCA ζ MRCA j: ℓ , along with locations (in blue and of length Lj: ℓ) for k-admissible muta￾tions (with k = ℓ − j + 1) carried only by this set of leaves, whenever these exist. Notice that ζ MRCA j: ℓ is strictly less than ζ ⋆ j: ℓ only in the bottom left figure. branch of X(j) (that is, the branch attached to X(j) ), and the branching point i… view at source ↗
Figure 3
Figure 3. Figure 3: Plot of g1(z, u) for various values of z. Let γ be the Euler constant. Using that: 1 − γ = Z 1 0 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
read the original abstract

This paper explores the Site Frequency Spectrum (SFS) in stationary branching populations. We derive estimates for the SFS associated with a sample from a continuous-state branching process conditioned to never go extinct, utilizing a quadratic branching mechanism. The genealogy of such processes is represented by a real tree with a semi-infinite branch, and we compute the expectation of the SFS under the infinitely-many-sites assumption as the sample size approaches infinity. Additionally, we present a continuum version of the SFS as a random point measure on the positive real line and compute the density of its expected measure explicitly. Finally, we derive estimates for the size of the clonal subpopulation carrying the same genotype as the most recent common ancestor of the whole population at a given time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper computes the expected site frequency spectrum (SFS) for samples from a continuous-state branching process with quadratic branching mechanism conditioned to non-extinction, whose genealogy is coded by a real tree with a semi-infinite spine. Under the infinitely-many-sites model it derives the limiting expected SFS as sample size n tends to infinity, introduces a continuum SFS as a random point measure on (0,∞) whose expected intensity measure is given explicitly, and obtains estimates for the size of the clonal subpopulation sharing the MRCA genotype.

Significance. The explicit formulas for E[SFS] and the intensity of the continuum version rest on standard change-of-measure and excursion-theory arguments for CSBPs and Lévy trees; when the derivations are verified they supply concrete, usable expressions for population-genetic quantities in the stationary regime that were previously unavailable in closed form.

minor comments (3)
  1. [§2] §2 (or wherever the real-tree coding is introduced): the semi-infinite spine is invoked without an explicit reference to the underlying excursion measure or the precise normalization of the quadratic mechanism; a short paragraph recalling the change-of-measure formula used would improve readability.
  2. The statement that the continuum SFS is a 'random point measure' should be accompanied by a brief verification that the intensity is σ-finite, even if this follows immediately from the explicit density.
  3. Notation for the clonal subpopulation size (final section) re-uses symbols already employed for the SFS counts; a distinct symbol or a clarifying sentence would avoid confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the clear summary of its contributions, and the recommendation for minor revision. The report lists no specific major comments under the MAJOR COMMENTS section.

Circularity Check

0 steps flagged

No significant circularity; derivations rest on established excursion theory

full rationale

The paper computes explicit expectations for the SFS (infinitely-many-sites, n→∞) and the intensity of the continuum SFS point measure under a fixed quadratic CSBP conditioned to non-extinction, coded by a Lévy tree with semi-infinite spine. These are standard objects whose change-of-measure and excursion representations are taken from the existing literature on CSBPs and real trees; the derivations do not reduce any target quantity to a fitted parameter, a self-citation chain, or a definition that presupposes the result. No load-bearing step is shown to be equivalent to its inputs by the paper's own equations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.0 · 5661 in / 1029 out tokens · 34700 ms · 2026-05-25T07:53:39.724557+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Abraham and J.-F

    R. Abraham and J.-F. Delmas. Exact simulation of the genealogical tree for a stationary branching population and application to the asymptotics of its total length. Advances in Applied Probability, 53(2):537–574, 2021

  2. [2]

    Berestycki, N

    J. Berestycki, N. Berestycki, and V. Limic. Asymptotic sampling formulae for Λ-coalescents. Annales de l’Institut Henri Poincar´ e (B) Probability and Statistics, 50(3):715–731, 2014

  3. [3]

    Bhaskar and Y

    A. Bhaskar and Y. S. Song. Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Annals of Statistics , 42(6):2469–2493, 2014

  4. [4]

    Bi and J.-F

    H. Bi and J.-F. Delmas. Total length of the genealogical tree for quadratic stationary continuous-state branching processes. Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, 52(3), 2016

  5. [5]

    Birkner, J

    M. Birkner, J. Blath, and B. Eldon. Statistical properties of the site-frequency spectrum associated with lambda-coalescents. Genetics, 195(3):1037–53, 2013

  6. [6]

    Blath, M

    J. Blath, M. C. Cronj¨ ager, B. Eldon, and M. Hammer. The site-frequency spectrum associated with Ξ- coalescents. Theoretical Population Biology, 110:36–50, 2016

  7. [7]

    Chen and J.-F

    Y.-T. Chen and J.-F. Delmas. Smaller population size at the MRCA time for stationary branching processes. The Annals of Probability , 40(5), 2012

  8. [8]

    J. J. Duchamps and A. Lambert. Mutations on a random binary tree with measured boundary. Annals of Applied Probability, 28(4):2141–2187, 2018

  9. [9]

    Duquesne and J.-F

    T. Duquesne and J.-F. Le Gall. Random Trees, L´ evy Processes and Spatial Branching Processes, volume 281. SMF, 2002

  10. [10]

    Duquesne and J.-F

    T. Duquesne and J.-F. Le Gall. Probabilistic and fractal aspects of L´ evy trees.Probability Theory and Related Fields, 131(4):553–603, 2005

  11. [11]

    Eldon, M

    B. Eldon, M. Birkner, J. Blath, and F. Freund. Can the Site-Frequency Spectrum Distinguish Exponential Population Growth from Multiple-Merger Coalescents? Genetics, 2015

  12. [12]

    S. N. Evans, J. Pitman, and A. Winter. Rayleigh processes, real trees, and root growth with re-grafting. Probability Theory and Related Fields , 134(1):81–126, 2005

  13. [13]

    Freund, E

    F. Freund, E. Kerdoncuff, S. Matuszewski, M. Lapierre, M. Hildebrandt, J. D. Jensen, L. Ferretti, A. Lambert, T. B. Sackton, and G. Achaz. Interpreting the pervasive observation of U-shaped Site Frequency Spectra. PLOS Genetics, 19(3):e1010677, 2023

  14. [14]

    Y. X. Fu. Statistical Properties of Segregating Sites. Theoretical Population Biology, 48(2):172–197, 1995

  15. [15]

    R. C. Griffiths and S. Tavar´ e. The age of a mutation in a general coalescent tree.Communications in Statistics. Stochastic Models, 14(1-2):273–295, 1998. 26 ROMAIN ABRAHAM, JEAN-FRANC ¸ OIS DELMAS, AND PATRICK HOSCHEIT

  16. [16]

    Kersting, A

    G. Kersting, A. Siri-J´ egousse, and A. H. Wences. Site Frequency Spectrum of the Bolthausen-Sznitman Coalescent. Latin American Journal of Probability and Mathematical Statistics , 18(1):1483, 2021

  17. [17]

    J. Kim, E. Mossel, M. Z. R´ acz, and N. Ross. Can one hear the shape of a population history? Theoretical Population Biology, 100:26–38, 2015

  18. [18]

    J. Koskela. Multi-locus data distinguishes between population growth and multiple merger coalescents. Sta- tistical Applications in Genetics and Molecular Biology , 17(3), 2018

  19. [19]

    Koskela, P

    J. Koskela, P. A. Jenkins, and D. Span` o. Computational inference beyond Kingman’s coalescent. Journal of Applied Probability, 52(2):519–537, 2015

  20. [20]

    Koskela, P

    J. Koskela, P. A. Jenkins, and D. Span` o. Bayesian non-parametric inference for Lambda-coalescents: Posterior consistency and a parametric method. Bernoulli, 24(3):2122–2153, 2018

  21. [21]

    A. Lambert. Quasi-Stationary Distributions and the Continuous-State Branching Process Conditioned to Be Never Extinct. Electronic Journal of Probability, 12, 2007

  22. [22]

    A. Lambert. The Allelic Partition for Coalescent Point Processes. Markov Processes and Related Fields , 15:359–386, 2009

  23. [23]

    A. Lambert. The coalescent of a sample from a binary branching process. Theoretical Population Biology , 122:30–35, 2018

  24. [24]

    Z. Li. Measure-Valued Branching Markov Processes. Springer, 2011

  25. [25]

    Matuszewski, M

    S. Matuszewski, M. E. Hildebrandt, G. Achaz, and J. D. Jensen. Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography. Genetics, 208(1):323–338, 2018

  26. [26]

    Myers, C

    S. Myers, C. Fefferman, and N. Patterson. Can one learn history from the allelic spectrum? Theoretical Population Biology, 73(3):342–348, 2008

  27. [27]

    L. Popovic. Asymptotic genealogy of a critical branching process. The Annals of Applied Probability , 14(4):2120–2148, 2004

  28. [28]

    Schweinsberg and Y

    J. Schweinsberg and Y. Shuai. Asymptotics for the site frequency spectrum associated with the genealogy of a birth and death process, 2023

  29. [29]

    J. P. Spence, J. A. Kamm, and Y. S. Song. The Site Frequency Spectrum for General Coalescents. Genetics, 202(4):1549–1561, 2016

  30. [30]

    Terhorst and Y

    J. Terhorst and Y. S. Song. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences , 112(25):7677–7682, 2015. Romain Abraham, Institut Denis Poisson, Universit ´e d’Orl ´eans, Universit ´e de Tours, CNRS, France Email address: romain.abraham@univ-orleans.fr Jean-F...