Site Frequency Spectrum in stationary branching populations
Pith reviewed 2026-05-25 07:53 UTC · model grok-4.3
The pith
The expected site frequency spectrum for samples from conditioned branching populations converges to an explicit form as sample size grows to infinity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the quadratic branching mechanism and non-extinction conditioning, the expectation of the site frequency spectrum converges as the sample size tends to infinity, and the continuum SFS, viewed as a random point measure, has an explicitly computable intensity density on the positive reals.
What carries the argument
Real tree with semi-infinite branch that encodes the genealogy of the continuous-state branching process conditioned to non-extinction, together with the infinitely-many-sites assumption.
If this is right
- The limiting expected SFS is given by an explicit integral expression derived from the branching mechanism.
- The continuum SFS intensity admits a closed-form density that can be evaluated numerically for any frequency value.
- The size of the clonal subpopulation descending from the most recent common ancestor admits explicit bounds or asymptotics at fixed times.
Where Pith is reading between the lines
- The explicit density could be compared directly with frequency spectra observed in microbial or viral populations that obey branching dynamics.
- Relaxing the quadratic mechanism while keeping the real-tree representation might produce analogous but different density formulas.
- The continuum point-measure construction may serve as a building block for studying joint distributions of multiple loci under the same genealogy.
Load-bearing premise
The genealogy must be captured by a real tree with a semi-infinite branch under quadratic branching and the non-extinction conditioning.
What would settle it
Generate many realizations of the conditioned branching process, draw large samples, tally the empirical site frequencies, and test whether the averages and the empirical point-measure density match the explicit formulas derived in the paper.
Figures
read the original abstract
This paper explores the Site Frequency Spectrum (SFS) in stationary branching populations. We derive estimates for the SFS associated with a sample from a continuous-state branching process conditioned to never go extinct, utilizing a quadratic branching mechanism. The genealogy of such processes is represented by a real tree with a semi-infinite branch, and we compute the expectation of the SFS under the infinitely-many-sites assumption as the sample size approaches infinity. Additionally, we present a continuum version of the SFS as a random point measure on the positive real line and compute the density of its expected measure explicitly. Finally, we derive estimates for the size of the clonal subpopulation carrying the same genotype as the most recent common ancestor of the whole population at a given time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper computes the expected site frequency spectrum (SFS) for samples from a continuous-state branching process with quadratic branching mechanism conditioned to non-extinction, whose genealogy is coded by a real tree with a semi-infinite spine. Under the infinitely-many-sites model it derives the limiting expected SFS as sample size n tends to infinity, introduces a continuum SFS as a random point measure on (0,∞) whose expected intensity measure is given explicitly, and obtains estimates for the size of the clonal subpopulation sharing the MRCA genotype.
Significance. The explicit formulas for E[SFS] and the intensity of the continuum version rest on standard change-of-measure and excursion-theory arguments for CSBPs and Lévy trees; when the derivations are verified they supply concrete, usable expressions for population-genetic quantities in the stationary regime that were previously unavailable in closed form.
minor comments (3)
- [§2] §2 (or wherever the real-tree coding is introduced): the semi-infinite spine is invoked without an explicit reference to the underlying excursion measure or the precise normalization of the quadratic mechanism; a short paragraph recalling the change-of-measure formula used would improve readability.
- The statement that the continuum SFS is a 'random point measure' should be accompanied by a brief verification that the intensity is σ-finite, even if this follows immediately from the explicit density.
- Notation for the clonal subpopulation size (final section) re-uses symbols already employed for the SFS counts; a distinct symbol or a clarifying sentence would avoid confusion.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the manuscript, the clear summary of its contributions, and the recommendation for minor revision. The report lists no specific major comments under the MAJOR COMMENTS section.
Circularity Check
No significant circularity; derivations rest on established excursion theory
full rationale
The paper computes explicit expectations for the SFS (infinitely-many-sites, n→∞) and the intensity of the continuum SFS point measure under a fixed quadratic CSBP conditioned to non-extinction, coded by a Lévy tree with semi-infinite spine. These are standard objects whose change-of-measure and excursion representations are taken from the existing literature on CSBPs and real trees; the derivations do not reduce any target quantity to a fitted parameter, a self-citation chain, or a definition that presupposes the result. No load-bearing step is shown to be equivalent to its inputs by the paper's own equations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
R. Abraham and J.-F. Delmas. Exact simulation of the genealogical tree for a stationary branching population and application to the asymptotics of its total length. Advances in Applied Probability, 53(2):537–574, 2021
work page 2021
-
[2]
J. Berestycki, N. Berestycki, and V. Limic. Asymptotic sampling formulae for Λ-coalescents. Annales de l’Institut Henri Poincar´ e (B) Probability and Statistics, 50(3):715–731, 2014
work page 2014
-
[3]
A. Bhaskar and Y. S. Song. Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Annals of Statistics , 42(6):2469–2493, 2014
work page 2014
-
[4]
H. Bi and J.-F. Delmas. Total length of the genealogical tree for quadratic stationary continuous-state branching processes. Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, 52(3), 2016
work page 2016
-
[5]
M. Birkner, J. Blath, and B. Eldon. Statistical properties of the site-frequency spectrum associated with lambda-coalescents. Genetics, 195(3):1037–53, 2013
work page 2013
- [6]
-
[7]
Y.-T. Chen and J.-F. Delmas. Smaller population size at the MRCA time for stationary branching processes. The Annals of Probability , 40(5), 2012
work page 2012
-
[8]
J. J. Duchamps and A. Lambert. Mutations on a random binary tree with measured boundary. Annals of Applied Probability, 28(4):2141–2187, 2018
work page 2018
-
[9]
T. Duquesne and J.-F. Le Gall. Random Trees, L´ evy Processes and Spatial Branching Processes, volume 281. SMF, 2002
work page 2002
-
[10]
T. Duquesne and J.-F. Le Gall. Probabilistic and fractal aspects of L´ evy trees.Probability Theory and Related Fields, 131(4):553–603, 2005
work page 2005
- [11]
-
[12]
S. N. Evans, J. Pitman, and A. Winter. Rayleigh processes, real trees, and root growth with re-grafting. Probability Theory and Related Fields , 134(1):81–126, 2005
work page 2005
- [13]
-
[14]
Y. X. Fu. Statistical Properties of Segregating Sites. Theoretical Population Biology, 48(2):172–197, 1995
work page 1995
-
[15]
R. C. Griffiths and S. Tavar´ e. The age of a mutation in a general coalescent tree.Communications in Statistics. Stochastic Models, 14(1-2):273–295, 1998. 26 ROMAIN ABRAHAM, JEAN-FRANC ¸ OIS DELMAS, AND PATRICK HOSCHEIT
work page 1998
-
[16]
G. Kersting, A. Siri-J´ egousse, and A. H. Wences. Site Frequency Spectrum of the Bolthausen-Sznitman Coalescent. Latin American Journal of Probability and Mathematical Statistics , 18(1):1483, 2021
work page 2021
-
[17]
J. Kim, E. Mossel, M. Z. R´ acz, and N. Ross. Can one hear the shape of a population history? Theoretical Population Biology, 100:26–38, 2015
work page 2015
-
[18]
J. Koskela. Multi-locus data distinguishes between population growth and multiple merger coalescents. Sta- tistical Applications in Genetics and Molecular Biology , 17(3), 2018
work page 2018
-
[19]
J. Koskela, P. A. Jenkins, and D. Span` o. Computational inference beyond Kingman’s coalescent. Journal of Applied Probability, 52(2):519–537, 2015
work page 2015
-
[20]
J. Koskela, P. A. Jenkins, and D. Span` o. Bayesian non-parametric inference for Lambda-coalescents: Posterior consistency and a parametric method. Bernoulli, 24(3):2122–2153, 2018
work page 2018
-
[21]
A. Lambert. Quasi-Stationary Distributions and the Continuous-State Branching Process Conditioned to Be Never Extinct. Electronic Journal of Probability, 12, 2007
work page 2007
-
[22]
A. Lambert. The Allelic Partition for Coalescent Point Processes. Markov Processes and Related Fields , 15:359–386, 2009
work page 2009
-
[23]
A. Lambert. The coalescent of a sample from a binary branching process. Theoretical Population Biology , 122:30–35, 2018
work page 2018
-
[24]
Z. Li. Measure-Valued Branching Markov Processes. Springer, 2011
work page 2011
-
[25]
S. Matuszewski, M. E. Hildebrandt, G. Achaz, and J. D. Jensen. Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography. Genetics, 208(1):323–338, 2018
work page 2018
- [26]
-
[27]
L. Popovic. Asymptotic genealogy of a critical branching process. The Annals of Applied Probability , 14(4):2120–2148, 2004
work page 2004
-
[28]
J. Schweinsberg and Y. Shuai. Asymptotics for the site frequency spectrum associated with the genealogy of a birth and death process, 2023
work page 2023
-
[29]
J. P. Spence, J. A. Kamm, and Y. S. Song. The Site Frequency Spectrum for General Coalescents. Genetics, 202(4):1549–1561, 2016
work page 2016
-
[30]
J. Terhorst and Y. S. Song. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences , 112(25):7677–7682, 2015. Romain Abraham, Institut Denis Poisson, Universit ´e d’Orl ´eans, Universit ´e de Tours, CNRS, France Email address: romain.abraham@univ-orleans.fr Jean-F...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.