pith. machine review for the scientific record. sign in

arxiv: 2603.16431 · v2 · submitted 2026-03-17 · 🧮 math.PR

Recognition: 2 theorem links

· Lean Theorem

On central limit theorems for Ewens-Pitman model

Authors on Pith no claims yet

Pith reviewed 2026-05-15 10:22 UTC · model grok-4.3

classification 🧮 math.PR
keywords Ewens-Pitman modelChinese restaurant processquenched central limit theoremcomponent countalpha-diversityoccupancy countrandom partitions
0
0 comments X

The pith

Fluctuations of the component count in Ewens-Pitman partitions consist of two conditionally independent parts given the alpha-diversity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a quenched functional central limit theorem for the total number of components in random partitions generated by the Chinese restaurant process with parameters alpha in (0,1) and theta greater than -alpha. The component count is represented as the occupancy count in an infinite urn scheme whose sampling frequencies are the random asymptotic frequencies P_j of the tables. The limiting fluctuations decompose into a term arising from sampling given the fixed frequencies P_j and a separate term arising from the randomness in the frequencies P_j themselves. These two limiting processes are conditionally independent when conditioned on the alpha-diversity. The result strengthens an earlier central limit theorem by supplying the functional convergence and the explicit conditional independence structure.

Core claim

The central claim is that the component count, equivalently the occupancy count of an infinite urn model with frequencies (P_j), obeys a quenched functional central limit theorem in which the limiting centered and scaled process decomposes as the sum of a sampling fluctuation given (P_j) and a fluctuation coming from the random (P_j), and these two Gaussian processes are conditionally independent given the alpha-diversity.

What carries the argument

The representation of component count as occupancy count in an infinite urn scheme with frequencies (P_j), together with the quenched functional central limit theorem that separates sampling noise from frequency noise.

If this is right

  • The total limiting variance of the component count is the sum of the sampling variance and the frequency variance.
  • The functional convergence supplies tightness and pathwise limits in the Skorokhod space rather than only finite-dimensional distributions.
  • Joint limit theorems for several statistics become simpler because the two noise sources are independent given the alpha-diversity.
  • The decomposition applies uniformly over the parameter range alpha in (0,1) and theta greater than -alpha.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Estimators of the alpha-diversity could be constructed by subtracting an estimate of the sampling fluctuation from observed component-count variance.
  • The same two-part decomposition may hold for other exchangeable partition models whose frequencies satisfy similar almost-sure convergence.
  • Conditioning simulations on the realized alpha-diversity should produce residuals whose cross-covariance with the frequency estimator is near zero.

Load-bearing premise

The earlier limit theorems for occupancy counts with fixed frequencies apply directly once the random frequencies (P_j) from the Chinese restaurant process are inserted.

What would settle it

A numerical check or exact calculation showing that the conditional covariance between the sampling fluctuation term and the frequency fluctuation term fails to vanish when conditioned on the alpha-diversity would falsify the claimed conditional independence.

read the original abstract

We establish a quenched functional central limit theorem for the total number of components of random partitions induced by Chinese restaurant process with parameters $(\alpha,\theta), \alpha\in(0,1), \theta>-\alpha$. With $P_j$ denoting the asymptotic frequency of $j$-th table, it is well-known that the component count has the same law as the occupancy count of an infinite urn scheme with sampling frequencies being $(P_j)_{j\in\mathbb N}$. Our analysis follows this approach and is based on earlier results of Karlin (1967) and Durieu and Wang (2016). In words, our result reveals that the fluctuations of component count consist of two parts, one due to the sampling effect given the asymptotic frequencies $(P_j)_{j\in\mathbb N}$, the other due to the fluctuations of the random asymptotic frequencies, and in the limit the fluctuations of two parts are conditionally independent given the $\alpha$-diversity. Our result strengthens a recent central limit theorem obtained by Bercu and Favaro (2024) via a different method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper establishes a quenched functional central limit theorem for the total number of components in random partitions generated by the Chinese restaurant process (Ewens-Pitman model) with parameters (α, θ) where α ∈ (0,1) and θ > −α. Using the known equivalence to occupancy counts in an infinite urn scheme with frequencies (P_j), the fluctuations are decomposed into a sampling term conditional on the fixed (P_j) and a term arising from the randomness of the asymptotic frequencies (P_j); these two contributions are shown to be conditionally independent given the α-diversity. The argument invokes Karlin (1967) and Durieu-Wang (2016) and strengthens the non-functional CLT of Bercu-Favaro (2024).

Significance. If the quenched functional convergence holds, the decomposition supplies a precise separation of sampling variability from frequency variability together with conditional independence given the α-diversity. This refines existing limit theorems for component counts in Pitman-Yor partitions and supplies a template that can be reused for related functionals in exchangeable partition models. The reliance on established urn-scheme results from Karlin and Durieu-Wang is a strength, as it keeps the new contribution focused and verifiable once the application is checked.

major comments (1)
  1. [§3] §3, proof of the main quenched functional CLT: the verification that the random sequence (P_j) satisfies the moment and tail conditions of Durieu-Wang (2016) for almost-sure functional convergence is only sketched; an explicit check that the α-diversity is measurable with respect to the sigma-field generated by the limiting Gaussian process is needed to justify the claimed conditional independence.
minor comments (2)
  1. [Introduction] The definition of the α-diversity appears first in the abstract and introduction but is not restated with its explicit almost-sure limit expression before it is used as the conditioning variable in the main theorem; adding one sentence would improve readability.
  2. [Theorem 1.1] In the statement of the main theorem, the topology on the space of cadlag paths (Skorokhod or uniform) should be specified explicitly rather than left implicit.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and the constructive comment on the proof in §3. We address the point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3] §3, proof of the main quenched functional CLT: the verification that the random sequence (P_j) satisfies the moment and tail conditions of Durieu-Wang (2016) for almost-sure functional convergence is only sketched; an explicit check that the α-diversity is measurable with respect to the sigma-field generated by the limiting Gaussian process is needed to justify the claimed conditional independence.

    Authors: We agree that the verification in the proof of the quenched functional CLT can be made more explicit. In the revised manuscript we will expand the argument to include a direct check that the random sequence (P_j) satisfies the moment and tail conditions of Durieu-Wang (2016) for almost-sure functional convergence. We will also add an explicit measurability argument showing that the α-diversity is measurable with respect to the sigma-field generated by the limiting Gaussian process, thereby rigorously justifying the claimed conditional independence of the two fluctuation sources given the α-diversity. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior occupancy results; derivation applies independent theorems without reduction to fitted inputs

full rationale

The paper invokes the occupancy-count representation of the component count (equivalent in law to an infinite-urn scheme with frequencies P_j) and applies limit theorems from Karlin (1967) and Durieu-Wang (2016) to decompose fluctuations into a conditional sampling term and a term from the random frequencies, with the two becoming conditionally independent given the alpha-diversity in the quenched limit. This decomposition follows directly from the cited external results once the representation is adopted; no equation in the present work defines the target CLT in terms of its own fitted parameters or self-referential ansatz, and the single self-citation is not load-bearing for the new quenched functional statement.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard Chinese restaurant process construction, the occupancy-count representation via asymptotic frequencies P_j, and convergence theorems from the cited 1967 and 2016 papers; no new free parameters or invented entities are introduced.

axioms (1)
  • standard math Convergence results for occupancy counts in infinite urn schemes from Karlin (1967) and Durieu and Wang (2016)
    The analysis is based on these earlier results as stated in the abstract.

pith-pipeline@v0.9.0 · 5476 in / 1382 out tokens · 68232 ms · 2026-05-15T10:22:44.796927+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    D., and Tavar´ e, S

    Arratia, R., Barbour, A. D., and Tavar´ e, S. (2003).Logarithmic combinatorial structures: a probabilistic approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Z¨ urich

  2. [2]

    Bahadur, R. R. (1960). On the number of distinct values in a large sample from an infinite discrete distribution.Proc. Nat. Inst. Sci. India Part A, 26(supplement II):67–75

  3. [3]

    and Najnudel, J

    Bahier, V. and Najnudel, J. (2022). On smooth mesoscopic linear statistics of the eigen- values of random permutation matrices.J. Theoret. Probab., 35(3):1640–1661

  4. [4]

    Basrak, B. (2025). On generalized arcsine laws and residual allocation models. arXiv preprint arXiv:2510.22066

  5. [5]

    and Dang, K

    Ben Arous, G. and Dang, K. (2015). On fluctuations of eigenvalues of random permuta- tion matrices.Ann. Inst. Henri Poincar´ e Probab. Stat., 51(2):620–647

  6. [6]

    and Favaro, S

    Bercu, B. and Favaro, S. (2024). A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model.Stochastic Process. Appl., 178:Paper No. 104493, 19

  7. [7]

    (1999).Convergence of probability measures

    Billingsley, P. (1999).Convergence of probability measures. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition. A Wiley-Interscience Publication

  8. [8]

    I., and Pitman, J

    Broderick, T., Jordan, M. I., and Pitman, J. (2012). Beta processes, stick-breaking and power laws.Bayesian Anal., 7(2):439–475

  9. [9]

    and Kovalevskii, A

    Chebunin, M. and Kovalevskii, A. (2016). Functional central limit theorems for certain statistics in an infinite urn scheme.Statist. Probab. Lett., 119:344–348

  10. [10]

    Contardi, C., Dolera, E., and Favaro, S. (2025). Laws of large numbers and central limit theorem for Ewens-Pitman model.Electron. J. Probab., 30:Paper No. 193, 51

  11. [11]

    Crane, H. (2016). The ubiquitous Ewens sampling formula.Statist. Sci., 31(1):1–19

  12. [12]

    Darling, D. A. (1967). Some limit theorems associated with multinomial trials. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. II: Contributions to Probability Theory, Part 1, pages 345–350. Univ. California Press, Berkeley, CA. 20 YIZAO W ANG

  13. [13]

    and Wang, Y

    Durieu, O. and Wang, Y. (2016). From infinite urn schemes to decompositions of self- similar Gaussian processes.Electron. J. Probab., 21:Paper No. 43, 23

  14. [14]

    (2010).Probability: theory and examples

    Durrett, R. (2010).Probability: theory and examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, fourth edition

  15. [15]

    Favaro, S., Feng, S., and Paguyo, J. (2025). Asymptotic behavior of clusters in hierar- chical species sampling models. arXiv preprint arXiv:2501.09741

  16. [16]

    (2010).The Poisson-Dirichlet distribution and related topics

    Feng, S. (2010).The Poisson-Dirichlet distribution and related topics. Probability and its Applications (New York). Springer, Heidelberg. Models and asymptotic behaviors

  17. [17]

    Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems.Ann. Statist., 1:209–230

  18. [18]

    Fran¸ cois, Q. (2025). Characteristic polynomial of generalized Ewens random permuta- tions.Electron. Commun. Probab., 30:Paper No. 97, 12

  19. [19]

    and Wang, Y

    Fu, Z. and Wang, Y. (2020). Stable processes with stationary increments parameterized by metric spaces.J. Theoret. Probab., 33(3):1737–1754

  20. [20]

    and Wang, Y

    Garza, J. and Wang, Y. (2024). Limit theorems for random permutations induced by Chinese restaurant processes. Arxiv preprint,https://arxiv.org/abs/2412.02162

  21. [21]

    and Wang, Y

    Garza, J. and Wang, Y. (2025). A functional central limit theorem for weighted occu- pancy processes of the Karlin model.Stochastic Process. Appl., 188:Paper No. 104665

  22. [22]

    Gnedin, A., Hansen, B., and Pitman, J. (2007). Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws.Probab. Surv., 4:146–171

  23. [23]

    and Iksanov, A

    Gnedin, A. and Iksanov, A. (2012). Regenerative compositions in the case of slow variation: a renewal theory approach.Electron. J. Probab., 17:no. 77, 19

  24. [24]

    Gnedin, A., Iksanov, A., and Marynych, A. (2010). Limit theorems for the number of occupied boxes in the Bernoulli sieve.Theory Stoch. Process., 16(2):44–57

  25. [25]

    and Kabluchko, Z

    Gr¨ ubel, R. and Kabluchko, Z. (2016). A functional central limit theorem for branching random walks, almost sure weak convergence and applications to random trees.Ann. Appl. Probab., 26(6):3659–3698

  26. [26]

    Heyde, C. C. (1977). On central limit and iterated logarithm supplements to the mar- tingale convergence theorem.J. Appl. Probability, 14(4):758–775

  27. [27]

    Iksanov, A., Kabluchko, Z., and Kotelnikova, V. (2022). A functional limit theorem for nested Karlin’s occupancy scheme generated by discrete Weibull-like distributions.J. Math. Anal. Appl., 507(2):Paper No. 125798, 24

  28. [28]

    Iksanov, A., Marynych, A., and Meiners, M. (2017). Asymptotics of random processes with immigration I: Scaling limits.Bernoulli, 23(2):1233–1278

  29. [29]

    Karlin, S. (1967). Central limit theorems for certain infinite urn schemes.J. Math. Mech., 17:373–401

  30. [30]

    Kingman, J. F. C. (1978). The representation of partition structures.J. London Math. Soc. (2), 18(2):374–380

  31. [31]

    Perman, M., Pitman, J., and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions.Probab. Theory Related Fields, 92(1):21–39

  32. [32]

    (2006).Combinatorial stochastic processes, volume 1875 ofLecture Notes in Mathematics

    Pitman, J. (2006).Combinatorial stochastic processes, volume 1875 ofLecture Notes in Mathematics. Springer-Verlag, Berlin. Lectures from the 32nd Summer School on Proba- bility Theory held in Saint-Flour, July 7–24, 2002, With a foreword by Jean Picard. ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 21

  33. [33]

    and Yor, M

    Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution de- rived from a stable subordinator.Ann. Probab., 25(2):855–900

  34. [34]

    van der Vaart, A. W. and Wellner, J. A. (1996).Weak convergence and empirical processes: with applications to statistics. Springer Series in Statistics. Springer-Verlag, New York

  35. [35]

    Wieand, K. (2000). Eigenvalue distributions of random permutation matrices.Ann. Probab., 28(4):1563–1587. Department of Mathematical Sciences, University of Cincinnati, 2815 Commons W ay, Cincinnati, OH, 45221-0025, USA. Email address:yizao.wang@uc.edu