Asymptotic regimes for maximum likelihood estimation in the Ewens--Pitman model: When the strength parameter matters
Pith reviewed 2026-06-27 04:55 UTC · model grok-4.3
The pith
The MLE for Ewens-Pitman parameters (α, θ) displays four distinct asymptotic regimes based on the frequency spectrum limit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Four distinct regimes arise for the maximum likelihood estimator of (α, θ) depending on the limiting behaviour of the frequency spectrum; in contrast with previous work, θ may play a crucial role asymptotically. The restriction to two regimes in the literature stems from infinite exchangeability constraints, which can be overcome by the scaled Ewens-Pitman model in which θ grows with n.
What carries the argument
The frequency spectrum and its limiting regimes, which classify the asymptotic behavior of the MLE and are constrained under infinite exchangeability.
If this is right
- The MLE for θ can be asymptotically relevant in regimes not covered by standard infinite exchangeability.
- Only two of the four regimes are accessible under the classical Ewens-Pitman model due to the rigid structural relation between number of distinct blocks and frequency spectrum.
- The scaled Ewens-Pitman model extends the framework by letting θ grow with sample size n.
- Real-world frequency spectra may fall outside the classical framework, as shown by empirical evidence.
Where Pith is reading between the lines
- This implies that analyses assuming infinite exchangeability may miss important asymptotic behaviors in finite samples.
- Extensions like the scaled model could be applied to other partition models to increase flexibility.
- Data analysts might check the empirical frequency spectrum to select the appropriate regime for inference.
Load-bearing premise
The analysis assumes mild conditions on the data-generating mechanism that guarantee the frequency spectrum possesses a well-defined limiting behavior capable of distinguishing the four regimes.
What would settle it
A dataset whose frequency spectrum converges to a limit that places it outside the two regimes allowed by infinite exchangeability, yet whose MLE for θ follows the behavior of one of the two classical regimes, would falsify the claim of four regimes.
Figures
read the original abstract
We study the large sample asymptotic behaviour of the Maximum Likelihood Estimator of the discount and strength parameters $(\alpha,\theta)$ in the Ewens--Pitman model for random partitions, under mild assumptions on the data-generating mechanism. We show that four distinct regimes arise, depending on the limiting behaviour of the frequency spectrum. In particular, in contrast with previous work, we find that $\theta$ may play a crucial role asymptotically. We further show that the existing literature implicitly focuses on only two of these regimes, and we relate this restriction to the constraints imposed by infinite exchangeability. Under the latter, indeed, the number of distinct blocks and the frequency spectrum are necessarily tied by a rigid structural relation. We prove that this lack of flexibility can be overcome through what we call the scaled Ewens--Pitman model, in which $\theta$ is allowed to grow with the sample size $n$. Finally, we provide empirical evidence from real-world data showing that such extensions are needed to capture frequency spectra that fall outside the classical Ewens--Pitman framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the large-sample asymptotics of the MLE for the discount and strength parameters (α, θ) in the Ewens–Pitman model for random partitions. Under mild conditions on the data-generating process, it identifies four distinct regimes determined by the limiting behavior of the frequency spectrum. The authors show that θ can play a non-negligible role in some regimes (in contrast to prior work), relate the restriction to only two regimes in the existing literature to the structural constraints of infinite exchangeability, introduce a scaled Ewens–Pitman model in which θ is permitted to grow with n, and supply empirical illustrations from real data.
Significance. If the regime classification and associated limit theorems hold, the paper supplies a more complete asymptotic theory for inference in the Ewens–Pitman family, clarifying when each parameter is identifiable and when the classical model is misspecified. The explicit link between exchangeability and the admissible regimes, together with the scaled extension, offers a principled way to enlarge the model class while retaining the partition structure; the empirical examples indicate that the additional regimes are observable in practice.
major comments (2)
- [§3, Theorem 3.1] §3, Theorem 3.1 and the regime definitions that follow: the four regimes are stated to be distinguished by the limiting frequency spectrum, yet the proof that the MLE converges to different limits in each regime appears to rely on the spectrum limit being known a priori; it is not shown that the regimes remain distinguishable when the spectrum limit must itself be estimated from the same data.
- [§5.1, Proposition 5.2] §5.1, Proposition 5.2: the claim that infinite exchangeability forces the number of blocks K_n and the frequency spectrum to satisfy a rigid relation is central to the motivation for the scaled model, but the argument only treats the two-parameter Ewens–Pitman case; it is unclear whether the same rigidity persists under the mild conditions stated in Assumption 2.1 or whether additional regimes become admissible even without scaling.
minor comments (3)
- [§2] Notation for the frequency spectrum (e.g., the definition of the empirical measure μ_n) is introduced in §2 but used with varying normalizations in later sections; a single consolidated definition would improve readability.
- [§6] The empirical section (§6) reports point estimates and regime assignments but does not include standard errors or bootstrap intervals for the MLE; adding these would strengthen the claim that the observed spectra fall outside the classical regimes.
- [Introduction] Several references to earlier work on the Ewens–Pitman MLE (e.g., the papers cited for the two-regime case) are listed in the bibliography but not discussed in the introduction; a brief comparison paragraph would clarify the precise advance.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below with clarifications and indicate the revisions that will be incorporated.
read point-by-point responses
-
Referee: [§3, Theorem 3.1] §3, Theorem 3.1 and the regime definitions that follow: the four regimes are stated to be distinguished by the limiting frequency spectrum, yet the proof that the MLE converges to different limits in each regime appears to rely on the spectrum limit being known a priori; it is not shown that the regimes remain distinguishable when the spectrum limit must itself be estimated from the same data.
Authors: The regimes are defined as properties of the true data-generating process under Assumption 2.1, which fixes the limiting frequency spectrum. Theorem 3.1 establishes the asymptotic behavior of the MLE conditional on the true regime. The spectrum limit characterizes the underlying distribution rather than being an observed quantity for the theorem statement. In practice, the regime can be diagnosed by separately estimating the frequency spectrum, but the theoretical convergence results hold conditionally. We will add a short clarifying remark in Section 3 distinguishing the theoretical regime classification from practical identification. revision: partial
-
Referee: [§5.1, Proposition 5.2] §5.1, Proposition 5.2: the claim that infinite exchangeability forces the number of blocks K_n and the frequency spectrum to satisfy a rigid relation is central to the motivation for the scaled model, but the argument only treats the two-parameter Ewens–Pitman case; it is unclear whether the same rigidity persists under the mild conditions stated in Assumption 2.1 or whether additional regimes become admissible even without scaling.
Authors: Proposition 5.2 uses the two-parameter Ewens–Pitman model as a concrete illustration, but the rigidity between K_n and the frequency spectrum is a direct consequence of infinite exchangeability of the partition. Assumption 2.1 maintains this exchangeability structure while relaxing the parametric form; the same structural relation therefore persists, and additional regimes remain inadmissible without scaling θ with n. We will revise the text in Section 5.1 to state explicitly that the rigidity arises from exchangeability and applies under the general conditions of Assumption 2.1. revision: yes
Circularity Check
No significant circularity identified
full rationale
The derivation classifies four asymptotic regimes for the MLE of (α, θ) according to the limiting behavior of the frequency spectrum under mild external assumptions on the data-generating mechanism. These regimes are not defined in terms of the fitted parameters or by construction from the MLE itself; the scaled Ewens-Pitman extension is introduced to relax exchangeability constraints rather than to rename or refit existing quantities. No load-bearing self-citation, self-definitional step, or reduction of a claimed prediction to an input fit is present in the argument structure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mild assumptions on the data-generating mechanism guarantee a limiting frequency spectrum that distinguishes regimes
invented entities (1)
-
scaled Ewens-Pitman model
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Conditional formulae for Gibbs-type exchangeable random partitions , volume =
Favaro, Stefano and Lijoi, Antonio and Pr. Conditional formulae for Gibbs-type exchangeable random partitions , volume =. The Annals of Applied Probability , number =
-
[2]
The asymptotic expansion of a ratio of gamma functions , volume =
Erd. The asymptotic expansion of a ratio of gamma functions , volume =. Pacific Journal of Mathematics , number =
-
[3]
A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model , volume =
Bercu, Bernard and Favaro, Stefano , date-added =. A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model , volume =. Stochastic Processes and their Applications , pages =
-
[4]
Power-law distributions in empirical data , volume =
Clauset, Aaron and Shalizi, Cosma Rohilla and Newman, Mark EJ , journal =. Power-law distributions in empirical data , volume =
-
[5]
Edge exchangeable models for interaction networks , volume =
Crane, Harry and Dempsey, Walter , journal =. Edge exchangeable models for interaction networks , volume =
-
[6]
Cereda, Giulia and Corradi, Fabio and Viscardi, Cecilia , journal =
-
[7]
Generalized hypergeometric, digamma and trigamma distributions , volume =
Sibuya, Masaaki , journal =. Generalized hypergeometric, digamma and trigamma distributions , volume =
-
[8]
Bercu, Bernard and Favaro, Stefano , journal =
-
[9]
Central limit theorems for certain infinite urn schemes , volume =
Karlin, Samuel , journal =. Central limit theorems for certain infinite urn schemes , volume =
-
[10]
The number of small blocks in exchangeable random partitions , volume =
Schweinsberg, Jason , journal =. The number of small blocks in exchangeable random partitions , volume =
-
[11]
Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws , volume =
Gnedin, Alexander and Hansen, Ben and Pitman, Jim , journal =. Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws , volume =
-
[12]
Koriyama, Takuya and Matsuda, Takeru and Komaki, Fumiyasu , journal =
-
[13]
Asymptotic statistics , volume =
van der Vaart, Aad , publisher =. Asymptotic statistics , volume =
-
[14]
Regular variation , volume =
Bingham, Nicholas H and Goldie, Charles M and Teugels, Jef L , publisher =. Regular variation , volume =
-
[15]
Bayesian nonparametric inference for ``species-sampling'' problems , volume =
Balocchi, Cecilia and Favaro, Stefano and Naulet, Zacharie , journal =. Bayesian nonparametric inference for ``species-sampling'' problems , volume =
-
[16]
Franssen, SEMP and van der Vaart, AW , journal =
-
[17]
The Annals of Probability , number =
Pitman, Jim and Yor, Marc , doi =. The Annals of Probability , number =. 1997 , Bdsk-Url-1 =
1997
-
[18]
, journal =
Ishwaran, Hemant and James, Lancelot F. , journal =
-
[19]
2006 , Bdsk-Url-1 =
Pitman, Jim , doi =. 2006 , Bdsk-Url-1 =
2006
-
[20]
Lijoi, Antonio and Pr. 2020 , Bdsk-Url-1 =. doi:10.1093/biomet/asaa030 , journal =
-
[21]
Lijoi, Antonio and Pr. 2008 , Bdsk-Url-1 =. doi:10.1214/07-AAP495 , journal =
-
[22]
Journal of Mathematical Sciences , pages =
Gnedin, Alexander and Pitman, Jim , doi =. Journal of Mathematical Sciences , pages =. 2006 , Bdsk-Url-1 =
2006
-
[23]
Bayesian Nonparametrics , doi =
Lijoi, Antonio and Pr. Bayesian Nonparametrics , doi =. 2010 , Bdsk-Url-1 =
2010
-
[24]
De Blasi, Pierpaolo and Favaro, Stefano and Lijoi, Antonio and Mena, Rams. 2015 , Bdsk-Url-1 =. doi:10.1109/TPAMI.2013.217 , journal =
-
[25]
Bayesian nonparametric estimation of the probability of discovering new species , volume =
Lijoi, Antonio and Mena, Rams. Bayesian nonparametric estimation of the probability of discovering new species , volume =. 2007 , Bdsk-Url-1 =. doi:10.1093/biomet/asm061 , journal =
-
[26]
Lijoi, Antonio and Mena, Rams. 2007 , Bdsk-Url-1 =. doi:10.1186/1471-2105-8-339 , journal =
-
[27]
Favaro, Stefano and Lijoi, Antonio and Mena, Rams. 2009 , Bdsk-Url-1 =. doi:10.1111/j.1467-9868.2009.00717.x , journal =
-
[28]
and Johnson, Mark , booktitle =
Goldwater, Sharon and Griffiths, Thomas L. and Johnson, Mark , booktitle =. Interpolating between types and tokens by estimating power-law generators , year =
-
[29]
Teh, Yee Whye , booktitle =. 2006 , Bdsk-Url-1 =. doi:10.3115/1220175.1220299 , pages =
-
[30]
Beraha, Mario and Favaro, Stefano , journal =
-
[31]
Gerlach, Martin and Altmann, Eduardo G. , doi =. Physical Review X , number =. 2013 , Bdsk-Url-1 =
2013
-
[32]
Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process , volume =
Mazzolini, Andrea and Colliva, Alberto and Caselle, Michele and Osella, Matteo , doi =. Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process , volume =. Physical Review E , number =. 2018 , Bdsk-Url-1 =
2018
-
[33]
Collective dynamics of social annotation , volume =
Cattuto, Ciro and Barrat, Alain and Baldassarri, Andrea and Schehr, Gregory and Loreto, Vittorio , doi =. Collective dynamics of social annotation , volume =. Proceedings of the National Academy of Sciences of the United States of America , number =. 2009 , Bdsk-Url-1 =
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.