pith. sign in

arxiv: 2402.14390 · v3 · pith:NF5IAFLCnew · submitted 2024-02-22 · 📊 stat.CO · stat.ME

Composite likelihood inference for the Poisson log-normal model

Pith reviewed 2026-05-24 04:04 UTC · model grok-4.3

classification 📊 stat.CO stat.ME
keywords Poisson log-normal modelcomposite likelihoodimportance samplingEM algorithmmultivariate count dataasymptotic propertiesparameter estimation
0
0 comments X

The pith

Composite likelihood inside the EM algorithm recovers the asymptotic efficiency of full maximum likelihood for the Poisson log-normal model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Poisson log-normal model serves as a latent-variable framework for multivariate count data, yet parameter inference is blocked by an intractable conditional distribution of the latent variables. A standard Monte Carlo EM procedure can recover true maximum-likelihood estimators but only when the latent dimension stays small. The paper replaces the missing conditional expectation with an estimator built from composite likelihood and importance sampling. This substitution keeps the large-sample properties of maximum likelihood while removing the high-dimensional integration barrier, so the method remains practical for moderately large data sets. The resulting procedure supplies point estimates, confidence intervals, and hypothesis tests, as demonstrated on a fish-count data set where environmental effects and species correlations can be assessed directly.

Core claim

The authors first present a Monte Carlo EM algorithm that targets maximum-likelihood estimators for the Poisson log-normal model but is computationally feasible only in low-dimensional latent spaces. They then introduce a novel procedure that embeds composite-likelihood and importance-sampling estimates inside the same EM framework. The resulting estimators retain the asymptotic properties of full maximum likelihood while bypassing the intractable high-dimensional integrals, thereby enabling reliable parameter estimation, confidence intervals, and hypothesis testing on data sets of moderate size.

What carries the argument

The composite-likelihood plus importance-sampling estimator that replaces the intractable conditional expectation inside the EM algorithm for the Poisson log-normal model.

If this is right

  • The estimators retain the asymptotic efficiency and normality of maximum-likelihood estimators.
  • Confidence intervals and hypothesis tests can be constructed directly from the composite-likelihood EM output.
  • Computation remains feasible for data sets whose latent dimension would render ordinary Monte Carlo EM prohibitive.
  • The method can be applied to identify significant covariates and residual correlations in multivariate count data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same substitution strategy could be tested on other latent-variable count models whose full conditional is intractable.
  • Refinements to the importance-sampling proposal might further reduce Monte Carlo variance without altering the asymptotic guarantees.
  • Likelihood-ratio statistics derived from the composite estimator could support model-comparison procedures that inherit the same large-sample justification.

Load-bearing premise

Replacing the intractable conditional expectation in the EM algorithm with a composite-likelihood plus importance-sampling estimator does not destroy the asymptotic equivalence to full maximum likelihood.

What would settle it

A Monte Carlo experiment in which the proposed estimators fail to achieve the same asymptotic variance as the true maximum-likelihood estimator, or in which the nominal coverage rate of the associated confidence intervals deviates systematically from the claimed level as sample size grows.

Figures

Figures reproduced from arXiv: 2402.14390 by Julien Stoehr (CEREMADE), Stephane S. Robin (LPSM (UMR\_8001)).

Figure 1
Figure 1. Figure 1: Number of blocks C as a function of the number of species p (in log-log-scale) for blocks of size k = 2 (black squares ∎), k = 3 (blue circles ◯), k = 5 (red triangles up △) and k = 7 (green triangles down ▽). Solid line: number of blocks actually used, dashed line: upper bound ( p k ), dotted line: lower bound p(p − 1)/[k(k − 1)]. parameter, so C = p(p − 1)/2. Spreading the p species into k blocks is equi… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of the p-values from the Kolmogorov–Smirnov test applied to the distribution of the standardized estimates β̃ℓj over the M = 100 simulations, for each inference method: full likelihood (FL), composite likelihood with blocks of size k (CLk), variational EM (VEM), and jackknife-based variational EM (JK). Each boxplot summarizes the d×p = 3p normalized coefficients β̃ℓj . Dotted red lines: α = 5%… view at source ↗
Figure 3
Figure 3. Figure 3: Effective sample size (ESS) over the first 50 iterations for the different inference methods on the reduced Barents Sea dataset (p = 7 species). From top to bottom: full likelihood (FL) and composite likelihood (CLk) with block sizes k = 2, 3, and 5. One boxplot is shown per iteration, summarizing the ESS values across all blocks and all sites. Horizontal red dashed line: median ESS across the remaining it… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of the estimates obtained using different inference methods on the Barents Sea reduced dataset (p = 7 species). Left: estimated regression coefficients B̂ = (β̂hj); center: estimated covariance parameters Σ̂ = (̂σjk); right: estimated variances of the regression coefficients β̂hj . x-axis: estimates from full likelihood inference (FL), y-axis: estimates from the other methods: FL (gray asterisk … view at source ↗
Figure 5
Figure 5. Figure 5: Results on the Barents Sea full dataset (p = 30 species). Left: estimated regression coef￾ficients B̂ = (β̂ℓj); center: estimated covariance parameters Σ̂ = (̂σjk). x-axis: estimates obtained using the composite likelihood method CLk with blocks of size k = 5 (CL5, red triangle [△] = reference); y-axis: estimates obtained using variational EM (VEM, black circles [◯]), CL3 (green plus sign [+]), and CL7 (bl… view at source ↗
Figure 6
Figure 6. Figure 6: Model selection among the 16 possible models (x-axis) for the Barents Sea full dataset (p = 30 species), using the composite likelihood solution (CLk) with k = 5 and 7. Dashed vertical lines: limits between models involving d = 1, . . . 5 covariates (including the intercept). Left: composite likelihood at the maximum composite likelihood estimates (subtracting the composite likelihood for Model 1); Center:… view at source ↗
Figure 7
Figure 7. Figure 7: Colormap of the parameter estimates for the Barents Sea full dataset (p = 30 species). Left: regression coefficients B̂ = β̂ℓj ; center: covariance parameters Σ̂ = (̂σjk); right: corresponding correlations ρ̂jk = ̂σjk/ √ ̂σjĵσkk. Blank cells correspond to non-significant estimates, red cells (marked with ’+’) to positive estimates and blue cells (marked with ’−’) to negative estimates. Interpretation One … view at source ↗
Figure 8
Figure 8. Figure 8: Significance of the regression coefficients for the Barents Sea full dataset (p = 30 species). Left: comparison of the test statistics for all βℓj . x-axis = composite likelihood algorithm with blocks of size k = 5, y-axis = variational test statistics; blue triangles △: variational estimate of the variance (VEM), red circles ◯: jackknife estimate of the variance (JK). Dashed lines = least-square regressio… view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of the p-values from the Kolmogorov–Smirnov test applied to the distribution of the standardized estimates β̃ℓj over the M = 100 simulations as a function of the num￾ber of species (p = 5, . . . , 10) for full-likelihood inference (FL). Dotted red lines: α = 5% significance threshold after Bonferroni correction (i.e., α/(dp)). 31 [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: qq-plots of the normalized regression coefficients β̃ℓj , as defined in Equation (15), for the second simulated species and each of the d = 3 covariates, using full likelihood inference (FL), with p = 5 to 10 species. x-axis: standard normal quantiles, y-axis: quantiles of β̃ℓj (black dots [●]), magenta dashed lines [- -]: 95% bounds for the standard normal qq-plot. 32 [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 11
Figure 11. Figure 11: Boxplots of the relative variance of the regression coefficient estimates β̂ℓj obtained with the composite likelihood method (CLk) with block sizes k = 2, 3, 7, as compared to the CL5 algorithm (namely, V̂arCLk[β̂ℓj ]/V̂arCL5[β̂ℓj ]), for p = 30 species. Each boxplot is built across the d × p = 90 normalised coefficients β̃ℓj . Computational time ( [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: qq-plots of the normalized regression coefficients β̃ℓj , as defined in Equation (15), for each of the d = 3 covariates and for the second simulated species (out of p = 30), obtained using the composite likelihood algorithms (CLk) with block of size k = 2, 3, 5, and 7, and the variational EM (VEM) algorithm with jackknife variance estimates. Same legend as [PITH_FULL_IMAGE:figures/full_fig_p034_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: qq-plots of the normalized regression coefficients β̃ℓj , as defined in Equation (15), for each of the d = 3 covariates and for the second simulated species (out of p = 30), obtained using the full likelihood method (FL) or the composite likelihood method (CLk) with block sizes k = 2, 3, 5, and 7. Same legend as [PITH_FULL_IMAGE:figures/full_fig_p035_13.png] view at source ↗
read the original abstract

The Poisson log-normal model is a latent variable model that provides a generic framework for the analysis of multivariate count data. Inferring its parameters can be a daunting task since the conditional distribution of the latent variables given the observed ones is intractable. For this model, variational approaches are the golden standard solution as they prove to be computationally efficient but lack theoretical guarantees on the estimates. Sampling-based solutions are quite the opposite. We first define a Monte Carlo EM algorithm that can achieve maximum likelihood estimators, but that is computationally efficient only for low-dimensional latent spaces. We then propose a novel inference procedure combining the EM framework with composite likelihood and importance sampling estimates. The algorithm preserves the desirable asymptotic properties of maximum likelihood estimators while circumventing the high-dimensional integration bottleneck, thus maintaining computational feasibility for moderately large datasets. This approach enables grounded parameter estimation, confidence intervals, and hypothesis testing. Application to the Barents Sea fish dataset demonstrates the algorithm capacity to identify significant environmental effects and residual interspecies correlations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a Monte Carlo EM algorithm for maximum likelihood estimation in the Poisson log-normal model for multivariate count data, then proposes a novel variant that replaces the intractable E-step conditional expectation with a composite-likelihood estimator combined with importance sampling. The central claim is that this approximation preserves the asymptotic properties of the full MLE while remaining computationally feasible for moderately high-dimensional latent spaces, enabling confidence intervals and hypothesis tests; the method is illustrated on the Barents Sea fish dataset.

Significance. If the asymptotic equivalence holds, the approach would supply a theoretically grounded alternative to variational methods for latent-variable count models, allowing valid inference on environmental effects and residual correlations in moderately large datasets where full MCEM is intractable.

major comments (2)
  1. [Abstract] Abstract (paragraph on the novel inference procedure): the assertion that the composite-likelihood + importance-sampling replacement for the E-step 'preserves the desirable asymptotic properties of maximum likelihood estimators' is presented without any derivation, bias/variance analysis, or rate conditions showing that the Godambe information mismatch and Monte Carlo error vanish sufficiently fast for the overall estimator to remain asymptotically equivalent to exact MCEM or direct MLE. This is the load-bearing step for the central claim.
  2. No section supplies simulation evidence or theoretical bounds demonstrating that the fixed point and asymptotic distribution of the proposed EM sequence coincide with those of the exact MLE; the abstract's claim therefore cannot be assessed from the supplied text.
minor comments (1)
  1. [Abstract] The abstract refers to 'grounded parameter estimation, confidence intervals, and hypothesis testing' but does not indicate which information matrix (observed, expected, or sandwich) is used for standard errors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for providing constructive feedback. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on the novel inference procedure): the assertion that the composite-likelihood + importance-sampling replacement for the E-step 'preserves the desirable asymptotic properties of maximum likelihood estimators' is presented without any derivation, bias/variance analysis, or rate conditions showing that the Godambe information mismatch and Monte Carlo error vanish sufficiently fast for the overall estimator to remain asymptotically equivalent to exact MCEM or direct MLE. This is the load-bearing step for the central claim.

    Authors: We acknowledge that the manuscript does not provide a detailed derivation or analysis of the asymptotic properties in the abstract or main text. The claim is motivated by the asymptotic theory of composite likelihood estimators, which are known to be consistent and asymptotically normal under regularity conditions, combined with the convergence of importance sampling. However, to address the referee's concern, we will revise the paper to include a new theoretical subsection that derives the conditions under which the proposed estimator is asymptotically equivalent to the MLE, including analysis of the Godambe information and Monte Carlo error rates. revision: yes

  2. Referee: [—] No section supplies simulation evidence or theoretical bounds demonstrating that the fixed point and asymptotic distribution of the proposed EM sequence coincide with those of the exact MLE; the abstract's claim therefore cannot be assessed from the supplied text.

    Authors: The referee is correct that the current manuscript lacks both simulation evidence and explicit theoretical bounds comparing the proposed method to exact MCEM. We agree this is necessary to support the central claim. In the revised version, we will add simulation studies on low-dimensional cases where exact MCEM is feasible to verify that the fixed points and asymptotic distributions coincide, as well as theoretical bounds on the approximation error. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a new algorithmic construction

full rationale

The paper defines a Monte Carlo EM algorithm and then a composite-likelihood + importance-sampling variant. The central claim that the procedure 'preserves the desirable asymptotic properties of maximum likelihood estimators' is asserted as a consequence of the construction rather than shown to reduce to a fitted parameter or self-citation by any equation. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations appear in the supplied text. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5704 in / 1001 out tokens · 23630 ms · 2026-05-24T04:04:00.646785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Agapiou, O

    S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. M. Stuart. Importance Sampling: Intrinsic Dimension and Computational Cost . Statistical Science, 32 0 (3): 0 405--431, 2017

  2. [2]

    Aitchison and C

    J. Aitchison and C. Ho. The multivariate P oisson-log normal distribution. Biometrika, 76 0 (4): 0 643--653, 1989

  3. [3]

    Bartolucci, F

    F. Bartolucci, F. Chiaromonte, P. Kuruppumullage Don, and B. Lindsay. Composite likelihood inference in a discrete latent variable model for two-way ``clustering-by-segmentation'' problems. Journal of Computational and Graphical Statistics, 26 0 (2): 0 388--402, 2017

  4. [4]

    J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36 0 (2): 0 192--225, 1974

  5. [5]

    Bevilacqua and C

    M. Bevilacqua and C. Gaetan. Comparing composite likelihood methods based on pairs for spatial gaussian random fields. Statistics and Computing, 25: 0 877--892, 2015

  6. [6]

    D. M. Blei , A. Kucukelbir , and J. D. McAuliffe . Variational inference: A review for statisticians . Journal of the American Statistical Association, 112 0 (518): 0 859--877, 2017

  7. [7]

    J. G. Booth and J. P. Hobert. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm . Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61 0 (1): 0 265--285, 1999

  8. [8]

    R. A. Boyles . On the Convergence of the EM Algorithm . Journal of the Royal Statistical Society: Series B (Methodological), 45 0 (1): 0 47--50, 1983

  9. [9]

    Capp \'e , R

    O. Capp \'e , R. Douc, A. Guillin, J.-M. Marin, and C. P. Robert. Adaptive importance sampling in general mixture classes. Statistics and Computing, 18 0 (4): 0 447--459, 2008

  10. [10]

    Chatterjee and P

    S. Chatterjee and P. Diaconis. The sample size required in importance sampling . The Annals of Applied Probability, 28 0 (2): 0 1099--1135, 2018

  11. [11]

    Chiquet, M

    J. Chiquet, M. Mariadassou, and S. Robin. Variational inference for probabilistic Poisson PCA . The Annals of Applied Statistics, 12 0 (4): 0 2674--2698, 2018

  12. [12]

    Chiquet, M

    J. Chiquet, M. Mariadassou, and S. Robin. Variational inference for sparse network reconstruction from count data. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1162--1171, California, USA, 09--15 Jun 2019. PMLR

  13. [13]

    Chiquet, M

    J. Chiquet, M. Mariadassou, and S. Robin. The P oisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances . Frontiers in Ecology and Evolution, 9: 0 188, 2021

  14. [14]

    Cornuet, J.-M

    J.-M. Cornuet, J.-M. Marin, A. Mira, and C. P. Robert. Adaptive multiple importance sampling. Scandinavian Journal of Statistics, 39 0 (4): 0 798--812, 2012

  15. [15]

    Daudel, R

    K. Daudel, R. Douc, and F. Portier. Infinite-dimensional gradient-based descent for alpha-divergence minimisation. The Annals of Statistics, 49 0 (4): 0 2250--2270, 2021

  16. [16]

    Daudel, R

    K. Daudel, R. Douc, and F. Roueff. Monotonic Alpha-divergence Minimisation for Variational Inference . Journal of Machine Learning Research , 24 0 (62): 0 1--76, 2023

  17. [17]

    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm . Journal of the Royal Statistical Society: Series B, 39: 0 1--38, 1977

  18. [18]

    Efron and C

    B. Efron and C. Stein. The jackknife estimate of variance. The Annals of Statistics, pages 586--596, 1981

  19. [19]

    Eidsvik, B

    J. Eidsvik, B. A. Shaby, B. J. Reich, M. Wheeler, and J. Niemi. Estimation and prediction in spatial models with block composite likelihoods. Journal of Computational and Graphical Statistics, 23 0 (2): 0 295--315, 2014

  20. [20]

    Fort and E

    G. Fort and E. Moulines. Convergence of the Monte Carlo expectation maximization for curved exponential families . The Annals of Statistics, 31 0 (4): 0 1220 -- 1259, 2003

  21. [21]

    Fossheim, E

    M. Fossheim, E. M. Nilssen, and M. Aschan. Fish assemblages in the B arents S ea. Marine Biology Research, 2 0 (4): 0 260--269, 2006

  22. [22]

    Gao and P

    X. Gao and P. X.-K. Song. Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105 0 (492): 0 1531--1540, 2010

  23. [23]

    Jaakkola

    T. Jaakkola. Advanced mean field methods: theory and practice, chapter Tutorial on variational approximation methods, pages 129--160. MIT Press, Cambridge, Massachusetts, USA, 2001

  24. [24]

    T. S. Jaakkola and M. I. Jordan. B ayesian parameter estimation via variational methods. Statistics and Computing, 10 0 (1): 0 25--37, 2000

  25. [25]

    Korba and F

    A. Korba and F. Portier. Adaptive Importance Sampling meets Mirror Descent : a Bias-variance Tradeoff . In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 11503--11527, Virtual Conference, 2022. PMLR

  26. [26]

    R. A. Levine and G. Casella. Implementations of the Monte Carlo EM Algorithm . Journal of Computational and Graphical Statistics, 10 0 (3): 0 422--439, 2001

  27. [27]

    B. Lindsay. Composite likelihood methods. Contemporary mathematics, 80 0 (1): 0 221--239, 1988

  28. [28]

    T. A. Louis. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B, pages 226--233, 1982

  29. [29]

    C. E. McCulloch. Maximum Likelihood Algorithms for Generalized Linear Mixed Models . Journal of the American Statistical Association, 92 0 (437): 0 162--170, 1997

  30. [30]

    Middleton, G

    L. Middleton, G. Deligiannidis, A. Doucet, and P. E. Jacob. Unbiased Smoothing using Particle Independent Metropolis-Hastings . In K. Chaudhuri and M. Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 2378--2387, Naha, Okinawa...

  31. [31]

    R: A Language and Environment for Statistical Computing

    R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2015. URL https://www.R-project.org/

  32. [32]

    A. W. v. d. Vaart. Asymptotic statistics, volume 27 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK, 1998

  33. [33]

    Varin and P

    C. Varin and P. Vidoni. A note on composite likelihood inference and model selection. Biometrika, 92 0 (3): 0 519--528, 2005

  34. [34]

    Varin, G

    C. Varin, G. H st, and . Skare. Pairwise likelihood inference in spatial generalized linear mixed models. Computational statistics & data analysis, 49 0 (4): 0 1173--1191, 2005

  35. [35]

    Varin, N

    C. Varin, N. Reid, and D. Firth. An Overview of Composite Likelihood Methods . Statistica Sinica, 21: 0 5--42, 2011

  36. [36]

    M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference . Found. Trends Mach. Learn., 1 0 (1--2): 0 1--305, 2008

  37. [37]

    G. C. G. Wei and M. A. Tanner. A M onte C arlo I mplementation of the EM A lgorithm and the P oor M an's D ata A ugmentation A lgorithms. Journal of the American Statistical Association, 85 0 (411): 0 699--704, 1990

  38. [38]

    C. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11 0 (1): 0 95--103, 1983

  39. [39]

    X. Xu, N. Reid, and L. Xu. Note on information bias and efficiency of composite likelihood. Technical Report 1612.06967, arXiv, 2016

  40. [40]

    Zhao and H

    Y. Zhao and H. Joe. Composite likelihood estimation in multivariate data analysis. Canadian Journal of Statistics, 33 0 (3): 0 335--356, 2005