pith. sign in

arxiv: 2604.20978 · v1 · submitted 2026-04-22 · 📊 stat.ME

ML, PL, QL in Markov chain models

Pith reviewed 2026-05-09 23:24 UTC · model grok-4.3

classification 📊 stat.ME
keywords Markov chain modelsmaximum likelihoodpseudo-likelihoodquasi-likelihoodlimiting normalityDNA sequence evolutionstatistical inference
0
0 comments X

The pith

Quasi-likelihood matches full maximum likelihood closely while gaining robustness over pseudo-likelihood in Markov chain models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Models with complex dependencies often make full maximum likelihood computation infeasible. This work derives limiting normal distributions for maximum likelihood, pseudo-likelihood, and quasi-likelihood estimators under general Markov chain assumptions, then compares their behavior across settings. The results indicate that quasi-likelihood typically outperforms pseudo-likelihood in efficiency and robustness while staying close to full maximum likelihood performance. The approach is illustrated on DNA sequence evolution models. A reader would care because the findings offer a workable compromise for dependent data where exact likelihood methods break down.

Core claim

The paper derives limiting normality results for the maximum likelihood, pseudo-likelihood, and quasi-likelihood estimators in general Markov chain models. It shows that the quasi-likelihood strategy is typically preferable to the pseudo-likelihood, losing very little to the maximum likelihood while gaining in model robustness, and has potential as a modelling tool.

What carries the argument

Limiting normality results for the three estimators, with pseudo-likelihood and quasi-likelihood treated as maximum penalised likelihood methods.

If this is right

  • Quasi-likelihood becomes a practical substitute when full maximum likelihood is computationally prohibitive due to complex dependencies.
  • Quasi-likelihood retains most efficiency of full maximum likelihood across the examined Markov settings.
  • Pseudo-likelihood shows consistent efficiency losses relative to the other two methods.
  • The methods apply directly to spatial-temporal and DNA sequence models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Quasi-likelihood could be implemented as a default option in software for dependent data analysis when full likelihood is intractable.
  • The robustness advantage might prove useful in spatial models where the exact dependence structure is uncertain.
  • Finite-sample checks on non-DNA Markov chains would test whether the asymptotic preference for quasi-likelihood holds in practice.

Load-bearing premise

The limiting normality results accurately capture finite-sample behavior and the performance comparisons extend beyond the DNA sequence examples considered.

What would settle it

A simulation study on finite-length Markov chains with known parameters, comparing the three estimators' bias, variance, and robustness under model misspecification, would confirm or refute the asymptotic rankings.

Figures

Figures reproduced from arXiv: 2604.20978 by Cristiano Varin, Nils Lid Hjort.

Figure 5.1
Figure 5.1. Figure 5.1: Contour plots of the variance ratio for PL with respect to ML, for estimation of α (left panel) and of β (right panel), for the [0, 0.15] × [0.15] subset of the parameter space. The quantities of (2.4) and (3.3) are found to be  γ0,0 γ0,1 γ1,0 γ1,1  = 1 (α + β) 2  α −α −β β  ,  γ¯0,0 γ¯0,1 γ¯1,0 γ¯1,1  = 1 − α − β (α + β) 2  α −α −β β  . Furthermore, u0,0 =  −1/(1 − α) 0  , u0,1 =  1/α 0  , u… view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Asymptotic relative efficiency for the PL method, with respect to the ML and QL method, for a three-stage equicorrelation chain, as a function of ρ. We start by assuming p = (p1, . . . , pk) t known and consider estimation of the param￾eter ρ. We have ua,b = (δa,b − pb)/pa,b(ρ) and J = X a,b pa (δa,b − pb) 2 pa,b(ρ) = X a,b pa (δa,b − pb) 2 (1 − ρ)pb + ρδa,b . Since the stationary distribution p is known… view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: The random walk with two reflecting barriers: six states example. The solid line correspond to the ARE for PL, while the dashed one to QL. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_5_3.png] view at source ↗
Figure 5.4
Figure 5.4. Figure 5.4: The random walk with two reflecting barriers: the effects of increasing the number of states on the PL estimator of p [left panel] and on the QL estimator of p [right panel]. The solid line corresponds to a 15 states chain; the dashed line to 10 states; and the dotted line to 5 states. In order to compute the matrices involved in the QL and PL computations, note that v1 = −π1 n 1 pq k X−2 i=1 p q i−1 … view at source ↗
Figure 5.5
Figure 5.5. Figure 5.5: The random walk two reflecting barriers: ten states chain. The curves correspond to the ARE for QL of order 2, 3, 4, 10 and 100 (growing). Again, the matrices involved in describing the large-sample behaviour of the QL and PL methods may now be computed numerically. To illustrate different aspects involved in the comparison, we varied the p parameter as well as the number k of states, and examined the la… view at source ↗
Figure 7.1
Figure 7.1. Figure 7.1: The plot shows the least false parameter values, for α and β, when the four-parameter Kimura model (6.1) is assumed, when the real mechanism is a six-parameter Kimura model, as a function of the model departure degree ε; here γ1 = γ + ε, γ2 = γ − ε, δ1 = δ + ε, δ2 = δ − ε, and values (.03, .04, .12, .14) are used for (α, β, γ, δ). The least false values are shown for the ML (solid curve), the QL (dotted … view at source ↗
Figure 7.2
Figure 7.2. Figure 7.2: The plot shows the least false parameter values, for γ and δ, when the four-parameter Kimura model (6.1) is assumed, when the real mechanism is a six-parameter Kimura model, as a function of the model departure degree ε; here γ1 = γ + ε, γ2 = γ − ε, δ1 = δ + ε, δ2 = δ − ε, and values (.03, .04, .12, .14) are used for (α, β, γ, δ). The least false values are shown for the ML (solid curve), the QL (dotted … view at source ↗
read the original abstract

In many spatial and spatial-temporal models, and more generally in models with complex dependencies, it may be too difficult to carry out full maximum likelihood (ML) analysis. Remedies include the use of pseudo-likelihood (PL) and quasi-likelihood (QL) (also called the composite likelihood). The present article studies the ML, the PL and the QL methods for general Markov chain models, partly motivated by the desire to understand the precise behaviour of PL and QL methods in settings where this can be analysed. We present limiting normality results and compare performances in different settings. The PL and QL methods can be seen as maximum penalised likelihood methods. We find that the QL strategy is typically preferable to the PL, and that it loses very little to the ML, while earning in model robustness. It has also appeal and potential as a modelling tool. Our methods are illustrated for analysis of DNA sequence evolution type models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper derives limiting normality results for maximum likelihood (ML), pseudo-likelihood (PL), and quasi-likelihood (QL) estimators in general Markov chain models. It compares their asymptotic efficiencies and finite-sample performance on DNA sequence evolution examples, concluding that QL is typically preferable to PL (losing little to ML while gaining robustness) and has appeal as a modeling tool.

Significance. If the derivations and comparisons hold, the work provides useful theoretical grounding and practical guidance for choosing among ML, PL, and QL in dependent-data settings where full likelihood is intractable. The explicit limiting normality results and the framing of PL/QL as penalized likelihood are strengths that allow precise efficiency comparisons.

major comments (2)
  1. [Performance comparisons and illustrations] The headline claim that QL is 'typically preferable' to PL (while close to ML) rests on the limiting normality results plus DNA-sequence illustrations. These are asymptotic; without finite-sample simulations across a range of chain lengths, orders, or transition structures, the finite-sample preference and robustness advantage do not necessarily follow (see stress-test concern).
  2. [Robustness discussion] The robustness advantage of QL is asserted but not quantified beyond the DNA examples. A concrete measure (e.g., sensitivity to misspecification of the transition kernel or to higher-order dependence) would be needed to support the general claim that QL 'earns in model robustness.'
minor comments (2)
  1. [Methods] Notation for the composite likelihood and the penalization terms could be clarified with an explicit equation early in the methods section.
  2. [Abstract] The abstract states 'limiting normality results' but does not indicate whether the results cover both stationary and non-stationary chains; a brief statement would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight opportunities to strengthen the finite-sample evidence supporting our claims. We address each major point below and will incorporate revisions to provide additional simulation-based support for the performance and robustness conclusions.

read point-by-point responses
  1. Referee: [Performance comparisons and illustrations] The headline claim that QL is 'typically preferable' to PL (while close to ML) rests on the limiting normality results plus DNA-sequence illustrations. These are asymptotic; without finite-sample simulations across a range of chain lengths, orders, or transition structures, the finite-sample preference and robustness advantage do not necessarily follow (see stress-test concern).

    Authors: We agree that the current finite-sample support relies on the DNA sequence illustrations in Section 5 rather than a broad Monte Carlo study. The limiting normality results (Theorems 3.1, 4.1, and 4.2) and the efficiency comparisons derived from them establish the asymptotic preference for QL over PL with minimal loss relative to ML. The DNA examples demonstrate this in a practical setting with finite lengths. To address the concern directly, we will add a new simulation subsection varying chain lengths (n = 50 to 2000), Markov orders (1 and 2), and transition probability structures, reporting empirical bias, variance, and coverage to confirm the finite-sample behavior aligns with the asymptotics. revision: yes

  2. Referee: [Robustness discussion] The robustness advantage of QL is asserted but not quantified beyond the DNA examples. A concrete measure (e.g., sensitivity to misspecification of the transition kernel or to higher-order dependence) would be needed to support the general claim that QL 'earns in model robustness.'

    Authors: The robustness claim follows from the construction of QL as a composite likelihood that depends only on the specified transition kernel (unlike full ML) and avoids the pairwise over-weighting issues of PL, as framed in Section 2. The DNA examples provide empirical illustration under potential model departures common in sequence data. We will add a targeted simulation study in the revision that generates data from a misspecified higher-order chain and fits first-order models, comparing mean squared error and robustness metrics (e.g., relative efficiency loss under misspecification) across ML, PL, and QL to quantify the advantage. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rely on standard asymptotic theory

full rationale

The paper derives limiting normality results for ML, PL and QL estimators in general Markov chain models from standard asymptotic theory for dependent processes. Efficiency comparisons and the conclusion that QL is typically preferable to PL follow directly from these limiting distributions and relative asymptotic variances, without any reduction to fitted parameters, self-definitions, or load-bearing self-citations. DNA sequence illustrations are presented as applications after the theory, not as the source of the claims. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard regularity conditions for Markov chains to obtain asymptotic normality; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Markov chain models satisfy standard regularity conditions allowing central limit theorems for the estimators
    Invoked to derive limiting normality results for ML, PL, and QL.

pith-pipeline@v0.9.0 · 5444 in / 1117 out tokens · 18248 ms · 2026-05-09T23:24:58.629600+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    and Goodman, L.A

    Anderson, T.W. and Goodman, L.A. (1957). Statistical inference about Markov chains. Annals of Mathematical Statistics28, 89–110

  2. [2]

    and Hartigan, J.A

    Barry, D. and Hartigan, J.A. (1987). Asynchronous distance between homologous DNA sequences.Biometrics43, 261–276. Basawa and Rao (1980).Statistical Inference for Stochastic Processes.Academic Press, London

  3. [3]

    and Naumov, V.A

    Basharin, G.P., Langville, A.N. and Naumov, V.A. (2004). The life and work of A.A. Markov.Linear Algebra and its Applications386, 3–26

  4. [4]

    Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion contributions).Journal of the Royal Statistical SocietyB 36, 192–236

  5. [5]

    Besag, J. (1975). Statistical analysis of non-lattice data.The Statistician24, 179–195

  6. [6]

    Besag, J. (1977). Some methods of statistical analysis for spatial data.Bulletin of the Institute of International Statistics47, 77–92

  7. [7]

    Blaisdell, B. E. (1985). A method for estimating from two aligned present day DNA sequences their ancestral composition and subsequent rates of composition and subse- quent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site.Journal of Molecular Evolution22, 69–81. 32

  8. [8]

    Cox, D. R. and Reid, N. (2004). A note on pseudolikelihood constructed from marginal densities.Biometrika91, 729–737

  9. [9]

    (2003).Statistical Models.Cambridge University Press, Cambridge

    Davison, A.C. (2003).Statistical Models.Cambridge University Press, Cambridge

  10. [10]

    (2002).Probability Models for DNA Sequence Evolution.Probability and Its

    Durret, R. (2002).Probability Models for DNA Sequence Evolution.Probability and Its

  11. [11]

    and Donnelly, P

    Fearnhead, P. and Donnelly, P. (2002). Approximate likelihood methods for estimating local recombination rates.Journal of the Royal Statistical SocietyB 64, 657–680

  12. [12]

    and Kedem, B

    Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series.Statis- tical Science18, 357–375

  13. [13]

    Glasbey, C.A. (2001). Non-linear autoregressive time series with multivariate Gaussian mixtures as marginal distributions.Applied Statistics50, 143–154

  14. [14]

    and Lele, S.R

    Heagerty, P.J. and Lele, S.R. (1998). A composite likelihood approach to binary spatial data.Journal of the American Statistical Association93, 1099–1111

  15. [15]

    and Shimakura, S

    Henderson, R. and Shimakura, S. (2003). A serially correlated gamma frailty model for longitudinal count data.Biometrika90, 355–366

  16. [16]

    and Mohn, E

    Hjort, N.L. and Mohn, E. (1987). Topics in the statistical analysis of remotely sensed data [with discussion].Bulletins of the International Statistical Institute52(Proceedings of the ISI Meeting, Tokyo), 23–44

  17. [17]

    and Omre, H

    Hjort, N.L. and Omre, H. (1994). Topics in spatial statistics (with discussion contribu- tions).Scandinavian Journal of Statistics21, 289–357

  18. [18]

    and Mostad, P

    Hjort, N.L. and Mostad, P. (1998). A quasi-likelihood method for estimating parameters in spatial covariance functions. Manuscript

  19. [19]

    and Jensen, J.L

    Hobolth, A. and Jensen, J.L. (2005). Statistical inference in evolutionary models of DNA sequences via the EM algorithm. Research report No. 455, Department of Theoretical

  20. [20]

    (1995).Statistical Methods Applied in Meteorology.Cand

    Homleid, M. (1995).Statistical Methods Applied in Meteorology.Cand. scient. thesis, Department of Mathematics, University of Oslo

  21. [21]

    and Taylor, H.M

    Karlin, S. and Taylor, H.M. (1975).A First Course in Stochastic Processes.Academic

  22. [22]

    Kimura, M. (1980). A simple method for estimating evolutionary rates of base substi- tutions through comparative studies of nucleotide sequences.Journal of Molecular Evolution16, 111–120

  23. [23]

    Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences.Proceedings of the National Academy of Sciences USA78, 454–458. de Leon, A.R. (2004). Pairwise likelihood approach to grouped continuous model and its extension. Technical report, Department of Mathematics & Statistics, University of Calgary

  24. [24]

    Lindsay, B. (1988). Composite likelihood methods. InStatistical Inference for Stochastic Processes(ed. N.U. Prahbu), American Mathematical Society. 33

  25. [25]

    Markov, A.A. (1906). Rasprostranenie zakona bol~xih qisel na veliqiny, zavis wie drug ot druga.Izvesti Fiziko-matematiqeskogo obqestva pri Ka- zanskom universitete15(2- seri ), 124–156

  26. [26]

    Ev- geni Onegina

    Markov, A.A. (1913). Primer statistiqeskogo issledovani nad tekstom “Ev- geni Onegina”, ill striru wi˘i sv z~ ispytani˘i v cep~.Izvesti Aka- demii Nauk, Sankt-Peterburg7(6- seri ), 153–162

  27. [27]

    and Ryd´ en, T

    Nott, D.J. and Ryd´ en, T. (1999). Pairwise likelihood methods for inference in image models.Biometrika86, 661–676

  28. [28]

    Parner, E.T. (2001). A composite likelihood approach to multivariate survival data.Scan- dinavian Journal of Statistics28, 295–302

  29. [29]

    Pickard, D.K. (1987). Inference for discrete Markov fields: the simplest nontrivial case. Journal of the American Statistical Association82, 90–96

  30. [30]

    (1977).Eugene Onegin[translated by C.H

    Pushkin, A.S. (1977).Eugene Onegin[translated by C.H. Johnston]. Penguin Clas- sics, London. There are various later reprints of essentially the same translation of Pushkin’s 1833 epic

  31. [31]

    and Geys, H

    Renard, D., Molenberghs, G. and Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probit models.Computational Statistics & Data Analysis 44, 649–667

  32. [32]

    and Ikeda, M

    Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks.Journal of the American Statistical Association85, 204–212

  33. [33]

    and Karlin, S

    Taylor, H.M. and Karlin, S. (1984).An Introduction to Stochastic Modeling.Academic

  34. [34]

    and Skare, Ø

    Varin, C., Høst, G. and Skare, Ø. (2005). Pairwise likelihood inference in spatial general- ized linear mixed models.Computational Statistics & Data Analysis, to appear. 34