Stable and practical semi-Markov modelling of intermittently-observed data
Pith reviewed 2026-05-18 20:30 UTC · model grok-4.3
The pith
A phase-type distribution approximation allows semi-Markov models to handle intermittent observations for any state structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that by restricting the phase-type family to moment-matched approximations of Gamma or Weibull distributions, a semi-Markov model can be expressed as a hidden Markov model. This allows the likelihood for intermittently observed multi-state data to be calculated easily for general state structures, and the model becomes stable and identifiable while still capturing time-dependent sojourns.
What carries the argument
Moment-matching phase-type distribution for state sojourn times, converting semi-Markov to hidden Markov model for likelihood computation.
If this is right
- General multi-state structures become feasible without custom restrictions.
- Bayesian and maximum likelihood estimation are both supported in the new software.
- Applications like modeling cognitive function decline can use time-dependent transitions.
- Simulation-based calibration validates the method's performance.
Where Pith is reading between the lines
- Similar moment-matching could apply to other distributions or observation types in survival analysis.
- Future work might compare approximation error to exact methods in small state spaces.
- The software could integrate with existing multi-state modeling tools for broader adoption.
Load-bearing premise
The moment-matching approximation to Gamma or Weibull is sufficiently accurate to maintain the semi-Markov behavior and model stability without significant loss of fidelity.
What would settle it
Generate data from an exact semi-Markov model with known Gamma sojourns under intermittent observation, fit the phase-type approximated model, and check if recovered parameters match the true values within expected error; large discrepancies would falsify the practicality claim.
read the original abstract
Multi-state models are commonly used for intermittent observations of a state over time, but these are generally based on the Markov assumption, that transition rates are independent of the time spent in current and previous states. In a semi-Markov model, the rates can depend on the time spent in the current state, though available methods for this are either restricted to specific state structures or lack general software. This paper develops the approach of using a "phase-type" distribution for the sojourn time in a state, which expresses a semi-Markov model as a hidden Markov model, allowing the likelihood to be calculated easily for any state structure. While this approach involves a proliferation of latent parameters, identifiability can be improved by restricting the phase-type family to one which approximates a simpler distribution such as the Gamma or Weibull. This paper proposes a moment-matching method to obtain this approximation, making general semi-Markov models for intermittent data accessible in software for the first time. The method is implemented in a new R package, "msmbayes", which implements Bayesian or maximum likelihood estimation for multi-state models with general state structures and covariates. The software is tested using simulation-based calibration, and an application to cognitive function decline illustrates the use of the method in a typical modelling workflow.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a method for semi-Markov modeling of intermittently observed multi-state data. It uses phase-type distributions for state sojourn times, approximated by moment-matching to Gamma or Weibull distributions, to represent the semi-Markov process as a hidden Markov model. This facilitates likelihood computation for general state structures. The approach is implemented in the R package msmbayes for Bayesian or maximum likelihood estimation, tested via simulation-based calibration, and applied to data on cognitive function decline.
Significance. Should the moment-matching approximation prove robust for preserving essential semi-Markov dynamics under intermittent observation, the paper would provide a valuable practical tool for fitting flexible semi-Markov models where existing methods are restrictive or lack software support. The open-source implementation in msmbayes and the simulation-based calibration for validation are strengths that support reproducibility and usability in the field.
major comments (2)
- [Section 3] The moment-matching approximation to Gamma or Weibull distributions is central to improving identifiability, but the manuscript does not provide a quantitative bound on the approximation error for the tail of the sojourn time distribution. This is particularly relevant for intermittent observations where inter-observation times can be long, potentially leading to inaccurate integrated hazards in the likelihood.
- [Likelihood derivation] The assumption that matching the first two moments suffices to control transition probabilities over arbitrary intervals is not supported by error analysis. For state graphs with cycles or competing exits, the phase-type restriction may distort the time-inhomogeneous behavior, affecting the stability and identifiability claims. Reporting the condition number of the observed-data information matrix or bias in simulated likelihoods would be necessary.
minor comments (1)
- [Abstract] The abstract could benefit from a brief mention of the typical number of phases used in the phase-type distributions or how the number is selected.
Simulated Author's Rebuttal
We thank the referee for their positive summary and constructive major comments, which highlight important aspects of the moment-matching approximation and its implications for the likelihood. We address each point below and have planned revisions to strengthen the manuscript's rigor while preserving its focus on practical implementation.
read point-by-point responses
-
Referee: [Section 3] The moment-matching approximation to Gamma or Weibull distributions is central to improving identifiability, but the manuscript does not provide a quantitative bound on the approximation error for the tail of the sojourn time distribution. This is particularly relevant for intermittent observations where inter-observation times can be long, potentially leading to inaccurate integrated hazards in the likelihood.
Authors: We agree that a quantitative assessment of tail approximation error would enhance the manuscript, particularly for long inter-observation intervals. The current validation relies on simulation-based calibration showing overall good performance, but we acknowledge the absence of explicit bounds. In the revised manuscript, we will add numerical comparisons in Section 3 of the relative error in the survival function (and thus integrated hazards) between the phase-type approximation and target Gamma/Weibull distributions across a grid of time intervals up to 10 times the mean sojourn time, reporting maximum relative errors for representative parameter values. revision: yes
-
Referee: [Likelihood derivation] The assumption that matching the first two moments suffices to control transition probabilities over arbitrary intervals is not supported by error analysis. For state graphs with cycles or competing exits, the phase-type restriction may distort the time-inhomogeneous behavior, affecting the stability and identifiability claims. Reporting the condition number of the observed-data information matrix or bias in simulated likelihoods would be necessary.
Authors: We appreciate the call for explicit error analysis in the context of the full multi-state likelihood. Moment matching approximates the marginal sojourn distribution, after which the phase-type representation yields an exact likelihood for the approximating model. We agree this does not automatically guarantee control of transition probabilities in graphs with cycles or competing exits. In revision, we will expand the simulation studies to include bias and coverage for parameter estimates in cyclic and competing-risk structures, and report condition numbers of the observed information matrix for the simulated datasets where computation is feasible. A full theoretical error bound on the likelihood for arbitrary intervals lies beyond the paper's scope but will be noted as a limitation with the strengthened empirical results. revision: partial
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper introduces a phase-type distribution representation to recast semi-Markov sojourn times as a hidden Markov model, enabling standard likelihood evaluation for arbitrary state structures under intermittent observation. The moment-matching restriction to approximate Gamma or Weibull distributions is presented as an explicit modeling choice to control parameter proliferation and improve identifiability, rather than a fitted quantity redefined as a prediction or a self-referential definition. No load-bearing equation or step reduces the claimed result to its own inputs by construction, and the abstract frames the contribution as a new computational device with software implementation and simulation-based calibration. This is the most common honest non-finding for a methods paper that builds on established HMM likelihood machinery without circular self-citation chains or ansatz smuggling.
Axiom & Free-Parameter Ledger
free parameters (1)
- phase-type parameters
axioms (2)
- domain assumption Phase-type distributions can express semi-Markov sojourn times as hidden Markov models whose likelihood is tractable for arbitrary state structures.
- domain assumption Moment-matching produces a phase-type approximation close enough to Gamma or Weibull to improve identifiability without distorting the semi-Markov dynamics.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This paper proposes a moment-matching method to obtain this approximation... phase-type distribution of a particular family whose first three moments agree with those of the Gamma or Weibull.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A multi-state model with a phase-type sojourn distribution is an example of a hidden Markov model... likelihood can therefore be evaluated easily using the forward algorithm.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.