pith. sign in

arxiv: 2605.21283 · v1 · pith:3OF3OXPEnew · submitted 2026-05-20 · 📊 stat.ME · stat.AP

A continuous-time Markov chain framework for population size estimation from multi-list data: accounting for absorbing lists and asymmetric interactions

Pith reviewed 2026-05-21 04:09 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords population size estimationmulti-list datacontinuous-time Markov chaincapture-recaptureabsorbing listsasymmetric interactionslog-linear models
0
0 comments X

The pith

A continuous-time Markov chain estimates population size from multi-list data while modeling absorbing lists and directional interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework using continuous-time Markov chains to estimate the size of a population from data collected across multiple lists. This models how individuals transition between list memberships over time and handles cases where some lists are absorbing, such as death records that cannot be exited. A sympathetic reader would care because standard methods can bias estimates when lists have order or absorption, as shown in simulations for health and drug data. The new model matches log-linear approaches under independence but better accounts for realistic dependencies and absorbing states in applications like stroke records.

Core claim

The central claim is that representing list membership as states in a continuous-time Markov chain allows estimation of population size while accommodating directional interactions between lists and absorbing lists such as death records, with the model reducing to the log-linear model under independence.

What carries the argument

The continuous-time Markov chain whose states encode combinations of list memberships and whose transition rates capture interactions and absorption processes.

If this is right

  • When lists are independent, the Markov chain model produces the same population size estimates as the standard log-linear model.
  • Failure to account for an absorbing list leads to biased population size estimates in simulations.
  • The framework can be applied to ordered lists, such as in drug use data where sequence matters.
  • The approach yields usable estimates for epidemiological populations containing a death record list.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-transition structure could be used to incorporate time-varying list interactions in future data collection designs.
  • Covariates could be added to the transition rates to estimate subpopulation sizes without changing the core model.
  • The method might be tested against known population sizes in ecology or census undercount settings with sequential sampling.

Load-bearing premise

The process by which individuals appear on the various lists follows the transition structure of the chosen continuous-time Markov chain.

What would settle it

Applying the model to data generated from a non-Markov process and comparing the resulting population size estimate against a known true value would reveal systematic bias if the assumption does not hold.

Figures

Figures reproduced from arXiv: 2605.21283 by Andrew Titman, Oph\'elie Schaller, Rachel McCrea.

Figure 1
Figure 1. Figure 1: Fitting the continuous-time Markov model to multi-list data As we are interested in data which does not contain any information on intermediate counts at a time t ∈]t0, tf [, we only want to model the probabilities ptf ,I for I ∈ L of being in a state at the end of the recording period. Let P = (pI,J : I, J ∈ L) be the solution of the differential matrix equation P ′ (t) = P(t)Q, P(0) = Ik×k. (1) 4 [PITH_… view at source ↗
Figure 1
Figure 1. Figure 1: Graph of transition between states in L = {L1, L2, L3}. The solution of Equation 1 is given by the exponential matrix e tQ (see e.g. Norris 1997). Recall that if V is the matrix of eigenvectors of Q and D the diagonal matrix of eigenvalues of Q such that Q = V DV −1 , one has that e tQ = V etDV −1 (see e.g. Norris 1997). By properties of continuous-time Markov chains (see e.g. Norris 1997) and as we assume… view at source ↗
Figure 2
Figure 2. Figure 2: L1 L1,L2 ∅ L2 L1, L3 L1,L2, L3 L3 L2, L3 λ2µ12 λ3µ13 λ1 λ2 λ3 λ2µ12µ23µ123 λ1µ13 λ2µ23 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Graph of transition between states in L = {L1, L2, L3} ordered lists L2 and L3. Similarly as for the log-linear case or as for the standard continuous-time Markov chain framework for MSE, the total number of parameters exceed by one the number of data points (i.e. the number of non-empty states). Indeed, the total number of states in Li→j is equal to 2k − 2 k−2 , which is equal to the total number of non-r… view at source ↗
Figure 4
Figure 4. Figure 4: Estimates of the log of the total population size [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

We introduce a continuous-time Markov chain framework for estimating population size from multi-list data, which allows directional interactions to be modelled and can accommodate absorbing lists, such as death records, or more general data collection processes. The standard model of the continuous-time Markov chain framework and the log-linear model for multi-list data are equivalent when lists are independent and we show empirically that they give similar results in the presence of dependencies between lists. Through a simulation study, we highlight the need to account for an absorbing list by using the Markov model or the log-linear model with forced absorbing interactions, observing biased estimates of the population size otherwise. We motivate our approach with an epidemiological dataset concerning individuals suffering from a first ever stroke in North-West England, in which one of the lists is a death record. We illustrate a further use of our approach by considering a case of ordered lists on drug use data from the City of London.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper introduces a continuous-time Markov chain (CTMC) framework for population size estimation from multi-list data. It models each individual's list-inclusion history as a CTMC whose states encode observed list combinations, with transition rates that can be asymmetric and can include absorbing states (e.g., death records). The framework is shown to be equivalent to the standard log-linear model under list independence and to produce comparable numerical results under modest dependence. A simulation study demonstrates bias in population-size estimates when an absorbing list is ignored, and the method is applied to a stroke epidemiology dataset (with a death-record list) and to ordered-list drug-use data from London.

Significance. If the CTMC representation is appropriate for the data-generating process, the framework supplies a coherent way to incorporate directional interactions and absorbing lists into capture-recapture estimation. This is practically relevant for epidemiological applications where one list records a terminal event. The reported equivalence under independence and the empirical similarity under dependence provide a useful bridge to existing log-linear methods, while the simulation and two real-data illustrations demonstrate concrete utility.

major comments (1)
  1. [Simulation study] Simulation study: the reported experiments demonstrate bias only when an absorbing list is omitted while the CTMC model is still assumed correct. They do not examine robustness to violations of the Markov assumption itself (non-exponential waiting times, time-varying rates, or heterogeneity outside the finite state space). Because the population-size estimator is obtained by fitting the CTMC rates to overlap counts and then computing the probability of the unobserved state, such misspecification would directly bias the estimator; this robustness check is therefore load-bearing for the central claim.
minor comments (3)
  1. [Abstract] Abstract: the phrase 'the standard model of the continuous-time Markov chain framework' is ambiguous; clarify whether the equivalence result is a formal identity or an empirical observation.
  2. [Applications] Applications section: supply explicit transition-rate matrices or state diagrams for the ordered-list drug-use example so that the directional-interaction modeling can be verified.
  3. Notation: ensure that the symbols for transition intensities are defined once and used consistently when moving from the general CTMC construction to the likelihood and to the population-size formula.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and describe the changes we will make to strengthen the simulation study and discussion of assumptions.

read point-by-point responses
  1. Referee: [Simulation study] Simulation study: the reported experiments demonstrate bias only when an absorbing list is omitted while the CTMC model is still assumed correct. They do not examine robustness to violations of the Markov assumption itself (non-exponential waiting times, time-varying rates, or heterogeneity outside the finite state space). Because the population-size estimator is obtained by fitting the CTMC rates to overlap counts and then computing the probability of the unobserved state, such misspecification would directly bias the estimator; this robustness check is therefore load-bearing for the central claim.

    Authors: We agree that the current simulation study is limited to cases where the CTMC model is correctly specified and primarily demonstrates the bias that arises from failing to account for an absorbing list. We did not investigate robustness to direct violations of the Markov assumption (e.g., non-exponential waiting times, time-varying rates, or additional heterogeneity). In the revised manuscript we will add a new set of simulation scenarios that generate data from processes violating the Markov property and report the resulting bias and coverage of the CTMC-based estimator. We will also insert a dedicated paragraph in the discussion section that explicitly states the Markov assumption, contrasts it with the log-linear equivalence result under independence, and notes the conditions under which misspecification is likely to be consequential. These additions will make the robustness limitations transparent while preserving the paper’s focus on absorbing lists and directional interactions. revision: yes

Circularity Check

0 steps flagged

No significant circularity: framework derives population-size estimator from explicit CTMC likelihood on observed overlaps.

full rationale

The paper defines a CTMC state space whose transitions encode list-inclusion histories (including absorbing states), writes the likelihood directly from the observed multi-list counts, and recovers the unobserved-state probability as the population-size estimator. Equivalence to the log-linear model under independence is shown by matching the implied overlap probabilities; numerical similarity under dependence is demonstrated on simulated and real data. No equation reduces a fitted parameter to a renamed prediction, no uniqueness theorem is imported from self-citation, and the simulation study tests the model under its own assumptions rather than re-expressing fitted values. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no explicit free parameters, axioms, or invented entities are stated; the framework appears to rest on standard Markov chain assumptions and the usual capture-recapture identifiability conditions.

pith-pipeline@v0.9.0 · 5696 in / 1168 out tokens · 37884 ms · 2026-05-21T04:09:37.186078+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    journal here , year =

    test , title =. journal here , year =

  2. [2]

    Norris, J. R. , biburl_bis =

  3. [3]

    Sandland, R. L. and Cormack, R. M. , title =. Biometrika , year =. doi:10.1093/biomet/71.1.27 , issn =

  4. [4]

    R. M. Cormack and P. E. Jupp , journal =. Inference for Poisson and Multinomial Models for Capture-Recapture Experiments , date =

  5. [5]

    American Journal of Public Health , volume=

    Estimated prevalence of opioid use disorder in Massachusetts, 2011--2015: a capture--recapture analysis , author=. American Journal of Public Health , volume=. 2018 , publisher=

  6. [6]

    and Hickman, Matthew and Welton, Nicky J

    Jones, Hayley E. and Hickman, Matthew and Welton, Nicky J. and De Angelis, Daniela and Harris, Ross J. and Ades, A. E. , title =. American Journal of Epidemiology , volume =. 2014 , month =. doi:10.1093/aje/kwu056 , url =

  7. [7]

    Annual review of statistics and its application , volume=

    Multiple systems estimation (or capture-recapture estimation) to inform public policy , author=. Annual review of statistics and its application , volume=. 2018 , publisher=

  8. [8]

    Journal of the Royal Statistical Society Series A: Statistics in Society , volume =

    Djennad, Abdelmajid and Harris, Ross J and Presanis, Anne M and Jahr, Stefan and Charlett, Andre and De Angelis, Daniela , title =. Journal of the Royal Statistical Society Series A: Statistics in Society , volume =. 2024 , month =. doi:10.1093/jrsssa/qnae114 , url_bis =

  9. [9]

    and Overstall, Antony M

    King, Ruth and Bird, Sheila M. and Overstall, Antony M. and Hay, Gordon and Hutchinson, Sharon J. , title =. Journal of the Royal Statistical Society: Series A (Statistics in Society) , volume =. doi:https://doi.org/10.1111/rssa.12011 , url_bis =

  10. [10]

    and Regal, Ronald R

    Hook, Ernest B. and Regal, Ronald R. , title =. Epidemiologic Reviews , volume =. 1995 , month =. doi:10.1093/oxfordjournals.epirev.a036192 , url_bis =

  11. [11]

    Significance , volume=

    Modern slavery in the UK: How many victims? , author=. Significance , volume=. 2015 , publisher=

  12. [12]

    Chance , volume=

    Statistics and Slobodan: using data analysis and statistics in the war crimes trial of former president Milosevic , author=. Chance , volume=. 2002 , publisher=

  13. [13]

    2025 , journal =

    Estimating the scale of hospital admissions for people experiencing homelessness in England: a population-based multiple systems estimation study using national Hospital Episodes Statistics , author =. 2025 , journal =

  14. [14]

    Wickens , title =

    Mary-Lynn Brecht and Thomas D. Wickens , title =. Journal of Drug Issues , volume =. 1993 , doi =

  15. [15]

    Jolly, G. M. , title =. Biometrika , volume =. 1965 , url_bis =

  16. [16]

    G. A. F. Seber , journal =. A Note on the Multiple-Recapture Census , url_bisdate =

  17. [17]

    R. M. Cormack , journal =. Log-Linear Models for Capture-Recapture , url_bisdate =

  18. [18]

    Estimating abundance from multiple sampling capture-recapture data via a multi-state multi-period stopover model , volume =

    Worthington , H and McCrea , R and King , R and Griffiths , R , journal =. Estimating abundance from multiple sampling capture-recapture data via a multi-state multi-period stopover model , volume =

  19. [19]

    Wildlife monographs , number=

    Statistical inference from capture data on closed animal populations , author=. Wildlife monographs , number=. 1978 , publisher=

  20. [20]

    , title =

    King, Ruth and McCrea, Rachel S. , title =. Handbook of Statistics , volume =. 2019 , doi =

  21. [21]

    , title =

    Fienberg, Stephen E. , title =. Biometrika , volume =. 1972 , month = dec, doi =

  22. [22]

    Crime & Delinquency , volume =

    Hannah Worthington and Rachel McCrea and Ruth King and Kyle Shane Vincent , title =. Crime & Delinquency , volume =. 2021 , doi =

  23. [23]

    and Regal, Ronald R

    Hook, Ernest B. and Regal, Ronald R. , title =. American Journal of Epidemiology , volume =. 2000 , month =. doi:10.1093/aje/152.8.771 , url_bis =

  24. [24]

    1997 , doi =

    Du, X and Sourbutts, J and Cruickshank, K and Summers, A and Roberts, N and Walton, E and Holmes, S , title =. 1997 , doi =

  25. [25]

    Journal of Human Trafficking , pages =

    How Many Trafficked People Are There in Greater New Orleans? Lessons in Measurement , abstract =. Journal of Human Trafficking , pages =. 2024 , author =. doi:10.1080/23322705.2019.1634936 , eissn =

  26. [26]

    and Morgan, Byron J

    McCrea, Rachel S. and Morgan, Byron J. T. and Gimenez, Olivier , title =. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =. doi:https://doi.org/10.1111/rssc.12197 , url =. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssc.12197 , abstract =

  27. [27]

    Journal of Agricultural, Biological and Environmental Statistics , volume=

    A test of positive association for detecting heterogeneity in capture for capture--recapture data , author=. Journal of Agricultural, Biological and Environmental Statistics , volume=. 2018 , publisher=

  28. [28]

    Pollock and James E

    Kenneth H. Pollock and James E. Hines and James D. Nichols , journal =. Goodness-of-Fit Tests for Open Capture-Recapture Models , urldate =

  29. [29]

    2002 , publisher=

    Model selection and multimodel inference: a practical information-theoretic approach , author=. 2002 , publisher=

  30. [30]

    , howpublished =

    n.d. , howpublished =