Sequential learning theory for Markov genealogy processes
Pith reviewed 2026-05-15 14:09 UTC · model grok-4.3
The pith
For absorbing phylodynamic estimands, sequence data alone leaves an irreducible gap between what an analyst can learn and what an oracle knowing latent absorption status can guarantee.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By ordering observed tips and running sequential Bayesian analysis on the resulting filtration, the expected variance reduction decomposes into learning, mismatch, and covariance terms. Estimands fall into learning classes according to the pathwise behaviour of the mismatch term. For absorbing estimands an oracle that knows the latent absorption status obtains event-wise learning guarantees unavailable to the analyst, and the gap between them is irreducible under assumptions that hold for many real phylodynamic estimands.
What carries the argument
Filtration on a natural ordering of tips together with decomposition of expected variance reduction into learning, mismatch, and covariance components.
If this is right
- Adding taxa improves estimates at different rates depending on which learning class an estimand belongs to.
- For absorbing estimands the mismatch component prevents full learning even when every tip is included.
- The covariance term can sometimes offset mismatch losses but does not remove the oracle-analyst gap.
- Sequence data alone therefore cannot recover certain features of the latent genealogy under the stated assumptions.
Where Pith is reading between the lines
- Parameters that fall outside the absorbing class may still be learnable to arbitrary precision from sequences.
- The framework suggests designing data-collection strategies that target non-absorbing estimands when possible.
- Similar filtration arguments could apply to sequential inference problems outside phylodynamics that involve hidden states.
- Practical work could test whether real datasets exhibit the predicted mismatch behaviour for common absorbing targets.
Load-bearing premise
The claim that the oracle-analyst information gap stays irreducible for many real phylodynamic estimands under the Markov genealogy model.
What would settle it
A concrete absorbing estimand and dataset in which the analyst, using only sequences, achieves the same per-event learning rate as the oracle would falsify the claimed gap.
read the original abstract
We introduce a filtration-based framework for studying when and why adding taxa improves phylodynamic inference, by constructing a natural ordering of observed tips and applying sequential Bayesian analysis to the resulting filtration. We decompose the expected variance reduction on taxa addition into learning, mismatch, and covariance components, classify estimands into learning classes based on the pathwise behaviour of the mismatch, and show that for absorbing estimands an oracle who knows the latent absorption status obtains event-wise learning guarantees unavailable to the analyst. The gap between oracle and analyst is irreducible assumptions that are likely to hold for many real phylodynamic estimands, establishing a fundamental limit on what sequence data alone can reveal about the latent genealogy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a filtration-based framework for studying when and why adding taxa improves phylodynamic inference in Markov genealogy processes. It constructs a natural ordering of observed tips, applies sequential Bayesian analysis to the resulting filtration, decomposes the expected variance reduction on taxa addition into learning, mismatch, and covariance components, classifies estimands into learning classes based on the pathwise behaviour of the mismatch, and shows that for absorbing estimands an oracle who knows the latent absorption status obtains event-wise learning guarantees unavailable to the analyst. This establishes an irreducible gap under assumptions likely to hold for many real phylodynamic estimands, highlighting a fundamental limit on what sequence data alone can reveal about the latent genealogy.
Significance. If the decomposition and irreducibility results hold rigorously, the work offers a theoretical foundation for quantifying information gains from additional taxa in phylodynamics and identifying inherent limits for absorbing estimands. This could inform sampling strategies and model interpretation by distinguishing oracle-level knowledge from sequence-data constraints, with potential to guide methodological advances in the field.
major comments (3)
- [Section on variance decomposition] The variance-reduction decomposition into learning, mismatch, and covariance components (detailed after the filtration construction) is load-bearing for the estimand classification. The manuscript must explicitly show that the covariance term cannot offset the mismatch for the analyst in the pathwise sense for absorbing estimands; without this, the claimed event-wise learning guarantees unavailable to the analyst do not follow.
- [Section on classification of estimands and oracle-analyst comparison] The assertion that the oracle-analyst gap is irreducible for absorbing estimands rests on the natural ordering of tips inducing a filtration under which the mismatch component is strictly positive and pathwise mismatch behavior cannot be compensated. The paper should provide a concrete verification or bound confirming this holds for Markov genealogy processes satisfying the 'likely to hold' assumptions on real phylodynamic estimands, as this is the least secure link in the central claim.
- [Section on absorbing estimands] The demonstration that the analyst cannot achieve the oracle's event-wise learning guarantees requires showing that the filtration construction (based on the natural ordering) prevents access to the latent absorption status in a way that the mismatch remains uncompensated. A step-by-step argument or counterexample check for cases where the ordering is non-canonical would strengthen the fundamental-limit conclusion.
minor comments (2)
- [Abstract] The abstract contains a grammatical issue in the sentence describing the irreducible gap ('is irreducible assumptions that are likely to hold'); this should be rephrased for clarity.
- [Introduction and methods] Notation for the filtration and sequential analysis should be introduced with explicit definitions early in the manuscript to ensure consistency when discussing the decomposition components.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight areas where the presentation of the variance decomposition and the oracle-analyst gap can be strengthened with additional explicit arguments. We address each major comment below and will incorporate the requested clarifications and lemmas in the revised manuscript.
read point-by-point responses
-
Referee: [Section on variance decomposition] The variance-reduction decomposition into learning, mismatch, and covariance components (detailed after the filtration construction) is load-bearing for the estimand classification. The manuscript must explicitly show that the covariance term cannot offset the mismatch for the analyst in the pathwise sense for absorbing estimands; without this, the claimed event-wise learning guarantees unavailable to the analyst do not follow.
Authors: We agree that an explicit pathwise argument is required. In the revised manuscript we will insert a new lemma immediately after the decomposition theorem. The lemma uses the Markov property of the genealogy process to show that, conditional on the analyst's filtration, the covariance term between the learning and mismatch increments is identically zero for absorbing estimands. Consequently the mismatch term cannot be offset pathwise, which directly yields the event-wise learning guarantees available only to the oracle. revision: yes
-
Referee: [Section on classification of estimands and oracle-analyst comparison] The assertion that the oracle-analyst gap is irreducible for absorbing estimands rests on the natural ordering of tips inducing a filtration under which the mismatch component is strictly positive and pathwise mismatch behavior cannot be compensated. The paper should provide a concrete verification or bound confirming this holds for Markov genealogy processes satisfying the 'likely to hold' assumptions on real phylodynamic estimands, as this is the least secure link in the central claim.
Authors: We will add a proposition in the classification section that supplies an explicit lower bound on the expected mismatch term. The bound is obtained by integrating the positive probability that an absorption event occurs between successive tip additions but remains non-measurable with respect to the analyst's sigma-algebra, under the standard assumptions of finite state space, positive absorption rates, and irreducibility that are typical for phylodynamic models. This establishes strict positivity of the mismatch and hence irreducibility of the oracle-analyst gap. revision: yes
-
Referee: [Section on absorbing estimands] The demonstration that the analyst cannot achieve the oracle's event-wise learning guarantees requires showing that the filtration construction (based on the natural ordering) prevents access to the latent absorption status in a way that the mismatch remains uncompensated. A step-by-step argument or counterexample check for cases where the ordering is non-canonical would strengthen the fundamental-limit conclusion.
Authors: We will expand the absorbing-estimands section with a detailed step-by-step argument showing that the natural ordering filtration is generated only by the observed tip labels and branch lengths, which do not render the latent absorption time measurable. We will also include a short counterexample for a non-canonical ordering in a two-state absorbing chain, demonstrating that the mismatch term remains strictly positive and the gap persists. These additions will make the fundamental-limit claim fully rigorous. revision: yes
Circularity Check
No significant circularity; derivation from filtrations and sequential analysis is self-contained
full rationale
The paper constructs a natural ordering of tips to induce a filtration, then applies sequential Bayesian analysis to decompose expected variance reduction into learning, mismatch, and covariance components. Classification of estimands by pathwise mismatch behavior and the oracle-analyst gap for absorbing estimands are defined directly from these components and the Markov process assumptions. No step reduces a claimed result to a fitted input, self-definition, or self-citation chain by construction; the central claims remain independent of the target quantities.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Genealogy processes are Markovian
- domain assumption A natural ordering of observed tips exists that induces a useful filtration
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.