pith. sign in

arxiv: 2602.06262 · v1 · submitted 2026-02-05 · 📊 stat.ME · stat.AP

Latent variation in pathogen strain-specific effects under multiple-versions-of-treatment theory

Pith reviewed 2026-05-16 06:24 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords pathogen strain heterogeneitymultiple versions of treatmentcausal inferenceinfectious disease epidemiologytransportabilitypotential outcomesstrain-specific effects
0
0 comments X

The pith

Epidemiologic estimates of infection effects on health depend on the frequencies of different pathogen strains in the studied population.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how strain differences complicate the study of infection effects when subtype information is missing. It applies causal theory on multiple versions of treatment to show that standard reported quantities are actually weighted averages of strain-specific effects. These averages only have a clear causal reading conditional on the local mix of strains. Transporting the results to other settings therefore needs extra steps that account for differences in strain frequencies. The work concludes that collecting pathogen subtype data would improve both interpretation and policy use of such studies.

Core claim

In the presence of heterogeneity in strain-specific effects on adverse outcomes and without information on strain composition, the quantities typically reported in epidemiologic studies of infections on health admit a causal interpretation that depends on the population frequencies of the infecting strains. As in other contexts where the treatment-variation-irrelevance assumption might be violated, transportability of these estimates requires additional considerations beyond those needed for non-compound exposures.

What carries the argument

The multiple-versions-of-treatment framework from causal inference, which treats distinct pathogen strains as different versions of the infection exposure whose effects may differ.

If this is right

  • Reported effect sizes represent weighted averages driven by the strain frequencies in the source population.
  • Moving results to a new population requires data on how its strain mix differs from the original one.
  • Studies that omit strain information have limited ability to support causal claims outside their setting.
  • Pathogen subtype data would allow decomposition of the overall effect into strain-specific components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same logic applies to other compound exposures where versions are latent, such as different viral lineages in vaccine studies.
  • Routine strain surveillance could enable more reliable pooling of effect estimates across separate investigations.
  • Models that track how strain frequencies shift over time could forecast changes in apparent infection effects.

Load-bearing premise

The effect of an infection on health does not vary across different strains of the pathogen.

What would settle it

Showing that effect estimates change systematically across populations in line with their documented differences in strain frequencies, or that adjusting for strain composition makes the estimates consistent enough to transport.

read the original abstract

Evidence-informed policy on infections requires estimates of their effects on health. However, pathogenic variation, whereby occurrence of adverse outcomes depends on the infecting strain, might complicate the study of many infectious agents. Here, we consider the interpretation of epidemiologic studies on effects of infections on health when there is heterogeneity in strain-specific effects and information on strain composition is unavailable. We use potential outcomes and causal inference theory for analyses in the presence of multiple versions of treatment to argue that oft-reported quantities in these studies have a causal interpretation that depends on population frequencies of infecting strains. Moreover, as in other contexts where the treatment-variation-irrelevance assumption might be violated, transportability requires additional considerations, beyond those needed for non-compound exposures. This discussion, that considers potential heterogeneity in strain-specific effects, will facilitate interpretation of these studies, and for the reasons mentioned above, also highlights the value of pathogen subtype data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript applies the multiple-versions-of-treatment framework from causal inference to interpret epidemiologic estimates of infection effects on health when pathogen strains differ in their effects but strain identity is unobserved. It claims that commonly reported quantities represent prevalence-weighted mixtures of strain-specific potential outcomes and that transportability requires additional considerations for the strain-frequency distribution beyond standard assumptions for non-compound exposures.

Significance. If the argument holds, the paper supplies a transparent causal reading of many existing infectious-disease studies and supplies a clear rationale for collecting pathogen-subtype data. The reliance on standard potential-outcomes reasoning rather than new parametric assumptions is a strength and links the discussion directly to the broader multiple-versions literature.

major comments (2)
  1. [Abstract and §2] Abstract and §2: the central interpretive claim—that the observed infection effect equals a prevalence-weighted average of strain-specific potential outcomes—is asserted but not accompanied by an explicit derivation or consistency statement. Adding a short display equation (e.g., E[Y|A=1] = ∑_s π_s E[Y(1,s)]) together with the required consistency and positivity conditions would make the logical step load-bearing rather than conceptual.
  2. [§3] §3 (transportability paragraph): the statement that transportability “requires additional considerations” is correct in principle but left at the level of a caveat. A brief illustration showing how the transported quantity changes when the target population’s strain distribution π*_s differs from the source π_s would strengthen the practical implication.
minor comments (2)
  1. [Introduction] The phrase “multiple-versions-of-treatment theory” should be accompanied by a citation to the foundational references (e.g., VanderWeele & Hernán 2013 or related work) on first use.
  2. [§2] Notation for strain-specific potential outcomes is introduced informally; a one-sentence definition of Y(a,s) and the indexing of the version variable s would improve readability for readers outside causal inference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the constructive suggestions, which help clarify the central arguments. We have revised the manuscript to incorporate both major comments, as described below.

read point-by-point responses
  1. Referee: [Abstract and §2] Abstract and §2: the central interpretive claim—that the observed infection effect equals a prevalence-weighted average of strain-specific potential outcomes—is asserted but not accompanied by an explicit derivation or consistency statement. Adding a short display equation (e.g., E[Y|A=1] = ∑_s π_s E[Y(1,s)]) together with the required consistency and positivity conditions would make the logical step load-bearing rather than conceptual.

    Authors: We agree that an explicit derivation strengthens the presentation. In the revised version we have inserted the suggested display equation in §2, together with a concise statement of the consistency assumption (Y = Y(A,S) when A=1 and S=s) and the positivity condition (P(A=1,S=s)>0 for strains s with positive prevalence). This makes the link between the observed data and the prevalence-weighted mixture of potential outcomes fully explicit. revision: yes

  2. Referee: [§3] §3 (transportability paragraph): the statement that transportability “requires additional considerations” is correct in principle but left at the level of a caveat. A brief illustration showing how the transported quantity changes when the target population’s strain distribution π*_s differs from the source π_s would strengthen the practical implication.

    Authors: We appreciate this suggestion. We have added a short numerical illustration in §3 that compares the transported effect under two different target strain distributions π* (one matching the source and one differing in the relative frequency of a high- versus low-virulence strain). The example shows how the transported quantity deviates from the source estimate when π* ≠ π, thereby making the additional transportability requirement concrete. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper applies the standard multiple-versions-of-treatment framework to interpret observed infection effects as prevalence-weighted mixtures of strain-specific potential outcomes. This follows directly from the consistency and positivity conditions once strains are treated as versions of a compound exposure, without any reduction to fitted parameters, self-definitional equations, or load-bearing self-citations. The transportability remark is the usual caveat that mixture weights must also be transported and introduces no circular step.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the standard potential outcomes framework and the multiple-versions-of-treatment extension; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)
  • standard math Potential outcomes framework applies to compound treatments with latent versions
    Invoked to assign causal meaning to observed infection effects when strain is unobserved.
  • domain assumption Treatment-variation-irrelevance assumption is violated by strain heterogeneity
    Central premise used to argue that transportability needs extra considerations.

pith-pipeline@v0.9.0 · 5451 in / 1271 out tokens · 27737 ms · 2026-05-16T06:24:18.483763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    Postacute Sequelae of SARS-CoV-2 Infection in the Pre-Delta, Delta, and Omicron Eras

    Xie Y , Choi T, Al-Aly Z. Postacute Sequelae of SARS-CoV-2 Infection in the Pre-Delta, Delta, and Omicron Eras. N Engl J Med. 2024;391(6):515-25

  2. [2]

    Bacterial Load and Molecular Markers Associated With Early -onset Group B Streptococcus: A Systematic Review and Meta-analysis

    Seedat F, Brown CS, Stinton C, Patterson J, Geppert J, Freeman K, et al. Bacterial Load and Molecular Markers Associated With Early -onset Group B Streptococcus: A Systematic Review and Meta-analysis. Pediatr Infect Dis J. 2018;37(12):e306-e14

  3. [3]

    Parasite virulence and disease patterns in Plasmodium falciparum malaria

    Gupta S, Hill A V , Kwiatkowski D, Greenwood AM, Greenwood BM, Day KP. Parasite virulence and disease patterns in Plasmodium falciparum malaria. Proc Natl Acad Sci U S A. 1994;91(9):3715-9

  4. [4]

    Causal Inference Under Multiple Versions of Treatment

    VanderWeele TJ, Hernan MA. Causal Inference Under Multiple Versions of Treatment. J Causal Inference. 2013;1(1):1-20

  5. [5]

    Mediation analysis with multiple versions of the mediator

    Vanderweele TJ. Mediation analysis with multiple versions of the mediator. Epidemiology. 2012;23(3):454-63

  6. [6]

    Compound treatments and transportability of causal inference

    Hernan MA, VanderWeele TJ. Compound treatments and transportability of causal inference. Epidemiology. 2011;22(3):368-77

  7. [7]

    Sieve analysis: methods for assessing from vaccine trial data how vaccine efficacy varies with genotypic and phenotypic pathogen variation

    Gilbert P, Self S, Rao M, Naficy A, Clemens J. Sieve analysis: methods for assessing from vaccine trial data how vaccine efficacy varies with genotypic and phenotypic pathogen variation. J Clin Epidemiol. 2001;54(1):68-85

  8. [8]

    Variant specific treatment effects with applications in vaccine studies

    Perenyi G, Stensrud M. Variant specific treatment effects with applications in vaccine studies. Biometrics. 2025;81(2)

  9. [9]

    Statistical methods for assessing differential vaccine protection against human immunodeficiency virus types

    Gilbert PB, Self SG, Ashby MA. Statistical methods for assessing differential vaccine protection against human immunodeficiency virus types. Biometrics. 1998;54(3):799-814

  10. [10]

    Causal Inference: What If

    Hernán MA, Robins JM. Causal Inference: What If. Version 07/10/2025 ed: Boca Raton: Chapman & Hall/CRC; 2020

  11. [11]

    Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches

    Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annu Rev Public Health. 2000;21:121-45

  12. [12]

    Constructed Measures and Causal Inference: Towards a New Model of Measurement for Psychosocial Constructs

    VanderWeele TJ. Constructed Measures and Causal Inference: Towards a New Model of Measurement for Psychosocial Constructs. Epidemiology. 2022;33(1):141-51

  13. [13]

    Risk of hospitalisation associated with infection with SARS -CoV-2 omicron variant versus delta variant in Denmark: an observational cohort study

    Bager P, Wohlfahrt J, Bhatt S, Stegger M, Legarth R, Moller CH, et al. Risk of hospitalisation associated with infection with SARS -CoV-2 omicron variant versus delta variant in Denmark: an observational cohort study. Lancet Infect Dis. 2022;22(7):967-76