pith. sign in

arxiv: 2502.20076 · v3 · submitted 2025-02-27 · 🧬 q-bio.QM

Purported quantitative support for multiple introductions of SARS-CoV-2 into humans is an artefact of an imbalanced hypothesis testing framework

Pith reviewed 2026-05-23 02:42 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords SARS-CoV-2multiple introductionshypothesis testingphylodynamic inferenceepidemic modelsmodel comparisontesting imbalancequantitative support
0
0 comments X

The pith

The reported support for two introductions of SARS-CoV-2 into humans is an artifact of applying stricter test conditions to the single-introduction model than to the two-introduction model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A prior calculation that combined phylodynamic inferences with epidemic models produced apparent quantitative backing for two separate introductions of SARS-CoV-2 into humans. The paper identifies an imbalance in the hypothesis testing setup: the single-introduction model was evaluated under more demanding conditions than the two-introduction model. When the two-introduction model is re-tested under those same stricter conditions, the reported support disappears. This matters because it shows that the claimed quantitative preference for multiple introductions depends on an inconsistent comparison rather than on the data itself. A reader who accepts the finding would conclude that balanced testing is required before treating the earlier result as evidence of multiple zoonotic events.

Core claim

The paper establishes that the quantitative support for two introductions of SARS-CoV-2 is produced by an imbalanced hypothesis testing framework in which the single-introduction model was tested against more stringent conditions than the two-introduction model, and that equalizing those conditions causes the support to vanish.

What carries the argument

An imbalanced hypothesis testing framework that subjects the single-introduction model to stricter conditions than the two-introduction model when both are evaluated with the same phylodynamic and epidemic-model combination.

If this is right

  • Equal application of test conditions removes the quantitative preference previously reported for the two-introduction model.
  • Any comparison of introduction hypotheses must apply identical stringency levels to avoid producing artifactual differences.
  • The integration of phylodynamic inferences with epidemic models requires explicit balancing of evaluation conditions to yield interpretable model support.
  • Re-examination of other multi-introduction claims that rely on the same combined modeling approach may be needed once conditions are equalized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result implies that apparent evidence for multiple early introductions could arise from methodological choices rather than from the underlying sequence data.
  • Studies that combine phylodynamic and epidemic models should pre-define and apply equivalent testing conditions to all hypotheses under comparison.
  • The finding opens the possibility that similar imbalances have affected other early-pandemic origin analyses that used unequal model scrutiny.

Load-bearing premise

That the more stringent conditions used for the single-introduction model constitute the appropriate and comparable standard that should also be applied to the two-introduction model.

What would settle it

A direct recalculation of the two-introduction model likelihood or support value under the exact conditions previously applied only to the single-introduction model, confirming whether the support metric remains elevated or falls to neutral.

Figures

Figures reproduced from arXiv: 2502.20076 by Angus McCowan.

Figure 1
Figure 1. Figure 1: Two-introduction likelihoods for introduction timings [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bayes factors for introduction timings (tx, ty) Notably, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

A prominent report claimed substantial support for two introductions of SARS-CoV-2 into humans using a calculation that combined phylodynamic inferences and epidemic models. Inspection of the calculation identifies an imbalance in the hypothesis testing framework that confounds this result; the single-introduction model was tested against more stringent conditions than the two-introduction model. Here, I show that when the two-introduction model is tested against the same conditions, the support disappears.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript argues that a prominent report's claimed quantitative support for two introductions of SARS-CoV-2 into humans results from an imbalance in the hypothesis testing framework: the single-introduction model was evaluated under more stringent conditions than the two-introduction model. The author re-tests the two-introduction model under the same conditions used for the single-introduction case and reports that the support for multiple introductions disappears.

Significance. If the re-test is correctly implemented and the conditions are shown to be comparable, the result would illustrate how unequal hypothesis-testing stringency can produce artifactual support for one model over another in combined phylodynamic-epidemic analyses. This would be a useful cautionary demonstration for the field, provided the manuscript supplies the explicit calculations, data, and justification for equivalence that are absent from the abstract.

major comments (1)
  1. [Abstract] Abstract (and any methods section): the central claim that support 'disappears' when the two-introduction model is tested under the single-introduction conditions rests on an unexamined equivalence assumption. The manuscript does not demonstrate why the more stringent conditions (whatever their precise definition) are the appropriate benchmark for the two-introduction model rather than arising from model-specific requirements such as prior volume or parameter-space differences. Without this justification or an explicit side-by-side comparison of the original versus re-tested calculations, the disappearance of support cannot be evaluated as load-bearing evidence.
minor comments (1)
  1. [Abstract] The abstract states the finding but supplies no numerical values, data sources, or description of the re-test procedure, making independent verification impossible from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments. We address the major comment regarding the justification for the equivalence assumption below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and any methods section): the central claim that support 'disappears' when the two-introduction model is tested under the single-introduction conditions rests on an unexamined equivalence assumption. The manuscript does not demonstrate why the more stringent conditions (whatever their precise definition) are the appropriate benchmark for the two-introduction model rather than arising from model-specific requirements such as prior volume or parameter-space differences. Without this justification or an explicit side-by-side comparison of the original versus re-tested calculations, the disappearance of support cannot be evaluated as load-bearing evidence.

    Authors: The equivalence is justified because the original prominent report applied a specific set of testing conditions to evaluate support for the single-introduction model, and our analysis applies exactly those same conditions to the two-introduction model. This is not an arbitrary choice but directly addresses the identified imbalance in the hypothesis testing framework. The conditions arise from the overall analysis setup rather than being model-specific, as evidenced by the fact that the two-introduction model was originally tested under less stringent versions of the same criteria. We agree that an explicit side-by-side comparison would strengthen the presentation and will include a table detailing the original calculations versus the re-tested ones in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper critiques an external report by identifying an imbalance in hypothesis testing conditions and re-testing the two-introduction model under equalized conditions to show that support disappears. No load-bearing step in the derivation chain reduces by construction to a self-definition, a fitted input renamed as a prediction, or a self-citation whose content is unverified within this work. The analysis is self-contained as an external comparison without internal circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities used in the calculation or re-test.

pith-pipeline@v0.9.0 · 5598 in / 960 out tokens · 33994 ms · 2026-05-23T02:42:25.950579+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Pekar, Andrew Magee, Edyth Parker, Niema Moshiri, Kather- ine Izhikevich, Jennifer L

    Jonathan E. Pekar, Andrew Magee, Edyth Parker, Niema Moshiri, Kather- ine Izhikevich, Jennifer L. Havens, Karthik Gangavarapu, Lorena Mari- ana Malpica Serrano, Alexander Crits-Christoph, Nathaniel L. Matteson, Mark Zeller, Joshua I. Levy, Jade C. Wang, Scott Hughes, Jungmin Lee, Heedo Park, Man-Seong Park, Katherine Ching Zi Yan, Raymond Tzer Pin Lin, Mo...

  2. [2]

    The molecular epidemiology of mul- tiple zoonotic origins of SARS-CoV-2

    Erratum for the research article “The molecular epidemiology of mul- tiple zoonotic origins of SARS-CoV-2” by J. E. Pekaret al. Science , 382(6667):eadl0585, 2023

  3. [3]

    Emergence of scaling in random networks

    Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999

  4. [4]

    Social contacts and mixing patterns relevant to the spread of infectious diseases.PLoS medicine, 5(3):e74, 2008

    Joël Mossong, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, Stefania Salmaso, Gianpaolo Scalia Tomba, Jacco Wallinga, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases.PLoS medicine, 5(3):e74, 2008

  5. [5]

    Tem- poral dynamics in viral shedding and transmissibility of covid-19.Nature medicine, 26(5):672–675, 2020

    Xi He, Eric HY Lau, Peng Wu, Xilong Deng, Jian Wang, Xinxin Hao, Yiu Chung Lau, Jessica Y Wong, Yujuan Guan, Xinghua Tan, et al. Tem- poral dynamics in viral shedding and transmissibility of covid-19.Nature medicine, 26(5):672–675, 2020

  6. [6]

    Early transmis- sion dynamics in wuhan, china, of novel coronavirus–infected pneumonia

    Qun Li, Xuhua Guan, Peng Wu, Xiaoye Wang, Lei Zhou, Yeqing Tong, Ruiqi Ren, Kathy SM Leung, Eric HY Lau, Jessica Y Wong, et al. Early transmis- sion dynamics in wuhan, china, of novel coronavirus–infected pneumonia. New England journal of medicine , 382(13):1199–1207, 2020

  7. [7]

    Reconstruction of the full transmission dynamics of covid-19 in wuhan.Nature, 584(7821):420–424, 2020

    Xingjie Hao, Shanshan Cheng, Degang Wu, Tangchun Wu, Xihong Lin, and Chaolong Wang. Reconstruction of the full transmission dynamics of covid-19 in wuhan.Nature, 584(7821):420–424, 2020

  8. [8]

    Association of public health interventions with the epidemiology of the covid-19 outbreak in wuhan, china

    An Pan, Li Liu, Chaolong Wang, Huan Guo, Xingjie Hao, Qi Wang, Jiao Huang, Na He, Hongjie Yu, Xihong Lin, et al. Association of public health interventions with the epidemiology of the covid-19 outbreak in wuhan, china. Jama, 323(19):1915–1923, 2020

  9. [9]

    Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2).Science, 368(6490):489–493, 2020

    Ruiyun Li, Sen Pei, Bin Chen, Yimeng Song, Tao Zhang, Wan Yang, and Jeffrey Shaman. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2).Science, 368(6490):489–493, 2020. 12

  10. [10]

    Improving GEMFsim: a stochastic simulator for the gener- alized epidemic modeling framework

    Futing Fang. Improving GEMFsim: a stochastic simulator for the gener- alized epidemic modeling framework. Master of Science Thesis, Kansas State University, Manhattan, KS, USA, December 2019. Major Professor: Caterina M. Scoglio

  11. [11]

    Gener- alized epidemic mean-field model for spreading processes over multilayer complex networks

    Faryad Darabi Sahneh, Caterina Scoglio, and Piet Van Mieghem. Gener- alized epidemic mean-field model for spreading processes over multilayer complex networks. IEEE/ACM Transactions on Networking , 21(5):1609– 1620, 2013

  12. [12]

    Gemfsim: A stochastic simulator for the generalized epidemic modeling framework.Journal of Computational Science , 22:36–44, 2017

    Faryad Darabi Sahneh, Aram Vajdi, Heman Shakeri, Futing Fan, and Caterina Scoglio. Gemfsim: A stochastic simulator for the generalized epidemic modeling framework.Journal of Computational Science , 22:36–44, 2017

  13. [13]

    Coatran: Coalescent tree simulation along a transmission network

    Niema Moshiri. Coatran: Coalescent tree simulation along a transmission network. bioRxiv, 2020

  14. [14]

    Treeswift: A massively scalable python tree package.Soft- wareX, 11:100436, 2020

    Niema Moshiri. Treeswift: A massively scalable python tree package.Soft- wareX, 11:100436, 2020. 13