Bayesian Re-Analysis of the Phylogenetic Topology of Early SARS-CoV-2 Case Sequences

Michael B. Weissman

arxiv: 2510.01484 · v5 · submitted 2025-10-01 · 🧬 q-bio.PE · q-bio.QM

Bayesian Re-Analysis of the Phylogenetic Topology of Early SARS-CoV-2 Case Sequences

Michael B. Weissman This is my paper

Pith reviewed 2026-05-18 11:14 UTC · model grok-4.3

classification 🧬 q-bio.PE q-bio.QM

keywords SARS-CoV-2phylogenetic analysisBayesian reasoningvirus introductionsmolecular phylogenypandemic originssingle introductiontwo introductions

0 comments

The pith

Correcting a fundamental error in Bayesian reasoning reverses the conclusion on early SARS-CoV-2 introductions, favoring one over two.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper re-analyzes the Bayesian phylogenetic study of early SARS-CoV-2 case sequences from a 2022 work. The original analysis concluded that two successful introductions to humans were more probable than one. After identifying and fixing a basic mistake in the application of Bayesian reasoning, the same data and methods yield a higher likelihood for a single introduction. A sympathetic reader would care because the number of introductions shapes understanding of the virus's initial jump from animals and its early human spread. The work is a direct replication that applies the correction to the published results without new data.

Core claim

After correcting a fundamental error in Bayesian reasoning the results in that paper give larger likelihood for a single introduction than for two.

What carries the argument

The corrected Bayesian posterior comparison of phylogenetic topologies supporting one versus two successful introductions.

If this is right

The data now assign higher probability to a single successful human introduction than to two.
This reverses the direction of the original conclusion about the number of introductions.
Models of early pandemic spread should incorporate the updated relative likelihoods from the corrected analysis.
The phylogenetic evidence alone does not support two introductions as the more likely scenario.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar Bayesian setups in other pathogen origin studies could be checked for the same reasoning step.
Collecting additional early case sequences might allow a direct test of whether the corrected likelihoods hold with more data.
The result suggests that conclusions about introduction counts can shift with precise handling of probability updates even when the underlying tree topologies remain unchanged.

Load-bearing premise

The assumption that the identified error is the only material flaw in the original Bayesian setup and that the re-calculation applies the correction without introducing new modeling choices or data exclusions that themselves affect the single-versus-two comparison.

What would settle it

Re-running the original Bayesian calculation on the early SARS-CoV-2 sequence data with the corrected treatment of priors or likelihoods and directly comparing the resulting probabilities for one versus two introductions.

read the original abstract

A much-cited 2022 paper by Pekar et al. claimed that Bayesian analysis of the molecular phylogeny of early SARS-CoV-2 cases indicated that it was more likely that two successful introductions to humans had occurred than that just one had. Here I show that after correcting a fundamental error in Bayesian reasoning the results in that paper give larger likelihood for a single introduction than for two.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Weissman claims a basic Bayesian slip in Pekar et al. reverses their finding so that one early SARS-CoV-2 introduction now looks more likely than two.

read the letter

The main point to take away is that this note argues Pekar et al. mishandled the comparison of single versus two introductions in their Bayesian setup, and that fixing the error flips the result toward a single introduction using the same early-case sequences. The author walks through the original likelihoods and shows how the prior odds or marginal likelihood ratio should have been combined differently, producing a higher posterior weight on the one-introduction hypothesis. What the paper does well is stay narrowly focused on one concrete numerical claim from an influential 2022 study and lay out a clear alternative calculation without introducing new data or a whole new model. That kind of direct re-working can help readers see where small interpretive choices matter in phylogenetic hypothesis testing. The soft spot is that the manuscript does not include a side-by-side table of the original and corrected Bayes factors or the exact formula used for the adjustment, so it is not immediately obvious whether every modeling choice—sequence inclusion, topology sampling, or introduction-time priors—was held exactly fixed. If any of those were altered even slightly, the reversal cannot be attributed solely to the identified Bayesian error. The stress-test concern therefore has some force until the arithmetic is shown in more detail. This note is mainly useful for people who already know the Pekar paper and want to test the robustness of its introduction-count conclusion. Readers working on early-pandemic sequence data or zoonotic spillover models will find it relevant. It deserves a serious referee because the claim is specific, the data source is public, and a response from the original authors could settle the arithmetic quickly.

Referee Report

2 major / 0 minor

Summary. The manuscript re-analyzes the Bayesian phylogenetic comparison of single versus two successful introductions of early SARS-CoV-2 from Pekar et al. (2022). It identifies a fundamental error in the application of Bayesian reasoning and asserts that, once corrected while holding the original data and model fixed, the likelihood favors a single introduction over two.

Significance. If the claimed reversal is shown to arise solely from the identified Bayesian correction without new modeling choices, the result would be significant for the interpretation of SARS-CoV-2 origins and for the correct use of Bayesian model comparison in phylogenetic studies of viral emergence. The work draws attention to a potential systematic issue in how posterior probabilities are compared across introduction scenarios.

major comments (2)

The central claim requires an explicit side-by-side derivation or numerical comparison showing that the corrected likelihood ratio favors one introduction. The abstract states the reversal but the manuscript provides no equation, table, or step-by-step recalculation of the relevant posteriors from the Pekar et al. model.
To attribute the reversal cleanly to the Bayesian error alone, the manuscript must verify that the re-analysis uses identical sequence data, tree topologies, priors on introduction times, and model structure as the 2022 work. Any implicit data exclusions or altered sampling would undermine the claim that the result follows solely from the correction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our re-analysis of Pekar et al. (2022). We address each major comment below, agreeing that additional explicit material will improve clarity, and indicate the corresponding revisions.

read point-by-point responses

Referee: The central claim requires an explicit side-by-side derivation or numerical comparison showing that the corrected likelihood ratio favors one introduction. The abstract states the reversal but the manuscript provides no equation, table, or step-by-step recalculation of the relevant posteriors from the Pekar et al. model.

Authors: We agree that an explicit side-by-side derivation and numerical comparison would strengthen the presentation. The revised manuscript will include a new section with the step-by-step recalculation of the posterior probabilities under the original and corrected Bayesian reasoning, together with a table reporting the likelihood ratios and model probabilities for the single-introduction versus two-introduction scenarios. revision: yes
Referee: To attribute the reversal cleanly to the Bayesian error alone, the manuscript must verify that the re-analysis uses identical sequence data, tree topologies, priors on introduction times, and model structure as the 2022 work. Any implicit data exclusions or altered sampling would undermine the claim that the result follows solely from the correction.

Authors: The re-analysis uses precisely the sequence data, tree topologies, introduction-time priors, and model structure reported in Pekar et al. (2022), with the sole modification being the correction to the Bayesian model-comparison step. The revised methods section will add explicit cross-references to the original supplementary materials and tables to document this identity and rule out any data or sampling changes. revision: yes

Circularity Check

0 steps flagged

No circularity: re-analysis applies external correction to cited prior work

full rationale

The manuscript re-uses phylogenetic data, tree topologies, and model structure from the externally cited Pekar et al. 2022 paper and applies an independent correction to a claimed Bayesian reasoning error. The central result (reversed likelihood favoring single introduction) is presented as following from that correction without introducing new fitted parameters, self-defined quantities, or load-bearing self-citations. No derivation step reduces by construction to inputs defined within the present paper; the work is self-contained against the external benchmark of the 2022 analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The re-analysis rests on the phylogenetic topology and sequence data from Pekar et al. together with standard Bayesian updating rules; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

standard math Standard rules of Bayesian probability apply to the comparison of introduction scenarios
The paper invokes correct application of Bayes' theorem to reverse the original conclusion.

pith-pipeline@v0.9.0 · 5584 in / 1161 out tokens · 36687 ms · 2026-05-18T11:14:46.110373+00:00 · methodology

Bayesian Re-Analysis of the Phylogenetic Topology of Early SARS-CoV-2 Case Sequences

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)