pith. sign in

arxiv: 2606.22561 · v1 · pith:VRNS5NMGnew · submitted 2026-06-21 · 🧬 q-bio.PE · q-bio.QM

quaint: An R Package for detecting introgression across a phylogeny using discordant gene tree topologies

Pith reviewed 2026-06-26 09:36 UTC · model grok-4.3

classification 🧬 q-bio.PE q-bio.QM
keywords introgressiongene tree discordanceABBA-BABAphylogenomicsR packagehybridizationreticulate evolutionspecies tree
0
0 comments X

The pith

The quaint R package detects introgression across phylogenies by applying the ABBA-BABA test to gene tree topologies rather than nucleotide sites.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hybrid speciation and introgressive hybridization occur across the tree of life, yet site-based methods for detecting them face limits when applied to large phylogenomic datasets. This paper presents an R package, quaint, that takes gene trees and a species tree as input and summarizes patterns of gene tree discordance to infer introgression. The approach extends the ABBA-BABA framework from individual sites to whole gene tree topologies. A sympathetic reader would care because this enables detection of reticulate evolution in broader contexts where gene trees can be estimated reliably but site patterns alone are insufficient.

Core claim

quaint is an R package that infers introgression given a set of gene trees and a species tree by summarizing patterns of gene tree discordance under an ABBA-BABA framework, thereby overcoming the limitations of site-based methods and enabling detection across broad phylogenomic contexts.

What carries the argument

The ABBA-BABA framework applied to gene tree topologies, which counts discordant quartet patterns to identify excess allele sharing indicative of introgression.

If this is right

  • Introgression can be tested across phylogenies containing hundreds of taxa where site-based methods become computationally or statistically intractable.
  • Reticulate evolution can be mapped onto species trees in groups where hybridization is suspected but gene tree estimation is feasible.
  • Reproducible pipelines become available for combining gene tree inference with introgression tests in a single R workflow.
  • Detection extends to datasets where individual sites are too sparse or noisy to yield reliable ABBA-BABA counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same discordance summaries could be combined with network inference methods to jointly estimate both the species network and the locations of introgression edges.
  • If gene tree accuracy improves with longer loci or better models, quaint signals would become stronger without changes to the package itself.
  • Application to empirical clades with known hybrid origins would provide direct tests of whether the method recovers documented introgression events.

Load-bearing premise

Gene tree discordance patterns summarized by the ABBA-BABA test reliably signal introgression rather than being driven mainly by incomplete lineage sorting or other processes.

What would settle it

A dataset generated under a model with only incomplete lineage sorting and no introgression that nevertheless produces strong quaint signals of introgression would falsify the method's reliability.

Figures

Figures reproduced from arXiv: 2606.22561 by Ethan A. Baldwin, James H. Leebens-Mack.

Figure 1
Figure 1. Figure 1: Theoretical framework behind the ABBA-BABA test. The top three trees demonstrate different scenarios wherein ABBA and BABA site patterns or gene tree topologies are generated. Incomplete lineage sorting will produce ABBA and BABA topologies in equal proportions, while introgression between P2 and P3 produces only ABBA topologies. Below each scenario is the resulting unrooted gene tree topology [PITH_FULL_… view at source ↗
read the original abstract

Premise: Hybrid speciation and introgressive hybridization are increasingly recognized as important evolutionary phenomena across the tree of life. One widely used class of methods to detect introgression includes D statistics and related methods which employ the ABBA-BABA test using nucleotide site patterns. Recent studies have applied this theoretical framework to phylogenomic datasets using gene tree topologies instead, but no software packages using this method have been developed. Methods and Results: An R package was developed to facilitate the inference of introgression given a set of gene trees and a species tree. Using an ABBA-BABA framework, this package summarizes patterns of gene tree discordance to infer introgression across large phylogenies. Conclusions: Using gene tree topologies, quaint overcomes the limitations of site-based methods, enabling the detection of introgression across broad phylogenomic contexts. This R package provides an accessible and reproducible tool for researchers investigating reticulate evolution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the quaint R package, which implements an ABBA-BABA framework applied to gene tree topologies (rather than nucleotide site patterns) to summarize discordance and infer introgression across a species tree. It positions the package as overcoming limitations of site-based D-statistics for broad phylogenomic datasets and provides an accessible, reproducible tool for reticulate evolution studies.

Significance. A validated implementation would offer a practical extension of topology-based introgression tests to large phylogenies where site-based methods may be computationally limited. However, the absence of any performance benchmarks, simulation results, or comparisons means the claimed advantage over site-based methods remains untested, limiting the immediate significance of the contribution.

major comments (2)
  1. [Abstract / Conclusions] The central claim (Abstract, Conclusions) that using gene tree topologies 'overcomes the limitations of site-based methods' is unsupported: the manuscript contains no simulation benchmarks, false-positive rates under ILS-only scenarios, or direct comparisons of the topology-based statistic to site-based D-statistics on the same datasets. This validation is load-bearing for the claim that discordance patterns reliably indicate introgression rather than ILS.
  2. [Methods] No section describes the statistical properties of the topology-based ABBA-BABA implementation, such as the exact mapping from gene-tree counts to the D-like statistic, handling of multifurcations, or correction for multiple testing across the phylogeny. Without this, it is unclear whether the package inherits or mitigates the known ILS confounding issues of the original framework.
minor comments (2)
  1. The manuscript should include a dedicated 'Usage' or 'Example' section with a reproducible workflow on a small empirical or simulated dataset to demonstrate package output and interpretation.
  2. [Premise] Clarify the relationship to prior topology-based ABBA-BABA applications cited in the premise; the novelty appears to be the software implementation rather than the method itself.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript describing the quaint R package. We address each major comment below and will make the indicated revisions to improve clarity and support for the claims.

read point-by-point responses
  1. Referee: [Abstract / Conclusions] The central claim (Abstract, Conclusions) that using gene tree topologies 'overcomes the limitations of site-based methods' is unsupported: the manuscript contains no simulation benchmarks, false-positive rates under ILS-only scenarios, or direct comparisons of the topology-based statistic to site-based D-statistics on the same datasets. This validation is load-bearing for the claim that discordance patterns reliably indicate introgression rather than ILS.

    Authors: We agree that the claim in the Abstract and Conclusions is not supported by empirical evidence in the current manuscript, as no benchmarks, simulations, or comparisons are included. The package implements an existing topology-based extension of the ABBA-BABA framework, but we will revise the Abstract and Conclusions to remove or qualify the claim of overcoming limitations. We will add a new Results section with simulation studies evaluating false-positive rates under ILS-only scenarios and direct comparisons to site-based D-statistics on equivalent datasets to provide the necessary validation. revision: yes

  2. Referee: [Methods] No section describes the statistical properties of the topology-based ABBA-BABA implementation, such as the exact mapping from gene-tree counts to the D-like statistic, handling of multifurcations, or correction for multiple testing across the phylogeny. Without this, it is unclear whether the package inherits or mitigates the known ILS confounding issues of the original framework.

    Authors: We acknowledge that the Methods section lacks these details. We will expand it to include: the exact mapping from gene-tree topology counts to the D-like statistic (including definitions of ABBA/BABA patterns in quartet terms), explicit rules for handling multifurcating gene trees, and the multiple-testing correction procedure used across the phylogeny. We will also clarify that the topology-based approach inherits the same theoretical susceptibility to ILS confounding as the site-based ABBA-BABA test and does not mitigate it, ensuring the description is transparent and accurate. revision: yes

Circularity Check

0 steps flagged

Software package paper with no internal derivation chain

full rationale

This is a software description paper that implements an existing ABBA-BABA framework on gene tree topologies, explicitly referencing prior literature for the method. No equations, derivations, parameter fitting, or predictions are presented that could reduce to the paper's own inputs by construction. The central contribution is the R package itself, not a novel theoretical result whose validity depends on self-citation or self-definition. External benchmarks and assumptions about ILS vs. introgression are outside the scope of circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, or new entities are introduced in the abstract. The work is a software wrapper around an existing ABBA-BABA logic applied to topologies.

pith-pipeline@v0.9.1-grok · 5693 in / 1069 out tokens · 24236 ms · 2026-06-26T09:36:36.635695+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

8 extracted references

  1. [1]

    Gompert, Z., E. G. Mandeville and C. A. Buerkle. 2017. Analysis of Population Genomic Data from Hybrid Zones. Annual Review of Ecology, Evolution, and Systematics 48: 207-229

  2. [2]

    Green, R. E., J. Krause, A. W. Briggs, T. Maricic, U. Stenzel, M. Kircher, N. Patterson, et al. 2010. A draft sequence of the Neandertal genome. Science 328: 710-722

  3. [3]

    Harrison, R. G. and E. L. Larson. 2014. Hybridization, Introgression, and the Nature of Species Boundaries. Journal of Heredity 105: 795-809

  4. [4]

    Besansky and M

    Mallet, J., N. Besansky and M. W. Hahn. 2016. How reticulated are species? Bioessays 38: 140-149

  5. [5]

    Pease, J. B., D. C. Haak, M. W. Hahn and L. C. Moyle. 2016. Phylogenomics Reveals Three Sources of Adaptive Variation during a Rapid Radiation. PLOS Biology 14: e1002379

  6. [6]

    Soltis, P . S. and D. E. Soltis. 2009. The Role of Hybridization in Plant Speciation. Annual Review of Plant Biology 60: 561-588

  7. [7]

    Scornavacca, M

    Suvorov, A., C. Scornavacca, M. S. Fujimoto, P . Bodily, M. Clement, K. A. Crandall, M. F . Whiting, et al. 2021. Deep Ancestral Introgression Shapes Evolutionary History of Dragonflies and Damselflies. Systematic Biology 71: 526-546

  8. [8]

    Taylor, S. A. and E. L. Larson. 2019. Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nature Ecology & Evolution 3: 170-177. Baldwin and Leebens-Mack – An R Package for detecting introgression. Figure 1. Theoretical framework behind the ABBA-BABA test. The top three trees demonstrate different scenarios wh...