pith. sign in

arxiv: 2604.14571 · v1 · submitted 2026-04-16 · 📊 stat.ME · stat.CO

Bayesian sparse principal coordinates analysis with delta-tolerant linear approximation for microbiome data

Pith reviewed 2026-05-10 11:02 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords Bayesian sparse methodsprincipal coordinates analysismicrobiome beta-diversitysparse linear surrogatedelta-tolerance diagnosticglobal-local priorsBray-Curtis distance
0
0 comments X

The pith

Bayesian sparse principal coordinates analysis approximates leading PCoA axes with sparse linear combinations of taxa abundances while flagging when the fit supports taxon-level interpretation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops BSPCoA to solve the interpretability problem in microbiome beta-diversity ordinations. Classical principal coordinates analysis produces axes from pairwise dissimilarities that cannot be read directly as effects of individual taxa. The method replaces those axes with a sparse linear surrogate in the observed abundance data, places three-parameter beta normal global-local priors on the coefficients to select influential taxa, and supplies a delta-tolerance diagnostic that measures how much the surrogate distorts the original geometry. When the diagnostic stays small, researchers obtain both an ordination close to the classical result and a short list of taxa that drive the main patterns, as illustrated in simulations and the Hadza seasonal data.

Core claim

BSPCoA approximates the leading principal coordinates defined by any ecologically meaningful dissimilarity by a sparse linear combination of the raw taxa abundances, using global-local shrinkage priors to induce row sparsity and posterior uncertainty, together with a delta-tolerance diagnostic that quantifies the approximation error and thereby indicates when taxon-level interpretation remains faithful to the original ordination.

What carries the argument

The delta-tolerant linear surrogate obtained by Bayesian sparse regression of the principal coordinates on the taxa abundance matrix.

If this is right

  • Taxa selected by the sparse surrogate become direct candidates for biological follow-up in studies of community differences.
  • The same procedure applies without modification to Bray-Curtis, Hellinger, and other non-Euclidean distances used in ecology.
  • Posterior intervals on the surrogate coefficients give a measure of uncertainty for each taxon's contribution to an axis.
  • When the delta-tolerance diagnostic is small, the method recovers nearly the same ordination geometry as classical PCoA while adding sparsity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on other high-dimensional compositional data sets where researchers need to link dissimilarity-based patterns back to measured variables.
  • Low delta-tolerance on the first few axes would support using the selected taxa as predictors in subsequent models of environmental or host factors.
  • If delta-tolerance grows with axis number, investigators might restrict biological interpretation to the first one or two coordinates.

Load-bearing premise

The dominant patterns captured by pairwise dissimilarities can be recovered to useful accuracy by a sparse linear combination of the observed taxa counts.

What would settle it

A data set in which the delta-tolerance values remain large across the leading axes even after sparsity selection, showing that the linear surrogate systematically distorts the classical ordination geometry.

Figures

Figures reproduced from arXiv: 2604.14571 by Hsin-Hsiung Huang, Liangliang Zhang, Ruitao Liu, Shao-Hsuan Wang.

Figure 1
Figure 1. Figure 1: Classical PCoA and BSPCoA for the Hadza gut microbiome data under the Hellinger [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Heatmap of the posterior median BSPCoA loading matrix for the Hadza data. Larger [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
read the original abstract

Principal coordinates analysis (PCoA) is a standard exploratory tool for microbiome beta-diversity studies, but its axes are defined by pairwise dissimilarities and therefore do not directly identify the taxa driving an ordination. We propose Bayesian sparse principal coordinates analysis (BSPCoA), a post hoc framework that approximates the leading principal coordinates by a sparse linear surrogate in the observed taxa. A delta-tolerance diagnostic quantifies the discrepancy between the classical ordination and its best linear surrogate, clarifying when taxon-level interpretation is well supported. We place three-parameter beta normal global-local priors on the surrogate coefficients to induce row sparsity, obtain posterior uncertainty, and select influential taxa. The method reduces to sparse principal component analysis under Euclidean distance, while remaining applicable to ecologically meaningful dissimilarities such as Bray--Curtis and Hellinger distances. We conduct simulation studies to demonstrate that BSPCoA provides an approximately linear representation of the dominant ordination geometry while enhancing interpretability in sparse microbiome settings. In the Hadza gut microbiome data, the method produces an ordination close to that of classical PCoA while highlighting a parsimonious set of taxa associated with seasonal variation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes Bayesian sparse principal coordinates analysis (BSPCoA), a post-hoc framework that approximates leading PCoA axes via sparse linear surrogates in observed taxa abundances using three-parameter beta normal global-local priors, along with a delta-tolerance diagnostic to quantify discrepancy from classical ordination. It reduces to sparse PCA under Euclidean distance, remains applicable to Bray-Curtis and Hellinger dissimilarities, and is supported by simulation studies plus a Hadza gut microbiome application showing close approximation and enhanced interpretability.

Significance. If the delta-tolerance reliably flags valid linear approximations, the method could meaningfully improve taxon-level interpretation of PCoA results in microbiome beta-diversity analyses, especially for non-Euclidean distances where direct sparse PCA does not apply. The simulation evidence for approximate linearity and the real-data demonstration of parsimonious taxon selection are clear strengths that support practical utility in sparse settings.

major comments (1)
  1. Simulation studies: the reported support for 'approximately linear representation of the dominant ordination geometry' does not include scenarios that test whether the delta-tolerance identifies biologically relevant taxa when the dissimilarity (e.g., Bray-Curtis) introduces nonlinear mappings or interactions from abundances to coordinates; this directly bears on the central claim that the surrogate enhances interpretability without distortion for ecologically meaningful distances.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential utility of BSPCoA for improving taxon-level interpretation of PCoA results. We address the single major comment below.

read point-by-point responses
  1. Referee: Simulation studies: the reported support for 'approximately linear representation of the dominant ordination geometry' does not include scenarios that test whether the delta-tolerance identifies biologically relevant taxa when the dissimilarity (e.g., Bray-Curtis) introduces nonlinear mappings or interactions from abundances to coordinates; this directly bears on the central claim that the surrogate enhances interpretability without distortion for ecologically meaningful distances.

    Authors: We agree that explicitly testing the delta-tolerance under stronger nonlinear mappings would strengthen the evidence. Our existing simulations already incorporate Bray-Curtis and Hellinger dissimilarities, which induce nonlinear transformations from abundances to coordinates, and demonstrate that low delta values coincide with faithful recovery of the dominant geometry and the taxa that generate it. To directly address the referee's concern, we will add new simulation scenarios that include explicit nonlinear interactions and higher-order terms between abundances and the underlying coordinates. In these cases we will report whether the delta-tolerance continues to flag approximations that preserve the biologically relevant taxa without distortion. This addition will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in BSPCoA derivation or delta-tolerance diagnostic

full rationale

The paper presents BSPCoA as a post-hoc approximation of PCoA axes via sparse linear surrogates on taxa abundances, with the delta-tolerance serving as an independent diagnostic of coordinate discrepancy rather than a self-referential prediction. The reduction to sparse PCA under Euclidean distance is a special case of the framework, not a circular redefinition. No equations or steps in the abstract reduce fitted parameters or self-cited results to the target claims by construction; the Bayesian priors and simulation validation are standard and externally verifiable. The derivation chain remains self-contained against the independently computed PCoA dissimilarities.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach relies on standard Bayesian inference and sparsity-inducing priors from the literature, plus the domain assumption that linear taxa surrogates can capture dominant ordination geometry for ecologically relevant distances.

free parameters (1)
  • hyperparameters of three-parameter beta normal global-local priors
    These control the degree of row sparsity in the surrogate coefficients and are chosen to induce the desired selection of influential taxa.
axioms (2)
  • standard math Bayesian posterior sampling yields valid uncertainty quantification and taxon selection for the linear surrogate
    Invoked as the mechanism for obtaining posterior uncertainty and selecting taxa.
  • domain assumption A linear combination of taxa abundances can serve as an interpretable surrogate for PCoA axes defined by dissimilarities
    This is the core premise of the post-hoc approximation framework.

pith-pipeline@v0.9.0 · 5508 in / 1448 out tokens · 71631 ms · 2026-05-10T11:02:30.603410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Anderson, M. J. and Willis, T. J. (2003). Canonical analysis of principal coordinates: A useful method of constrained ordination for ecology. Ecology 84, 511--525

  2. [2]

    Armagan, A., Clyde, M., and Dunson, D. B. (2011). Generalized beta mixtures of Gaussians. In Advances in Neural Information Processing Systems 24, 523--531

  3. [3]

    Bai, R. (2018). Bayesian High-Dimensional Models with Scale-Mixture Shrinkage Priors. PhD dissertation, University of Florida

  4. [4]

    and Ghosh, M

    Bai, R. and Ghosh, M. (2018). High-dimensional multivariate posterior consistency under global-local shrinkage priors. Journal of Multivariate Analysis 167, 157--170

  5. [5]

    M., Polson, N

    Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465--480

  6. [6]

    Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325--338

  7. [7]

    and Dy, J

    Guan, Y. and Dy, J. (2009). Sparse probabilistic principal component analysis. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, 185--192

  8. [8]

    and Anderson, M

    Legendre, P. and Anderson, M. J. (1999). Distance-based redundancy analysis: Testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69, 1--24

  9. [9]

    and Gallagher, E

    Legendre, P. and Gallagher, E. D. (2001). Ecologically meaningful transformations for ordination of species data. Oecologia 129, 271--280

  10. [10]

    and Fong, D

    Lin, L. and Fong, D. K. H. (2019). Bayesian multidimensional scaling procedure with variable selection. Computational Statistics & Data Analysis 129, 1--13

  11. [11]

    Oh, M. S. and Raftery, A. E. (2001). Bayesian multidimensional scaling and choice of dimension. Journal of the American Statistical Association 96, 1031--1044

  12. [12]

    A., Leach, J., Sonnenburg, E

    Smits, S. A., Leach, J., Sonnenburg, E. D., et al. (2017). Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science 357, 802--806

  13. [13]

    Wang, S.-H., Bai, R., and Huang, H.-H. (2025). Two-step mixed-type multivariate Bayesian sparse variable selection with shrinkage priors. Electronic Journal of Statistics 19, 397--457

  14. [14]

    Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265--286