Composition as Direction: An Active-Set Ray-Based Model for Sparse High-Dimensional Compositional Data
Pith reviewed 2026-06-30 09:11 UTC · model grok-4.3
The pith
Compositions are reframed as observed directions of latent abundance vectors, with an active-set process and ray-based Gaussian evaluation to handle exact zeros and dependence in high dimensions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that mapping compositions to the nonnegative orthant of the unit hypersphere and specifying an active-set process that governs which components are present allows the positive subcomposition to be modeled by evaluating a latent Gaussian density along positive rays of the active subspace, with the radius treated as an auxiliary variable. Such a construction separates the active-set process from the positive subcomposition, preserves a latent Gaussian interpretation, accommodates arbitrary latent dependence, and remains computationally feasible in high-dimensional settings. Conceptually, the framework treats a composition as an observed direction of a latent abundance vect
What carries the argument
The Active-set Ray-based Compositional (ARC) construction that maps compositions to the nonnegative orthant of the unit hypersphere and evaluates the latent Gaussian along positive rays conditional on the active set.
If this is right
- Exact zeros are handled directly by the active-set process without truncation or folding.
- Arbitrary latent dependence among components is retained through the underlying Gaussian structure.
- The positive subcomposition and the active-set process are modeled separately yet remain jointly tractable.
- The radius is introduced as an auxiliary variable to enable the ray-based density evaluation.
- Compositions are interpreted as directions of latent abundance vectors whose magnitude is unobserved.
Where Pith is reading between the lines
- The directional representation may suggest analogous ray-based models for other data supported on simplices or probability simplices.
- In applied work the separation of active-set and positive-subcomposition steps could simplify variable-selection routines for microbiome or geochemical data.
- The auxiliary-radius device might be reusable in other latent-Gaussian models that must respect positivity or unit-sum constraints.
- Empirical comparisons on real sparse compositional datasets would test whether the computational gains translate into improved downstream prediction or clustering.
Load-bearing premise
The mapping of compositions to the nonnegative orthant of the unit hypersphere together with evaluation of the latent Gaussian density along positive rays of the active subspace preserves support on the simplex and separates the active-set process from the positive subcomposition without bias or intractability.
What would settle it
Generate high-dimensional compositional data from a known projected-Gaussian process that includes exact zeros and a specified dependence structure, then check whether the ARC model recovers the generating parameters with low bias and scales computationally while a truncated projected-Gaussian baseline does not.
Figures
read the original abstract
[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components, high-dimensionality, and a unit-sum constraint that induces a non-Euclidean geometry. Conventional Dirichlet-type and logistic-normal models address these features only partially. Projected Gaussian models offer a directional representation that captures exact zeros and latent dependence; however, support correctness on the simplex requires either truncation or folding, both of which become computationally prohibitive as the dimension grows. We develop an Active-set Ray-based Compositional (ARC) framework, which retains the benefits of projected Gaussian models while remaining computationally feasible in high-dimensional settings. In this framework, we map compositions to the nonnegative orthant of the unit hypersphere and specify an active-set process that governs which components are present. Conditional on the active set, the positive subcomposition is modeled by evaluating a latent Gaussian density along positive rays of the active subspace with the radius treated as an auxiliary variable. Such a construction (i) separates the active-set process that governs which components are present from the positive subcomposition on the active components, (ii) preserves a latent Gaussian interpretation, and (iii) accommodates arbitrary latent dependence. Thus, the framework is conducive to high-dimensional applications in which exact zeros and shared positive responses are scientifically central. Conceptually, the proposed framework reframes a composition as an observed direction of a latent abundance vector with an unobserved magnitude and an explicitly modeled active set.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Active-set Ray-based Compositional (ARC) framework for sparse high-dimensional compositional data featuring exact zeros, latent dependence, and the unit-sum constraint. Compositions are mapped to the nonnegative orthant of the unit hypersphere; an active-set process governs which components are present; conditional on the active set S, the positive subcomposition is obtained by evaluating a latent Gaussian density along positive rays in the |S|-dimensional active subspace while treating radius as auxiliary. The construction is asserted to separate the active-set process from the subcomposition, preserve a latent Gaussian interpretation, accommodate arbitrary dependence, and remain computationally feasible where truncation/folding in projected Gaussian models is prohibitive. Conceptually, a composition is reframed as an observed direction of a latent abundance vector with unobserved magnitude and explicitly modeled active set.
Significance. If the ray-based construction is shown to be measure-theoretically correct, the framework would supply a tractable directional model that jointly handles exact zeros and high-dimensional latent dependence without the computational cost of projected-Gaussian truncation or folding. This would be relevant for microbial, ecological, and environmental applications where sparse compositional data with shared positive responses are central. The reframing of composition as direction of a latent vector with auxiliary magnitude is a conceptually clean perspective that aligns with the geometry of the simplex.
major comments (1)
- [Abstract (model construction) and any section presenting the density or likelihood] The load-bearing claim that the ray parameterization with auxiliary radius induces the correct marginal measure on simplex faces (and separates active-set process from subcomposition without bias) requires an explicit change-of-variables derivation. The manuscript must supply the Jacobian factor arising from the sphere-to-simplex map together with the radial integral of the latent Gaussian, and verify that the resulting density is proper with respect to the appropriate measure on each face. Without this derivation (or an equivalent normalization argument), the model risks incorrect probabilities on positive subcompositions; this directly affects the retained benefits of projected Gaussians asserted in the abstract.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for a rigorous measure-theoretic justification of the ARC construction. We address the single major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract (model construction) and any section presenting the density or likelihood] The load-bearing claim that the ray parameterization with auxiliary radius induces the correct marginal measure on simplex faces (and separates active-set process from subcomposition without bias) requires an explicit change-of-variables derivation. The manuscript must supply the Jacobian factor arising from the sphere-to-simplex map together with the radial integral of the latent Gaussian, and verify that the resulting density is proper with respect to the appropriate measure on each face. Without this derivation (or an equivalent normalization argument), the model risks incorrect probabilities on positive subcompositions; this directly affects the retained benefits of projected Gaussians asserted in the abstract.
Authors: We agree that an explicit change-of-variables derivation is required to substantiate the claims. The current draft presents the ray-based construction and its conceptual advantages but omits the detailed Jacobian and normalization steps. In the revision we will insert a dedicated derivation (new subsection in Section 3 together with an appendix) that (i) obtains the Jacobian of the sphere-to-simplex map, (ii) evaluates the radial integral of the latent Gaussian, and (iii) verifies that the induced density is proper with respect to the (D-1)-dimensional Hausdorff measure on each face of the simplex. This will confirm separation of the active-set process without bias and will strengthen the comparison with projected-Gaussian models. revision: yes
Circularity Check
No circularity detected; derivation presented as independent modeling construction
full rationale
The provided abstract and description introduce the ARC framework as a novel mapping of compositions to the nonnegative orthant of the unit hypersphere, combined with an active-set process and ray-based latent Gaussian evaluation treating radius as auxiliary. No equations, fitted parameters, or self-citations are exhibited that reduce any claimed prediction or preservation property to a tautological input by construction. The separation of active-set process from subcomposition and the preservation of latent Gaussian interpretation are asserted as consequences of the chosen construction rather than derived from prior fitted quantities or self-referential definitions. This is the most common honest finding when no load-bearing reductions are visible in the text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of the American Statistical Association , volume=
Sampling-based approaches to calculating marginal densities , author=. Journal of the American Statistical Association , volume=. 1990 , publisher=
1990
-
[2]
Use of the von
Stephens, Michael A , journal=. Use of the von. 1982 , publisher=
1982
-
[3]
Blessing of dimension in
Chattopadhyay, Shounak and Zhang, Anru R and Dunson, David B , journal=. Blessing of dimension in
-
[4]
Bhattacharya, Anirban and Dunson, David B , journal=. Sparse. 2011 , publisher=
2011
-
[5]
Environmetrics , volume=
Separable approximations of space-time covariance matrices , author=. Environmetrics , volume=. 2007 , publisher=
2007
-
[6]
The meaning and use of the area under a receiver operating characteristic (
Hanley, James A and McNeil, Barbara J , journal=. The meaning and use of the area under a receiver operating characteristic (
-
[7]
Computational Statistics & Data Analysis , volume=
Generalized spatial dynamic factor models , author=. Computational Statistics & Data Analysis , volume=. 2011 , publisher=
2011
-
[8]
2005 , publisher=
Gaussian Markov Random Fields: Theory and Applications , author=. 2005 , publisher=
2005
-
[9]
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability , volume=
Statistical inference in factor analysis , author=. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability , volume=
-
[10]
Hastings, W Keith , journal=. Monte. 1970 , pages=
1970
-
[11]
The Review of Financial Studies , volume=
Measuring the pricing error of the arbitrage pricing theory , author=. The Review of Financial Studies , volume=. 1996 , publisher=
1996
-
[12]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Fixed rank kriging for very large spatial data sets , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=
2008
-
[13]
Handbook of Spatial Statistics , volume=
Low-rank representations for spatial processes , author=. Handbook of Spatial Statistics , volume=
-
[14]
Bayesian analysis of constrained parameter and truncated data problems using
Gelfand, Alan E and Smith, Adrian FM and Lee, Tai-Ming , journal=. Bayesian analysis of constrained parameter and truncated data problems using. 1992 , publisher=
1992
-
[15]
The Annals of Applied Statistics , volume=
Spatial hyperspheric models for compositional data , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=
2025
-
[16]
Zero-inflated
Xu, Tianchen and Demmer, Ryan T and Li, Gen , journal=. Zero-inflated. 2021 , publisher=
2021
-
[17]
arXiv preprint arXiv:2208.13073 , year=
Modelling structural zeros in compositional data via a zero-censored multivariate normal model , author=. arXiv preprint arXiv:2208.13073 , year=
-
[18]
Journal of the American Statistical Association , volume=
Bayesian analysis of binary and polychotomous response data , author=. Journal of the American Statistical Association , volume=. 1993 , publisher=
1993
-
[19]
arXiv preprint arXiv:2604.02286 , year=
Bayesian covariance regression for differential network analysis of zero-inflated microbiome data , author=. arXiv preprint arXiv:2604.02286 , year=
-
[20]
BMC Bioinformatics , volume=
Compositional zero-inflated network estimation for microbiome data , author=. BMC Bioinformatics , volume=. 2020 , publisher=
2020
-
[21]
Zeng, Yanyan and Li, Jing and Wei, Chaochun and Zhao, Hongyu and Wang, Tao , journal=. mb. 2022 , publisher=
2022
-
[22]
The Annals of Applied Statistics , publisher=
Bayesian mixed effects models for zero-inflated compositions in microbiome data analysis , author=. The Annals of Applied Statistics , publisher=
-
[23]
Zero-inflated generalized
Tang, Zheng-Zheng and Chen, Guanhua , journal=. Zero-inflated generalized. 2019 , publisher=
2019
-
[24]
The general projected normal distribution of arbitrary dimension:
Hernandez-Stumpfhauser, Daniel and Breidt, F Jay and van der Woerd, Mark J , journal=. The general projected normal distribution of arbitrary dimension:. 2017 , publisher=
2017
-
[25]
Nu. A. Journal of Applied Statistics , volume=. 2005 , publisher=
2005
-
[26]
A latent
Butler, Adam and Glasbey, Chris , journal=. A latent. 2008 , publisher=
2008
-
[27]
The --regression for compositional data:
Tsagris, Michail and Pantazis, Yannis , journal=. The --regression for compositional data:
-
[28]
Proceedings of the AAAI Conference on Artificial Intelligence , author =
Auto-. Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2023 , pages =. doi:10.1609/aaai.v37i12.26704 , abstract =
-
[29]
Spatio‐temporal analyses of marine predator diets from data‐rich and data‐limited systems , volume =. Fish and Fisheries , author =. 2020 , pages =. doi:10.1111/faf.12457 , abstract =
-
[30]
Environmental and Ecological Statistics , author =
Spatio-temporal regression on compositional covariates: modeling vegetation in a gypsum outcrop , volume =. Environmental and Ecological Statistics , author =. 2015 , pages =. doi:10.1007/s10651-014-0305-4 , abstract =
-
[31]
Fast sampling with
Bhattacharya, Anirban and Chakraborty, Antik and Mallick, Bani K , journal=. Fast sampling with. 2016 , publisher=
2016
-
[32]
False discovery rates:
Stephens, Matthew , journal=. False discovery rates:. 2017 , publisher=
2017
-
[33]
phyloseq:
McMurdie, Paul J and Holmes, Susan , journal=. phyloseq:. 2013 , publisher=
2013
-
[34]
2012 , publisher=
Structure, function and diversity of the healthy human microbiome , journal =. 2012 , publisher=
2012
-
[35]
Asymptotic properties of
Datta, Jyotishka and Ghosh, Jayanta K , journal=. Asymptotic properties of. 2013 , publisher=
2013
-
[36]
Spatio-temporal air pollution modelling using a compositional approach , volume =. Heliyon , author =. 2020 , pages =. doi:10.1016/j.heliyon.2020.e04794 , abstract =
-
[37]
PLOS One , author =
Clustering compositional data using. PLOS One , author =. 2022 , pages =
2022
-
[38]
Functional connectivity across the human subcortical auditory system using an autoregressive matrix-
Chandra, Noirrit Kiran and Sitek, Kevin R and Chandrasekaran, Bharath and Sarkar, Abhra , journal=. Functional connectivity across the human subcortical auditory system using an autoregressive matrix-. 2024 , publisher=
2024
-
[39]
PeerJ , volume=
Semiautomated generation of species-specific training data from large, unlabeled acoustic datasets for deep supervised birdsong isolation , author=. PeerJ , volume=. 2024 , publisher=
2024
-
[40]
Science , volume=
Ecology in the age of automation , author=. Science , volume=. 2021 , publisher=
2021
-
[41]
Yoo, Jinkyung and Sun, Zequn and Greenacre, Michael and Ma, Qin and Chung, Dongjun and Kim, Young Min , month = apr, year =. A
-
[42]
Journal of the American Statistical Association , author =
A Zero-Inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions , volume =. Journal of the American Statistical Association , author =. 2023 , pages =. doi:10.1080/01621459.2022.2044827 , language =
-
[43]
Compositional data analysis tutorial. , issn =. Psychological Methods , author =. 2022 , file =. doi:10.1037/met0000464 , abstract =
-
[44]
Compositional
Greenacre, Michael , file =. Compositional
-
[45]
Mathematical Geosciences , author =
Classical and. Mathematical Geosciences , author =. 2021 , pages =. doi:10.1007/s11004-020-09895-w , abstract =
-
[46]
Aitchison’s compositional data analysis 40 years on:
Greenacre, Michael and Grunsky, Eric and Bacon-Shone, John and Erb, Ionas and Quinn, Thomas , journal=. Aitchison’s compositional data analysis 40 years on:. 2023 , publisher=
2023
-
[47]
Mathematical Geosciences , author =
Geostatistics for. Mathematical Geosciences , author =. 2019 , pages =. doi:10.1007/s11004-018-9769-3 , abstract =
-
[48]
Tolosana-Delgado, Raimon and Mueller, Ute , year =. Geostatistics for. doi:10.1007/978-3-030-82568-3 , file =
-
[49]
“compositions”:. Computers & Geosciences , author =. 2008 , pages =. doi:10.1016/j.cageo.2006.11.017 , abstract =
-
[50]
Communications in Statistics - Theory and Methods , author =
A review of compositional data analysis and recent advances , volume =. Communications in Statistics - Theory and Methods , author =. 2023 , pages =. doi:10.1080/03610926.2021.2014890 , abstract =
-
[51]
Journal of the Royal Statistical Society Series B: Statistical Methodology , author =
Regression for. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 2011 , keywords =
2011
-
[52]
Maier, Marco Johannes , file =
-
[53]
Journal of Agricultural, Biological, and Environmental Statistics , author =
Spatial regression modeling for compositional data with many zeros , volume =. Journal of Agricultural, Biological, and Environmental Statistics , author =. 2013 , pages =
2013
-
[54]
The American Statistician , volume=
Adding spatially-correlated errors can mess up the fixed effect you love , author=. The American Statistician , volume=. 2010 , publisher=
2010
-
[55]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2013 , publisher=
2013
-
[56]
Restricted spatial regression in practice:
Hanks, Ephraim M and Schliep, Erin M and Hooten, Mevin B and Hoeting, Jennifer A , journal=. Restricted spatial regression in practice:. 2015 , publisher=
2015
-
[57]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Assessment and propagation of model uncertainty , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1995 , publisher=
1995
-
[58]
Bezanson, Jeff and Edelman, Alan and Karpinski, Stefan and Shah, Viral B , journal=. Julia:. 2017 , publisher=
2017
-
[59]
Johnson , year =
Steven G. Johnson , year =
-
[60]
Multivariate Spatial Process Models , booktitle=
Gelfand, Alan E and Banerjee, Sudipto , journal=. Multivariate Spatial Process Models , booktitle=. 2010 , publisher=
2010
-
[61]
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages=
Elliptical slice sampling , author=. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages=. 2010 , organization=
2010
-
[62]
40 years after. SORT , author =. 2023 , pages =. doi:10.57645/20.8080.02.6 , abstract =
-
[63]
Teaching with the
Betancourt, Ileana and McLinn, Colleen M , journal=. Teaching with the. 2012 , publisher=
2012
-
[64]
An analysis of acoustic communication within the social system of downy woodpeckers (
Dodenhoff, Danielle J , year=. An analysis of acoustic communication within the social system of downy woodpeckers (
-
[65]
Abadi, Mart. Tensor. 12th USENIX Symposium on Operating Systems Design and Implementation , pages=
-
[66]
2015 , howpublished=
Keras , author=. 2015 , howpublished=
2015
-
[67]
2018 IEEE International Conference on Acoustics, Speech and Signal Processing , pages=
Large-scale weakly supervised audio classification using gated convolutional neural network , author=. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing , pages=. 2018 , organization=
2018
-
[68]
2017 , publisher=
Ecoacoustics: The Ecological Role of Sounds , author=. 2017 , publisher=
2017
-
[69]
Making recursive
Hooten, Mevin B and Johnson, Devin S and Brost, Brian M , journal=. Making recursive. 2021 , publisher=
2021
-
[70]
Energy and AI , volume=
An introduction to multivariate probabilistic forecast evaluation , author=. Energy and AI , volume=. 2021 , publisher=
2021
-
[71]
Journal of the American Statistical Association , volume=
Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics , author=. Journal of the American Statistical Association , volume=. 2004 , publisher=
2004
-
[72]
Graphical
Dey, Debangan and Datta, Abhirup and Banerjee, Sudipto , journal=. Graphical. 2022 , publisher=
2022
-
[73]
Ecological Applications , volume=
Model selection for geostatistical models , author=. Ecological Applications , volume=. 2006 , publisher=
2006
-
[74]
arXiv preprint arXiv:2209.06294 , year=
Graph-constrained Analysis for Multivariate Functional Data , author=. arXiv preprint arXiv:2209.06294 , year=
-
[75]
Journal of Computational and Graphical Statistics , volume=
Modeling Massive Highly Multivariate Nonstationary Spatial Data with the Basis Graphical Lasso , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=
2023
-
[76]
Neal, Radford M , journal=
-
[77]
2017 , school=
On uncertainty quantification for systems of computer models , author=. 2017 , school=
2017
-
[78]
Spatial factor modeling:
Zhang, Lu and Banerjee, Sudipto , journal=. Spatial factor modeling:. 2022 , publisher=
2022
-
[79]
Adaptive rejection sampling for
Gilks, Walter R and Wild, Pascal , journal=. Adaptive rejection sampling for. 1992 , publisher=
1992
-
[80]
The Annals of Statistics , volume=
Slice sampling , author=. The Annals of Statistics , volume=. 2003 , publisher=
2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.