Compositional regression using principal nested spheres
Pith reviewed 2026-05-15 07:29 UTC · model grok-4.3
The pith
Compositional regression succeeds by embedding simplex data on a sphere, reducing it via principal nested spheres to a cylinder with one circular and several linear scores, regressing there, and mapping predictions back.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding compositional data into the positive orthant of the sphere and applying principal nested spheres, one obtains a cylindrical intermediate space with a leading circular score and Euclidean higher-order scores. Regression proceeds directly in this space, after which the estimates are mapped back to the original simplex.
What carries the argument
Principal Nested Spheres (PNS) applied to sphere-embedded compositional responses, producing a cylindrical space with one circular coordinate and the rest Euclidean for regression.
If this is right
- Standard regression models can be used on compositional responses without directly violating the sum-to-one constraint.
- The leading circular score isolates the dominant nonlinear variation while higher-order Euclidean scores permit linear adjustments.
- Back-mapping of fitted values guarantees that all predictions remain valid compositions.
- The framework extends the idea of intermediate-space regression to other manifold-valued response types.
Where Pith is reading between the lines
- If the cylindrical reduction works reliably, analogous intermediate mappings could simplify regression on other curved manifolds such as positive definite matrices or tree spaces.
- The circular leading score may correspond to interpretable physical or chemical cycles in applications, suggesting targeted validation against domain knowledge.
- Allowing multiple circular scores in the intermediate space could handle compositional data with several independent nonlinear features.
Load-bearing premise
The sphere embedding together with the principal nested spheres reduction must preserve the main nonlinear relationships present in the original simplex data so that regression performed in the cylinder remains meaningful once mapped back.
What would settle it
Generate synthetic compositional responses from a known regression function on the simplex, embed and reduce them with principal nested spheres, fit the model in the cylinder, map the predictions back, and verify whether the recovered relationship matches the generating function within sampling variability.
Figures
read the original abstract
Regression with compositional responses is challenging due to the nonlinear geometry of the simplex and the limitations of Euclidean methods. We propose a regression framework for manifold-valued data based on mappings to statistically tractable intermediate spaces. For compositional data, responses are embedded in the positive orthant of the sphere and analysed using Principal Nested Spheres (PNS), yielding a cylindrical intermediate space with a circular leading score and Euclidean higher-order scores. Regression is performed in this intermediate space and fitted values are mapped back to the simplex. A simulation study demonstrates good performance of PNS-based regression. An application to environmental chemical exposure data illustrates the interpretability and practical utility of the method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a regression framework for compositional responses by embedding them in the positive orthant of the sphere, applying Principal Nested Spheres (PNS) to obtain a cylindrical intermediate space (circular leading score plus Euclidean higher-order scores), performing regression in that space, and mapping fitted values back to the simplex. It reports a simulation study with good performance and an application to environmental chemical exposure data to illustrate interpretability.
Significance. If the PNS-derived cylindrical space adequately preserves the essential nonlinear structure of the original simplex data, the method supplies a geometrically motivated alternative to direct Euclidean regression on compositional data. The simulation study and real-data example constitute concrete evidence of performance and utility, which strengthens the contribution for a methodological statistics paper.
major comments (1)
- The abstract states that regression is performed in the cylindrical intermediate space, but does not specify the model used for the circular leading score (e.g., circular regression, projected linear model, or other). This detail is load-bearing for reproducibility and for evaluating whether the back-mapping step preserves regression properties; it should be stated explicitly with the relevant equations in the methods section.
minor comments (1)
- The abstract mentions 'good performance' in the simulation without naming the metrics or baseline comparators; adding one sentence on these would improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the recommendation of minor revision. The single major comment is addressed below; we agree that explicit specification of the regression model for the circular component will improve clarity and reproducibility.
read point-by-point responses
-
Referee: The abstract states that regression is performed in the cylindrical intermediate space, but does not specify the model used for the circular leading score (e.g., circular regression, projected linear model, or other). This detail is load-bearing for reproducibility and for evaluating whether the back-mapping step preserves regression properties; it should be stated explicitly with the relevant equations in the methods section.
Authors: We agree that the specific regression model for the circular leading score should be stated explicitly. The current manuscript describes regression in the cylindrical space (circular score plus Euclidean scores) in Section 3 but does not isolate the circular component with equations. In the revised version we will (i) update the abstract to read 'using circular regression for the leading score and linear regression for the higher-order scores' and (ii) add the explicit model equations (including the link function and any projection step) to the methods section so that the back-mapping properties can be directly assessed. revision: yes
Circularity Check
No significant circularity detected
full rationale
The proposed framework maps compositional responses to the positive orthant of the sphere, applies established Principal Nested Spheres (PNS) to produce a cylindrical intermediate space (circular leading score plus Euclidean coordinates), performs ordinary regression in that space, and back-maps fitted values to the simplex. This sequence relies on prior geometric constructions for PNS and standard regression techniques; no equation reduces a fitted parameter to a prediction by construction, no central claim is justified solely by self-citation, and no ansatz is smuggled in. The simulation study and environmental application provide external checks rather than tautological confirmation. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Compositional data on the simplex can be isometrically embedded into the positive orthant of the sphere.
- domain assumption Principal nested spheres yield a cylindrical space whose leading circular score and higher Euclidean scores are suitable for linear regression.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
responses are embedded in the positive orthant of the sphere and analysed using Principal Nested Spheres (PNS), yielding a cylindrical intermediate space with a circular leading score and Euclidean higher-order scores
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
(1986).The Statistical Analysis of Compositional Data
Aitchison, J. (1986).The Statistical Analysis of Compositional Data. Chapman and Hall, London
work page 1986
-
[2]
Aitchison, J. and Bacon-Shone, J. (1984). Log contrast models for experiments with mixtures. Biometrika, 71(2):323–330
work page 1984
-
[3]
Bates, D. (2005). Fitting linear mixed models in R.R News, 5(1):27–30
work page 2005
-
[4]
Benjamini, Y . and Hochberg, Y . (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing.J. Roy. Statist. Soc. Ser . B, 57(1):289–300
work page 1995
-
[5]
Dryden, I. L. (2025).shapespackage. R Foundation for Statistical Computing, Vienna, Austria. Contributed package, Version 1.2.8
work page 2025
-
[6]
Dryden, I. L. and Mardia, K. V . (2016).Statistical Shape Analysis, with Applications in R, 2nd edition. Wiley, Chichester. 18
work page 2016
-
[7]
Fletcher, P. T. (2013). Geodesic regression and the theory of least squares on Riemannian mani- folds.Int. J. Comput. Vis., 105(2):171–185
work page 2013
-
[8]
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal com- ponent analysis.Biometrika, 58:453–467
work page 1971
-
[9]
Jung, S., Dryden, I. L., and Marron, J. S. (2012). Analysis of principal nested spheres. Biometrika, 99(3):551–568
work page 2012
-
[10]
Kenward, M. G. and Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood.Biometrics, 53:983–997
work page 1997
-
[11]
Lee, H., Hingee, K. L., Scealy, J. L., Wood, A. T. A., Grunsky, E., and Marron, J. S. (2025). Principal subsimplex analysis. arXiv 2504.09853
-
[12]
Li, B., Yoon, C., and Ahn, J. (2023). Reproducing kernels and new approaches in compositional data analysis.Journal of Machine Learning Research, 24(327):1–34
work page 2023
-
[13]
Mardia, K. V . and Jupp, P. E. (2000).Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons Ltd., Chichester
work page 2000
-
[14]
Marron, J. S. and Dryden, I. L. (2021).Object Oriented Data Analysis. CRC Press/Chapman and Hall, Boca Raton
work page 2021
-
[15]
Monem, M., Dryden, I. L., and George, F. (2025). Principal nested spheres for high-dimensional data. arXiv 2511.08398
-
[16]
Ogunbiyi, O. D., Cappelini, L. T. D., Monem, M., Mejias, E., George, F., Gardinali, P., Bag- ner, D. M., and Quinete, N. (2024). Innovative non-targeted screening approach using high-resolution mass spectrometry for the screening of organic chemicals and identification of specific tracers of soil and dust exposure in children.Journal of Hazardous Material...
work page 2024
-
[17]
Pennec, X. (2006). Intrinsic statistics on Riemannian manifolds: Basic tools for geometric mea- surements.Journal of Mathematical Imaging and Vision, 25(1):127–154
work page 2006
-
[18]
Scealy, J. L. and Welsh, A. H. (2011). Regression for compositional data by using distributions defined on the hypersphere.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):351–375
work page 2011
-
[19]
Srivastava, A., Jermyn, I., and Joshi, S. (2007). Riemannian analysis of probability density functions with applications in vision. In2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. 19
work page 2007
-
[20]
Srivastava, A. and Klassen, E. P. (2016).Functional and Shape Data Analysis. Springer, New York. van den Boogaart, K. G. and Tolosana-Delgado, R. (2013).Analyzing Compositional Data with R. Springer, Heidelberg. van den Boogaart, K. G., Tolosana-Delgado, R., and Bren, M. (2024).compositions: Composi- tional Data Analysis. R package version 2.0-8. 20
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.