The $\alpha$--regression for compositional data: a unified framework for standard, temporal and spatial regression models including compositional predictors

Michail Tsagris; Yannis Pantazis

arxiv: 2510.12663 · v6 · submitted 2025-10-14 · 📊 stat.ME

The α--regression for compositional data: a unified framework for standard, temporal and spatial regression models including compositional predictors

Michail Tsagris , Yannis Pantazis This is my paper

Pith reviewed 2026-05-18 07:33 UTC · model grok-4.3

classification 📊 stat.ME

keywords compositional dataalpha regressionspatial regressionnonlinear least squareslog-ratio methodsmarginal effectsgeographically weighted regressionspatial filtering

0 comments

The pith

α-regression uses a data-driven power transform to unify standard, temporal and spatial models for compositional data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an α-regression framework that applies a flexible power transformation, controlled by a single parameter α, to compositional responses. This choice interpolates between raw-scale analysis and log-ratio methods while handling zeros directly and casting the problem as non-linear least squares. The approach yields asymptotic results, marginal effects, visual diagnostics, robust variants, spline extensions, and the ability to include compositional predictors for simple time-series models. Four spatial extensions are introduced: spatially lagged covariates with direct/indirect effect decomposition, spatial autoregression, geographically weighted regression, and eigenvector spatial filtering. Real-data applications show these models match or exceed prior methods and that spatial versions improve predictions by accounting for dependence.

Core claim

The central claim is that parameterizing a power transformation by α converts compositional regression into a non-linear least squares problem whose solution supplies marginal effects, handles zeros without imputation, supports compositional predictors for temporal modeling, and extends directly to four spatial specifications that capture autocorrelation or local variation, with empirical results indicating that the resulting predictions and dependence modeling are at least as good as, and often better than, those obtained from log-ratio transformations.

What carries the argument

The α power transformation inside the regression link, which is estimated jointly with the coefficients via the Levenberg-Marquardt algorithm and yields both marginal effects and spatial decompositions.

If this is right

Marginal effects of predictors on compositional responses become directly interpretable without back-transformation.
Spatial spillover can be decomposed into direct and indirect components via the lagged-X model.
Inclusion of compositional predictors yields a straightforward time-series formulation.
Geographically weighted and eigenvector-filtered versions allow local or global capture of spatial dependence.
Natural splines and robust estimation can be inserted without altering the overall non-linear least squares structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same α machinery could be tested on other simplex-constrained data such as probability distributions or budget shares.
Data-driven α selection might serve as a diagnostic for whether raw-scale or log-scale modeling is preferable in a given scientific domain.
Embedding the framework inside larger spatial econometric toolkits would allow joint estimation with non-compositional outcomes.

Load-bearing premise

A single data-driven power parameter α can produce a transformation that respects compositional constraints and yields reliable coefficients and predictions for the observed data.

What would settle it

On a fresh spatial compositional dataset, if the α-regression models produce higher out-of-sample prediction error or fail to improve upon log-ratio benchmarks after spatial dependence is accounted for, the claimed advantage would be refuted.

read the original abstract

The paper revisits the $\alpha$--regression framework for compositional data. The model uses a flexible power transformation parameterized by $\alpha$ to interpolate between raw data analysis and log--ratio methods, naturally handling zeros without imputation while allowing data--driven transformation selection. We formulate $\alpha$--regression as a non--linear least squares problem, study its asymptotic properties, provide efficient estimation via the Levenberg--Marquardt algorithm, derive marginal effects for interpretation, and provide a visual inspection of the effect of each predictor. We further discuss robustified versions, the inclusion of natural splines, and the incorporation of compositional predictors which further facilitate the formulation of a simple time series model. The framework is extended to spatial settings through four models. a) The $\alpha$--spatially--lagged X regression model, which incorporates spatial spillover effects via spatially--lagged covariates, with decomposition into direct and indirect effects. b) The $\alpha$--spatial autoregressive model that allows for spatial autocorrelation. c) The geographically--weighted $\alpha$--regression, which allows coefficients to vary spatially for capturing local relationships. d) The $\alpha$--eigenvector spatial filtering that is computationally efficient and captures spatial dependence via the eigenvectors of the kernelized distance matrix. Applications to four real datasets illustrate that the models perform on par with or outperform existing models in the literature. The examples showcase that spatial extensions capture the dependence and improve the predictive performance. Overall, the examples provide evidence that the log--ratio methodology does not lead to the optimal results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper unifies α-regression for compositional data with temporal and four spatial extensions under NLS, but the outperformance claims over log-ratio methods rest on thin empirical validation.

read the letter

The main point here is that Tsagris and Pantazis have extended the α-regression idea for compositional data into one framework that covers standard regression, time series via compositional predictors, and four spatial models. They set it up as nonlinear least squares with a tunable power transform for α that bridges raw data and log-ratio styles while handling zeros directly. This is the concrete new piece, and it gives users options instead of forcing one transformation.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the α-regression framework for compositional data using a flexible power transformation with parameter α that interpolates between raw-data and log-ratio approaches while naturally handling zeros. It formulates the model as a nonlinear least squares problem, derives asymptotic properties for the base estimator, provides Levenberg-Marquardt estimation, derives marginal effects, and extends the approach to temporal models with compositional predictors as well as four spatial variants: spatially lagged X (with direct/indirect effects), spatial autoregressive, geographically weighted, and eigenvector spatial filtering. Applications to four real datasets are used to claim that the models perform on par with or outperform existing methods, that spatial extensions capture dependence and improve predictions, and that log-ratio methods are not optimal.

Significance. If the central empirical and theoretical claims hold, the work supplies a unified, data-driven alternative to log-ratio transformations that accommodates zeros without imputation and extends naturally to spatial and temporal settings. The explicit derivation of marginal effects together with the four spatial extensions could be useful for practitioners in ecology, geochemistry, and economics who require interpretable regression with compositional responses or predictors.

major comments (3)

[Asymptotic properties] § on asymptotic properties: consistency and asymptotic normality are stated for the base NLS estimator, but the manuscript provides no explicit argument showing that the same results continue to hold for the α-spatial autoregressive model or the eigenvector spatial filtering model once spatial dependence is introduced; this is load-bearing for the validity of standard errors and inference in the spatial applications.
[Applications] Applications section: the claim that the α-regression and its spatial extensions outperform or match existing models rests on four real datasets, yet the text does not report the precise out-of-sample metric (e.g., spatially blocked cross-validation versus ordinary CV), whether α is estimated jointly or by profile likelihood, or any statistical test of the performance differences versus log-ratio baselines; without these details the superiority could be driven by extra flexibility rather than the framework itself.
[Eigenvector spatial filtering] § on eigenvector spatial filtering: the kernel used to form the distance matrix and the rule for selecting the number of eigenvectors are not specified, so it is impossible to reproduce the reported spatial dependence capture or to assess sensitivity of the predictive gains.

minor comments (2)

[Model formulation] The role of the free parameter α and its estimation procedure should be stated more explicitly in the model definition to avoid ambiguity for readers new to compositional data.
[Marginal effects] Figures showing marginal effects would be clearer if they included pointwise confidence bands derived from the asymptotic variance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment in turn below, indicating the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Asymptotic properties] § on asymptotic properties: consistency and asymptotic normality are stated for the base NLS estimator, but the manuscript provides no explicit argument showing that the same results continue to hold for the α-spatial autoregressive model or the eigenvector spatial filtering model once spatial dependence is introduced; this is load-bearing for the validity of standard errors and inference in the spatial applications.

Authors: We agree that the asymptotic properties require explicit extension to the spatial models. In the revised manuscript we will add a dedicated subsection outlining the regularity conditions (e.g., on the spatial weight matrix and eigenvalue decay) under which consistency and asymptotic normality continue to hold for both the α-spatial autoregressive model and the eigenvector spatial filtering estimator. We will provide proof sketches that build on the base NLS arguments and reference relevant results from spatial econometrics literature. revision: yes
Referee: [Applications] Applications section: the claim that the α-regression and its spatial extensions outperform or match existing models rests on four real datasets, yet the text does not report the precise out-of-sample metric (e.g., spatially blocked cross-validation versus ordinary CV), whether α is estimated jointly or by profile likelihood, or any statistical test of the performance differences versus log-ratio baselines; without these details the superiority could be driven by extra flexibility rather than the framework itself.

Authors: We appreciate the request for greater transparency. In the revision we will explicitly describe the out-of-sample procedure (including whether spatially blocked cross-validation is employed), confirm that α is estimated jointly via the nonlinear least-squares criterion, and add formal statistical comparisons (e.g., paired tests or Diebold-Mariano tests) of the performance differences relative to the log-ratio baselines. These additions will clarify that reported gains are not merely due to extra flexibility. revision: yes
Referee: [Eigenvector spatial filtering] § on eigenvector spatial filtering: the kernel used to form the distance matrix and the rule for selecting the number of eigenvectors are not specified, so it is impossible to reproduce the reported spatial dependence capture or to assess sensitivity of the predictive gains.

Authors: We acknowledge the omission. The revised manuscript will specify the kernel (a Gaussian kernel whose bandwidth is chosen by cross-validation) used to construct the distance matrix and the eigenvector selection rule (eigenvectors retained when their eigenvalues exceed a data-driven threshold chosen to minimize out-of-sample error). These details will ensure full reproducibility and permit sensitivity checks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a methodological framework that formulates α-regression as a non-linear least squares problem, derives asymptotic properties for the base estimator using standard arguments, specifies estimation via the Levenberg-Marquardt algorithm, computes marginal effects, and defines four spatial extensions through explicit model equations. These steps rely on independent statistical constructions and algorithmic choices rather than reducing any claimed result to a fitted parameter or self-citation by construction. Applications to four real datasets serve as empirical illustrations of performance rather than load-bearing derivations, with no evidence of self-definitional loops, renamed known results, or uniqueness theorems imported from prior author work. The central claims about outperformance and spatial dependence capture rest on data-driven comparisons that remain falsifiable outside the fitted values.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the suitability of the power transformation for compositional data and the validity of extending it to spatial dependence structures; α is estimated rather than fixed a priori.

free parameters (1)

alpha
The power transformation parameter is estimated from the data within the non-linear least squares problem rather than fixed in advance.

axioms (1)

domain assumption Compositional data can be modeled via power transformations that naturally accommodate zeros without imputation while preserving the sum-to-one constraint.
This assumption underpins the entire α-regression formulation and its extensions as described in the abstract.

pith-pipeline@v0.9.0 · 5825 in / 1566 out tokens · 75211 ms · 2026-05-18T07:33:48.138536+00:00 · methodology