pith. sign in

arxiv: 2606.18365 · v1 · pith:XKBLTJPXnew · submitted 2026-06-16 · 📊 stat.ME

Logarithmic energy distances and Gini covariance for Hilbert-valued random elements

Pith reviewed 2026-06-26 23:13 UTC · model grok-4.3

classification 📊 stat.ME
keywords energy distancelogarithmic kernelGini covarianceHilbert spacemaximum mean discrepancyk-sample testasymptotic theoryfunctional data
0
0 comments X

The pith

As alpha approaches zero, normalized energy distances in Hilbert spaces converge to a logarithmic version using log of the norm that still characterizes equality of distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the limit behavior of generalized energy distances based on the kernel ||x-y||^alpha for alpha in (0,2) as alpha decreases to zero in real separable Hilbert spaces. With a suitable normalization depending on alpha, the distance converges to one defined by the kernel log||x-y||. This logarithmic energy distance keeps the key property that it equals zero precisely when the two random elements share the same distribution. The authors also obtain a representation of this distance through Gaussian-kernel maximum mean discrepancies and introduce a logarithmic Gini covariance for the k-sample problem, complete with structural properties, asymptotic theory, and permutation implementation.

Core claim

After suitable normalization, the energy distance with kernel ||x-y||^alpha converges to the logarithmic energy distance with kernel log||x-y|| as alpha ↓ 0. This logarithmic version retains the characterization that the distance is zero if and only if the two random elements have the same distribution in a real separable Hilbert space. It admits a representation in terms of Gaussian-kernel maximum mean discrepancies. Motivated by this, a logarithmic Gini covariance is defined for the k-sample problem, with representations in terms of pairwise distances, a characterization theorem, and asymptotic theory for the empirical version.

What carries the argument

The logarithmic energy distance defined via the kernel (x,y) mapsto log||x-y||, which arises as the normalized limit of generalized energy distances and supports the representation via Gaussian-kernel MMDs.

If this is right

  • The logarithmic energy distance characterizes equality of distributions for Hilbert-valued random elements.
  • A logarithmic Gini covariance statistic applies to testing equality of distributions in the k-sample problem.
  • Asymptotic distributions under the null and alternatives are available for the empirical logarithmic Gini covariance.
  • Permutation-based procedures implement the test based on the logarithmic Gini covariance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The boundary case may strengthen links between energy statistics and kernel methods for high-dimensional or functional data.
  • The logarithmic form could inspire similar limit investigations for other powered kernels or in non-Hilbert spaces.
  • Applications in high-dimensional inference may benefit when standard distances suffer from concentration effects.

Load-bearing premise

The limit of the suitably normalized energy distance exists as alpha approaches zero from above in a real separable Hilbert space.

What would settle it

Two distinct distributions on a separable Hilbert space whose logarithmic energy distance equals zero would disprove the retained characterization property.

Figures

Figures reproduced from arXiv: 2606.18365 by M. Dolores Jim\'enez-Gamero, Norbert Henze.

Figure 1
Figure 1. Figure 1: The Berkeley Growth Data. interval E1 Elog E0 interval E1 Elog E0 [1, 4) 0.011 0.025 0.027 [13, 18) 0.000 0.000 0.000 [4, 13) 0.228 0.381 0.392 [1, 18] 0.000 0.000 0.000 [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
read the original abstract

For $\alpha\in(0,2)$, the generalized energy distance and the Gini covariance statistic are based on kernels of the form $(x,y)\mapsto \|x-y\|^\alpha$, where $\|\cdot\|$ denotes the norm in a real separable Hilbert space. This paper investigates the boundary regime $\alpha\downarrow 0$. After suitable normalization, the corresponding energy distance converges to a logarithmic energy distance involving the kernel $(x,y)\mapsto\log\|x-y\|$. We establish that the resulting logarithmic energy distance retains the fundamental characterization property of ordinary energy distances in separable Hilbert spaces and derive a representation in terms of Gaussian-kernel maximum mean discrepancies. Motivated by this representation, we introduce a logarithmic Gini covariance for the $k$-sample problem and investigate its structural and asymptotic properties. In particular, we derive a representation in terms of pairwise logarithmic energy distances, establish a characterization theorem for equality of distributions, develop asymptotic null and alternative theory for the corresponding empirical statistic, and discuss permutation-based implementation. The logarithmic framework reveals a new boundary phenomenon within the family of energy-type statistics and provides connections with kernel methods, functional data analysis, and high-dimensional inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies the boundary regime α↓0 of the generalized energy distance and Gini covariance based on kernels ||x−y||^α (α∈(0,2)) for random elements in separable Hilbert spaces. After suitable α-dependent normalization it claims convergence to a logarithmic energy distance with kernel log||x−y||, proves that this limit retains the characterization property of ordinary energy distances, derives an MMD representation with Gaussian kernels, and introduces a logarithmic Gini covariance for the k-sample problem together with its structural properties, asymptotic theory, and permutation implementation.

Significance. If the normalized limit and characterization theorem are rigorously established, the work supplies a new boundary case linking energy distances to kernel methods and MMD, with potential utility in functional data analysis and high-dimensional inference. The explicit MMD representation and the permutation-based implementation are concrete strengths that would make the logarithmic Gini statistic immediately usable.

major comments (2)
  1. [normalization step / characterization theorem] The central claim that the normalized α-energy distance converges to the logarithmic version (abstract and the derivation leading to Eq. (log-energy)) requires a justification that the limit and expectation may be interchanged for arbitrary distributions on the Hilbert space. In infinite dimensions the family {||X−Y||^α}α∈(0,ε) need not admit a uniform integrable dominant, so the paper must supply either an explicit dominating function, a monotone-convergence argument, or a truncation-plus-remainder estimate that works uniformly over the class of distributions for which the α-energy distance is defined.
  2. [MMD representation] The representation of the logarithmic energy distance in terms of Gaussian-kernel MMD (the claim following the limit) is load-bearing for the subsequent Gini-covariance construction; the manuscript should state the precise conditions on the Gaussian bandwidth under which the representation holds and verify that these conditions are compatible with the Hilbert-space setting used for the characterization theorem.
minor comments (2)
  1. Notation for the normalized logarithmic distance should be introduced once and used consistently; currently the abstract and the later Gini section appear to employ slightly different scaling constants.
  2. The asymptotic null and alternative theory for the empirical logarithmic Gini statistic would benefit from an explicit statement of the moment conditions required on the underlying random elements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The two major comments identify technical points that require clarification or additional justification. We respond to each below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [normalization step / characterization theorem] The central claim that the normalized α-energy distance converges to the logarithmic version (abstract and the derivation leading to Eq. (log-energy)) requires a justification that the limit and expectation may be interchanged for arbitrary distributions on the Hilbert space. In infinite dimensions the family {||X−Y||^α}α∈(0,ε) need not admit a uniform integrable dominant, so the paper must supply either an explicit dominating function, a monotone-convergence argument, or a truncation-plus-remainder estimate that works uniformly over the class of distributions for which the α-energy distance is defined.

    Authors: We agree that an explicit justification for interchanging the limit and the expectation is required in the infinite-dimensional setting. In the revised version we will insert a dedicated lemma that supplies a truncation-plus-remainder argument. The argument proceeds by truncating the norm at a large but finite level M, applying the dominated-convergence theorem on the truncated part (where the integrand is bounded), and controlling the remainder uniformly over all distributions that possess finite α-energy distance by using the monotonicity of t ↦ t^α for α ∈ (0,2) together with the triangle inequality in the Hilbert norm. This establishes the desired interchange without requiring a single integrable dominant for the whole family. revision: yes

  2. Referee: [MMD representation] The representation of the logarithmic energy distance in terms of Gaussian-kernel MMD (the claim following the limit) is load-bearing for the subsequent Gini-covariance construction; the manuscript should state the precise conditions on the Gaussian bandwidth under which the representation holds and verify that these conditions are compatible with the Hilbert-space setting used for the characterization theorem.

    Authors: We will add an explicit statement that the MMD representation holds for every bandwidth σ > 0. Because the underlying space is a separable Hilbert space, the Gaussian kernel exp(−‖x−y‖²/(2σ²)) is positive definite and the associated RKHS is well-defined. The proof of the representation relies only on the Fourier transform of the Gaussian and on the fact that the logarithmic kernel arises as the α → 0 limit; these steps remain valid for any σ > 0 and impose no further restrictions beyond those already used for the characterization theorem. A short remark will be inserted to confirm this compatibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained

full rationale

The paper defines the logarithmic energy distance explicitly as the normalized limit of the α-energy distance (α↓0) with kernel log‖x−y‖, then separately proves that this object retains the characterization property for equality of distributions and admits an MMD representation with Gaussian kernels. These steps are carried out via direct limiting arguments and kernel identities in separable Hilbert spaces; no load-bearing step reduces by construction to a fitted parameter, a self-referential definition, or a self-citation chain whose validity depends on the present work. The derivation chain is therefore independent of its target conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the primary background assumptions are the separability of the Hilbert space and the existence of the normalized limit; no explicit free parameters or new postulated entities are introduced.

axioms (1)
  • domain assumption The underlying space is a real separable Hilbert space.
    Required for the norm to be defined on the random elements and for the energy-distance theory to apply.

pith-pipeline@v0.9.1-grok · 5732 in / 1378 out tokens · 48151 ms · 2026-06-26T23:13:46.816109+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 2 canonical work pages

  1. [1]

    Aeberhard and M

    S. Aeberhard and M. Forina (1992). Wine [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J. 17

  2. [2]

    Baringhaus and C

    L. Baringhaus and C. Franz,On a new multivariate two-sample test, J. Multiv. Anal.88(2004), 190–206

  3. [3]

    Baringhaus and C

    L. Baringhaus and C. Franz,Rigid motion invariant two-sample tests. Statist. Sinica20(2010), 1333–1361

  4. [4]

    X. Dang, D. Nguyen, Y. Chen and J. Zhang,A new Gini correlation between quantitative and qualitative variables. Scand. J. Stat.48(2021), 1314–1314

  5. [5]

    Ebner and N

    B. Ebner and N. Henze,Test for multivariate normality – a critical review with emphasis on weightedL 2-statistics, TEST29(2020), 845–892

  6. [6]

    Iris [dataset],

    R. Fisher(1936). Iris [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76

  7. [7]

    Hardy,Divergent Series, Oxford University Press, Oxford 1949

    G.H. Hardy,Divergent Series, Oxford University Press, Oxford 1949

  8. [8]

    Henze,Extreme smoothing and testing for multivariate normality, Statist

    N. Henze,Extreme smoothing and testing for multivariate normality, Statist. & Prob. Lett.35 (1997), 203–213

  9. [9]

    Henze,Asymptotic Stochastics

    N. Henze,Asymptotic Stochastics. An introduction with a view towards statistics, Mathematics Study Resources Vol. 10, Springer, Heidelberg 2024

  10. [10]

    Jim´ enez-Gamero and M.R

    M.D. Jim´ enez-Gamero and M.R. Sillero-Denamiel,Thek-sample problem using Gini covari- ance for largek, J. Multiv. Anal.210(2025), 105463

  11. [11]

    R: A language and environment for statistical computing

    R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  12. [12]

    fda: Functional Data Analysis

    Ramsay J (2025). fda: Functional Data Analysis. R package version 6.3.0, https://CRAN.R- project.org/package=fda

  13. [13]

    Rizzo and G.J

    M.L. Rizzo and G.J. Sz´ ekely.DISCO analysis: A nonparametric extension of analysis of vari- ance.Ann. Appl. Stat. 4 (2) (2010) 1034–1055

  14. [14]

    Serfling,Approximation Theorems of Mathematical Statistics,Wiley, New York 1980

    R.J. Serfling,Approximation Theorems of Mathematical Statistics,Wiley, New York 1980

  15. [15]

    Sang and X

    Y. Sang and X. Dang.Asymptotic normality of Gini correlation in high dimension with appli- cations to the K-sample problem. Electron. J. Stat.17(2023) 2539–2574

  16. [16]

    Schoenberg, Metric spaces and positive definite functions (1938).Trans

    I.J. Schoenberg, Metric spaces and positive definite functions (1938).Trans. Amer. Math. Soc. 44, 522–536

  17. [17]

    Sz´ ekely and M.L

    G.J. Sz´ ekely and M.L. Rizzo.Energy statistics: A class of statistics based on distances.J. Stat. Plann. Infer.143(2013), 1249-–1272

  18. [18]

    Zhang, X

    J.T. Zhang, X. Liang, and S. Xiao.On the two-sample Behrens-Fisher problem for functional data. J. Statist. Theory Pract.4(2010), 571–587. 18