pith. sign in

arxiv: 2502.08461 · v2 · submitted 2025-02-12 · 🧮 math.ST · stat.AP· stat.ME· stat.TH

On the Dirichlet-kernel Gasser--M\"uller estimator and its competitors for fixed design regression on the simplex

Pith reviewed 2026-05-23 04:09 UTC · model grok-4.3

classification 🧮 math.ST stat.APstat.MEstat.TH
keywords Dirichlet kernelGasser-Müller estimatorfixed design regressionsimplexnonparametric regressionasymptotic normalitymean integrated squared errorsimulation study
0
0 comments X

The pith

A Dirichlet-kernel Gasser-Müller estimator for fixed-design regression on the simplex has explicit pointwise bias, variance, asymptotic normality, and mean integrated squared error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Dirichlet-kernel Gasser-Müller estimator that extends the univariate construction of Chen to nonparametric regression when design points lie on the simplex. It derives the pointwise bias and variance of the estimator, proves its asymptotic normality, and obtains an asymptotic formula for its mean integrated squared error. Finite-sample simulations compare the new estimator against the Dirichlet-kernel Nadaraya-Watson and local-linear estimators on the same kernel and find that the local-linear version achieves the lowest error while the proposed estimator achieves the highest error. The estimators are also applied to relate soil composition to pH levels in the GEMAS European dataset.

Core claim

The Dirichlet-kernel Gasser-Müller estimator is constructed by weighting the observed responses with a Dirichlet kernel centered at the target point inside the simplex. Its pointwise bias admits an expansion of order equal to the bandwidth, its variance admits an expansion of order 1 over sample size times bandwidth to the power of the dimension, the properly centered and scaled estimator converges in distribution to a normal random variable, and the mean integrated squared error admits an explicit asymptotic expansion whose leading terms can be minimized with respect to the bandwidth.

What carries the argument

The Dirichlet-kernel Gasser-Müller estimator, a weighted average of responses that replaces the usual kernel with a Dirichlet kernel satisfying the usual moment and positivity conditions on the simplex.

If this is right

  • The bias and variance expansions yield the optimal bandwidth rate that balances the two terms in the mean integrated squared error.
  • Asymptotic normality supplies the limiting distribution needed to form pointwise approximate confidence intervals at interior points.
  • The explicit mean integrated squared error formula permits analytic comparison of asymptotic efficiency among the three Dirichlet-kernel estimators.
  • The simulation ranking indicates that the local-linear version should be used in preference to the Gasser-Müller version for data sets of moderate size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The consistent underperformance of the Gasser-Müller weights relative to local-linear weights on the simplex suggests that the weighting scheme itself, rather than the kernel family, is the dominant source of the difference.
  • The real-data illustration on compositional predictors implies that any of the three estimators can be applied directly to problems in which the covariates are constrained to sum to one.
  • Replacing the fixed design by a random design drawn from a Dirichlet distribution would require only minor changes to the bias and variance derivations already obtained.

Load-bearing premise

The fixed design points lie on the simplex and the Dirichlet kernel satisfies the moment and positivity conditions needed for the bias and variance expansions to hold.

What would settle it

A Monte Carlo experiment on a known twice-differentiable regression function on the simplex with sample sizes growing to several thousand in which the empirical distribution of the normalized estimator fails to approach a normal limit would falsify the asymptotic normality claim.

Figures

Figures reproduced from arXiv: 2502.08461 by Christian Genest, Fr\'ed\'eric Ouimet, Hanen Daayeb, Nicolas Klutchnikoff, Salah Khardani.

Figure 3.1
Figure 3.1. Figure 3.1: The black dots represent the sequence of design points [PITH_FULL_IMAGE:figures/full_fig_p005_3_1.png] view at source ↗
Figure 6.1
Figure 6.1. Figure 6.1: Plot of leave-one-out cross-validation criterion as a function of the bandwidth for the GEMAS dataset. [PITH_FULL_IMAGE:figures/full_fig_p012_6_1.png] view at source ↗
Figure 6.2
Figure 6.2. Figure 6.2: Density plot of the estimated pH in CaCl2 as a function of the proportion of sand and silt. [PITH_FULL_IMAGE:figures/full_fig_p013_6_2.png] view at source ↗
read the original abstract

A Dirichlet-kernel Gasser-M\"uller (D-GM) estimator is introduced for fixed design regression on the simplex, extending the univariate analog due to Chen [Statist. Sinica, vol. 10(1) (2000), pp. 73-91]. Its pointwise bias and variance, asymptotic normality, and mean integrated squared error are investigated. Some simulation experiments are conducted to compare its small-sample performance with that of two recently proposed alternatives: the Dirichlet-kernel Nadaraya-Watson (D-NW) and local linear (D-LL) estimators. The simulation results reveal that the D-LL estimator is best among the D-LL, D-NW, and D-GM estimators and that the proposed D-GM estimator is worst. A real data analysis is also reported for the GEMAS dataset to analyze the relationship between soil composition and pH levels across various agricultural and grazing lands in Europe.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a Dirichlet-kernel Gasser-Müller (D-GM) estimator for fixed-design nonparametric regression on the simplex, extending Chen (2000). It derives pointwise bias, variance, asymptotic normality, and MISE; conducts simulations comparing D-GM to D-NW and D-LL estimators (finding D-LL best and D-GM worst); and applies the methods to the GEMAS soil-composition dataset for pH modeling.

Significance. If the asymptotic expansions hold, the work supplies a new fixed-design estimator for regression on compositional data and supplies simulation evidence favoring local-linear over Gasser-Müller and Nadaraya-Watson versions on the simplex. The inclusion of both theoretical derivations and a real-data illustration is a positive feature.

major comments (2)
  1. [Sections containing the bias/variance and MISE theorems] The bias, variance, normality, and MISE derivations extend Chen (2000) but rest on the Dirichlet kernel satisfying the standard moment conditions (integral equals 1, appropriate first-moment vanishing for bias order h, positivity, and tail decay) when integrated against the fixed-design measure on the simplex. The manuscript states the extension without re-deriving or explicitly verifying these integrals under the simplex geometry and boundary behavior; this verification is load-bearing for all four theoretical results.
  2. [Simulation section] The simulation design and results (D-GM worst) are consistent with the possibility that the kernel conditions fail to transfer directly, yet the paper offers no diagnostic (e.g., numerical check of the kernel moments on the simplex) that would confirm or refute the source of the performance gap.
minor comments (1)
  1. Notation for the simplex, the fixed-design points, and the precise definition of the Dirichlet kernel should be introduced with a dedicated preliminary subsection before the estimator is defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying these key points about the theoretical assumptions and the simulation diagnostics. We address each major comment below.

read point-by-point responses
  1. Referee: [Sections containing the bias/variance and MISE theorems] The bias, variance, normality, and MISE derivations extend Chen (2000) but rest on the Dirichlet kernel satisfying the standard moment conditions (integral equals 1, appropriate first-moment vanishing for bias order h, positivity, and tail decay) when integrated against the fixed-design measure on the simplex. The manuscript states the extension without re-deriving or explicitly verifying these integrals under the simplex geometry and boundary behavior; this verification is load-bearing for all four theoretical results.

    Authors: We agree that the manuscript does not contain an explicit re-derivation or numerical/symbolic verification of the moment conditions for the Dirichlet kernel under the fixed-design measure on the simplex, including boundary effects. While the structure follows Chen (2000), this verification is indeed necessary to rigorously support the bias, variance, normality, and MISE results. In the revised manuscript we will add a new subsection (or appendix) that explicitly verifies the required integrals: the kernel integrates to 1, the first-moment condition holds at the appropriate order in h, positivity is preserved, and the tail decay is sufficient, all with respect to the simplex geometry and the fixed-design points. revision: yes

  2. Referee: [Simulation section] The simulation design and results (D-GM worst) are consistent with the possibility that the kernel conditions fail to transfer directly, yet the paper offers no diagnostic (e.g., numerical check of the kernel moments on the simplex) that would confirm or refute the source of the performance gap.

    Authors: We concur that the observed ranking (D-LL best, D-GM worst) could be consistent with the moment conditions not transferring directly, and that the absence of a diagnostic leaves this possibility unexamined. We will add, in the revised simulation section, a numerical check that evaluates the relevant kernel moments (integral, first moment, etc.) when integrated against the empirical fixed-design measure on the simplex for the bandwidths and sample sizes used in the experiments. This diagnostic will be reported alongside the existing simulation results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are extensions of external prior work

full rationale

The paper extends Chen (2000) univariate construction to the simplex case and derives bias, variance, asymptotic normality, and MISE for the D-GM estimator. No quoted steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations. The Chen citation is external and independent. Concerns about moment conditions on the simplex are validity issues, not circularity. This matches the default expectation of a self-contained extension.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms or invented entities; the construction relies on standard kernel assumptions carried over from the univariate case.

pith-pipeline@v0.9.0 · 5722 in / 1230 out tokens · 43097 ms · 2026-05-23T04:09:57.667359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    R. J. Adler and J. E. Taylor. Random F ields and G eometry . Springer Monographs in Mathematics. Springer, New York, 2007. ISBN 978-0-387-48112-8. MR2319516 http://www.ams.org/mathscinet-getitem?mr=MR2319516

  2. [2]

    L. R. Belzile, A. Desgagn\'e, C. Genest, and F. Ouimet. Normal approximations for the multivariate inverse G aussian distribution and asymmetric kernel smoothing on d -dimensional half-spaces. Preprint, page 45 pp., 2024. arXiv:2209.04757 https://arxiv.org/abs/2209.04757

  3. [3]

    Bernstein

    S. Bernstein. D\'emonstration du th\'eor\`eme de W eierstrass, fond\'ee sur le calcul des probabilit\'es. Commun. Soc. Math. Kharkow, 2 0 (13): 0 1--2, 1912

  4. [4]

    Bertin, C

    K. Bertin, C. Genest, N. Klutchnikoff, and F. Ouimet. Minimax properties of D irichlet kernel density estimators. J. Multivariate Anal., 195: 0 Paper No. 105158, 16 pp., 2023. MR4544604 http://www.ams.org/mathscinet-getitem?mr=MR4544604

  5. [5]

    Bouezmarni and J

    T. Bouezmarni and J. V. K. Rombouts. Nonparametric density estimation for multivariate bounded data. J. Statist. Plann. Inference, 140 0 (1): 0 139--152, 2010. MR2568128 http://www.ams.org/mathscinet-getitem?mr=MR2568128

  6. [6]

    Bouzebda, A

    S. Bouzebda, A. Nezzal, and I. Elhattab. Limit theorems for nonparametric conditional U -statistics smoothed by asymmetric kernels. AIMS Math., 9 0 (9): 0 26195--26282, 2024. MR4796622 http://www.ams.org/mathscinet-getitem?mr=MR4796622

  7. [7]

    B. M. Brown and S. X. Chen. Beta- B ernstein smoothing for regression curves with compact support. Scand. J. Statist., 26 0 (1): 0 47--59, 1999. MR1685301 http://www.ams.org/mathscinet-getitem?mr=MR1685301

  8. [8]

    S. X. Chen. Beta kernel estimators for density functions. Comput. Statist. Data Anal., 31 0 (2): 0 131--145, 1999. MR1718494 http://www.ams.org/mathscinet-getitem?mr=MR1718494

  9. [9]

    S. X. Chen. Beta kernel smoothers for regression curves. Statist. Sinica, 10 0 (1): 0 73--91, 2000. MR1742101 http://www.ams.org/mathscinet-getitem?mr=MR1742101

  10. [10]

    S. X. Chen. Local linear smoothers using asymmetric kernels. Ann. Inst. Statist. Math., 54 0 (2): 0 312--323, 2002. MR1910175 http://www.ams.org/mathscinet-getitem?mr=MR1910175

  11. [11]

    Cheng, J

    M.-Y. Cheng, J. Fan, and J. S. Marron. On automatic boundary corrections. Ann. Statist., 25 0 (4): 0 1691--1708, 1997. MR1463570 http://www.ams.org/mathscinet-getitem?mr=MR1463570

  12. [12]

    W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc., 74 0 (368): 0 829--836, 1979. MR556476 http://www.ams.org/mathscinet-getitem?mr=MR556476

  13. [13]

    Daayeb, C

    H. Daayeb, C. Genest, S. Khardani, N. Klutchnikoff, and F. Ouimet. Dirichlet K ernel R egression, 2025. Available online at https://github.com/FredericOuimetMcGill https://github.com/FredericOuimetMcGill

  14. [14]

    Devroye, L

    L. Devroye, L. Gy\" o rfi, G. Lugosi, and H. Walk. On the measure of V oronoi cells. J. Appl. Probab., 54 0 (2): 0 394--408, 2017. MR3668473 http://www.ams.org/mathscinet-getitem?mr=MR3668473

  15. [15]

    J. Fan. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc., 87 0 (420): 0 998--1004, 1992. MR1209561 http://www.ams.org/mathscinet-getitem?mr=MR1209561

  16. [16]

    J. Fan. Local linear regression smoothers and their minimax efficiencies. Ann. Statist., 21 0 (1): 0 196--216, 1993. MR1212173 http://www.ams.org/mathscinet-getitem?mr=MR1212173

  17. [17]

    Fan and I

    J. Fan and I. Gijbels. Variable bandwidth and local linear regression smoothers. Ann. Statist., 20 0 (4): 0 2008--2036, 1992. MR1193323 http://www.ams.org/mathscinet-getitem?mr=MR1193323

  18. [18]

    Fan and I

    J. Fan and I. Gijbels. Local P olynomial M odelling and I ts A pplications , volume 66 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1996. ISBN 0-412-98321-4. MR1383587 http://www.ams.org/mathscinet-getitem?mr=MR1383587

  19. [19]

    Funke and M

    B. Funke and M. Hirukawa. Bias correction for local linear regression estimation using asymmetric kernels via the skewing method. Econom. Stat., 20 0 (C): 0 109--130, 2021. MR4302589 http://www.ams.org/mathscinet-getitem?mr=MR4302589

  20. [20]

    Funke and M

    B. Funke and M. Hirukawa. On uniform consistency of nonparametric estimators smoothed by the gamma kernel. Preprint, page 29 pp., 2024

  21. [21]

    Gasser and H.-G

    T. Gasser and H.-G. M\" u ller. Kernel estimation of regression functions. In Smoothing techniques for curve estimation ( P roc. W orkshop, H eidelberg, 1979) , volume 757 of Lecture Notes in Math., pages 23--68. Springer, Berlin, 1979. MR564251 http://www.ams.org/mathscinet-getitem?mr=MR564251

  22. [22]

    Gasser, H.-G

    T. Gasser, H.-G. M\" u ller, and V. Mammitzsch. Kernels for nonparametric curve estimation. J. Roy. Statist. Soc. Ser. B, 47 0 (2): 0 238--252, 1985. MR564251 http://www.ams.org/mathscinet-getitem?mr=MR564251

  23. [23]

    Genest and F

    C. Genest and F. Ouimet. Local linear smoothing for regression surfaces on the simplex using D irichlet kernels. Preprint, page 20 pp., 2024. arXiv:2408.07209 https://arxiv.org/abs/2408.07209

  24. [24]

    Gibbs and L

    I. Gibbs and L. Chen. Asymptotic properties of random V oronoi cells with arbitrary underlying density. Adv. in Appl. Probab., 52 0 (2): 0 655--680, 2020. MR4123649 http://www.ams.org/mathscinet-getitem?mr=MR4123649

  25. [25]

    Hirukawa, I

    M. Hirukawa, I. Murtazashvili, and A. Prokhorov. Uniform convergence rates for nonparametric estimators smoothed by the beta kernel. Scand. J. Stat., 49 0 (3): 0 1353--1382, 2022. ISSN 0303-6898,1467-9469. MR4471289 http://www.ams.org/mathscinet-getitem?mr=MR4471289

  26. [26]

    Hirukawa, I

    M. Hirukawa, I. Murtazashvili, and A. Prokhorov. Yet another look at the omitted variable bias. Econometric Rev., 42 0 (1): 0 1--27, 2023. ISSN 0747-4938,1532-4168. MR4556820 http://www.ams.org/mathscinet-getitem?mr=MR4556820

  27. [27]

    M. C. Jones. Simple boundary correction for kernel density estimation. Stat Comput., 3: 0 135--146, 1993. doi:10.1007/BF00147776. doi:10.1007/BF00147776 https://www.doi.org/10.1007/BF00147776

  28. [28]

    V. Ya. Katkovnik. Linear and nonlinear methods of nonparametric regression analysis. Avtomatika, 0 (5): 0 35--46, 93, 1979. MR582402 http://www.ams.org/mathscinet-getitem?mr=MR582402

  29. [29]

    C. C. Kokonendji and S. M. Som\' e . On multivariate associated kernels to estimate general density functions. J. Korean Statist. Soc., 47 0 (1): 0 112--126, 2018. MR3760293 http://www.ams.org/mathscinet-getitem?mr=MR3760293

  30. [30]

    M\" u ller

    H.-G. M\" u ller. Nonparametric R egression A nalysis of L ongitudinal D ata , volume 46 of Lecture Notes in Statistics. Springer-Verlag, Berlin, 1988. ISBN 3-540-96844-X. MR960887 http://www.ams.org/mathscinet-getitem?mr=MR960887

  31. [31]

    M\" u ller

    H.-G. M\" u ller. Smooth optimum kernel estimators near endpoints. Biometrika, 78 0 (3): 0 521--530, 1991. MR1130920 http://www.ams.org/mathscinet-getitem?mr=MR1130920

  32. [32]

    M\" u ller

    H.-G. M\" u ller. Surface and function approximation with nonparametric regression. Rend. Sem. Mat. Fis. Milano, 63: 0 171--211 (1995), 1993. MR1369600 http://www.ams.org/mathscinet-getitem?mr=MR1369600

  33. [33]

    M\" u ller and K

    H.-G. M\" u ller and K. A. Prewitt. Multiparameter bandwidth processes and adaptive surface smoothing. J. Multivariate Anal., 47 0 (1): 0 1--21, 1993. MR1239102 http://www.ams.org/mathscinet-getitem?mr=MR1239102

  34. [34]

    \`E. A. Nadaraja. On a regression estimate. Teor. Verojatnost. i Primenen., 9: 0 157--159, 1964. MR166874 http://www.ams.org/mathscinet-getitem?mr=MR166874

  35. [35]

    F. Ouimet. A symmetric matrix-variate normal local approximation for the W ishart distribution and some applications. J. Multivariate Anal., 189: 0 Paper No. 104923, 17 pp., 2022. MR4358612 http://www.ams.org/mathscinet-getitem?mr=MR4358612

  36. [36]

    Ouimet and R

    F. Ouimet and R. Tolosana-Delgado. Asymptotic properties of D irichlet kernel density estimators. J. Multivariate Anal., 187: 0 Paper No. 104832, 25 pp., 2022. MR4319409 http://www.ams.org/mathscinet-getitem?mr=MR4319409

  37. [37]

    M. B. Priestley and M. T. Chao. Non-parametric function fitting. J. Roy. Statist. Soc. Ser. B, 34, 1972. MR331616 http://www.ams.org/mathscinet-getitem?mr=MR331616

  38. [38]

    Reimann, P

    C. Reimann, P. Filzmoser, K. Fabian, K. Hron, M. Birke, A. Demetriades, E. Dinelli, A. Ladenberger, and The GEMAS Project Team. The concept of compositional data analysis in practice -- T otal major element concentrations in agricultural and grazing land soils of E urope. Sci. Total Environ., 426: 0 196--210, 2012. doi:10.1016/j.scitotenv.2012.02.032 http...

  39. [39]

    Ruppert and M

    D. Ruppert and M. P. Wand. Multivariate locally weighted least squares regression. Ann. Statist., 22 0 (3): 0 1346--1370, 1994. MR1311979 http://www.ams.org/mathscinet-getitem?mr=MR1311979

  40. [40]

    Shi and W

    J. Shi and W. Song. Asymptotic results in gamma kernel regression. Comm. Statist. Theory Methods, 45 0 (12): 0 3489--3509, 2016. MR3494026 http://www.ams.org/mathscinet-getitem?mr=MR3494026

  41. [41]

    B. W. Silverman. Density E stimation for S tatistics and D ata A nalysis . Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1986. ISBN 0-412-24620-1. MR848134 http://www.ams.org/mathscinet-getitem?mr=MR848134

  42. [42]

    S. M. Som\' e and C. C. Kokonendji. Effects of associated kernels in nonparametric multiple regressions. J. Stat. Theory Pract., 10 0 (2): 0 456--471, 2016. MR3499725 http://www.ams.org/mathscinet-getitem?mr=MR3499725

  43. [43]

    Stadtm\" u ller

    U. Stadtm\" u ller. Asymptotic properties of nonparametric curve estimates. Period. Math. Hungar., 17 0 (2): 0 83--108, 1986. MR858109 http://www.ams.org/mathscinet-getitem?mr=MR858109

  44. [44]

    C. J. Stone. Consistent nonparametric regression. Ann. Statist., 5 0 (4): 0 595--645, 1977. MR443204 http://www.ams.org/mathscinet-getitem?mr=MR443204

  45. [45]

    C. J. Stone. Optimal rates of convergence for nonparametric estimators. Ann. Statist., 8 0 (6): 0 1348--1360, 1980. MR594650 http://www.ams.org/mathscinet-getitem?mr=MR594650

  46. [46]

    C. J. Stone. Optimal global rates of convergence for nonparametric regression. Ann. Statist., 10 0 (4): 0 1040--1053, 1982. MR673642 http://www.ams.org/mathscinet-getitem?mr=MR673642

  47. [47]

    Tenbusch

    A. Tenbusch. Nonparametric curve estimation with B ernstein estimates. Metrika, 45 0 (1): 0 1--30, 1997. MR1437794 http://www.ams.org/mathscinet-getitem?mr=MR1437794

  48. [48]

    Wasserman

    L. Wasserman. All of N onparametric S tatistics . Springer Texts in Statistics. Springer, New York, 2006. ISBN 978-0387-25145-5; 0-387-25145-6. MR2172729 http://www.ams.org/mathscinet-getitem?mr=MR2172729

  49. [49]

    G. S. Watson. Smooth regression analysis. Sankhy\= a Ser. A , 26: 0 359--372, 1964. MR185765 http://www.ams.org/mathscinet-getitem?mr=MR185765

  50. [50]

    Zhang and R

    S. Zhang and R. J. Karunamuni. On kernel density estimation near endpoints. J. Statist. Plann. Inference, 70 0 (2): 0 301--316, 1998. MR1649872 http://www.ams.org/mathscinet-getitem?mr=MR1649872

  51. [51]

    Zhang and R

    S. Zhang and R. J. Karunamuni. On nonparametric density estimation at the boundary. J. Nonparametr. Statist., 12 0 (2): 0 197--221, 2000. MR1752313 http://www.ams.org/mathscinet-getitem?mr=MR1752313