Distribution-robust mean estimation via smoothed random perturbations

Matthew J. Holland

arxiv: 1906.10300 · v1 · pith:MDFXBXSFnew · submitted 2019-06-25 · 🧮 math.ST · stat.TH

Distribution-robust mean estimation via smoothed random perturbations

Matthew J. Holland This is my paper

Pith reviewed 2026-05-25 16:35 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords mean estimationfinite variancerobust estimatorsrandom perturbationsexponential tailssoft truncationsub-Gaussian bounds

0 comments

The pith

Integrating random noise into soft-truncated sample means produces estimators with exponential tails governed only by the second moment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a new family of mean estimators that start from a soft-truncated version of the empirical mean and then average that quantity over an auxiliary random perturbation. For several concrete noise families the resulting integral admits a closed-form expression. Using relative-entropy arguments the authors show that the probability of large deviations decays exponentially at a rate controlled solely by the second moment of the data-generating distribution. A reader would care because this yields practical estimators whose performance approaches the sub-Gaussian benchmark while requiring only the weakest moment assumption that still makes the mean well-defined.

Core claim

What carries the argument

The smoothed estimator formed by integrating a soft-truncated empirical mean against an auxiliary random perturbation (additive or multiplicative) drawn from a noise family that permits closed-form evaluation.

If this is right

The estimator admits a closed-form expression for the noise families examined.
Deviation probabilities decay exponentially with a rate determined only by the second moment.
Performance remains close to sub-Gaussian across a range of distributions when the mean-to-standard-deviation ratio varies.
Both additive and multiplicative noise versions inherit the same tail guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same smoothing idea could be applied to other location functionals whose empirical versions admit soft truncation.
Empirical sensitivity to the mean-standard-deviation ratio suggests that a data-driven choice of noise scale may further improve finite-sample behavior.
Because the construction uses only second-moment information, it offers a template for robust procedures in settings where higher moments are unavailable or infinite.

Load-bearing premise

The chosen noise distributions permit both closed-form integration and the direct application of relative-entropy inequalities that convert second-moment information into exponential tail bounds.

What would settle it

A concrete counter-example would be a sequence of finite-variance distributions together with sample sizes for which the observed deviation probability of the new estimator exceeds the exponential bound predicted by the second moment alone.

Figures

Figures reproduced from arXiv: 1906.10300 by Matthew J. Holland.

**Figure 1.** Figure 1: Graph of ψ(u) (in green), along with upper and lower bounds given in (10). Let W denote an arbitrary random variable. We consider computation of E ψ(a + bW), where expectation is taken with respect to W, and a ∈ R and b > 0 are respectively shift and scale parameters. To streamline implementation, for integer k > 0 and input u ∈ R, we introduce the notation Mk a,b(u) . .= E Wk I {a + bW ≤ u} , (6) Dk a,b(u… view at source ↗

**Figure 2.** Figure 2: Deviations |Xb −EP X| averaged over all trials, plotted as a function of the ratio r(X) = E X/ sd(X). Sample size is n = 20, variance level is low. Left: Normal data. Right: log-Normal data. 4.2 Impact of mean-SD ratio In [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Deviations |Xb − EP X| averaged over all trials, plotted as a function of the sample size n. Mean to standard deviation ratio is r(X) = 1.0, variance level is low. Left: Normal data. Right: log-Normal data. we only compare the two classical estimators and two known sub-Gaussian estimators with the Bernoulli-type estimators from our class of interest. In addition to the consistency of Xb× (and its centered … view at source ↗

**Figure 4.** Figure 4: Histograms of deviations |Xb−EP X| for all estimators being evaluated, with accompanying two-sided 1 − 2δ confidence intervals. Data is log-Normal, sample size is n = 20, variance level is low, mean to standard deviation ratio is r(X) = 1.0. Since rare values are difficult to see, the dashed black vertical line indicates the largest observed deviation. the new bounds and the computational formulas obtained… view at source ↗

**Figure 5.** Figure 5: Boxplots of deviations |Xb − EP X| over all trials. Sample size is n = 20, mean to standard deviation ratio is r(X) = 1.0. Left column: Normal data. Right column: log-Normal data. The rows correspond to low-high variance levels. described above. As mentioned in the introduction, mean estimators which perform well under weak assumptions on the underlying distribution plays an important role in developing m… view at source ↗

read the original abstract

We consider the problem of mean estimation assuming only finite variance. We study a new class of mean estimators constructed by integrating over random noise applied to a soft-truncated empirical mean estimator. For appropriate choices of noise, we show that this can be computed in closed form, and utilizing relative entropy inequalities, these estimators enjoy deviations with exponential tails controlled by the second moment of the underlying distribution. We consider both additive and multiplicative noise, and several noise distribution families in our analysis. Furthermore, we empirically investigate the sensitivity to the mean-standard deviation ratio for numerous concrete manifestations of the estimator class of interest. Our main take-away is that an inexpensive new estimator can achieve nearly sub-Gaussian performance for a wide variety of data distributions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a new smoothed-perturbation estimator for the mean that gets exponential tails from only finite variance, with closed forms for several noise families.

read the letter

The core contribution is a class of mean estimators built by integrating random additive or multiplicative noise over a soft-truncated empirical mean. For Gaussian, Laplace, and a few other distributions the integral has a closed form, and the tail bounds follow from relative-entropy inequalities that depend only on the second moment. The derivations are supplied explicitly for both noise types, which removes the usual higher-moment requirements that block most sub-Gaussian results. The empirical section checks sensitivity to the mean-to-sd ratio across several concrete versions of the estimator and shows the performance stays close to sub-Gaussian for a range of distributions. That combination of weak assumptions, closed-form computation, and explicit bounds is the part worth paying attention to. The construction does not appear to collapse to existing trimmed or median-based methods. The main limitation is that the simulations emphasize parameter sensitivity rather than direct comparisons against other finite-variance robust estimators such as Catoni-type or median-of-means procedures, so the practical improvement over simpler alternatives is not fully quantified. The truncation level and noise scale still need to be set, even if the paper argues they can be chosen without knowledge of the distribution. Readers working on robust estimation or heavy-tailed data analysis will find the technical development useful. The paper is coherent on its own terms and supplies the derivations needed to check the central claims, so it merits a serious referee.

Referee Report

0 major / 3 minor

Summary. The paper proposes a class of mean estimators for distributions with only finite second moment, constructed by integrating a soft-truncated empirical mean against additive or multiplicative random perturbations drawn from chosen noise families. For suitable noise distributions the resulting estimator admits a closed-form expression. Using relative-entropy inequalities the authors derive exponential tail bounds on the deviation from the true mean whose rate depends only on the second moment. An empirical study examines sensitivity of the estimator to the mean-to-standard-deviation ratio across several distributions and concludes that the method achieves nearly sub-Gaussian performance for a wide range of data.

Significance. If the closed-form derivations and the relative-entropy tail bounds hold, the work supplies a computationally inexpensive, distribution-robust mean estimator whose performance guarantees require only finite variance and no higher-moment or distribution-specific tuning parameters. The explicit constructions for Gaussian, Laplace and related families, together with the empirical sensitivity analysis, constitute a concrete contribution to robust statistics under minimal assumptions.

minor comments (3)

[§3] §3 (closed-form derivations): the explicit integral evaluations for the additive Gaussian and Laplace cases are stated to be closed-form, but the manuscript does not list the final simplified expressions in the main text; placing them in an appendix reduces readability of the central claim.
[Table 1, Figure 2] Table 1 and Figure 2: the reported sensitivity curves are shown only for a fixed sample size n=1000; adding a brief statement on how the curves change with n would strengthen the empirical section.
[Eq. (2)] Notation: the soft-truncation threshold is denoted differently in the definition (Eq. (2)) and in the subsequent tail-bound statements; a single consistent symbol would eliminate minor confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, the assessment of its significance, and the recommendation for minor revision. No specific major comments appear in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper constructs smoothed estimators by integrating random perturbations over a soft-truncated empirical mean and derives closed-form expressions for specific additive and multiplicative noise families (Gaussian, Laplace, etc.). Tail bounds are obtained via standard relative-entropy inequalities whose rate depends only on the second moment; these inequalities are external mathematical facts, not fitted or self-referential. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz imported from the authors' prior work. The argument supplies explicit derivations that close without invoking the target performance as an input.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the finite-variance assumption and the existence of suitable noise distributions that permit closed-form integration and application of relative-entropy tail bounds.

axioms (1)

domain assumption The underlying distribution has finite second moment (variance).
Explicitly stated as the only assumption in the problem setup.

pith-pipeline@v0.9.0 · 5634 in / 1282 out tokens · 33126 ms · 2026-05-25T16:35:43.144683+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study a new class of mean estimators constructed by integrating over random noise applied to a soft-truncated empirical mean estimator... utilizing relative entropy inequalities, these estimators enjoy deviations with exponential tails controlled by the second moment
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ψ(u) = u−u³/6 on [−√2,√2] ... −log(1−u+u²/2)≤ψ(u)≤log(1+u+u²/2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

[1]

and Stegun, I

Abramowitz, M. and Stegun, I. A. (1964).Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, volume 55 ofNational Bureau of Standards Applied Mathematics Series. US National Bureau of Standards

work page 1964
[2]

Bauckhage, C. (2013). Computing the Kullback-Leibler divergence between two Weibull distribu- tions. arXiv preprint arXiv:1310.3713

work page internal anchor Pith review Pith/arXiv arXiv 2013
[3]

Brownlees, C., Joly, E., and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43(6):2507–2536

work page 2015
[4]

Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185

work page 2012
[5]

Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression

Catoni, O. and Giulini, I. (2017). Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression.arXiv preprint arXiv:1712.02747

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Chen, Y., Su, L., and Xu, J. (2017). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. InProceedings of the ACM on Measurement and Analysis of Computing Systems, volume 1. ACM

work page 2017
[7]

Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R. I. (2016). Sub-gaussian mean estimators. Annals of Statistics, 44(6):2695–2725

work page 2016
[8]

Holland, M. J. (2019a). PAC-Bayes under potentially heavy tails.arXiv preprint arXiv:1905.07900

work page arXiv 1905
[9]

Holland, M. J. (2019b). Robust descent using smoothed multiplicative noise. In22nd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), volume 89 ofProceedings of Machine Learning Research, pages 703–711

work page
[10]

Holland, M. J. and Ikeda, K. (2017). Robust regression using biased objectives.Machine Learning, 106(9):1643–1679

work page 2017
[11]

Holland, M. J. and Ikeda, K. (2019). Better generalization with less data using robust gradient descent. In 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research

work page 2019
[12]

Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2):149–176

work page 1949
[13]

and Oliveira, R

Lerasle, M. and Oliveira, R. I. (2011). Robust empirical mean estimators. arXiv preprint arXiv:1112.3914

work page arXiv 2011
[14]

and Ord, J

Stuart, A. and Ord, J. K. (1994).Kendall’s Advanced Theory of Statistics Volume 1: Distribution Theory. Hodder Arnold, 6th edition

work page 1994
[15]

and Rubio, F

Villa, C. and Rubio, F. J. (2018). Objective priors for the number of degrees of freedom of a multivariate t distribution and thet-copula. Computational Statistics & Data Analysis, 124:197–219. 26

work page 2018

[1] [1]

and Stegun, I

Abramowitz, M. and Stegun, I. A. (1964).Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, volume 55 ofNational Bureau of Standards Applied Mathematics Series. US National Bureau of Standards

work page 1964

[2] [2]

Bauckhage, C. (2013). Computing the Kullback-Leibler divergence between two Weibull distribu- tions. arXiv preprint arXiv:1310.3713

work page internal anchor Pith review Pith/arXiv arXiv 2013

[3] [3]

Brownlees, C., Joly, E., and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43(6):2507–2536

work page 2015

[4] [4]

Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185

work page 2012

[5] [5]

Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression

Catoni, O. and Giulini, I. (2017). Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression.arXiv preprint arXiv:1712.02747

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Chen, Y., Su, L., and Xu, J. (2017). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. InProceedings of the ACM on Measurement and Analysis of Computing Systems, volume 1. ACM

work page 2017

[7] [7]

Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R. I. (2016). Sub-gaussian mean estimators. Annals of Statistics, 44(6):2695–2725

work page 2016

[8] [8]

Holland, M. J. (2019a). PAC-Bayes under potentially heavy tails.arXiv preprint arXiv:1905.07900

work page arXiv 1905

[9] [9]

Holland, M. J. (2019b). Robust descent using smoothed multiplicative noise. In22nd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), volume 89 ofProceedings of Machine Learning Research, pages 703–711

work page

[10] [10]

Holland, M. J. and Ikeda, K. (2017). Robust regression using biased objectives.Machine Learning, 106(9):1643–1679

work page 2017

[11] [11]

Holland, M. J. and Ikeda, K. (2019). Better generalization with less data using robust gradient descent. In 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research

work page 2019

[12] [12]

Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2):149–176

work page 1949

[13] [13]

and Oliveira, R

Lerasle, M. and Oliveira, R. I. (2011). Robust empirical mean estimators. arXiv preprint arXiv:1112.3914

work page arXiv 2011

[14] [14]

and Ord, J

Stuart, A. and Ord, J. K. (1994).Kendall’s Advanced Theory of Statistics Volume 1: Distribution Theory. Hodder Arnold, 6th edition

work page 1994

[15] [15]

and Rubio, F

Villa, C. and Rubio, F. J. (2018). Objective priors for the number of degrees of freedom of a multivariate t distribution and thet-copula. Computational Statistics & Data Analysis, 124:197–219. 26

work page 2018