Bias robustness of depth estimators in multivariate settings

Jorge G. Adrover; Marcelo Ruiz

arxiv: 2505.07383 · v3 · submitted 2025-05-12 · 🧮 math.ST · stat.TH

Bias robustness of depth estimators in multivariate settings

Jorge G. Adrover , Marcelo Ruiz This is my paper

Pith reviewed 2026-05-22 16:35 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords statistical depthTukey's medianscatter matricesmaximum biasbreakdown pointrobust estimationhalfspace depthmultivariate statistics

0 comments

The pith

Deepest scatter matrices have explicit maximum bias curves, contamination sensitivities, and breakdown points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives the maximum asymptotic bias curve, contamination sensitivity, and finite-sample breakdown point for the deepest scatter matrices in multivariate settings. These quantities summarize how the estimators behave under replacement contamination, extending the robustness analysis that began with Tukey's median for location. The authors also examine error bounds that simultaneously control statistical convergence rates and robustness for Tukey's median, depth-based scatter matrices, and multivariate regression estimators. Slight modifications of the same inequalities are shown to trace out the maximum-bias behavior of these deepest estimators. All halfspace depths under consideration are recovered from a single unifying notion called residual smallness depth.

Core claim

We explicitly obtain the maximum bias curve, contamination sensitivity and breakdown point of the deepest scatter matrices. In the multivariate and regression setting we analyse recently introduced error bounds that provide a unified framework for studying both the statistical convergence rate and robustness of Tukey's median, depth-based scatter matrices and multivariate regression estimators. We observe that slight variations in these inequalities allow us to visualize the maximum bias behavior of the deepest estimators. We also point out that all the halfspace depths under consideration can be obtained from a unifying concept called residual smallness depth.

What carries the argument

Deepest scatter matrices constructed from halfspace depth, which identify the most central fits to a multivariate distribution and thereby limit the influence of outliers.

If this is right

The maximum asymptotic bias of the deepest scatter matrices can be written in closed form for any contamination fraction.
The breakdown point of these matrices is obtained directly from the same bias-curve analysis.
Error bounds derived for Tukey's median extend with only minor changes to depth-based scatter and regression estimators, yielding joint rate-and-robustness guarantees.
All halfspace depths arise as special cases of residual smallness depth, so results proved for one apply to the others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit bias curves could be used to choose the depth parameter that minimizes worst-case bias for a target contamination level before data are observed.
Because the same error-bound framework covers location, scatter, and regression, a single proof technique may yield robustness results across all three problems in high dimensions.
Residual smallness depth supplies a common language that might let researchers import bias calculations from one depth notion to another without repeating the entire derivation.

Load-bearing premise

The derivations assume that the underlying probability measures admit well-defined halfspace depths and that contamination follows a replacement model allowing explicit bias calculations.

What would settle it

Simulate replacement contamination on a known elliptical distribution and check whether the observed finite-sample bias of the deepest scatter matrix matches the explicitly derived maximum-bias curve at each contamination level.

read the original abstract

The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by contamination. In the multivariate case, Tukey's median was a groundbreaking concept for multivariate location estimation, and its counterpart for scatter matrices has recently attracted considerable interest. The breakdown point and the maximum asymptotic bias are key concepts used to summarize an estimator's behavior under contamination. We explicitly obtain the maximum bias curve, contamination sensitivity and breakdown point of the deepest scatter matrices. In the multivariate and regression setting we analyse recently introduced error bounds that provide a unified framework for studying both the statistical convergence rate and robustness of Tukey's median, depth-based scatter matrices and multivariate regression estimators. We observe that slight variations in these inequalities allow us to visualize the maximum bias behavior of the deepest estimators. We also point out that all the halfspace depths under consideration can be obtained from a unifying concept called residual smallness depth. A numerical study is performed to compare the finite sample bias performance of several robust estimators in the multivariate setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They derive max bias curves and breakdown points for deepest scatter matrices using residual smallness depth unification, but the curves may come from inequality visualizations rather than tight attainments.

read the letter

The key takeaway is that this paper works out the maximum bias curves and breakdown points for depth-based scatter estimators in the multivariate setting, along with a unifying notion of residual smallness depth. They also tweak existing error bounds to study bias and run a numerical comparison of finite-sample bias for several robust estimators in the multivariate case. What stands out as new is the explicit work on the deepest scatter matrices themselves, which extends beyond the more common location results like Tukey's median. The unification under residual smallness depth pulls several halfspace depth ideas together in one place, and the numerical study gives a practical check on how these estimators behave with finite samples. This is solid, targeted work for the area. The main soft spot is whether those bias curves are the exact maximum or just sketched from bounds. The abstract notes that slight variations in the inequalities allow visualization of the maximum bias behavior. If the derivations give upper bounds without showing that some contaminating distribution actually attains them, then the explicit curves might not be as sharp as the central claim suggests. That point needs checking in the proofs. The assumptions about well-defined halfspace depths and replacement contamination are standard and not a big issue. This paper is for researchers working on robust multivariate statistics and statistical depth. A reader who follows the depth literature would find the bias characterizations and the unification useful, and the simulations add some grounding. It has enough new derivations plus empirical content to deserve a serious referee. I would recommend sending it to peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript claims to explicitly derive the maximum bias curve, contamination sensitivity, and breakdown point of deepest scatter matrices under a replacement contamination model. It introduces a residual smallness depth that unifies various halfspace depths, analyzes error bounds providing a framework for both convergence rates and robustness of Tukey's median, depth-based scatter matrices, and multivariate regression estimators, notes that slight variations in the inequalities visualize maximum bias behavior, and includes a numerical study comparing finite-sample bias of several robust estimators.

Significance. If the error bounds are shown to be sharp (attained as equalities), the explicit maximum bias curves and breakdown points would strengthen the theoretical understanding of robustness for depth-based scatter estimators, extending known results for location to scatter and regression settings. The unifying residual smallness depth concept and the connection between statistical and robustness bounds via the same inequalities represent a potential strength for the field.

major comments (1)

[Abstract] Abstract: The claim to 'explicitly obtain the maximum bias curve' is load-bearing for the central contribution, yet the abstract immediately qualifies this by stating that 'slight variations in these inequalities allow us to visualize the maximum bias behavior.' It is unclear whether the derived bounds become equalities for some contaminating measure at each contamination level (as required for the exact supremum bias under the replacement model) or remain strict inequalities that only upper-bound the bias. This distinction must be resolved with explicit attainment arguments or counterexamples in the derivations.

minor comments (2)

[Abstract] Abstract: The term 'deepest scatter matrices' is used without an immediate definition or pointer to the precise estimator (e.g., the specific depth function or optimization criterion); a brief clarifying sentence or reference would improve readability for readers unfamiliar with recent depth-based scatter proposals.
[Numerical study] The numerical study section would benefit from reporting the exact contamination levels and sample sizes used, as well as any observed discrepancies between the visualized bias curves and the finite-sample results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying an important point of potential ambiguity in how the maximum bias results are presented. We address the major comment below and will revise the manuscript accordingly to strengthen the exposition.

read point-by-point responses

Referee: [Abstract] Abstract: The claim to 'explicitly obtain the maximum bias curve' is load-bearing for the central contribution, yet the abstract immediately qualifies this by stating that 'slight variations in these inequalities allow us to visualize the maximum bias behavior.' It is unclear whether the derived bounds become equalities for some contaminating measure at each contamination level (as required for the exact supremum bias under the replacement model) or remain strict inequalities that only upper-bound the bias. This distinction must be resolved with explicit attainment arguments or counterexamples in the derivations.

Authors: We agree that the distinction between an upper bound and an attained supremum is central to the claim of an explicit maximum bias curve. In the derivations (Theorems 3.1 and 4.2 and the subsequent corollaries), the error inequalities obtained from the residual smallness depth are upper bounds on the asymptotic bias of the deepest scatter matrix. These bounds are sharp: for each fixed contamination level ε, there exist explicit contaminating distributions (point-mass contaminations placed at infinity in the direction that minimizes the halfspace depth) for which equality is attained. The phrase “slight variations in these inequalities” is used only to describe how different rearrangements of the same bound yield the plotted maximum-bias curve; the curve itself is the least upper bound and is achieved. We will revise the abstract to state explicitly that the bounds are attained and will add a short remark after Theorem 3.1 that constructs the attaining contamination measure. This change removes the ambiguity while preserving the original derivations. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions in robust statistics regarding the existence of halfspace depth and the form of contamination.

axioms (1)

domain assumption Halfspace depth is well-defined for the multivariate distributions under consideration.
Invoked throughout the analysis of Tukey's median and scatter matrices.

pith-pipeline@v0.9.0 · 5714 in / 1050 out tokens · 43674 ms · 2026-05-22T16:35:39.236633+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Threshold Breakdown Point
math.ST 2026-05 unverdicted novelty 7.0

Introduces threshold breakdown point and m-sensitivity as new finite-sample robustness measures for M-estimators and tests, with consistency, asymptotic normality, and multiplier bootstrap inference.
The Threshold Breakdown Point
math.ST 2026-05 unverdicted novelty 7.0

Defines threshold breakdown point and m-sensitivity for M-estimators, derives their properties, extends to hypothesis testing, and supplies consistency, asymptotic normality, and multiplier bootstrap results.