Robust discriminant analysis
Pith reviewed 2026-05-23 22:25 UTC · model grok-4.3
The pith
Standard discriminant analysis fails under outliers or label errors, but robust versions using resistant location and scatter estimates stay reliable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that classical discriminant analysis uses non-robust estimators of location and scatter for each class, rendering it sensitive to outliers and mislabeled observations, and reviews a range of robust discriminant analysis methods that substitute resistant estimators of location and scatter to obtain classifications that remain reliable in the presence of deviating cases, together with graphical tools for visualizing results.
What carries the argument
Robust estimates of location and scatter computed separately for each class, which replace the arithmetic mean and sample covariance matrix when forming the discriminant rules.
If this is right
- Classifications obtained from robust discriminant analysis stay stable when a moderate fraction of the data are outliers.
- Mislabeled observations exert reduced influence on the estimated class boundaries.
- Graphical diagnostic tools allow identification of the suspicious points that affect the analysis.
- Both linear and quadratic forms of discriminant analysis can be made robust by the same replacement of estimators.
Where Pith is reading between the lines
- The same substitution of robust estimators could be tested in related supervised methods such as logistic regression to check for similar gains.
- In high-dimensional settings the robust estimators would need dimension-reduction steps first, an extension the paper does not address.
- Benchmark comparisons on public datasets with known contamination levels would give quantitative evidence of the improvement over classical rules.
- The approach suggests that any classification procedure relying on second-moment summaries could benefit from analogous robust replacements.
Load-bearing premise
Robust estimators of location and scatter can be computed for each class and will yield classifications that remain reliable when the data contain outliers or label errors.
What would settle it
A controlled experiment on data with 5-10 percent added outliers or swapped labels where the robust methods produce higher misclassification rates than classical discriminant analysis would refute the reliability claim.
Figures
read the original abstract
Discriminant analysis (DA) is one of the most popular methods for classification due to its conceptual simplicity, low computational cost, and often solid performance. In its standard form, DA uses the arithmetic mean and sample covariance matrix to estimate the center and scatter of each class. We discuss and illustrate how this makes standard DA very sensitive to suspicious data points, such as outliers and mislabeled cases. We then present an overview of techniques for robust DA, which are more reliable in the presence of deviating cases. In particular, we review DA based on robust estimates of location and scatter, along with graphical diagnostic tools for visualizing the results of DA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reviews how classical discriminant analysis (DA), which relies on the arithmetic mean and sample covariance for each class, is highly sensitive to outliers and mislabeled observations. It then surveys robust DA approaches that substitute robust estimators of location and scatter, and discusses associated graphical diagnostic tools for visualizing results under contamination.
Significance. As a descriptive review rather than a source of new theorems or algorithms, the paper's value lies in consolidating known results from robust statistics literature and illustrating the practical limitations of standard DA. If the survey is balanced and cites the key references accurately, it could serve as a useful entry point for practitioners, but it does not advance novel methodology or provide original empirical validation.
minor comments (3)
- The abstract states that robust DA techniques 'are more reliable in the presence of deviating cases' but does not specify the contamination models or breakdown points considered; adding a brief qualification would improve precision without altering the review character.
- The description of standard DA sensitivity would benefit from a short numerical illustration (e.g., a small contaminated dataset) early in the text to make the claim concrete before moving to the overview of robust methods.
- Ensure that all cited robust estimators (e.g., MCD, M-estimators) are accompanied by at least one key reference in the first mention to allow readers to locate the original methodological papers.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and recommendation of minor revision. The report provides no specific major comments to address point-by-point, and we agree with the characterization of the manuscript as a descriptive review consolidating existing results rather than introducing new methodology.
Circularity Check
Review paper with no derivation chain or predictions
full rationale
This manuscript is an overview article surveying standard discriminant analysis sensitivity to outliers and existing robust alternatives based on robust location/scatter estimators. No new theorems, derivations, fitted parameters, or predictions are advanced; the text reviews prior techniques and diagnostics from the robust statistics literature without introducing self-referential steps or reducing claims to inputs by construction. The central claims are descriptive restatements of established properties, with no load-bearing self-citations or ansatzes that could create circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aerts2017 APACrefauthors Aerts, S. \ Wilms, I. APACrefauthors \ 2017 . Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis . Statistical Analysis and Data Mining: The ASA Data Science Journal 10 6 436--447
work page 2017
-
[2]
Campbell1978 APACrefauthors Campbell, N A. APACrefauthors \ 1978 . The Influence Function as an Aid in Outlier Detection in Discriminant Analysis The influence function as an aid in outlier detection in discriminant analysis . Applied Statistics 27 3 251--258
work page 1978
-
[3]
Chork:DA-MVE APACrefauthors Chork, C. \ Rousseeuw, P J. APACrefauthors \ 1992 . Integrating a high-breakdown option into discriminant analysis in exploration geochemistry Integrating a high-breakdown option into discriminant analysis in exploration geochemistry . Journal of Geochemical Exploration 43 3 191--203
work page 1992
-
[4]
Croux:Discrim APACrefauthors Croux, C. \ Dehon, C. APACrefauthors \ 2001 . Robust linear discriminant analysis using S -estimators Robust linear discriminant analysis using S -estimators . The Canadian Journal of Statistics 29 3 473--493
work page 2001
-
[5]
DeKetelaere:RTDetMCD APACrefauthors De Ketelaere, B. , Hubert, M. , Raymaekers, J. , Rousseeuw, P J. \ Vranckx, I. APACrefauthors \ 2020 . Real-time outlier detection for large datasets by RT-DetMCD Real-time outlier detection for large datasets by RT-DetMCD . Chemometrics and Intelligent Laboratory Systems 199 103957
work page 2020
-
[6]
He:Discrim APACrefauthors He, X. \ Fung, W. APACrefauthors \ 2000 . High breakdown estimation for multiple populations with applications to discriminant analysis High breakdown estimation for multiple populations with applications to discriminant analysis . Journal of Multivariate Analysis 72 151--162
work page 2000
-
[7]
Hubert:WIRE-MCD2 APACrefauthors Hubert, M. , Debruyne, M. \ Rousseeuw, P J. APACrefauthors \ 2018 . M inimum C ovariance D eterminant and extensions M inimum C ovariance D eterminant and extensions . Wiley Interdisciplinary Reviews: Computational Statistics 10 3 e1421 . APACrefURL https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wics.1421 APACrefURL
-
[8]
Hubert:MFC APACrefauthors Hubert, M. , Rousseeuw, P J. \ Segaert, P. APACrefauthors \ 2017 . Multivariate and functional classification using depth and distance Multivariate and functional classification using depth and distance . Advances in Data Analysis and Classification 11 445--466
work page 2017
-
[9]
Hubert:DetMCD APACrefauthors Hubert, M. , Rousseeuw, P J. \ Verdonck, T. APACrefauthors \ 2012 . A deterministic algorithm for robust location and scatter A deterministic algorithm for robust location and scatter . Journal of Computational and Graphical Statistics 21 3 618--637
work page 2012
-
[10]
Hubert:Discrim APACrefauthors Hubert, M. \ Van Driessen , K. APACrefauthors \ 2004 . Fast and robust discriminant analysis Fast and robust discriminant analysis . Computational Statistics & Data Analysis 45 2 301--320
work page 2004
-
[11]
Liu:Depth APACrefauthors Liu, R. , Parelius, J. \ Singh, K. APACrefauthors \ 1999 . Multivariate analysis by data depth: descriptive statistics, graphics and inference Multivariate analysis by data depth: descriptive statistics, graphics and inference . The Annals of Statistics 27 783--840
work page 1999
-
[12]
robustbase APACrefauthors Maechler, M. , Rousseeuw, P. , Croux, C. , Todorov, V. , R\"uckstuhl, A. \ Salibian-Barrera, M. APACrefauthors \ 2024 . robustbase : Basic robust statistics robustbase : Basic robust statistics \ [ ]. APACrefURL https://CRAN.R-project.org/package=robustbase APACrefURL
work page 2024
-
[13]
mclachlan2004discriminant APACrefauthors McLachlan, G J. APACrefauthors \ 2004 . Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition \ ( 544). John Wiley & Sons
work page 2004
-
[14]
RCT2023 APACrefauthors R Core Team . APACrefauthors \ 2023 . R: A Language and Environment for Statistical Computing R: A language and environment for statistical computing \ [ ]. Vienna, Austria . APACrefURL https://www.R-project.org/ APACrefURL
work page 2023
-
[15]
Survival Regression with Accelerated Failure Time Model in XGBoost
Raymaekers:Silhouettes APACrefauthors Raymaekers, J. \ Rousseeuw, P J. APACrefauthors \ 2022 . Silhouettes and Quasi Residual Plots for Neural Nets and Tree-Based Classifiers Silhouettes and quasi residual plots for neural nets and tree-based classifiers . Journal of Computational and Graphical Statistics 31 1332--1343 . APACrefURL https://doi.org/10.1080...
-
[16]
cellMCD APACrefauthors Raymaekers, J. \ Rousseeuw, P J. APACrefauthors \ 2023 1 . The Cellwise M inimum C ovariance D eterminant Estimator The cellwise M inimum C ovariance D eterminant estimator . Journal of the American Statistical Association . APACrefURL https://www.tandfonline.com/doi/full/10.1080/01621459.2023.2267777 APACrefURL
-
[17]
cellWise APACrefauthors Raymaekers, J. \ Rousseeuw, P J. APACrefauthors \ 2023 2 . cellWise : Analyzing Data with Cellwise Outliers cellWise : Analyzing Data with Cellwise Outliers \ [ ]. APACrefURL https://CRAN.R-project.org/package=cellWise APACrefURL
work page 2023
-
[18]
classmap_R APACrefauthors Raymaekers, J. \ Rousseeuw, P J. APACrefauthors \ 2023 3 . classmap : Visualizing Classification Results classmap : Visualizing Classification Results \ [ ]. APACrefURL https://CRAN.R-project.org/package=classmap APACrefURL
work page 2023
-
[19]
Challenges APACrefauthors Raymaekers, J. \ Rousseeuw, P J. APACrefauthors \ 2024 . Challenges of cellwise outliers Challenges of cellwise outliers . Econometrics and Statistics . APACrefURL https://doi.org/10.1016/j.ecosta.2024.02.002 APACrefURL
-
[20]
Raymaekers:ClassMap APACrefauthors Raymaekers, J. , Rousseeuw, P J. \ Hubert, M. APACrefauthors \ 2022 . Class Maps for Visualizing Classification Results Class maps for visualizing classification results . Technometrics 64 2 151--165 . APACrefURL https://doi.org/10.1080/00401706.2021.1927849 APACrefURL
-
[21]
Rousseeuw:LMS APACrefauthors Rousseeuw, P J. APACrefauthors \ 1984 . L east M edian of S quares Regression L east M edian of S quares regression . Journal of the American Statistical Association 79 388 871--880
work page 1984
-
[22]
Rousseeuw:MCD APACrefauthors Rousseeuw, P J. APACrefauthors \ 1985 . Multivariate estimation with high breakdown point Multivariate estimation with high breakdown point . W. Grossmann, G. Pflug, I. Vincze \ W. Wertz\ ( ), Mathematical Statistics and Applications, Vol. B Mathematical Statistics and Applications, Vol. B \ ( \ 283--297). Reidel
work page 1985
-
[23]
Silh1987 APACrefauthors Rousseeuw, P J. APACrefauthors \ 1987 . Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis Silhouettes: a graphical aid to the interpretation and validation of cluster analysis . Journal of Computational and Applied Mathematics 20 388 53--65 . APACrefURL https://doi.org/10.1016/0377-0427(87)90125-...
-
[24]
Rousseeuw:FastMCD APACrefauthors Rousseeuw, P J. \ Van Driessen , K. APACrefauthors \ 1999 . A Fast Algorithm for the M inimum C ovariance D eterminant Estimator A fast algorithm for the M inimum C ovariance D eterminant estimator . Technometrics 41 3 212--223
work page 1999
-
[25]
rrcov APACrefauthors Todorov, V. APACrefauthors \ 2023 . rrcov : Scalable Robust Estimators with High Breakdown Point rrcov : Scalable Robust Estimators with High Breakdown Point \ [ ]. APACrefURL https://CRAN.R-project.org/package=rrcov APACrefURL
work page 2023
-
[26]
Vranckx:RT-RQDA APACrefauthors Vranckx, I. , Raymaekers, J. , De Ketelaere , B. , Rousseeuw, P J. \ Hubert, M. APACrefauthors \ 2021 . Real-time discriminant analysis in the presence of label and measurement noise Real-time discriminant analysis in the presence of label and measurement noise . Chemometrics and Intelligent Laboratory Systems 208 104197
work page 2021
-
[27]
Wouters:floral APACrefauthors Wouters, N. , De Ketelaere , B. , Deckers, T. , Baerdemaeker, J D. \ Saeys, W. APACrefauthors \ 2015 . Multispectral detection of floral buds for automated thinning of pear Multispectral detection of floral buds for automated thinning of pear . Computers and Electronics in Agriculture 113 93--103
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.