pith. sign in

arxiv: 1907.00211 · v1 · pith:O67MYHZDnew · submitted 2019-06-29 · 💻 cs.LG · cs.CV· stat.ML

Robust Linear Discriminant Analysis Using Ratio Minimization of L1,2-Norms

Pith reviewed 2026-05-25 13:00 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords robust linear discriminant analysisL1,2-norm ratio minimizationoutlier robustnesssubspace learningnon-smooth optimizationconvergence analysisdiscriminative projection
0
0 comments X

The pith

Linear discriminant analysis becomes robust to outliers by minimizing an L1,2-norm ratio instead of the squared L2-norm ratio.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces the traditional LDA objective, which minimizes a ratio of squared L2-norms, with a ratio of L1,2-norms to reduce sensitivity to outliers. It derives an efficient algorithm to optimize this non-smooth ratio objective and supplies a convergence proof for the iteration. Experiments on synthetic data and nine real datasets are used to illustrate that the resulting projections separate classes more reliably under contamination. A sympathetic reader would care because LDA is a core tool for dimensionality reduction and classification, and outlier sensitivity limits its use on noisy data.

Core claim

We propose a novel robust linear discriminant analysis method based on the L1,2-norm ratio minimization. We derive a new efficient algorithm to solve this challenging problem, and provide a theoretical analysis on the convergence of our algorithm. The proposed algorithm is easy to implement, and converges fast in practice. Extensive experiments on both synthetic data and nine real benchmark data sets show the effectiveness of the proposed robust LDA method.

What carries the argument

L1,2-norm ratio minimization, which replaces the squared L2-norm ratio in the LDA objective and is solved by a new iterative algorithm whose convergence is analyzed.

If this is right

  • The new algorithm solves a previously intractable non-smooth ratio minimization problem for LDA.
  • Convergence of the iteration is guaranteed by the supplied theoretical analysis.
  • The method produces subspaces that separate classes more reliably than standard LDA when outliers are present.
  • The approach extends L1-norm robustness ideas from PCA to the discriminative ratio setting of LDA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ratio-minimization technique could be tested on other subspace methods that currently rely on L2-norm ratios.
  • If the algorithm scales well, it might support real-time updates of LDA models on streaming data with occasional outliers.
  • The convergence proof might be reusable for similar ratio objectives that arise in clustering or metric learning.

Load-bearing premise

That switching to an L1,2-norm ratio automatically confers robustness to outliers in the LDA objective without separate verification for the ratio-minimization case.

What would settle it

A controlled experiment on data with added outliers where the L1,2-norm ratio LDA yields lower classification accuracy or poorer class separation than standard LDA would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 1907.00211 by Feiping Nie, Heng Huang, Hua Wang, Zheng Wang.

Figure 1
Figure 1. Figure 1: Two groups of data points sampled from two different Gaussian [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Classification accuracy of compared methods on all nine data sets [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
read the original abstract

As one of the most popular linear subspace learning methods, the Linear Discriminant Analysis (LDA) method has been widely studied in machine learning community and applied to many scientific applications. Traditional LDA minimizes the ratio of squared L2-norms, which is sensitive to outliers. In recent research, many L1-norm based robust Principle Component Analysis methods were proposed to improve the robustness to outliers. However, due to the difficulty of L1-norm ratio optimization, so far there is no existing work to utilize sparsity-inducing norms for LDA objective. In this paper, we propose a novel robust linear discriminant analysis method based on the L1,2-norm ratio minimization. Minimizing the L1,2-norm ratio is a much more challenging problem than the traditional methods, and there is no existing optimization algorithm to solve such non-smooth terms ratio problem. We derive a new efficient algorithm to solve this challenging problem, and provide a theoretical analysis on the convergence of our algorithm. The proposed algorithm is easy to implement, and converges fast in practice. Extensive experiments on both synthetic data and nine real benchmark data sets show the effectiveness of the proposed robust LDA method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel robust LDA method by minimizing an L1,2-norm ratio (instead of the conventional squared-L2 ratio) in the between/within scatter objective. It derives a new iterative algorithm for the resulting non-smooth ratio optimization problem, supplies a convergence analysis for that algorithm, and reports experiments on synthetic outliers plus nine real benchmark datasets.

Significance. If the robustness property is established, the work would usefully extend L1-norm ideas from PCA to the supervised LDA setting and supply a practical solver for a previously unaddressed non-smooth ratio problem. The algorithmic derivation and convergence guarantee are the clearest technical contributions.

major comments (2)
  1. [Introduction / §2] Introduction and §2: the claim that the L1,2-norm ratio automatically confers outlier robustness is motivated solely by citation to prior L1-PCA results; no influence-function argument, breakdown-point bound, or perturbation analysis is supplied for the specific LDA ratio objective. This assumption is load-bearing for the central claim.
  2. [§4] §4 (experiments): synthetic-outlier tests evaluate an unproven modeling assumption rather than a derived robustness property; without the missing analysis, the performance gains cannot be attributed to the L1,2 formulation with certainty.
minor comments (2)
  1. [Abstract] Abstract: the nine real datasets and the quantitative metrics (accuracy, etc.) are not named.
  2. [§2] Notation: the precise definition of the L1,2-norm (row-wise or column-wise) should be stated explicitly before the objective is written.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful review and constructive feedback. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Introduction / §2] Introduction and §2: the claim that the L1,2-norm ratio automatically confers outlier robustness is motivated solely by citation to prior L1-PCA results; no influence-function argument, breakdown-point bound, or perturbation analysis is supplied for the specific LDA ratio objective. This assumption is load-bearing for the central claim.

    Authors: We agree that the robustness claim for the L1,2-norm ratio in the LDA objective rests on an analogy to L1-PCA results rather than a dedicated analysis (influence function, breakdown point, or perturbation) specific to the supervised ratio formulation. This is a substantive gap in the current manuscript. In revision we will rephrase the introduction and §2 to present the robustness as a motivated conjecture supported by the L1-norm literature and by the empirical results, without asserting an automatically conferred property. We do not supply the requested theoretical analysis. revision: partial

  2. Referee: [§4] §4 (experiments): synthetic-outlier tests evaluate an unproven modeling assumption rather than a derived robustness property; without the missing analysis, the performance gains cannot be attributed to the L1,2 formulation with certainty.

    Authors: The synthetic experiments demonstrate practical behavior under controlled outlier contamination, while the nine real datasets provide complementary evidence. We concur that, absent the missing theoretical analysis, performance differences remain empirical observations rather than direct proof of the L1,2 mechanism. In revision we will add a clarifying sentence in §4 stating that the reported gains constitute empirical support for the proposed method. revision: partial

standing simulated objections not resolved
  • A formal robustness analysis (influence function, breakdown point, or perturbation analysis) for the specific L1,2-norm ratio objective in LDA.

Circularity Check

0 steps flagged

No significant circularity; new algorithm and convergence analysis are independent of inputs

full rationale

The paper introduces a novel objective (L1,2-norm ratio for LDA) and derives a new solver plus convergence proof for the non-smooth ratio minimization. No step reduces by construction to a fitted parameter, self-citation chain, or renamed prior result. The robustness motivation cites external L1-PCA literature rather than self-citations that bear the central load. The derivation chain (objective definition → algorithm → convergence theorem) is self-contained and does not equate outputs to inputs via the enumerated patterns. Score 0 is the appropriate default when no explicit reduction is quotable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's contribution is algorithmic; it introduces no new free parameters, no invented entities, and relies only on standard domain assumptions about data matrices and class labels.

axioms (1)
  • domain assumption Data matrices admit well-defined within-class and between-class scatter structures suitable for LDA.
    Implicit prerequisite for applying any LDA variant.

pith-pipeline@v0.9.0 · 5740 in / 1199 out tokens · 43446 ms · 2026-05-25T13:00:29.896862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    A new and fast implementation for null space based linear discriminant analysis,

    D. Chu and G. S. Thye, “A new and fast implementation for null space based linear discriminant analysis,” Pattern Recognition, vol. 43(4), pp. 1373–1379, 2010

  2. [2]

    Characterization of a family of algorithms for generalized dis- criminant analysis on undersampled problems,

    J. Ye, “Characterization of a family of algorithms for generalized dis- criminant analysis on undersampled problems,” Journal of Machine Learning Research, vol. 6, pp. 483–502, 2005

  3. [3]

    Multi-class discriminant kernel learning via convex programming,

    J. Ye, S. Ji, and J. Chen, “Multi-class discriminant kernel learning via convex programming,” The Journal of Machine Learning Research , vol. 9, pp. 719–758, 2008

  4. [4]

    Computational and theoretical analysis of null space and orthogonal linear discriminant analysis,

    J. Ye and T. Xiong, “Computational and theoretical analysis of null space and orthogonal linear discriminant analysis,” Journal of Machine Learning Research, vol. 7, pp. 581–599, 2006

  5. [5]

    Least squares linear discriminant analysis,

    J. Ye, “Least squares linear discriminant analysis,” Proceedings of the 24th international conference on Machine learning , pp. 1087–1093, 2007

  6. [6]

    Hierarchical linear discriminant analysis for beamforming,

    J. Choo, B. L. Drake, and H. Park, “Hierarchical linear discriminant analysis for beamforming,” Siam Data Mining (SDM) , pp. 894–905, 2009

  7. [7]

    Adaptive nonlinear discriminant analysis by regularized minimum squared errors,

    H. Kim, B. L. Drake, and H. Park, “Adaptive nonlinear discriminant analysis by regularized minimum squared errors,” IEEE Trans. Knowl. Data Eng. , vol. 18(5), pp. 603–612, 2006. 20

  8. [8]

    A comparison of generalized linear discrim- inant analysis algorithms,

    C. H. Park and H. Park, “A comparison of generalized linear discrim- inant analysis algorithms,” Pattern Recognition, vol. 41(3), pp. 1083– 1097, 2008

  9. [9]

    Submanifold-preserving dis- criminant analysis with an auto-optimized graph,

    F. Nie, Z. Wang, R. Wang, and X. Li, “Submanifold-preserving dis- criminant analysis with an auto-optimized graph,” IEEE transactions on cybernetics, 2019

  10. [10]

    Trace ratio vs. ratio trace for dimensionality reduction,

    H. Wang, S. Yan, D. Xu, X. Tang, and T. S. Huang, “Trace ratio vs. ratio trace for dimensionality reduction,” in CVPR, 2007

  11. [11]

    Trace ratio problem revisited,

    Y. Jia, F. Nie, and C. Zhang, “Trace ratio problem revisited,” IEEE Transactions on Neural Networks , vol. 20, no. 4, pp. 729–735, 2009

  12. [12]

    P. J. Huber, Robust Statistics. Wiley, 1981

  13. [13]

    Robust lda classification by subsampling,

    S. Fidler and A. Leonardis, “Robust lda classification by subsampling,” Conference on Computer Vision and Pattern Recognition Workshop , p. 97, 2003

  14. [14]

    Robust fisher discriminant analysis,

    S. Kim, A. Magnani, and S. Boyd, “Robust fisher discriminant analysis,” Advances in Neural Information Processing Systems , p. 97, 2006

  15. [15]

    Classification efficiencies for robust linear discriminant analysis,

    C. Croux, P. Filzmoser, and K. Joossens, “Classification efficiencies for robust linear discriminant analysis,”Statistica Sinica, pp. 581–599, 2008

  16. [16]

    A new formulation of linear discriminant analysis for robust dimensionality reduction,

    H. Zhao, Z. Wang, and F. Nie, “A new formulation of linear discriminant analysis for robust dimensionality reduction,” IEEE Transactions on Knowledge and Data Engineering , vol. 31, no. 4, pp. 629–640, 2018

  17. [17]

    A l1-norm pca and heuris- tic approach,

    A. Baccini, P. Besse, and A. de Faguerolles, “A l1-norm pca and heuris- tic approach,” International Conference on Ordinal and Symbolic Data Analysis, pp. 359–368, 1996

  18. [18]

    Robust l1 principal component analysis and its bayesian vari- ational inference,

    J. Gao, “Robust l1 principal component analysis and its bayesian vari- ational inference,” Neural Computation, vol. 20, pp. 555–572, 2008

  19. [19]

    Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming,

    Q. Ke and T. Kanade, “Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming,” IEEE Conf. Computer Vision and Pattern Recognition , pp. 592–599, 2004. 21

  20. [20]

    R1-pca: Rotational invariant l1- norm principal component analysis for robust subspace factorization,

    C. Ding, D. Zhou, X. He, and H. Zha, “R1-pca: Rotational invariant l1- norm principal component analysis for robust subspace factorization,” Int’l Conf. Machine Learning , 2006

  21. [21]

    Principal component analysis based on l1-norm maximiza- tion,

    N. Kwak, “Principal component analysis based on l1-norm maximiza- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 30, pp. 1672–1680, 2008

  22. [22]

    Robust principal component analysis: Exact recovery of corrupted,

    J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted,” Advances in Neural Information Processing Systems , p. 116, 2009

  23. [23]

    Optimal mean robust principal compo- nent analysis,

    F. Nie, J. Yuan, and H. Huang, “Optimal mean robust principal compo- nent analysis,” in Proceedings of the 31st International Conference on Machine Learning (ICML) , 2014, pp. 1062–1070

  24. [24]

    Robust l1-norm two-dimensional linear discriminant analysis,

    C.-N. Li, Y.-H. Shao, and N.-Y. Deng, “Robust l1-norm two-dimensional linear discriminant analysis,” Neural Networks, vol. 65, pp. 92–104, 2015

  25. [25]

    L1-norm dis- tance linear discriminant analysis based on an effective iterative algo- rithm,

    Q. Ye, J. Yang, F. Liu, C. Zhao, N. Ye, and T. Yin, “L1-norm dis- tance linear discriminant analysis based on an effective iterative algo- rithm,” IEEE Transactions on Circuits and Systems for Video Technol- ogy, vol. 28, no. 1, pp. 114–129, 2016

  26. [26]

    A non-greedy algorithm for l1-norm lda,

    Y. Liu, Q. Gao, S. Miao, X. Gao, F. Nie, and Y. Li, “A non-greedy algorithm for l1-norm lda,” IEEE Transactions on Image Processing , vol. 26, no. 2, pp. 684–695, 2016

  27. [27]

    Linear discriminant analysis based on l1-norm maximization,

    F. Zhong and J. Zhang, “Linear discriminant analysis based on l1-norm maximization,” IEEE Transactions on Image Processing , vol. 22, no. 8, pp. 3018–3027, 2013

  28. [28]

    Efficient and robust feature selection via joint ℓ2,1-norms minimization,

    F. Nie, H. Huang, X. Cai, and C. Ding, “Efficient and robust feature selection via joint ℓ2,1-norms minimization,” in NIPS, 2010

  29. [29]

    Http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html

  30. [30]

    From few to many: Illumination cone models for face recognition under variable lighting and pose,

    A. Georghiades, P. Belhumeur, and D. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intelligence , vol. 23, no. 6, pp. 643–660, 2001. 22

  31. [31]

    The cmu pose, illumination, and expression database,

    T. Sim and S. Baker, “The cmu pose, illumination, and expression database,” IEEE Transactions on PAMI, vol. 25, no. 12, pp. 1615–1617, 2003

  32. [32]

    Fukunaga, Introduction to statistical pattern recognition

    K. Fukunaga, Introduction to statistical pattern recognition . Academic Press, 1990

  33. [33]

    Locality Preserving Projections,

    X. He and P. Niyogi, “Locality Preserving Projections,” in NIPS, 2003

  34. [34]

    Efficient and robust feature extraction by maximum margin criterion,

    H. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extraction by maximum margin criterion,” in NIPS, 2004

  35. [35]

    Discriminant Laplacian Embedding,

    H. Wang, H. Huang, and C. Ding, “Discriminant Laplacian Embedding,” in AAAI, 2010. 23 0 5% 10% 15% 20% 25% 30% 35% 40%0.4 0.5 0.6 0.7 0.8 Noise in training data Classification accuracy k−NN SVM MMC LDA LPP−Supervised trLDA DLE rLDA (our method) (a) Vehicle data set. 0 5% 10% 15% 20% 25% 30% 35% 40%0.7 0.75 0.8 0.85 0.9 0.95 1 Noise in training data Classi...