A New Regression Lens on Multi-Class Classification

Bingqing Li; Marten Wegkamp; Xin Bing

arxiv: 2402.14260 · v4 · submitted 2024-02-22 · 📊 stat.ME

A New Regression Lens on Multi-Class Classification

Xin Bing , Bingqing Li , Marten Wegkamp This is my paper

Pith reviewed 2026-05-24 04:10 UTC · model grok-4.3

classification 📊 stat.ME

keywords linear discriminant analysismultivariate response regressionmulti-class classificationregularized regressionreduced-rank regressionexcess misclassification riskl1 regularization

0 comments

The pith

An explicit link between LDA discriminant directions and multivariate regression coefficients yields a new framework for multi-class classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the directions separating classes under linear discriminant analysis correspond directly to the coefficients from regressing class-indicator variables on the features. This correspondence converts the classification task into a standard multivariate regression problem. Any regression procedure, including regularized or nonparametric variants, can therefore be substituted while retaining the original LDA decision boundaries. The authors further supply a general method to bound the excess misclassification risk of the resulting classifier for arbitrary regression estimators.

Core claim

Under the modeling assumptions used to derive the LDA classifier, the discriminant directions are explicit linear functions of the regression coefficients obtained from a multivariate response regression of the class indicators. This identity produces a regression-based multi-class classifier whose decision rule matches LDA exactly, yet admits structured, regularized, and nonparametric regression methods. The same identity also supports a uniform strategy for proving excess-risk bounds that apply to every regression procedure employed in the framework.

What carries the argument

The explicit algebraic relationship that maps LDA discriminant directions to the coefficient matrix of a multivariate response regression.

If this is right

Any structured or regularized regression method can be used directly for multi-class classification while preserving LDA decision boundaries.
Excess misclassification risk bounds can be derived uniformly for every regression procedure placed inside the framework.
Complete theoretical guarantees now exist for l1-regularized regression and reduced-rank regression in the LDA setting.
The same regression formulation supports nonparametric methods whose risk properties translate immediately into classification guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Progress on high-dimensional or sparse multivariate regression immediately supplies new classification procedures with accompanying risk bounds.
The regression lens may be applied to other linear classifiers by deriving analogous coefficient-to-direction identities.
Empirical work could test whether the regression formulation improves finite-sample performance even when the Gaussian assumption is mildly violated.

Load-bearing premise

The algebraic relationship between discriminant directions and regression coefficients holds exactly when the class-conditional distributions are Gaussian and share a common covariance matrix.

What would settle it

Generate data from equal-covariance Gaussian classes, compute both the LDA directions and the regression coefficients, and check whether they satisfy the claimed linear relationship; mismatch on such data would disprove the identity.

Figures

Figures reproduced from arXiv: 2402.14260 by Bingqing Li, Marten Wegkamp, Xin Bing.

**Figure 2.** Figure 2: The averaged misclassification errors in low-rank model (1). [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: The averaged misclassification errors in low-rank model (2). [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization in the space of the first two discriminant vectors. [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

read the original abstract

Linear Discriminant Analysis (LDA) is a fundamental method for classification. Its simple linear structure facilitates interpretation, and it is naturally suited to multi-class settings. LDA is also closely connected to several classical multivariate techniques, including Fisher's discriminant analysis, canonical correlation analysis, and linear regression. In this paper, we strengthen the connection between LDA and multivariate response regression by establishing an explicit relationship between discriminant directions and regression coefficients. This characterization yields a new regression-based framework for multi-class classification that accommodates structured, regularized, and even non-parametric regression methods. In contrast to existing regression-based approaches, our formulation is particularly amenable to theoretical analysis: we develop a general strategy for deriving bounds on the excess misclassification risk of the proposed classifier across all such regression procedures. As concrete applications, we provide complete theoretical guarantees for two widely used methods -- $\ell_1$-regularization and reduced-rank regression -- neither of which has previously been fully analyzed in the LDA context. The theoretical results are supported by extensive simulation studies and empirical evaluations on real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper makes the LDA-to-multivariate-regression link explicit and delivers complete excess-risk bounds for l1 and reduced-rank regression, but the general strategy for arbitrary regressions needs consistency rates that the abstract does not supply.

read the letter

The core advance is an explicit population-level map from LDA discriminant directions to the coefficients of a multivariate response regression. That map lets them treat the LDA classifier as a plug-in regression estimator and then bound its excess misclassification risk for any regression procedure that targets those coefficients. They work out the details completely for l1-regularized regression and for reduced-rank regression, neither of which had full LDA-style guarantees before. The simulations and real-data checks are standard but sufficient to show the methods behave as the bounds predict under the usual Gaussian assumptions. That part is useful and cleanly executed. The general excess-risk strategy is the softer spot. It is stated to apply to structured, regularized, and even non-parametric regressions, yet the bound only closes if the regression estimator is consistent for the population least-squares coefficients. Without rate assumptions on the regression error, the excess-risk claim becomes vacuous for methods that do not converge at the required speed. The concrete theorems are supplied only for the two linear/structured estimators, so the broader claim rests on an implicit consistency condition that is not spelled out in the abstract. If the full proofs add the missing rates or restrict the general statement, the concern disappears; otherwise it needs tightening. The work is aimed at statisticians who already care about LDA extensions and regression-based classification. A reader in that niche will find the explicit relationship and the two complete analyses worth citing. It is coherent on its own terms and formally grounded enough to merit referee time rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper establishes an explicit relationship between LDA discriminant directions and coefficients from multivariate response regression under Gaussian class-conditional distributions with shared covariance. This yields a regression-based multi-class classifier that can incorporate structured, regularized, or non-parametric regression estimators, together with a general strategy for bounding excess misclassification risk and complete theoretical guarantees for ℓ1-regularized and reduced-rank regression.

Significance. If the derivations hold, the work supplies a theoretically analyzable regression lens on LDA that permits modern regression tools while retaining decision-boundary equivalence under the stated assumptions. The provision of full risk bounds for two concrete estimators (neither previously fully analyzed in the LDA setting) and the accompanying empirical studies constitute a clear contribution.

major comments (2)

[Abstract] Abstract (second paragraph): the claim that the framework 'accommodates ... even non-parametric regression methods' and supplies a 'general strategy for deriving bounds on the excess misclassification risk ... across all such regression procedures' is load-bearing. The population-level equivalence holds exactly only under the LDA assumptions; for non-parametric estimators the resulting classifier recovers the LDA rule only upon consistency to the population least-squares coefficients. The manuscript must state whether the general bound strategy is unconditional or implicitly requires regression consistency rates (which are not guaranteed for arbitrary non-parametric procedures).
[Theoretical development] Theoretical development (the section deriving the explicit relationship and the general bound strategy): the excess-risk bound for arbitrary regression procedures should be stated with an explicit hypothesis on the regression estimator (e.g., a rate condition on ||β̂ - β||). Without this, the bound for non-parametric methods is either vacuous or reduces to the consistency case already covered by the concrete ℓ1 and reduced-rank analyses.

minor comments (1)

[Abstract] The abstract states that the relationship 'holds exactly under the modeling assumptions used to derive the LDA classifier'; the corresponding theorem should restate these assumptions (Gaussian class-conditionals, common covariance) verbatim for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We agree that the abstract and theoretical sections require clarification on the conditions for the general excess-risk bound strategy, particularly its dependence on regression consistency. We will make the necessary revisions to address both points.

read point-by-point responses

Referee: [Abstract] Abstract (second paragraph): the claim that the framework 'accommodates ... even non-parametric regression methods' and supplies a 'general strategy for deriving bounds on the excess misclassification risk ... across all such regression procedures' is load-bearing. The population-level equivalence holds exactly only under the LDA assumptions; for non-parametric estimators the resulting classifier recovers the LDA rule only upon consistency to the population least-squares coefficients. The manuscript must state whether the general bound strategy is unconditional or implicitly requires regression consistency rates (which are not guaranteed for arbitrary non-parametric procedures).

Authors: We agree that the population-level equivalence between the LDA rule and the regression-based classifier holds under the stated Gaussian assumptions, and that non-parametric estimators recover the LDA decision boundary only when consistent for the population coefficients. The general bound strategy expresses excess misclassification risk in terms of the regression estimation error; without a consistency rate on this error the bound does not guarantee vanishing excess risk. We will revise the abstract to state explicitly that the framework accommodates regression procedures (including non-parametric ones) for which consistency rates are available, and that the general strategy yields bounds conditional on the regression error. This removes any implication of unconditional validity. revision: yes
Referee: [Theoretical development] Theoretical development (the section deriving the explicit relationship and the general bound strategy): the excess-risk bound for arbitrary regression procedures should be stated with an explicit hypothesis on the regression estimator (e.g., a rate condition on ||β̂ - β||). Without this, the bound for non-parametric methods is either vacuous or reduces to the consistency case already covered by the concrete ℓ1 and reduced-rank analyses.

Authors: The referee correctly identifies that the current presentation of the general bound leaves the dependence on regression error implicit. We will add an explicit hypothesis in the theoretical development section (e.g., 'Assume ||β̂ - β|| = O_p(r_n) with r_n → 0'). The excess-risk bound will then be stated under this hypothesis, with the concrete ℓ1 and reduced-rank analyses supplying the specific rates that satisfy it. This distinguishes the general strategy from the fully analyzed cases and prevents the bound from appearing vacuous for arbitrary non-parametric estimators. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central relationship derived from standard LDA population assumptions

full rationale

The paper establishes an explicit population-level relationship between LDA discriminant directions and multivariate regression coefficients under the usual Gaussian class-conditional model with shared covariance. This identity is a direct algebraic consequence of the model assumptions and is not obtained by fitting parameters to data or by renaming a fitted quantity as a prediction. The subsequent regression-based classifier framework and excess-risk bounds are developed from this identity and apply to arbitrary regression procedures (with concrete guarantees only for structured linear estimators). No self-citation is invoked as a load-bearing uniqueness theorem, no ansatz is smuggled via prior work, and the derivation does not reduce to its own inputs by construction. The provided abstract and context give no evidence of any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger therefore limited to standard background assumptions implied by LDA.

axioms (1)

domain assumption Class-conditional distributions are multivariate Gaussian with common covariance matrix (standard LDA modeling assumption).
Required for the discriminant directions to coincide with the regression coefficients as claimed.

pith-pipeline@v0.9.0 · 5706 in / 1098 out tokens · 18839 ms · 2026-05-24T04:10:03.828603+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We strengthen the connection between LDA and multivariate response regression by establishing an explicit relationship between discriminant directions and regression coefficients... B* = B H^{-1} for some invertible L×L matrix H
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the assumption that the distributions of X | Y = eℓ are Gaussian Np(µℓ, Σw), we provide... a general strategy for analyzing the excess misclassification risk

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

, Grinshtein, V

Abramovich, F. , Grinshtein, V. and Levy, T. (2021). Multiclass classification by sparse multinomial logistic regression. IEEE Transactions on Information Theory 67 4637--4646

work page 2021
[4]

and Pensky, M

Abramovich, F. and Pensky, M. (2019). Classification with many classes: Challenges and pluses. Journal of Multivariate Analysis 174 104536

work page 2019
[5]

, Chung, H

Ahn, J. , Chung, H. C. and Jeon, Y. (2021). Trace ratio optimization for high-dimensional multi-class discrimination. Journal of Computational and Graphical Statistics 30 192--203

work page 2021
[6]

linear discriminant regularized regression

Bing, X. , Li, B. and Wegkamp, M. (2025). Supplement to "linear discriminant regularized regression"

work page 2025
[7]

and Wegkamp, M

Bing, X. and Wegkamp, M. (2023). Optimal discriminant analysis in high-dimensional latent factor models. The Annals of Statistics 51 1232--1257

work page 2023
[8]

and Wegkamp, M

Bing, X. and Wegkamp, M. H. (2019). Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models. Ann. Statist. 47 3157--3184

work page 2019
[9]

and Van de Geer, S

B\"uhlmann, P. and Van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer

work page 2011
[10]

, She, Y

Bunea, F. , She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 1282--1309

work page 2011
[11]

and Liu, W

Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566--1577

work page 2011
[12]

and Zhang, L

Cai, T. and Zhang, L. (2019). High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81 675--705

work page 2019
[13]

Campbell, N. A. (1980). Shrunken estimators in discriminant and canonical variate analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 29 5--14

work page 1980
[14]

and Sun, Q

Chen, H. and Sun, Q. (2022). Distributed sparse multicategory discriminant analysis. In International Conference on Artificial Intelligence and Statistics. PMLR

work page 2022
[15]

, Dong, H

Chen, K. , Dong, H. and Chan, K.-S. (2013). Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100 901--920

work page 2013
[16]

, Hastie, T

Clemmensen, L. , Hastie, T. , Witten, D. and Ersb ll, B. (2011). Sparse discriminant analysis. Technometrics 53 406--413

work page 2011
[17]

, Young, F

De Leeuw, J. , Young, F. W. and Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrika 41 471--503

work page 1976
[18]

Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics 20 3583--3593

work page 2004
[19]

and Fan, Y

Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules . The Annals of Statistics 36 2605--2637

work page 2008
[20]

, Feng, Y

Fan, J. , Feng, Y. and Tong, X. (2012). A road to classification in high dimensional space: the regularized optimal affine discriminant. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 745--771

work page 2012
[21]

Friedman, J. H. (1989). Regularized discriminant analysis. J. Amer. Statist. Assoc 84 165--175

work page 1989
[22]

Gaynanova, I. (2020). Prediction and estimation consistency of sparse multi-class penalized optimal scoring . Bernoulli 26 286--322

work page 2020
[23]

, Booth, J

Gaynanova, I. , Booth, J. G. and Wells, M. T. (2016). Simultaneous sparse estimation of canonical vectors in the p >> n setting. Journal of the American Statistical Association 111 696--706

work page 2016
[24]

Giraud, C. (2011). Low rank multivariate regression. Electron. J. Statist. 5 775--799

work page 2011
[25]

Giraud, C. (2021). Introduction to High-Dimensional Statistics. No. 139 in Monographs on Statistics and Applied Probability, CRC Press, Taylor & Francis Group

work page 2021
[26]

, Hastie, T

Guo, Y. , Hastie, T. and Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8 86--100

work page 2007
[27]

, Buja, A

Hastie, T. , Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. The Annals of Statistics 23 73--102

work page 1995
[28]

, Tibshirani, R

Hastie, T. , Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American statistical association 89 1255--1270

work page 1994
[29]

Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis 5 248--264

work page 1975
[30]

Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Series: Springer Texts in Statistics

work page 2008
[31]

, Ahn, J

Jung, S. , Ahn, J. and Jeon, Y. (2019). Penalized orthogonal iteration for sparse estimation of generalized eigenvalue problem. Journal of Computational and Graphical Statistics 28 710--721

work page 2019
[32]

, Lounici, K

Koltchinskii, V. , Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion . The Annals of Statistics 39 2302 -- 2329

work page 2011
[33]

and Kim, J

Lee, K. and Kim, J. (2015). On the equivalence of linear discriminant analysis and least squares. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29

work page 2015
[34]

, Dogan, \"U

Lei, Y. , Dogan, \"U . , Zhou, D.-X. and Kloft, M. (2019). Data-dependent generalization bounds for multi-class classification. IEEE Transactions on Information Theory 65 2995--3021

work page 2019
[35]

and Abramovich, F

Levy, T. and Abramovich, F. (2023). Generalization error bounds for multiclass sparse linear classifiers. Journal of Machine Learning Research 24 1--35

work page 2023
[36]

, Yang, Y

Mai, Q. , Yang, Y. and Zou, H. (2019). Multiclass sparse discriminant analysis. Statistica Sinica 29 97--111

work page 2019
[37]

, Zou, H

Mai, Q. , Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions . Biometrika 99 29--42

work page 2012
[38]

and Zhu, J

Mukherjee, A. and Zhu, J. (2011). Reduced rank ridge regression and its kernel extensions. Statistical analysis and data mining: the ASA data science journal 4 612--622

work page 2011
[39]

and Hastie, T

Nibbering, D. and Hastie, T. (2022). Multiclass-penalized logistic regression we develop a model for clustering classes in multi-class logistic regression. Comput. Statist. Data Anal. 169

work page 2022
[40]

, Chen, H

Nie, F. , Chen, H. , Xiang, S. , Zhang, C. , Yan, S. and Li, X. (2022). On the equivalence of linear discriminant analysis and least squares regression. IEEE Transactions on Neural Networks and Learning Systems

work page 2022
[41]

, Zhou, L

Qiao, Z. , Zhou, L. and Huang, J. Z. (2009). Sparse linear discriminant analysis with applications to high dimensional low sample size data. IAENG International Journal of Applied Mathematics 39

work page 2009
[42]

, Tamayo, P

Ramaswamy, S. , Tamayo, P. , Rifkin, R. , Mukherjee, S. , Yeang, C.-H. , Angelo, M. , Ladd, C. , Reich, M. , Latulippe, E. , Mesirov, J. P. et al. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 98 15149--15154

work page 2001
[43]

and Zhou, S

Rudelson, M. and Zhou, S. (2012). Reconstruction from anisotropic random measurements. In Conference on Learning Theory. JMLR Workshop and Conference Proceedings

work page 2012
[44]

Safo, S. E. and Ahn, J. (2016). General sparse multi-class linear discriminant analysis. Comput. Stat. Data Anal. 99 81--90

work page 2016
[45]

Seber, G. A. (2009). Multivariate observations. John Wiley & Sons

work page 2009
[46]

, Wang, Y

Shao, J. , Wang, Y. , Deng, X. and Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data . The Annals of Statistics 39 1241--1265

work page 2011
[47]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 267--288

work page 1996
[48]

, Hastie, T

Tibshirani, R. , Hastie, T. , Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences 99 6567--6572

work page 2002
[49]

Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. The Annals of Statistics 32 135--166

work page 2004
[50]

, Jiang, B

Wang, C. , Jiang, B. and Zhu, L. (2021). Penalized interaction estimation for ultrahigh dimensional quadratic regression. Statistica Sinica 31 1549--1570

work page 2021
[51]

Witten, D. M. and Tibshirani, R. (2011). Penalized classification using fisher's linear discriminant. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 753--772

work page 2011
[52]

, Wipf, D

Wu, Y. , Wipf, D. and Yun, J.-M. (2015). Understanding and evaluating sparse linear discriminant analysis. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (G. Lebanon and S. V. N. Vishwanathan, eds.), vol. 38 of Proceedings of Machine Learning Research. PMLR, San Diego, California, USA

work page 2015
[53]

Ye, J. (2007). Least squares linear discriminant analysis. In Proceedings of the 24th international conference on Machine learning

work page 2007
[54]

Young, F. W. , Takane, Y. and de Leeuw, J. (1978). The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika 43 279--281

work page 1978
[55]

and Lin, Y

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology 68 49--67

work page 2006
[56]

, Mai, Q

Zeng, J. , Mai, Q. and Zhang, X. (2024). Subspace estimation with automatic dimension and variable selection in sufficient dimension reduction. Journal of the American Statistical Association 119 343--355

work page 2024
[57]

and Hastie, T

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67 301--320

work page 2005

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

, Grinshtein, V

Abramovich, F. , Grinshtein, V. and Levy, T. (2021). Multiclass classification by sparse multinomial logistic regression. IEEE Transactions on Information Theory 67 4637--4646

work page 2021

[4] [4]

and Pensky, M

Abramovich, F. and Pensky, M. (2019). Classification with many classes: Challenges and pluses. Journal of Multivariate Analysis 174 104536

work page 2019

[5] [5]

, Chung, H

Ahn, J. , Chung, H. C. and Jeon, Y. (2021). Trace ratio optimization for high-dimensional multi-class discrimination. Journal of Computational and Graphical Statistics 30 192--203

work page 2021

[6] [6]

linear discriminant regularized regression

Bing, X. , Li, B. and Wegkamp, M. (2025). Supplement to "linear discriminant regularized regression"

work page 2025

[7] [7]

and Wegkamp, M

Bing, X. and Wegkamp, M. (2023). Optimal discriminant analysis in high-dimensional latent factor models. The Annals of Statistics 51 1232--1257

work page 2023

[8] [8]

and Wegkamp, M

Bing, X. and Wegkamp, M. H. (2019). Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models. Ann. Statist. 47 3157--3184

work page 2019

[9] [9]

and Van de Geer, S

B\"uhlmann, P. and Van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer

work page 2011

[10] [10]

, She, Y

Bunea, F. , She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 1282--1309

work page 2011

[11] [11]

and Liu, W

Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566--1577

work page 2011

[12] [12]

and Zhang, L

Cai, T. and Zhang, L. (2019). High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81 675--705

work page 2019

[13] [13]

Campbell, N. A. (1980). Shrunken estimators in discriminant and canonical variate analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 29 5--14

work page 1980

[14] [14]

and Sun, Q

Chen, H. and Sun, Q. (2022). Distributed sparse multicategory discriminant analysis. In International Conference on Artificial Intelligence and Statistics. PMLR

work page 2022

[15] [15]

, Dong, H

Chen, K. , Dong, H. and Chan, K.-S. (2013). Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100 901--920

work page 2013

[16] [16]

, Hastie, T

Clemmensen, L. , Hastie, T. , Witten, D. and Ersb ll, B. (2011). Sparse discriminant analysis. Technometrics 53 406--413

work page 2011

[17] [17]

, Young, F

De Leeuw, J. , Young, F. W. and Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrika 41 471--503

work page 1976

[18] [18]

Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics 20 3583--3593

work page 2004

[19] [19]

and Fan, Y

Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules . The Annals of Statistics 36 2605--2637

work page 2008

[20] [20]

, Feng, Y

Fan, J. , Feng, Y. and Tong, X. (2012). A road to classification in high dimensional space: the regularized optimal affine discriminant. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 745--771

work page 2012

[21] [21]

Friedman, J. H. (1989). Regularized discriminant analysis. J. Amer. Statist. Assoc 84 165--175

work page 1989

[22] [22]

Gaynanova, I. (2020). Prediction and estimation consistency of sparse multi-class penalized optimal scoring . Bernoulli 26 286--322

work page 2020

[23] [23]

, Booth, J

Gaynanova, I. , Booth, J. G. and Wells, M. T. (2016). Simultaneous sparse estimation of canonical vectors in the p >> n setting. Journal of the American Statistical Association 111 696--706

work page 2016

[24] [24]

Giraud, C. (2011). Low rank multivariate regression. Electron. J. Statist. 5 775--799

work page 2011

[25] [25]

Giraud, C. (2021). Introduction to High-Dimensional Statistics. No. 139 in Monographs on Statistics and Applied Probability, CRC Press, Taylor & Francis Group

work page 2021

[26] [26]

, Hastie, T

Guo, Y. , Hastie, T. and Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8 86--100

work page 2007

[27] [27]

, Buja, A

Hastie, T. , Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. The Annals of Statistics 23 73--102

work page 1995

[28] [28]

, Tibshirani, R

Hastie, T. , Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American statistical association 89 1255--1270

work page 1994

[29] [29]

Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis 5 248--264

work page 1975

[30] [30]

Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Series: Springer Texts in Statistics

work page 2008

[31] [31]

, Ahn, J

Jung, S. , Ahn, J. and Jeon, Y. (2019). Penalized orthogonal iteration for sparse estimation of generalized eigenvalue problem. Journal of Computational and Graphical Statistics 28 710--721

work page 2019

[32] [32]

, Lounici, K

Koltchinskii, V. , Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion . The Annals of Statistics 39 2302 -- 2329

work page 2011

[33] [33]

and Kim, J

Lee, K. and Kim, J. (2015). On the equivalence of linear discriminant analysis and least squares. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29

work page 2015

[34] [34]

, Dogan, \"U

Lei, Y. , Dogan, \"U . , Zhou, D.-X. and Kloft, M. (2019). Data-dependent generalization bounds for multi-class classification. IEEE Transactions on Information Theory 65 2995--3021

work page 2019

[35] [35]

and Abramovich, F

Levy, T. and Abramovich, F. (2023). Generalization error bounds for multiclass sparse linear classifiers. Journal of Machine Learning Research 24 1--35

work page 2023

[36] [36]

, Yang, Y

Mai, Q. , Yang, Y. and Zou, H. (2019). Multiclass sparse discriminant analysis. Statistica Sinica 29 97--111

work page 2019

[37] [37]

, Zou, H

Mai, Q. , Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions . Biometrika 99 29--42

work page 2012

[38] [38]

and Zhu, J

Mukherjee, A. and Zhu, J. (2011). Reduced rank ridge regression and its kernel extensions. Statistical analysis and data mining: the ASA data science journal 4 612--622

work page 2011

[39] [39]

and Hastie, T

Nibbering, D. and Hastie, T. (2022). Multiclass-penalized logistic regression we develop a model for clustering classes in multi-class logistic regression. Comput. Statist. Data Anal. 169

work page 2022

[40] [40]

, Chen, H

Nie, F. , Chen, H. , Xiang, S. , Zhang, C. , Yan, S. and Li, X. (2022). On the equivalence of linear discriminant analysis and least squares regression. IEEE Transactions on Neural Networks and Learning Systems

work page 2022

[41] [41]

, Zhou, L

Qiao, Z. , Zhou, L. and Huang, J. Z. (2009). Sparse linear discriminant analysis with applications to high dimensional low sample size data. IAENG International Journal of Applied Mathematics 39

work page 2009

[42] [42]

, Tamayo, P

Ramaswamy, S. , Tamayo, P. , Rifkin, R. , Mukherjee, S. , Yeang, C.-H. , Angelo, M. , Ladd, C. , Reich, M. , Latulippe, E. , Mesirov, J. P. et al. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 98 15149--15154

work page 2001

[43] [43]

and Zhou, S

Rudelson, M. and Zhou, S. (2012). Reconstruction from anisotropic random measurements. In Conference on Learning Theory. JMLR Workshop and Conference Proceedings

work page 2012

[44] [44]

Safo, S. E. and Ahn, J. (2016). General sparse multi-class linear discriminant analysis. Comput. Stat. Data Anal. 99 81--90

work page 2016

[45] [45]

Seber, G. A. (2009). Multivariate observations. John Wiley & Sons

work page 2009

[46] [46]

, Wang, Y

Shao, J. , Wang, Y. , Deng, X. and Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data . The Annals of Statistics 39 1241--1265

work page 2011

[47] [47]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 267--288

work page 1996

[48] [48]

, Hastie, T

Tibshirani, R. , Hastie, T. , Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences 99 6567--6572

work page 2002

[49] [49]

Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. The Annals of Statistics 32 135--166

work page 2004

[50] [50]

, Jiang, B

Wang, C. , Jiang, B. and Zhu, L. (2021). Penalized interaction estimation for ultrahigh dimensional quadratic regression. Statistica Sinica 31 1549--1570

work page 2021

[51] [51]

Witten, D. M. and Tibshirani, R. (2011). Penalized classification using fisher's linear discriminant. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 753--772

work page 2011

[52] [52]

, Wipf, D

Wu, Y. , Wipf, D. and Yun, J.-M. (2015). Understanding and evaluating sparse linear discriminant analysis. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (G. Lebanon and S. V. N. Vishwanathan, eds.), vol. 38 of Proceedings of Machine Learning Research. PMLR, San Diego, California, USA

work page 2015

[53] [53]

Ye, J. (2007). Least squares linear discriminant analysis. In Proceedings of the 24th international conference on Machine learning

work page 2007

[54] [54]

Young, F. W. , Takane, Y. and de Leeuw, J. (1978). The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika 43 279--281

work page 1978

[55] [55]

and Lin, Y

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology 68 49--67

work page 2006

[56] [56]

, Mai, Q

Zeng, J. , Mai, Q. and Zhang, X. (2024). Subspace estimation with automatic dimension and variable selection in sufficient dimension reduction. Journal of the American Statistical Association 119 343--355

work page 2024

[57] [57]

and Hastie, T

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67 301--320

work page 2005