Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals

Giannis Iakovidis; Ilias Diakonikolas; Mingchen Ma

arxiv: 2605.21428 · v1 · pith:KRMGBZTOnew · submitted 2026-05-20 · 💻 cs.LG · cs.DS

Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals

Ilias Diakonikolas , Giannis Iakovidis , Mingchen Ma This is my paper

Pith reviewed 2026-05-21 05:47 UTC · model grok-4.3

classification 💻 cs.LG cs.DS

keywords robust multiclass learninglinear classifiersGaussian marginalsagnostic learningimproper learningpolynomial-time algorithmsdimension-independent guarantees

0 comments

The pith

New structural results allow polynomial-time robust learning of multiclass linear classifiers under Gaussian marginals with dimension-independent error bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes structural properties of multiclass linear classifiers when the input features are Gaussian. These properties are used to construct efficient algorithms for agnostic robust learning that achieve error guarantees depending on the number of classes but not on the dimension. For any number of classes k the main algorithm attains an error of roughly k to the power 3/2 times the square root of the optimal error plus a small epsilon. A refined method for three classes achieves an error linear in the optimal error plus epsilon. This advances the binary case theory to multiple classes while overcoming exponential dependencies in prior work for k at least 3.

Core claim

The authors show that the standard multiclass perceptron requires super-polynomially many samples even with clean labels and Gaussian marginals. They then introduce a pairwise improper-learning framework that yields an efficient learner with error ~O(k^{3/2} sqrt(opt)) + epsilon for general k. They also present a localization-based framework that achieves error O(opt) + epsilon for k=3 and poly(k) opt + epsilon for geometrically regular classifiers.

What carries the argument

Pairwise improper-learning framework that leverages Gaussian marginals to reduce multiclass robust learning to binary tasks.

If this is right

Robust multiclass linear classification becomes feasible in high dimensions without exponential dependence on accuracy.
Error guarantees scale with the number of classes k but stay independent of the ambient dimension d.
For three classes, the learner can achieve error arbitrarily close to that of the best linear classifier.
The multiclass perceptron algorithm is inefficient for k greater than 2 even under clean Gaussian data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The structural results may generalize to other concentrated distributions like sub-Gaussian or log-concave ones.
This could lead to extensions for learning more complex hypothesis classes in multiclass settings.
In practice, the poly-time algorithms might be implemented for high-dimensional classification tasks in machine learning applications.

Load-bearing premise

The x-marginal distribution is Gaussian, which is required to derive the structural results enabling the polynomial-time algorithms and dimension-independent guarantees.

What would settle it

A concrete high-dimensional dataset drawn from a Gaussian distribution where the proposed learner produces a hypothesis whose error exceeds the stated bound by more than epsilon or requires super-polynomial time.

Figures

Figures reproduced from arXiv: 2605.21428 by Giannis Iakovidis, Ilias Diakonikolas, Mingchen Ma.

**Figure 2.** Figure 2: Illustration of the geometric quantities used for localization. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

We study the task of agnostic learning of multiclass linear classifiers under the Gaussian distribution. Given labeled examples $(x, y)$ from a distribution over $\mathbb{R}^d \times [k]$, with Gaussian $x$-marginal, the goal is to output a hypothesis whose error is comparable to that of the best $k$-class linear classifier. While the binary case $k=2$ has a well-developed algorithmic theory, much less is known for $k \ge 3$. Even for $k=3$, prior robust algorithms incur exponential dependence on the inverse of the desired accuracy in both complexity and representation size. In this work, we develop new structural results for multiclass linear classifiers and use them to design fully polynomial-time robust learners with dimension-independent error guarantees. Our first result shows that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels and Gaussian marginals, revealing a basic obstruction absent in the binary case. Our main positive result is a pairwise improper-learning framework which yields an efficient learner with error $\widetilde O(k^{3/2}\sqrt{\mathrm{opt}})+\epsilon$ for general $k$. Additionally, we develop a sharper localization-based framework which leads to error $O(\mathrm{opt})+\epsilon$ for $k=3$, and error $\mathrm{poly}(k)\mathrm{opt}+\epsilon$ for geometrically regular $k$-class linear classifiers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first poly-time robust learners for multiclass linear classifiers under Gaussians, with dimension-independent error via new structural results and a pairwise framework.

read the letter

The main point is that they replace the exponential dependence in prior multiclass robust learning with polynomial time and error bounds that stay independent of dimension. They first prove that the multiclass perceptron needs super-polynomial samples even under clean labels and Gaussian marginals, which cleanly separates the multiclass case from binary and shows why standard methods fail. That negative result is useful on its own. Their positive results then build on new structural facts about multiclass linear separators to get a pairwise improper learner with error roughly k to the 3/2 times square root of opt plus epsilon for general k, plus a localization approach that reaches linear dependence on opt for k=3 and poly(k) times opt for geometrically regular cases. The analysis uses Gaussian concentration in a direct way, and the stress-test indicates the proofs hold without hidden exponential blowups or circular steps. The Gaussian marginal assumption is stated up front and is load-bearing for the structural results, so the work does not claim broader distribution families. The sqrt(opt) factor for general k is not as tight as the k=3 case, but it is still a clear improvement over prior exponential dependence while keeping everything polynomial. This is aimed at theoretical machine learning researchers working on robust and agnostic linear learning. It fills a specific gap between the well-understood binary setting and the multiclass regime. The claims are concrete, the lower bound is interesting, and the algorithms are constructive, so the paper deserves a serious referee. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The paper studies agnostic learning of multiclass linear classifiers under Gaussian x-marginals. It proves a super-polynomial sample lower bound for the standard multiclass perceptron even with clean labels, then introduces a pairwise improper-learning framework yielding error bound ~O(k^{3/2} sqrt(opt)) + epsilon for general k, and a localization-based framework achieving O(opt) + epsilon error for k=3 and poly(k) opt + epsilon for geometrically regular k-class linear classifiers.

Significance. If the Gaussian-specific structural results hold, the work supplies the first polynomial-time robust multiclass linear learners with dimension-independent guarantees, closing a gap left by prior exponential-in-1/epsilon algorithms for k >= 3. The perceptron lower bound and the two algorithmic frameworks (pairwise and localization) are concrete advances that exploit Gaussian concentration and pairwise comparisons.

major comments (2)

§4 (pairwise improper-learning framework): the error accumulation across the k choose 2 binary subproblems is bounded by k^{3/2} sqrt(opt); the derivation of this precise exponent should be expanded to confirm it is tight and does not hide additional logarithmic or dimension-dependent factors.
§5 (localization-based framework for k=3): the O(opt) + epsilon guarantee for three classes relies on a geometric regularity condition that is not required in the general-k result; clarify whether this condition is necessary or can be removed without degrading the bound.

minor comments (2)

Abstract: the notation ~O(k^{3/2} sqrt(opt)) should be defined explicitly (including the precise meaning of the tilde) and matched to the body text.
Introduction: add a table comparing runtime, sample complexity, and error dependence on k and opt against the best prior robust multiclass algorithms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation and constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity.

read point-by-point responses

Referee: §4 (pairwise improper-learning framework): the error accumulation across the k choose 2 binary subproblems is bounded by k^{3/2} sqrt(opt); the derivation of this precise exponent should be expanded to confirm it is tight and does not hide additional logarithmic or dimension-dependent factors.

Authors: We thank the referee for this suggestion. In the revised version of the manuscript, we will expand the derivation in Section 4 with additional intermediate steps. The k^{3/2} factor arises from bounding the sum of pairwise errors (via an l1-to-l2 norm inequality over binom(k,2) terms) combined with the per-pair robust binary error of O(sqrt(opt)) obtained from the Gaussian marginals; the analysis uses only dimension-free concentration and introduces neither logarithmic nor dimension-dependent factors. We will also add a brief tightness discussion based on the existing lower-bound construction. revision: yes
Referee: §5 (localization-based framework for k=3): the O(opt) + epsilon guarantee for three classes relies on a geometric regularity condition that is not required in the general-k result; clarify whether this condition is necessary or can be removed without degrading the bound.

Authors: We appreciate the opportunity to clarify. The O(opt) + epsilon guarantee for k=3 in the localization framework does not rely on the geometric regularity condition. That condition is introduced only when extending the localization approach to general k to obtain the poly(k) opt bound. For the special case k=3 our proof exploits the fixed number of classes directly and avoids the regularity assumption entirely. We will insert an explicit remark in Section 5 distinguishing the two settings and confirming that the condition can be removed for k=3 without affecting the stated bound. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper develops independent structural results for multiclass linear classifiers under Gaussian marginals, including a super-polynomial lower bound for the multiclass perceptron and new pairwise improper-learning and localization frameworks. These steps rely on Gaussian concentration properties and direct algorithmic constructions rather than reducing to fitted parameters, self-citations, or prior ansatzes from the same authors. The error bounds (e.g., Õ(k^{3/2}√opt) + ε) follow from the newly proven structural lemmas without circular redefinition or renaming of known results. The central claims rest on explicit proofs that are self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the problem setting and new structural results developed in the paper. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption The feature marginal on x is Gaussian.
This is the explicit distributional assumption of the learning problem studied.

pith-pipeline@v0.9.0 · 5799 in / 1218 out tokens · 49988 ms · 2026-05-21T05:47:53.874153+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

The power of localization for efficiently learning linear separators with noise

Pranjal Awasthi, Maria Florina Balcan, and Philip M Long. The power of localization for efficiently learning linear separators with noise. Journal of the ACM (JACM) , 63(6):1--27, 2017

work page 2017
[2]

Learning from noisy examples

Dana Angluin and Philip Laird. Learning from noisy examples. Machine learning , 2(4):343--370, 1988

work page 1988
[3]

Survey on multiclass classification methods

Mohamed Aly. Survey on multiclass classification methods. Neural Netw , 19(1-9):2, 2005

work page 2005
[4]

Agnostic learning of general relu activation using gradient descent

Pranjal Awasthi, Alex Tang, and Aravindan Vijayaraghavan. Agnostic learning of general relu activation using gradient descent. In The Eleventh International Conference on Learning Representations, ICLR 2023 . OpenReview.net, 2023

work page 2023
[5]

Bandit multiclass linear classification: Efficient algorithms for the separable case

Alina Beygelzimer, David Pal, Balazs Szorenyi, Devanathan Thiruvenkatachari, Chen-Yu Wei, and Chicheng Zhang. Bandit multiclass linear classification: Efficient algorithms for the separable case. In International Conference on Machine Learning , pages 624--633. PMLR, 2019

work page 2019
[6]

Ultraconservative online algorithms for multiclass problems

Koby Crammer and Yoram Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research , 3(Jan):951--991, 2003

work page 2003
[7]

Complexity theoretic limitations on learning halfspaces

Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 105--117, 2016

work page 2016
[8]

Approximation schemes for relu regression

Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R Klivans, and Mahdi Soltanolkotabi. Approximation schemes for relu regression. In Conference on Learning Theory , pages 1452--1485. PMLR, 2020

work page 2020
[9]

Kane, and Nikos Zarifis

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, and Nikos Zarifis. Robust Learning of Multi-index Models via Iterative Subspace Approximation . In 2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , pages 2231--2239, Los Alamitos, CA, USA, December 2025. IEEE Computer Society

work page 2025
[10]

Which is the best multiclass svm method? an empirical study

Kai-Bo Duan and S Sathiya Keerthi. Which is the best multiclass svm method? an empirical study. In International workshop on multiple classifier systems , pages 278--285. Springer, 2005

work page 2005
[11]

Algorithmic high-dimensional robust statistics

Ilias Diakonikolas and Daniel M Kane. Algorithmic high-dimensional robust statistics . Cambridge university press, 2023

work page 2023
[12]

Active learning of general halfspaces: Label queries vs membership queries

Ilias Diakonikolas, Daniel Kane, and Mingchen Ma. Active learning of general halfspaces: Label queries vs membership queries. Advances in Neural Information Processing Systems , 37:49180--49213, 2024

work page 2024
[13]

Robust regression of general ReLUs with queries

Ilias Diakonikolas, Daniel Kane, and Mingchen Ma. Robust regression of general ReLUs with queries. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

work page 2025
[14]

Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals

Ilias Diakonikolas, Daniel Kane, and Lisheng Ren. Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals. In International Conference on Machine Learning , pages 7922--7938. PMLR, 2023

work page 2023
[15]

Learning geometric concepts with nasty noise

Ilias Diakonikolas, Daniel Kane, and Alistair Stewart. Learning geometric concepts with nasty noise. In Proceedings of the 50 th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018 , pages 1061--1073, 2018

work page 2018
[16]

Non-convex sgd learns halfspaces with adversarial label noise

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Non-convex sgd learns halfspaces with adversarial label noise. Advances in Neural Information Processing Systems , 33:18540--18549, 2020

work page 2020
[17]

Learning general halfspaces with adversarial label noise via online gradient descent

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Learning general halfspaces with adversarial label noise via online gradient descent. In International Conference on Machine Learning , pages 5118--5141. PMLR, 2022

work page 2022
[18]

Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals

Ilias Diakonikolas, Daniel Kane, and Nikos Zarifis. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems , 33:13586--13596, 2020

work page 2020
[19]

Statistical query hardness of multiclass linear classification with random classification noise

Ilias Diakonikolas, Mingchen Ma, Lisheng Ren, and Christos Tzamos. Statistical query hardness of multiclass linear classification with random classification noise. In International Conference on Machine Learning , pages 13693--13712. PMLR, 2025

work page 2025
[20]

Agnostic learning of a single neuron with gradient descent

Spencer Frei, Yuan Cao, and Quanquan Gu. Agnostic learning of a single neuron with gradient descent. Advances in neural information processing systems , 33:5417--5428, 2020

work page 2020
[21]

Agnostic learning of arbitrary relu activation under gaussian marginals

Anxin Guo and Aravindan Vijayaraghavan. Agnostic learning of arbitrary relu activation under gaussian marginals. In The Thirty Eighth Annual Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 2592--2631. PMLR , 2025

work page 2025
[22]

Decision theoretic generalizations of the pac model for neural net and other learning applications

David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation , 100(1):78--150, 1992

work page 1992
[23]

A comparison of methods for multiclass support vector machines

Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks , 13(2):415--425, 2002

work page 2002
[24]

Extreme learning machine for regression and multiclass classification

Guang-Bin Huang, Hongming Zhou, Xiaojian Ding, and Rui Zhang. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 42(2):513--529, 2011

work page 2011
[25]

The gain from ordering in online learning

Vasilis Kontonis, Mingchen Ma, and Christos Tzamos. The gain from ordering in online learning. Advances in Neural Information Processing Systems , 36, 2024

work page 2024
[26]

Toward efficient agnostic learning

Michael J Kearns, Robert E Schapire, and Linda M Sellie. Toward efficient agnostic learning. In Proceedings of the fifth annual workshop on Computational learning theory , pages 341--352, 1992

work page 1992
[27]

Efficient bandit algorithms for online multiclass prediction

Sham M Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. Efficient bandit algorithms for online multiclass prediction. In Proceedings of the 25th international conference on Machine learning , pages 440--447, 2008

work page 2008
[28]

Large margin dags for multiclass classification

John Platt, Nello Cristianini, and John Shawe-Taylor. Large margin dags for multiclass classification. Advances in neural information processing systems , 12, 1999

work page 1999
[29]

The perceptron: a probabilistic model for information storage and organization in the brain

Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review , 65(6):386, 1958

work page 1958
[30]

Understanding machine learning: From theory to algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

work page 2014
[31]

On the consistency of multiclass classification methods

Ambuj Tewari and Peter L Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research , 8(5), 2007

work page 2007
[32]

Hardness of agnostically learning halfspaces from worst-case lattice problems

Stefan Tiegel. Hardness of agnostically learning halfspaces from worst-case lattice problems. In The Thirty Sixth Annual Conference on Learning Theory , pages 3029--3064. PMLR, 2023

work page 2023
[33]

High-dimensional probability: An introduction with applications in data science , volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science , volume 47. Cambridge university press, 2018

work page 2018
[34]

Learning a single neuron with bias using gradient descent

Gal Vardi, Gilad Yehudai, and Ohad Shamir. Learning a single neuron with bias using gradient descent. Advances in Neural Information Processing Systems , 34:28690--28700, 2021

work page 2021
[35]

Robustly learning monotone single-index models

Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning monotone single-index models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

work page 2025
[36]

Revisiting perceptron: Efficient and label-optimal learning of halfspaces

Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. Advances in Neural Information Processing Systems , 30, 2017

work page 2017
[37]

Robustly learning single-index models via alignment sharpness

Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning single-index models via alignment sharpness. In Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research , pages 58197--58243. PMLR, 21--27 Jul 2024

work page 2024
[38]

Robustly learning monotone generalized linear models via data augmentation

Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning monotone generalized linear models via data augmentation. In The Thirty Eighth Annual Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 5921--5990. PMLR , 2025

work page 2025

[1] [1]

The power of localization for efficiently learning linear separators with noise

Pranjal Awasthi, Maria Florina Balcan, and Philip M Long. The power of localization for efficiently learning linear separators with noise. Journal of the ACM (JACM) , 63(6):1--27, 2017

work page 2017

[2] [2]

Learning from noisy examples

Dana Angluin and Philip Laird. Learning from noisy examples. Machine learning , 2(4):343--370, 1988

work page 1988

[3] [3]

Survey on multiclass classification methods

Mohamed Aly. Survey on multiclass classification methods. Neural Netw , 19(1-9):2, 2005

work page 2005

[4] [4]

Agnostic learning of general relu activation using gradient descent

Pranjal Awasthi, Alex Tang, and Aravindan Vijayaraghavan. Agnostic learning of general relu activation using gradient descent. In The Eleventh International Conference on Learning Representations, ICLR 2023 . OpenReview.net, 2023

work page 2023

[5] [5]

Bandit multiclass linear classification: Efficient algorithms for the separable case

Alina Beygelzimer, David Pal, Balazs Szorenyi, Devanathan Thiruvenkatachari, Chen-Yu Wei, and Chicheng Zhang. Bandit multiclass linear classification: Efficient algorithms for the separable case. In International Conference on Machine Learning , pages 624--633. PMLR, 2019

work page 2019

[6] [6]

Ultraconservative online algorithms for multiclass problems

Koby Crammer and Yoram Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research , 3(Jan):951--991, 2003

work page 2003

[7] [7]

Complexity theoretic limitations on learning halfspaces

Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 105--117, 2016

work page 2016

[8] [8]

Approximation schemes for relu regression

Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R Klivans, and Mahdi Soltanolkotabi. Approximation schemes for relu regression. In Conference on Learning Theory , pages 1452--1485. PMLR, 2020

work page 2020

[9] [9]

Kane, and Nikos Zarifis

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, and Nikos Zarifis. Robust Learning of Multi-index Models via Iterative Subspace Approximation . In 2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , pages 2231--2239, Los Alamitos, CA, USA, December 2025. IEEE Computer Society

work page 2025

[10] [10]

Which is the best multiclass svm method? an empirical study

Kai-Bo Duan and S Sathiya Keerthi. Which is the best multiclass svm method? an empirical study. In International workshop on multiple classifier systems , pages 278--285. Springer, 2005

work page 2005

[11] [11]

Algorithmic high-dimensional robust statistics

Ilias Diakonikolas and Daniel M Kane. Algorithmic high-dimensional robust statistics . Cambridge university press, 2023

work page 2023

[12] [12]

Active learning of general halfspaces: Label queries vs membership queries

Ilias Diakonikolas, Daniel Kane, and Mingchen Ma. Active learning of general halfspaces: Label queries vs membership queries. Advances in Neural Information Processing Systems , 37:49180--49213, 2024

work page 2024

[13] [13]

Robust regression of general ReLUs with queries

Ilias Diakonikolas, Daniel Kane, and Mingchen Ma. Robust regression of general ReLUs with queries. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

work page 2025

[14] [14]

Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals

Ilias Diakonikolas, Daniel Kane, and Lisheng Ren. Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals. In International Conference on Machine Learning , pages 7922--7938. PMLR, 2023

work page 2023

[15] [15]

Learning geometric concepts with nasty noise

Ilias Diakonikolas, Daniel Kane, and Alistair Stewart. Learning geometric concepts with nasty noise. In Proceedings of the 50 th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018 , pages 1061--1073, 2018

work page 2018

[16] [16]

Non-convex sgd learns halfspaces with adversarial label noise

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Non-convex sgd learns halfspaces with adversarial label noise. Advances in Neural Information Processing Systems , 33:18540--18549, 2020

work page 2020

[17] [17]

Learning general halfspaces with adversarial label noise via online gradient descent

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Learning general halfspaces with adversarial label noise via online gradient descent. In International Conference on Machine Learning , pages 5118--5141. PMLR, 2022

work page 2022

[18] [18]

Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals

Ilias Diakonikolas, Daniel Kane, and Nikos Zarifis. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems , 33:13586--13596, 2020

work page 2020

[19] [19]

Statistical query hardness of multiclass linear classification with random classification noise

Ilias Diakonikolas, Mingchen Ma, Lisheng Ren, and Christos Tzamos. Statistical query hardness of multiclass linear classification with random classification noise. In International Conference on Machine Learning , pages 13693--13712. PMLR, 2025

work page 2025

[20] [20]

Agnostic learning of a single neuron with gradient descent

Spencer Frei, Yuan Cao, and Quanquan Gu. Agnostic learning of a single neuron with gradient descent. Advances in neural information processing systems , 33:5417--5428, 2020

work page 2020

[21] [21]

Agnostic learning of arbitrary relu activation under gaussian marginals

Anxin Guo and Aravindan Vijayaraghavan. Agnostic learning of arbitrary relu activation under gaussian marginals. In The Thirty Eighth Annual Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 2592--2631. PMLR , 2025

work page 2025

[22] [22]

Decision theoretic generalizations of the pac model for neural net and other learning applications

David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation , 100(1):78--150, 1992

work page 1992

[23] [23]

A comparison of methods for multiclass support vector machines

Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks , 13(2):415--425, 2002

work page 2002

[24] [24]

Extreme learning machine for regression and multiclass classification

Guang-Bin Huang, Hongming Zhou, Xiaojian Ding, and Rui Zhang. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 42(2):513--529, 2011

work page 2011

[25] [25]

The gain from ordering in online learning

Vasilis Kontonis, Mingchen Ma, and Christos Tzamos. The gain from ordering in online learning. Advances in Neural Information Processing Systems , 36, 2024

work page 2024

[26] [26]

Toward efficient agnostic learning

Michael J Kearns, Robert E Schapire, and Linda M Sellie. Toward efficient agnostic learning. In Proceedings of the fifth annual workshop on Computational learning theory , pages 341--352, 1992

work page 1992

[27] [27]

Efficient bandit algorithms for online multiclass prediction

Sham M Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. Efficient bandit algorithms for online multiclass prediction. In Proceedings of the 25th international conference on Machine learning , pages 440--447, 2008

work page 2008

[28] [28]

Large margin dags for multiclass classification

John Platt, Nello Cristianini, and John Shawe-Taylor. Large margin dags for multiclass classification. Advances in neural information processing systems , 12, 1999

work page 1999

[29] [29]

The perceptron: a probabilistic model for information storage and organization in the brain

Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review , 65(6):386, 1958

work page 1958

[30] [30]

Understanding machine learning: From theory to algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

work page 2014

[31] [31]

On the consistency of multiclass classification methods

Ambuj Tewari and Peter L Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research , 8(5), 2007

work page 2007

[32] [32]

Hardness of agnostically learning halfspaces from worst-case lattice problems

Stefan Tiegel. Hardness of agnostically learning halfspaces from worst-case lattice problems. In The Thirty Sixth Annual Conference on Learning Theory , pages 3029--3064. PMLR, 2023

work page 2023

[33] [33]

High-dimensional probability: An introduction with applications in data science , volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science , volume 47. Cambridge university press, 2018

work page 2018

[34] [34]

Learning a single neuron with bias using gradient descent

Gal Vardi, Gilad Yehudai, and Ohad Shamir. Learning a single neuron with bias using gradient descent. Advances in Neural Information Processing Systems , 34:28690--28700, 2021

work page 2021

[35] [35]

Robustly learning monotone single-index models

Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning monotone single-index models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

work page 2025

[36] [36]

Revisiting perceptron: Efficient and label-optimal learning of halfspaces

Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. Advances in Neural Information Processing Systems , 30, 2017

work page 2017

[37] [37]

Robustly learning single-index models via alignment sharpness

Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning single-index models via alignment sharpness. In Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research , pages 58197--58243. PMLR, 21--27 Jul 2024

work page 2024

[38] [38]

Robustly learning monotone generalized linear models via data augmentation

Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, and Jelena Diakonikolas. Robustly learning monotone generalized linear models via data augmentation. In The Thirty Eighth Annual Conference on Learning Theory , volume 291 of Proceedings of Machine Learning Research , pages 5921--5990. PMLR , 2025

work page 2025