arXiv preprint arXiv:1911.01544 , year=

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , author= · 1911 · arXiv 1911.01544

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

cs.LG · 2026-05-21 · unverdicted · novelty 8.0

In the proportional high-dimensional regime, stronger backdoor training triggers improve clean accuracy and make attack success non-monotonic for regularized GLMs on Gaussian mixtures, with closed-form proofs for squared loss and fixed-point extensions to convex losses.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.

Double descent for least-squares interpolation on contaminated data: A simulation study

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Simulations show that least-squares interpolation on contaminated data exhibits double descent with superior generalization over robust alternatives at high overparameterization.

citing papers explorer

Showing 4 of 4 citing papers.

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks cs.LG · 2026-05-21 · unverdicted · none · ref 31
In the proportional high-dimensional regime, stronger backdoor training triggers improve clean accuracy and make attack success non-monotonic for regularized GLMs on Gaussian mixtures, with closed-form proofs for squared loss and fixed-point extensions to convex losses.
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 127 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization cs.LG · 2026-05-13 · unverdicted · none · ref 186
The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.
Double descent for least-squares interpolation on contaminated data: A simulation study cs.LG · 2026-04-15 · unverdicted · none · ref 5
Simulations show that least-squares interpolation on contaminated data exhibits double descent with superior generalization over robust alternatives at high overparameterization.

arXiv preprint arXiv:1911.01544 , year=

fields

years

verdicts

representative citing papers

citing papers explorer