Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.
arXiv preprint arXiv:2501.19105 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
In the linear-width regime, the second GD step yields a spiked random matrix whose number of outliers is floor(alpha2 / (1/2 - alpha1)), and batch reuse enables learning directions with information exponent greater than one under suitable alpha scalings.
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
citing papers explorer
-
Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)
Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.
-
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
In the linear-width regime, the second GD step yields a spiked random matrix whose number of outliers is floor(alpha2 / (1/2 - alpha1)), and batch reuse enables learning directions with information exponent greater than one under suitable alpha scalings.
-
On the Blessing of Pre-training in Weak-to-Strong Generalization
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.