arXiv preprint arXiv:2501.19105 , year=

Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss , author= · arXiv 2501.19105

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0

In the linear-width regime, the second GD step yields a spiked random matrix whose number of outliers is floor(alpha2 / (1/2 - alpha1)), and batch reuse enables learning directions with information exponent greater than one under suitable alpha scalings.

On the Blessing of Pre-training in Weak-to-Strong Generalization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.

citing papers explorer

Showing 3 of 3 citing papers.

Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models) cs.LG · 2026-05-07 · unverdicted · none · ref 9
Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 252
In the linear-width regime, the second GD step yields a spiked random matrix whose number of outliers is floor(alpha2 / (1/2 - alpha1)), and batch reuse enables learning directions with information exponent greater than one under suitable alpha scalings.
On the Blessing of Pre-training in Weak-to-Strong Generalization cs.LG · 2026-05-07 · unverdicted · none · ref 121
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.

arXiv preprint arXiv:2501.19105 , year=

fields

years

verdicts

representative citing papers

citing papers explorer