Weak-to-strong generalization even in random feature networks, provably.arXiv preprint arXiv:2503.02877

Medvedev, M · arXiv 2503.02877

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.

On the Blessing of Pre-training in Weak-to-Strong Generalization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

cs.LG · 2025-02-07 · unverdicted · novelty 6.0

In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.

citing papers explorer

Showing 3 of 3 citing papers.

Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models) cs.LG · 2026-05-07 · unverdicted · none · ref 10
Weak-to-strong generalization is nearly inevitable in linear logistic regression for most student-teacher pairs without any model capacity mismatch.
On the Blessing of Pre-training in Weak-to-Strong Generalization cs.LG · 2026-05-07 · unverdicted · none · ref 124
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension cs.LG · 2025-02-07 · unverdicted · none · ref 16
In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.

Weak-to-strong generalization even in random feature networks, provably.arXiv preprint arXiv:2503.02877

fields

years

verdicts

representative citing papers

citing papers explorer