arXiv preprint arXiv:2402.15505 , year=

Co-supervised learning: Improving weak-to-strong generalization with hierarchical mixture of experts , author= · arXiv 2402.15505

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

On the Blessing of Pre-training in Weak-to-Strong Generalization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

cs.LG · 2025-02-07 · unverdicted · novelty 6.0

In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.

Generalizable Video Quality Assessment via Weak-to-Strong Learning

cs.CV · 2025-05-06

citing papers explorer

Showing 4 of 4 citing papers.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 263 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
On the Blessing of Pre-training in Weak-to-Strong Generalization cs.LG · 2026-05-07 · unverdicted · none · ref 100
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension cs.LG · 2025-02-07 · unverdicted · none · ref 15
In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.
Generalizable Video Quality Assessment via Weak-to-Strong Learning cs.CV · 2025-05-06 · unreviewed · ref 9

arXiv preprint arXiv:2402.15505 , year=

fields

years

verdicts

representative citing papers

citing papers explorer