Journal of Machine Learning Research , volume=

Rademacher, Gaussian complexities: Risk bounds, structural results , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift

cs.DS · 2026-05-07 · unverdicted · novelty 8.0 · 2 refs

An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

stat.ML · 2026-05-18 · unverdicted · novelty 7.0

Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.

On the Blessing of Pre-training in Weak-to-Strong Generalization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.

Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning

cs.LG · 2026-05-02 · unverdicted · novelty 4.0

Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.

citing papers explorer

Showing 4 of 4 citing papers.

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift cs.DS · 2026-05-07 · unverdicted · none · ref 60 · 2 links
An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.
Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity stat.ML · 2026-05-18 · unverdicted · none · ref 20
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
On the Blessing of Pre-training in Weak-to-Strong Generalization cs.LG · 2026-05-07 · unverdicted · none · ref 19
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning cs.LG · 2026-05-02 · unverdicted · none · ref 16
Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.

Journal of Machine Learning Research , volume=

fields

years

verdicts

representative citing papers

citing papers explorer