An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.
Journal of Machine Learning Research , volume=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.
citing papers explorer
-
Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift
An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.
-
Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
-
On the Blessing of Pre-training in Weak-to-Strong Generalization
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
-
Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning
Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.