Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
Machine Learning , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A Lasso-based screening step followed by low-dimensional mean-variance optimization on the selected assets improves high-dimensional portfolio construction, with a defactoring extension for strong factors.
JASPER is a new joint Bayesian regression model for spatial transcriptomics that accounts for correlations between genes to better identify spatially varying genes.
citing papers explorer
-
Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity
Multi-head attention is an ensemble of Nadaraya-Watson estimators whose MSE decreases monotonically with a new spectral Head Diversity Index measuring subspace decorrelation, yielding optimal head count and dimension scaling laws under fixed total dimension.
-
Post-Screening Portfolio Selection
A Lasso-based screening step followed by low-dimensional mean-variance optimization on the selected assets improves high-dimensional portfolio construction, with a defactoring extension for strong factors.
-
JASPER: Joint Bayesian Analysis of Spatial Expression via Regression
JASPER is a new joint Bayesian regression model for spatial transcriptomics that accounts for correlations between genes to better identify spatially varying genes.