A carefully designed two-layer neural network with channel attention trained by gradient descent achieves the minimax optimal sample complexity Theta(d to the ell_0 over epsilon) for learning degree-ell_0 spherical polynomials.
Minimax-optimal rates for sparse additive models over kernel classes via convex programming
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention
A carefully designed two-layer neural network with channel attention trained by gradient descent achieves the minimax optimal sample complexity Theta(d to the ell_0 over epsilon) for learning degree-ell_0 spherical polynomials.