Transformers perform kernel-based prediction for Hölder regression on manifolds and achieve intrinsic-dimension-dependent minimax rates with sufficient training tasks.
Deep relu network approximation of functions on a manifold
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
citing papers explorer
-
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Transformers perform kernel-based prediction for Hölder regression on manifolds and achieve intrinsic-dimension-dependent minimax rates with sufficient training tasks.
-
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Transformers achieve approximation and generalization error bounds for noisy manifold regression that scale with the intrinsic dimension of the task-level manifold.
-
A Composite Activation Function for Learning Stable Binary Representations
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.