Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.
Journal of Functional Analysis , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
citing papers explorer
-
Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers
Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.
-
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.