Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
Arvind Mahankali, Tatsunori B
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
The paper surveys definitions, techniques, applications, and challenges in in-context learning for large language models.
citing papers explorer
-
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
-
Learning to Adapt: In-Context Learning Beyond Stationarity
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
-
Provable Knowledge Acquisition and Extraction in One-Layer Transformers
In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds
-
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
-
A Survey on In-context Learning
The paper surveys definitions, techniques, applications, and challenges in in-context learning for large language models.