Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
What can transformers learn in-context? a case study of simple function classes.ArXiv, abs/2208.01066
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
citing papers explorer
-
What learning algorithm is in-context learning? Investigations with linear models
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
-
Spectral Transformer Neural Processes
STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.
-
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
-
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
-
When Context Sticks: Studying Interference in In-Context Learning
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.