Standard softmax-attention transformers can approximate the Gaussian kernel ridge regression predictor by implementing preconditioned Richardson iteration during their forward pass.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
unclear 1representative citing papers
ACTS improves Thompson sampling in high-dimensional Bayesian optimization by adaptively reducing the search space using gradients from surrogate samples to produce better maximizer samples.
citing papers explorer
-
Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression
Standard softmax-attention transformers can approximate the Gaussian kernel ridge regression predictor by implementing preconditioned Richardson iteration during their forward pass.
-
Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization
ACTS improves Thompson sampling in high-dimensional Bayesian optimization by adaptively reducing the search space using gradients from surrogate samples to produce better maximizer samples.