Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
Bandit algorithms , year =
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
other 1polarities
unclear 1representative citing papers
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
SMART transfers knowledge in multi-task linear regression via spectral subspace similarity assumptions, achieving near-minimax Frobenius error rates while requiring only a fitted source model.
Random team assignments in a professional firm reveal that indirect ties strongly increase new direct tie formation, while effects of degree and local density are smaller and less robust.
citing papers explorer
-
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
-
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
-
SMART: A Spectral Transfer Approach to Multi-Task Learning
SMART transfers knowledge in multi-task linear regression via spectral subspace similarity assumptions, achieving near-minimax Frobenius error rates while requiring only a fitted source model.
-
Causal inference for social network formation
Random team assignments in a professional firm reveal that indirect ties strongly increase new direct tie formation, while effects of degree and local density are smaller and less robust.