Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
Advances in Neural Information Processing Systems , year=
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
citing papers explorer
-
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
-
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.