arXiv preprint arXiv:2307.11046 , title =

Abel, David, Barreto, Andr · 2023 · arXiv 2307.11046

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning

cs.LG · 2026-05-08 · unverdicted · novelty 8.0 · 2 refs

Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.

Optimal control of the future via prospective learning with control

stat.ML · 2025-11-11 · unverdicted · novelty 5.0

Prospective Learning with Control proves ERM asymptotically achieves the Bayes optimal policy in non-stationary reset-free settings and outperforms time-aware RL on a 1D foraging benchmark.

LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems

cs.AI · 2026-04-14 · unverdicted · novelty 4.0

LIFE is a proposed agentic framework that combines four components to enable incremental, flexible, and energy-efficient continual learning for HPC operations such as latency spike mitigation.

citing papers explorer

Showing 4 of 4 citing papers.

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning cs.LG · 2026-05-08 · unverdicted · none · ref 124 · 2 links
Softmax Transformers implement in-context RL through equivalence to weighted softmax TD updates, with error decay under contraction and parameters as global minimizers of pretraining loss.
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought cs.LG · 2026-05-08 · unverdicted · none · ref 93
With specific linear Transformer parameters, CoT generation equals iterative TD updates, yielding geometric error decay with CoT length until a context-length statistical floor, and those parameters globally minimize the pretraining loss.
Optimal control of the future via prospective learning with control stat.ML · 2025-11-11 · unverdicted · none · ref 16
Prospective Learning with Control proves ERM asymptotically achieves the Bayes optimal policy in non-stationary reset-free settings and outperforms time-aware RL on a 1D foraging benchmark.
LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems cs.AI · 2026-04-14 · unverdicted · none · ref 4
LIFE is a proposed agentic framework that combines four components to enable incremental, flexible, and energy-efficient continual learning for HPC operations such as latency spike mitigation.

arXiv preprint arXiv:2307.11046 , title =

fields

years

verdicts

representative citing papers

citing papers explorer