Specifically, for 1≤i≤10 , it trains on sequences truncated to 5i steps, and for 11≤i≤13 , it trains on sequences It trains on sequences truncated to 50 + 15(i−10) steps

Curriculum Strategy:We teach the model to learn to reason over progressively larger MW sequences, one stage at a time

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Transformers equipped with continuous latent context tokens can implement foundational online decision-making algorithms such as weighted majority and Q-learning, and a trained small model outperforms larger LLMs on synthetic online prediction tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Continuous Latent Contexts Enable Efficient Online Learning in Transformers cs.LG · 2026-05-11 · unverdicted · none · ref 8
Transformers equipped with continuous latent context tokens can implement foundational online decision-making algorithms such as weighted majority and Q-learning, and a trained small model outperforms larger LLMs on synthetic online prediction tasks.

Specifically, for 1≤i≤10 , it trains on sequences truncated to 5i steps, and for 11≤i≤13 , it trains on sequences It trains on sequences truncated to 50 + 15(i−10) steps

fields

years

verdicts

representative citing papers

citing papers explorer