Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Stabilizing Extrapolation in Looped Transformers via Learned Stochastic Stopping
Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.