Large constant learning rates in a two-factor linear transformer model can induce cycles, bounded chaos, or divergence rather than convergence to a single in-context linear-regression solution.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
Large constant learning rates in a two-factor linear transformer model can induce cycles, bounded chaos, or divergence rather than convergence to a single in-context linear-regression solution.