Transformers learn latent structure components in discrete stages during training, composing rules more robustly than decomposing complex examples, with identified layer plasticity windows.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Understanding the Staged Dynamics of Transformers in Learning Latent Structure
Transformers learn latent structure components in discrete stages during training, composing rules more robustly than decomposing complex examples, with identified layer plasticity windows.