Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Abrupt learning in transformers: A case study on matrix completion, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Emergent capabilities arise stochastically from abrupt learning of sparse attention patterns on synthetic linear map and cellular automata tasks, with larger models learning them earlier on average.
citing papers explorer
-
A Systematic Study of Behavioral Cloning for Scientific Data Annotation
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
-
Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns
Emergent capabilities arise stochastically from abrupt learning of sparse attention patterns on synthetic linear map and cellular automata tasks, with larger models learning them earlier on average.