During pretraining, language models exhibit natural ungrokking where learned rules are forgotten based on their support frequency in the corpus, with asymmetric editability of rule survival.
Grokking of Hierarchical Structure in Vanilla Transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.
citing papers explorer
-
Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
During pretraining, language models exhibit natural ungrokking where learned rules are forgotten based on their support frequency in the corpus, with asymmetric editability of rule survival.
-
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.