A first-passage time model produces the law T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star) that predicts grokking delays with 17.7% MAPE on held-out AdamW runs after calibrating two parameters on one cell.
Output:results/campaign1/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
A first-passage time model produces the law T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star) that predicts grokking delays with 17.7% MAPE on held-out AdamW runs after calibrating two parameters on one cell.