CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.
and Bergen, Benjamin K
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A new evaluation framework using MMD on Biber features shows LLMs deviate from human linguistic distributions across registers, with closest models varying by register rather than size.
GPT-4o exhibits daily and weekly periodic fluctuations in performance on a fixed physics task, accounting for about 20% of observed variance.
citing papers explorer
-
Child-directed speech facilitates production, not comprehension, in BabyLMs
CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.
-
How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework
A new evaluation framework using MMD on Biber features shows LLMs deviate from human linguistic distributions across registers, with closest models varying by register rather than size.