Pre-pretraining on MP-STRUCT matches k-Shuffle Dyck baselines in efficiency while adding human-like resistance to implausible languages and challenges the need for C-RASP definability in effective PPT languages.
Linguistic Inquiry , volume =
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5representative citing papers
Brain Score remains similar when language models are trained on diverse natural languages or on structured non-language data like DNA and code, indicating the metric tracks shared structural extraction but is not diagnostic of human-like language processing.
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.
Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.
Lil-Bevo applies music pretraining, curriculum learning on sequence length, and targeted masking to small LMs in the BabyLM challenge, finding modest gains from short sequences but overall limited performance.
citing papers explorer
-
Language Acquisition Device in Large Language Models
Pre-pretraining on MP-STRUCT matches k-Shuffle Dyck baselines in efficiency while adding human-like resistance to implausible languages and challenges the need for C-RASP definability in effective PPT languages.
-
Brain Score Tracks Shared Properties of Languages: Evidence from Many Natural Languages and Structured Sequences
Brain Score remains similar when language models are trained on diverse natural languages or on structured non-language data like DNA and code, indicating the metric tracks shared structural extraction but is not diagnostic of human-like language processing.
-
Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.
-
Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models
Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.
-
Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
Lil-Bevo applies music pretraining, curriculum learning on sequence length, and targeted masking to small LMs in the BabyLM challenge, finding modest gains from short sequences but overall limited performance.