pith. sign in

arxiv: 2605.15440 · v1 · pith:VEH5UWC6new · submitted 2026-05-14 · 💻 cs.CL

Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis

Pith reviewed 2026-05-19 14:45 UTC · model grok-4.3

classification 💻 cs.CL
keywords surprisal theorygarden path sentenceslanguage modelssentence processingsyntactic ambiguityrecurrent neural network grammarsparsingreading times
0
0 comments X

The pith

Reducing the number of simultaneous parses in language models increases predicted garden path effects but not enough to match human reading difficulties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Surprisal theory links word processing difficulty to how predictable a word is given context. Language models predict reading times well in natural text but underestimate difficulty in sentences with temporary syntactic ambiguities like garden paths. The paper tests whether this underestimation occurs because models can consider more possible sentence interpretations at the same time than humans can. By using recurrent neural network grammars and limiting the beam of active parses when calculating surprisal, they find that fewer parses do make the models predict larger difficulty spikes at the disambiguating word. However, even with very limited parses the effect sizes remain smaller than those seen in human experiments.

Core claim

The parse multiplicity mismatch hypothesis posits that language models are less surprised than humans by garden path sentences because they maintain a larger number of simultaneous active parses. Using word-synchronous beam search in RNNGs, the authors vary the beam size to control the number of parses and compute surprisals. They show that smaller beams increase the magnitude of predicted garden path effects, but these increases are insufficient to account for the full size of the effects observed in human reading time data.

What carries the argument

Word-synchronous beam search in Recurrent Neural Network Grammars (RNNGs), which limits the number of simultaneous parses used to compute next-word surprisal.

If this is right

  • Smaller numbers of active parses lead to larger predicted processing difficulty at points of syntactic disambiguation.
  • Current LM surprisal measures, even with reduced parse multiplicity, still underpredict human garden path effects.
  • Other differences between human and model parsing mechanisms must be responsible for the remaining mismatch in surprise magnitudes.
  • Surprisal from models with constrained parse sets remains a partial but incomplete account of human sentence processing difficulty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future work could test whether incorporating human-like memory limitations or reanalysis costs into models would better match human data.
  • Similar experiments with other types of ambiguity or in different languages might reveal if parse multiplicity plays a larger role elsewhere.
  • If models cannot be made to match humans by limiting parses, researchers may need to focus on how humans integrate information across parses rather than how many they keep active.

Load-bearing premise

That controlling the beam size in word-synchronous search with RNNGs accurately represents the number of distinct interpretations a human sentence parser can maintain in parallel.

What would settle it

An experiment that directly measures or estimates how many sentence interpretations humans maintain during parsing of garden path sentences and compares that number to the beam size required to match human effect magnitudes.

Figures

Figures reproduced from arXiv: 2605.15440 by Brian Dillon, Tal Linzen, William Timkey.

Figure 1
Figure 1. Figure 1: (Top) An example of a garden path sentence with its initially preferred and globally [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing the computation of next word probabilities in standard LMs, and in [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Goodness of fit (increase in log-likelihood compared to baseline model; higher is [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predicted garden path effects summed across the disambiguating word and two [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Predicted garden path effects at the first spillover region, measured in milliseconds [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Surprisal theory posits that the processing difficulty of a word is determined by its predictability in context, offering a potential link between human sentence processing and next-word predictions from language models. While language model (LM) surprisals successfully predict reading times in naturalistic text, they systematically underpredict the magnitude of difficulty observed in controlled studies of syntactic ambiguity, particularly in garden path sentences. This mismatch might arise from differences in the computational constraints between humans and LMs. Here we test one such hypothesis, specifically, that LMs may be able to simultaneously consider a greater number of distinct sentence interpretations at once, compared to humans. Using Recurrent Neural Network Grammars (RNNGs) with word-synchronous beam search, we systematically vary the number of simultaneous parses used to compute word surprisal, and then use these surprisals to predict human reading times. Reducing the number of simultaneous active parses indeed increases the magnitude of predicted garden path effects, but not nearly enough to capture the full magnitude of the effects in humans. This suggests that differences in the number of simultaneous parses available to LMs and humans cannot reconcile LM-based surprisal with human sentence processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript tests whether the underprediction of garden-path effects by language-model surprisals relative to human reading times can be explained by a difference in parse multiplicity: LMs may maintain more simultaneous syntactic analyses than humans. The authors use Recurrent Neural Network Grammars (RNNGs) together with word-synchronous beam search, systematically vary beam width to control the number of active parses, recompute word surprisals, and regress these surprisals against human reading-time data from controlled garden-path experiments. They report that smaller beam sizes increase the magnitude of predicted garden-path effects, yet the increase remains substantially smaller than the effects measured in humans.

Significance. If the beam-width manipulation validly isolates the number of simultaneous parses, the result indicates that parse-multiplicity differences alone cannot reconcile LM surprisal with human processing difficulty and therefore directs attention to other computational distinctions (e.g., integration mechanisms or resource allocation). The work supplies a direct, controllable test of a specific hypothesis about parsing constraints and demonstrates that RNNG surprisals can be modulated in the predicted direction, which is a methodological strength.

major comments (1)
  1. [Methods] Methods section on word-synchronous beam search: the central claim that varying beam width cleanly manipulates the number of distinct active parses (and thereby tests the multiplicity hypothesis) rests on an unverified assumption. Beam pruning is probability-driven and can discard low-probability but structurally distinct continuations before they affect surprisal; simultaneously, narrower beams degrade next-word prediction quality on unambiguous material. No beam-diversity metrics or surprisal calibration checks on unambiguous control sentences at matched beam sizes are reported, leaving open the possibility that the observed increase in garden-path magnitude is partly an artifact of poorer global model calibration rather than a pure multiplicity effect.
minor comments (2)
  1. [Results] Results: specify exactly how the magnitude comparison between model-predicted and human garden-path effects is quantified (e.g., ratio of regression coefficients, Cohen’s d, or raw millisecond differences).
  2. [Figures] Figure captions: ensure that error bars or confidence intervals are described and that the number of items per condition is stated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment below and indicate planned revisions to strengthen the methodological claims.

read point-by-point responses
  1. Referee: Methods section on word-synchronous beam search: the central claim that varying beam width cleanly manipulates the number of distinct active parses (and thereby tests the multiplicity hypothesis) rests on an unverified assumption. Beam pruning is probability-driven and can discard low-probability but structurally distinct continuations before they affect surprisal; simultaneously, narrower beams degrade next-word prediction quality on unambiguous material. No beam-diversity metrics or surprisal calibration checks on unambiguous control sentences at matched beam sizes are reported, leaving open the possibility that the observed increase in garden-path magnitude is partly an artifact of poorer global model calibration rather than a pure multiplicity effect.

    Authors: We agree that explicit verification of beam composition would strengthen the interpretation. Word-synchronous beam search in RNNGs maintains the k highest-probability partial derivations at each word boundary, so beam width directly limits the number of active syntactic analyses used for marginalizing next-word probability. While probability-driven pruning can in principle drop low-probability but distinct structures, this is precisely the mechanism that reduces parse multiplicity—the quantity our hypothesis targets. To address calibration concerns, we will add (i) beam-diversity statistics (unique parse trees and their structural entropy) across beam widths and (ii) surprisal calibration plots and perplexity on unambiguous control items at the same beam sizes used in the garden-path regressions. These supplementary analyses will be included in the revised manuscript to demonstrate that the increase in garden-path magnitude is not solely an artifact of degraded global prediction quality. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical manipulation and external comparison

full rationale

The paper performs an empirical test by using word-synchronous beam search in RNNGs to vary the number of active parses, deriving surprisal values from the resulting distributions, and then comparing those values against independent human reading-time measurements in garden-path sentences. No derivation step equates a model output to its input by construction, renames a known result, or relies on a self-citation chain for a uniqueness claim; the central finding (reduced multiplicity increases effect size but remains insufficient) is obtained by direct measurement against external data and is therefore falsifiable without internal reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on surprisal theory as the linking hypothesis between model probabilities and reading times, plus the assumption that RNNG beam search approximates human parallel parsing capacity. No new entities are introduced.

free parameters (1)
  • beam width / number of simultaneous parses
    Systematically varied to test the hypothesis; values are chosen by the experimenter rather than derived.
axioms (1)
  • domain assumption Surprisal theory: processing difficulty is determined by word predictability in context
    Invoked in the opening paragraph as the foundation linking LM output to human reading times.

pith-pipeline@v0.9.0 · 5731 in / 1191 out tokens · 62728 ms · 2026-05-19T14:45:48.657231+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Using Recurrent Neural Network Grammars (RNNGs) with word-synchronous beam search, we systematically vary the number of simultaneous parses used to compute word surprisal... Reducing the number of simultaneous active parses indeed increases the magnitude of predicted garden path effects, but not nearly enough...

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · 2 internal anchors

  1. [1]

    Marr, David , month = may, year =. Vision:

  2. [2]

    Lowerre, B. T. , month = apr, year =. The

  3. [3]

    , editor =

    Lewis, Richard L. , editor =. Specifying. Architectures and. 1999 , pages =. doi:10.1017/CBO9780511527210.004 , abstract =

  4. [4]

    and Tanenhaus, Michael K

    Trueswell, John C. and Tanenhaus, Michael K. , year =. Toward a lexicalist framework of constraint-based syntactic ambiguity resolution , isbn =. Perspectives on sentence processing , publisher =

  5. [5]

    Constraint-based models of sentence processing , isbn =

    McRae, Ken and Matsuki, Kazunaga , year =. Constraint-based models of sentence processing , isbn =. Sentence processing , publisher =

  6. [6]

    Sentence processing:

    Frazier, Lyn , year =. Sentence processing:. Attention and performance 12:

  7. [7]

    Goldstein, Ariel and Zada, Zaid and Buchnik, Eliav and Schain, Mariano and Price, Amy and Aubrey, Bobbi and Nastase, Samuel A. and Feder, Amir and Emanuel, Dotan and Cohen, Alon and Jansen, Aren and Gazula, Harshvardhan and Choe, Gina and Rao, Aditi and Kim, Catherine and Casto, Colton and Fanda, Lora and Doyle, Werner and Friedman, Daniel and Dugan, Patr...

  8. [8]

    Brains and algorithms partially converge in natural language processing.Communications Biology, 5(1):134, 2022

    Caucheteux, Charlotte and King, Jean-Rémi , month = feb, year =. Brains and algorithms partially converge in natural language processing , volume =. Communications Biology , publisher =. doi:10.1038/s42003-022-03036-1 , abstract =

  9. [9]

    , month = apr, year =

    Dunagan, Donald and Low, Dylan Scott and Yue, Shisen and Meyer, Lars and Hale, John T. , month = apr, year =. Temporal. doi:10.64898/2026.04.20.719609 , abstract =

  10. [10]

    Journal of Memory and Language , author =

    Hierarchical relations guide memory retrieval in sentence comprehension:. Journal of Memory and Language , author =. 2026 , keywords =. doi:10.1016/j.jml.2026.104747 , abstract =

  11. [11]

    Topics in Cognitive Science , author =

    A. Topics in Cognitive Science , author =. 2025 , note =. doi:10.1111/tops.12780 , abstract =

  12. [12]

    Dialogue & Discourse , author =

    Locality in. Dialogue & Discourse , author =. 2011 , pages =. doi:10.5087/dad.2011.104 , abstract =

  13. [13]

    Syntactic

    Dillon, Brian and Keshev, Maayan , editor =. Syntactic. The. 2025 , keywords =. doi:10.1017/9781009179362.035 , abstract =

  14. [14]

    and Monaghan, Padraic and Tsoukala, Chara , editor =

    Frank, Stefan L. and Monaghan, Padraic and Tsoukala, Chara , editor =. Neural. Human. 2019 , pages =. doi:10.7551/mitpress/10841.003.0026 , language =

  15. [15]

    Algorithmic

    Maina-Kilaas, Amani and Levy, Roger , month = mar, year =. Algorithmic. doi:10.48550/arXiv.2603.11412 , abstract =

  16. [16]

    , month = oct, year =

    Hale, John T. , month = oct, year =. Automaton

  17. [17]

    Journal of Memory and Language , author =

    Context ameliorates but does not eliminate garden-pathing:. Journal of Memory and Language , author =. 2026 , keywords =. doi:10.1016/j.jml.2026.104748 , abstract =

  18. [18]

    Behavior Research Methods , author =

    The. Behavior Research Methods , author =. 2018 , keywords =. doi:10.3758/s13428-017-0908-4 , abstract =

  19. [19]

    Journal of Memory and Language , author =

    Learning filler-gap dependencies with neural language models:. Journal of Memory and Language , author =. 2025 , keywords =. doi:10.1016/j.jml.2025.104663 , abstract =

  20. [20]

    Learning

    Kush, Dave and Sant, Charlotte and Strætkvern, Sunniva Briså , month = sep, year =. Learning. Glossa: a journal of general linguistics , publisher =. doi:10.16995/glossa.5774 , abstract =

  21. [21]

    Journal of Memory and Language , author =

    Incremental alternative sampling as a lens into the temporal and representational resolution of linguistic prediction , volume =. Journal of Memory and Language , author =. 2026 , keywords =. doi:10.1016/j.jml.2025.104715 , abstract =

  22. [22]

    Cognition , author =

    What drives regressions in reading?. Cognition , author =. 2026 , keywords =. doi:10.1016/j.cognition.2026.106535 , abstract =

  23. [23]

    and Levy, Roger P

    Michaelov, James A. and Levy, Roger P. , month = mar, year =. N-gram-like. doi:10.48550/arXiv.2603.09872 , abstract =

  24. [24]

    Aina, Laura and Linzen, Tal , editor =. The. Proceedings of the. 2021 , pages =. doi:10.18653/v1/2021.blackboxnlp-1.4 , abstract =

  25. [25]

    Machine Learning , author =

    Distributed representations, simple recurrent networks, and grammatical structure , volume =. Machine Learning , author =. 1991 , keywords =. doi:10.1007/BF00114844 , abstract =

  26. [26]

    Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior? , url =

    Chobey, Aryaman and Smith, Oliver and Wang, Anzi and Prasad, Grusha , editor =. Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior? , url =. Proceedings of the. 2023 , pages =. doi:10.18653/v1/2023.conll-babylm.9 , urldate =

  27. [27]

    Pimentel, Tiago and Meister, Clara , editor =. How to. Proceedings of the 2024. 2024 , pages =. doi:10.18653/v1/2024.emnlp-main.1020 , abstract =

  28. [28]

    Oh, Byung-Doh and Schuler, William , editor =. Leading. Proceedings of the 2024. 2024 , pages =. doi:10.18653/v1/2024.emnlp-main.202 , urldate =

  29. [29]

    McCurdy, Kate and Hahn, Michael , editor =. Lossy. Proceedings of the 28th. 2024 , pages =. doi:10.18653/v1/2024.conll-1.4 , abstract =

  30. [30]

    Behavior Research Methods , author =

    Expanding horizons of cross-linguistic research on reading:. Behavior Research Methods , author =. 2022 , keywords =. doi:10.3758/s13428-021-01772-6 , abstract =

  31. [31]

    and Hendrick, Randall and Johnson, Marcus , year =

    Gordon, Peter C. and Hendrick, Randall and Johnson, Marcus , year =. Memory interference during language processing , volume =. Journal of Experimental Psychology: Learning, Memory, and Cognition , publisher =. doi:10.1037/0278-7393.27.6.1411 , abstract =

  32. [33]

    Proceedings of the 59th

    Shain, Cory , editor =. Proceedings of the 59th. 2021 , pages =. doi:10.18653/v1/2021.acl-long.288 , abstract =

  33. [34]

    Journal of Linguistics , author =

    Does headedness affect processing?. Journal of Linguistics , author =. 2009 , pages =. doi:10.1017/S0022226709990065 , abstract =

  34. [35]

    Journal of Memory and Language , author =

    Memory for prediction:. Journal of Memory and Language , author =. 2025 , keywords =. doi:10.1016/j.jml.2025.104670 , abstract =

  35. [36]

    Journal of Memory and Language , author =

    The effect of similarity-based interference on bottom-up and top-down processing in verb-final languages:. Journal of Memory and Language , author =. 2025 , keywords =. doi:10.1016/j.jml.2025.104627 , abstract =

  36. [37]

    Memory & Cognition , author =

    The missing-. Memory & Cognition , author =. 2021 , keywords =. doi:10.3758/s13421-021-01159-0 , abstract =

  37. [38]

    and Peebles, D

    Engelmann, Felix and Vasishth, Shravan and Howes, A. and Peebles, D. and Cooper, R. P. , year =. Processing grammatical and ungrammatical center embeddings in. \

  38. [39]

    Statistics in Medicine , author =

    A simple method for converting an odds ratio to effect size for use in meta-analysis , volume =. Statistics in Medicine , author =. 2000 , note =. doi:10.1002/1097-0258(20001130)19:22<3127::AID-SIM784>3.0.CO;2-M , abstract =

  39. [40]

    and Wagenmakers, Eric-Jan , year =

    Lee, Michael D. and Wagenmakers, Eric-Jan , year =. Bayesian. doi:10.1017/CBO9781139087759 , abstract =

  40. [41]

    Psychonomic Bulletin & Review , author =

    Bayesian t tests for accepting and rejecting the null hypothesis , volume =. Psychonomic Bulletin & Review , author =. 2009 , keywords =. doi:10.3758/PBR.16.2.225 , abstract =

  41. [42]

    and Bod, Rens , month = jun, year =

    Frank, Stefan L. and Bod, Rens , month = jun, year =. Insensitivity of the. Psychological Science , publisher =. doi:10.1177/0956797611409589 , abstract =

  42. [43]

    Language and Linguistics Compass , author =

    The. Language and Linguistics Compass , author =. 2015 , note =. doi:10.1111/lnc3.12151 , abstract =

  43. [44]

    Open Mind , author =

    Word. Open Mind , author =. 2024 , pages =. doi:10.1162/opmi_a_00119 , abstract =

  44. [45]

    and Fiete, Ila and Irie, Kazuki , month = jun, year =

    Gershman, Samuel J. and Fiete, Ila and Irie, Kazuki , month = jun, year =. Key-value memory in the brain , volume =. Neuron , publisher =. doi:10.1016/j.neuron.2025.02.029 , language =

  45. [46]

    Neuropsychologia , author =

    Localizing syntactic predictions using recurrent neural network grammars , volume =. Neuropsychologia , author =. 2020 , keywords =. doi:10.1016/j.neuropsychologia.2020.107479 , abstract =

  46. [47]

    and Betancourt, Michael and Vasishth, Shravan , year =

    Schad, Daniel J. and Betancourt, Michael and Vasishth, Shravan , year =. Toward a principled. Psychological Methods , publisher =. doi:10.1037/met0000275 , abstract =

  47. [48]

    Cognitive Science , author =

    Uncertainty. Cognitive Science , author =. 2006 , note =. doi:10.1207/s15516709cog0000_64 , abstract =

  48. [49]

    Journal of Psycholinguistic Research , author =

    The. Journal of Psycholinguistic Research , author =. 2003 , keywords =. doi:10.1023/A:1022492123056 , abstract =

  49. [50]

    doi:10.1162/nol_a_00121 , abstract =

    Surprisal. doi:10.1162/nol_a_00121 , abstract =

  50. [51]

    Gallistel, C. R. , year =. The importance of proving the null , volume =. Psychological Review , publisher =. doi:10.1037/a0015251 , abstract =

  51. [52]

    Vision Research , author =

    The binocular coordination of eye movements during reading in children and adults , volume =. Vision Research , author =. 2006 , keywords =. doi:10.1016/j.visres.2006.06.006 , abstract =

  52. [53]

    The 35th

    Rayner, Keith , month = aug, year =. The 35th. Quarterly Journal of Experimental Psychology , publisher =. doi:10.1080/17470210902816461 , abstract =

  53. [54]

    Predictive Power of Word Surprisal for Reading Times Is a Linear Function of Language Model Quality , booktitle=

    Goodkind, Adam and Bicknell, Klinton , editor =. Predictive power of word surprisal for reading times is a linear function of language model quality , url =. Proceedings of the 8th. 2018 , pages =. doi:10.18653/v1/W18-0102 , urldate =

  54. [55]

    , editor =

    Bever, Thomas G. , editor =. The cognitive basis for linguistic structures , isbn =. Language. 2013 , doi =

  55. [56]

    Journal of Experimental Psychology: General , author =

    Paradigms and processes in reading comprehension , volume =. Journal of Experimental Psychology: General , author =. 1982 , keywords =. doi:10.1037/0096-3445.111.2.228 , abstract =

  56. [57]

    De Varda, Andrea and Marelli, Marco , editor =. Locally. Proceedings of the. 2024 , pages =. doi:10.18653/v1/2024.cmcl-1.3 , abstract =

  57. [58]

    Humans and language models diverge when predicting repeating text , url =

    Vaidya, Aditya and Turek, Javier and Huth, Alexander , editor =. Humans and language models diverge when predicting repeating text , url =. Proceedings of the 27th. 2023 , pages =. doi:10.18653/v1/2023.conll-1.5 , abstract =

  58. [59]

    Clark, Christian and Oh, Byung-Doh and Schuler, William , editor =. Linear. Proceedings of the 31st. 2025 , pages =

  59. [60]

    and Kanwisher, Nancy and Tenenbaum, Joshua B

    Schrimpf, Martin and Blank, Idan Asher and Tuckute, Greta and Kauf, Carina and Hosseini, Eghbal A. and Kanwisher, Nancy and Tenenbaum, Joshua B. and Fedorenko, Evelina , month = nov, year =. The neural architecture of language:. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2105646118 , abstract =

  60. [61]

    Hu and Aaron Mueller and Alex Warstadt and Leshem Choshen and Chengxu Zhuang and Adina Williams and Ryan Cotterell and Tal Linzen , keywords =

    Bigger is not always better:. Journal of Memory and Language , author =. 2025 , keywords =. doi:10.1016/j.jml.2025.104650 , abstract =

  61. [62]

    Characterizing

    Armeni, Kristijan and Honey, Christopher and Linzen, Tal , editor =. Characterizing. Proceedings of the 26th. 2022 , pages =. doi:10.18653/v1/2022.conll-1.28 , abstract =

  62. [63]

    Constituency

    Kitaev, Nikita and Klein, Dan , editor =. Constituency. Proceedings of the 56th. 2018 , pages =. doi:10.18653/v1/P18-1249 , abstract =

  63. [64]

    Colorless Green Recurrent Networks Dream Hierarchically

    Gulordava, Kristina and Bojanowski, Piotr and Grave, Edouard and Linzen, Tal and Baroni, Marco , editor =. Colorless. Proceedings of the 2018. 2018 , pages =. doi:10.18653/v1/N18-1108 , abstract =

  64. [65]

    and Chater, Nick , year =

    The. Behavioral and Brain Sciences , author =. 2016 , keywords =. doi:10.1017/S0140525X1500031X , abstract =

  65. [66]

    Eye movements reveal a dissociation between prediction and structural processing in language comprehension , url =

    Timkey, William and Huang, Kuan-Jung and Oh, Byung-Doh and Prasad, Grusha and Arehalli, Suhas and Linzen, Tal and Dillon, Brian , month = nov, year =. Eye movements reveal a dissociation between prediction and structural processing in language comprehension , url =

  66. [67]

    , year =

    Dunagan, Donald G. , year =. Linguistic

  67. [68]

    Processing

    Bakay, Özge , year =. Processing

  68. [69]

    Two ways into the hall of mirrors:

    McCurdy, Kate and Christian, Katharina and Seyfried, Amelie and Sonkin, Mikhail , editor =. Two ways into the hall of mirrors:. Proceedings of the. 2025 , pages =

  69. [70]

    Autobiographical Notes,

    Reichle, Erik D. and Sheridan, Heather , editor =. E-. The. 2015 , pages =. doi:10.1093/oxfordhb/9780199324576.013.17 , abstract =

  70. [71]

    To model human linguistic prediction, make LLMs less superhuman

    Oh, Byung-Doh and Linzen, Tal , month = oct, year =. To model human linguistic prediction, make. doi:10.48550/arXiv.2510.05141 , abstract =

  71. [72]

    Investigating

    Yoshida, Ryo and Sugimoto, Yushi and Oseki, Yohei , editor =. Investigating. Proceedings of the 29th. 2025 , pages =. doi:10.18653/v1/2025.conll-1.27 , abstract =

  72. [73]

    and Poeppel, David and Vo, Vy A

    Raccah, Omri and Chen, Phoebe and Willke, Ted L. and Poeppel, David and Vo, Vy A. , month = nov, year =. Memory in humans and deep language models:. doi:10.48550/arXiv.2210.01869 , abstract =

  73. [74]

    Attention, Perception, & Psychophysics , author =

    Parafoveal processing in reading , volume =. Attention, Perception, & Psychophysics , author =. 2012 , keywords =. doi:10.3758/s13414-011-0219-2 , abstract =

  74. [75]

    Journal of Memory and Language , author =

    Avoiding the garden path:. Journal of Memory and Language , author =. 1992 , pages =. doi:10.1016/0749-596X(92)90035-V , abstract =

  75. [76]

    and Rayner, Keith and Pollatsek, Alexander , year = 2003, journal =

    The. The Behavioral and Brain Sciences , author =. 2003 , keywords =. doi:10.1017/s0140525x03000104 , abstract =

  76. [77]

    and Kliegl, Reinhold , year = 2005, journal =

    Engbert, Ralf and Nuthmann, Antje and Richter, Eike M. and Kliegl, Reinhold , year =. Psychological Review , publisher =. doi:10.1037/0033-295X.112.4.777 , abstract =

  77. [78]

    Journal of eye movement research , author =

    Eye. Journal of eye movement research , author =. 2009 , pages =

  78. [79]

    Journal of Memory and Language , author =

    Language models that match reader experience are better predictors of reading times , volume =. Journal of Memory and Language , author =. 2026 , keywords =. doi:10.1016/j.jml.2025.104677 , abstract =

  79. [80]

    Cognition , author =

    Heuristic interpretation as rational inference:. Cognition , author =. 2023 , keywords =. doi:10.1016/j.cognition.2022.105359 , abstract =

  80. [81]

    FRAZIER, LYN , year =. On

Showing first 80 references.