The Frequency Confound in Language-Model Surprisal and Metaphor Novelty

Omar Momen; Sina Zarrie{\ss}

arxiv: 2605.06506 · v2 · pith:UC5ZIRQInew · submitted 2026-05-07 · 💻 cs.CL

The Frequency Confound in Language-Model Surprisal and Metaphor Novelty

Omar Momen , Sina Zarrie{\ss} This is my paper

Pith reviewed 2026-05-20 22:55 UTC · model grok-4.3

classification 💻 cs.CL

keywords surprisalmetaphor noveltylexical frequencylanguage modelsPythiatraining checkpointsconfound analysis

0 comments

The pith

Word frequency predicts metaphor novelty judgments more strongly than language model surprisal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates why language model surprisal correlates with human ratings of metaphor novelty. It compares surprisal estimates from multiple Pythia model sizes and training stages against two lexical frequency measures. Frequency consistently outperforms surprisal as a predictor. The surprisal-novelty link peaks early in training and then weakens, tracking a parallel rise in how closely surprisal tracks frequency. This pattern indicates that frequency effects may underlie many reported surprisal findings on novelty and processing difficulty.

Core claim

Across eight Pythia model sizes and 154 training checkpoints, word frequency measures prove stronger predictors of metaphor novelty ratings than surprisal estimates. The surprisal-novelty association reaches its peak at an early training stage before declining, which coincides with a corresponding strengthening of the surprisal-frequency association at the same stage.

What carries the argument

Correlation comparison of surprisal versus two lexical frequency measures as predictors of metaphor novelty ratings, tracked across model scales and training checkpoints.

If this is right

Reported optimal surprisal settings for modeling metaphor novelty may reflect frequency confounds rather than contextual predictability.
Lexical frequency may serve as the primary underlying factor in associations between surprisal and processing difficulty.
Surprisal from later training stages adds little predictive value for novelty once frequency is accounted for.
Studies relying on surprisal for cognitive modeling of metaphors should include frequency controls to isolate true contextual effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could test whether simpler frequency-only models match or exceed surprisal performance on novelty prediction tasks.
Similar frequency confounds may affect surprisal correlations in other domains such as sentence acceptability or reading time studies.
Reanalyzing prior surprisal papers on linguistic novelty with frequency controls could clarify which effects are genuinely contextual.

Load-bearing premise

The metaphor novelty ratings dataset reflects human judgments without residual influence from lexical frequency, and the two frequency measures capture the full confound.

What would settle it

Statistically partialling frequency out of surprisal and finding that the remaining surprisal-novelty correlation stays significant would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.06506 by Omar Momen, Sina Zarrie{\ss}.

**Figure 1.** Figure 1: Effect of model size on associations between Metaphor Novelty Scores and Surprisal (solid); Negative Log Word Frequency in general language use (NLF-Human) (dash); and Negative Log Word Frequency in Pythia’s pretraining data (NLF-LM) (dots). Blue lines track Spearman correlation, and red lines track AUC to detect novel metaphors (score ≥ 0.5). 2M 4M 8M 17M 34M 67M 134M 268M 537M 1B 2B 4B 9B 17B 34B 69B 137… view at source ↗

**Figure 2.** Figure 2: Effect of pretraining data/steps for Pythia-70M on associations between Metaphor Novelty Scores and Surprisal (solid); Negative Log Word Frequency in general language use (NLF-Human) (dash); and Negative Log Word Frequency in Pythia’s pretraining data (NLF-LM) (dots). Blue lines track Spearman correlation, and red lines track AUC. 70M 160M 410M 1B 1.4B 2.8B 6.9B 12B 0.3 0.4 0.5 0.6 0.7 0.8 Surprisal vs. NL… view at source ↗

**Figure 3.** Figure 3: Effect of model scale on correlation between Surprisal and Frequency. NLF-Human (dash); and NLF-LM (dots). 4 Discussion Word Frequency: Our results agree with previous work (Do Dinh et al., 2018; Reimann and Scheffler, 2024) showing that lexical frequency is strongly associated with metaphor novelty scores. Additionally, we show that frequency–novelty association is substantially stronger than surprisal–… view at source ↗

read the original abstract

Language-model (LM) surprisal is widely used as a proxy for contextual predictability and has been reported to correlate with metaphor novelty judgments. However, surprisal is tightly intertwined with lexical frequency. We explore this interaction on metaphor novelty ratings using two different word frequency measures. We analyse surprisal estimates from eight Pythia model sizes and 154 training checkpoints. Across settings, word frequency is a stronger predictor of metaphor novelty than surprisal. Across training stages, the surprisal--novelty association peaks at an early stage and then falls again, mirroring a similarly timed increase in the surprisal--frequency association. These results suggest that the often-reported optimal LM surprisal settings may incorrectly associate contextual predictability with metaphor novelty and processing difficulty, whereas lexical frequency may be the major underlying factor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Frequency beats surprisal for metaphor novelty in separate correlations, but the paper needs joint controls to show it has unique explanatory power.

read the letter

The main takeaway is that word frequency predicts metaphor novelty ratings more strongly than LM surprisal does, and the surprisal-novelty link rises early in training then drops as surprisal aligns more closely with frequency measures. They track this across eight Pythia sizes and 154 checkpoints with two frequency counts, which gives a clear picture of how the relationships shift during training. The early-peak-then-decline pattern is a concrete observation that extends earlier notes on frequency-surprisal overlap into this specific domain. That part of the work is useful and worth having on record. The soft spot sits in the predictor comparison itself. The abstract and stress-test note both point to separate associations rather than a joint model or partial correlations that would isolate frequency's contribution after accounting for its overlap with surprisal. Without those controls, the claim that frequency is the stronger or underlying factor rests on shared variance and does not yet rule out the alternative. If the full paper includes multiple regression results or partials for the key tests, this concern shrinks; if it stays at univariate comparisons, the central result is less decisive than it first appears. The study uses pre-trained models and existing ratings, so there is no circularity in the method. Citations hit the standard surprisal and metaphor literature without heavy self-reference. This paper is for researchers who use LM surprisal to stand in for human predictability or novelty judgments. Anyone modeling processing difficulty or computational metaphor would get a practical reminder about frequency confounds. It deserves peer review because the empirical scope is broad enough and the question is real, even if the statistical controls need tightening before the conclusions can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The paper investigates the potential frequency confound in using language-model surprisal as a predictor of metaphor novelty ratings. It analyzes surprisal estimates from eight Pythia model sizes across 154 training checkpoints, paired with two distinct word frequency measures, and reports that frequency is a stronger predictor of novelty judgments than surprisal across settings. It further finds that the surprisal-novelty association peaks early in training and subsequently declines, in parallel with a timed increase in the surprisal-frequency association, suggesting that lexical frequency rather than contextual predictability may drive prior findings on metaphor processing.

Significance. If the central claims survive appropriate statistical controls for shared variance, the work would be significant for computational psycholinguistics and NLP. It challenges the interpretation of LM surprisal as a direct proxy for predictability in metaphor novelty and processing difficulty studies, while leveraging a large set of model checkpoints and dual frequency measures to strengthen the empirical case. This could prompt re-examination of surprisal-based explanations in related domains.

major comments (2)

[Results] Results section: the claim that word frequency is a 'stronger predictor' of metaphor novelty than surprisal rests on separate associations (Pearson correlations or univariate regressions) rather than a joint model. Because surprisal and frequency are known to correlate, this does not establish unique explanatory power; a multiple regression or partial-correlation analysis controlling for their shared variance is required to support the conclusion.
[Training stages] Training-stage analysis (likely §4 or equivalent): the reported peak-then-decline pattern in the surprisal-novelty association, and its mirroring of the surprisal-frequency rise, needs to be re-evaluated after partialling out the other variable. Without such controls, the timing alignment could be an artifact of the underlying correlation rather than an independent developmental trajectory.

minor comments (2)

[Methods] Clarify the exact statistical tests and any data exclusion criteria used for the 154 checkpoints and novelty ratings in the methods section.
[Figures] Ensure all figures plotting correlations across training stages include confidence intervals or significance markers for the key associations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We appreciate the opportunity to clarify and strengthen our analyses regarding the frequency confound in LM surprisal for metaphor novelty. Below, we address each major comment point by point.

read point-by-point responses

Referee: [Results] Results section: the claim that word frequency is a 'stronger predictor' of metaphor novelty than surprisal rests on separate associations (Pearson correlations or univariate regressions) rather than a joint model. Because surprisal and frequency are known to correlate, this does not establish unique explanatory power; a multiple regression or partial-correlation analysis controlling for their shared variance is required to support the conclusion.

Authors: We agree that separate correlations do not fully establish unique explanatory power given the known correlation between surprisal and frequency. To address this, we have conducted additional multiple regression analyses where metaphor novelty is regressed on both surprisal and frequency simultaneously, as well as partial correlations. In the revised manuscript, we report that frequency remains a significant predictor even after controlling for surprisal, whereas surprisal's unique contribution is weaker or non-significant across most model sizes and checkpoints. This supports our original claim while providing a more rigorous test of unique variance. revision: yes
Referee: [Training stages] Training-stage analysis (likely §4 or equivalent): the reported peak-then-decline pattern in the surprisal-novelty association, and its mirroring of the surprisal-frequency rise, needs to be re-evaluated after partialling out the other variable. Without such controls, the timing alignment could be an artifact of the underlying correlation rather than an independent developmental trajectory.

Authors: We acknowledge the importance of controlling for the other variable in the training-stage analyses to rule out artifacts from their correlation. We have re-analyzed the data using partial correlations: specifically, the partial correlation between surprisal and novelty controlling for frequency at each checkpoint, and vice versa where relevant. The results show that the early peak and subsequent decline in the surprisal-novelty association persists after partialling out frequency, although the magnitude is reduced. Similarly, the increase in surprisal-frequency association over training remains evident. We have updated the relevant figures and text in the revision to include these controlled analyses, which reinforce the interpretation that lexical frequency plays a key role. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical correlation study with independent inputs

full rationale

The paper performs an empirical analysis of pre-existing metaphor novelty ratings against surprisal values computed from public Pythia checkpoints and two external frequency measures. No derivation chain, fitted parameters, or predictions are defined in terms of the target associations; the reported Pearson correlations and training-stage trends are computed directly from these independent data sources. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the load-bearing claims. The central findings rest on observable data patterns rather than reducing to the paper's own equations or prior author work by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the validity of an existing metaphor novelty rating dataset and the assumption that the two frequency measures are appropriate and independent of the surprisal estimates. No new entities or free parameters are introduced; the work uses off-the-shelf Pythia checkpoints.

axioms (1)

domain assumption Human metaphor novelty ratings provide a reliable ground-truth measure of processing difficulty or novelty that can be compared directly to model-derived quantities.
Invoked implicitly when treating novelty ratings as the dependent variable to be predicted by surprisal or frequency.

pith-pipeline@v0.9.0 · 5659 in / 1274 out tokens · 38952 ms · 2026-05-20T22:55:07.800279+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

[1]

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

Tjuatja, Lindia and Neubig, Graham and Linzen, Tal and Hao, Sophie. What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1...

work page doi:10.18653/v1/2025.naacl-long.109 2025
[2]

Open Mind , year =

Word Frequency and Predictability Dissociate in Naturalistic Reading , author =. Open Mind , year =

work page
[3]

Psychological Review , year =

The Career of Metaphor , author =. Psychological Review , year =

work page
[4]

When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection

Reimann, Sebastian and Scheffler, Tatjana. When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection. Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII). 2024

work page 2024
[5]

2006 , edition =

The Study of Language , author =. 2006 , edition =

work page 2006
[6]

Scientific American , volume =

The Origin of Speech , author =. Scientific American , volume =. 1960 , month = sep, doi =

work page 1960
[7]

Procedia Computer Science , volume =

A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans , author =. Procedia Computer Science , volume =. 2025 , doi =

work page 2025
[8]

, author Bicknell, K

Goodkind, Adam and Bicknell, Klinton. Predictive power of word surprisal for reading times is a linear function of language model quality. Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics ( CMCL 2018). 2018. doi:10.18653/v1/W18-0102

work page doi:10.18653/v1/w18-0102 2018
[9]

Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal ' s Fit to Reading Times

Oh, Byung-Doh and Yue, Shisen and Schuler, William. Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal ' s Fit to Reading Times. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.eacl-long.162

work page doi:10.18653/v1/2024.eacl-long.162 2024
[10]

Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

Oh, Byung-Doh and Schuler, William. Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.128

work page doi:10.18653/v1/2023.findings-emnlp.128 2023
[11]

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

Momen, Omar and Sitter, Emilie and Herrmann, Berenike and Zarrie , Sina. Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026...

work page doi:10.18653/v1/2026.eacl-long.378 2026
[12]

Leading Whitespaces of Language Models' Subword Vocabulary Pose a Confound for Calculating Word Probabilities

Oh, Byung-Doh and Schuler, William. Leading Whitespaces of Language Models' Subword Vocabulary Pose a Confound for Calculating Word Probabilities. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.202

work page doi:10.18653/v1/2024.emnlp-main.202 2024
[13]

How to Compute the Probability of a Word

Pimentel, Tiago and Meister, Clara. How to Compute the Probability of a Word. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1020

work page doi:10.18653/v1/2024.emnlp-main.1020 2024
[14]

2020 , eprint=

The Pile: An 800GB Dataset of Diverse Text for Language Modeling , author=. 2020 , eprint=

work page 2020
[15]

2023 , eprint=

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author=. 2023 , eprint=

work page 2023
[16]

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation

Kiritchenko, Svetlana and Mohammad, Saif. Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017. doi:10.18653/v1/P17-2074

work page doi:10.18653/v1/p17-2074 2017
[17]

Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations

Do Dinh, Erik-L \^a n and Wieland, Hannah and Gurevych, Iryna. Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1171

work page doi:10.18653/v1/d18-1171 2018
[18]

Proceedings of the 43rd Annual Meeting of the Cognitive Science Society , year =

Episodic Memory Demands Modulate Novel Metaphor Use during Event Narration , author =. Proceedings of the 43rd Annual Meeting of the Cognitive Science Society , year =

work page
[19]

Hu and Aaron Mueller and Alex Warstadt and Leshem Choshen and Chengxu Zhuang and Adina Williams and Ryan Cotterell and Tal Linzen , keywords =

Ethan Gotlieb Wilcox and Michael Y. Hu and Aaron Mueller and Alex Warstadt and Leshem Choshen and Chengxu Zhuang and Adina Williams and Ryan Cotterell and Tal Linzen , keywords =. Bigger is not always better: The importance of human-scale language modeling for psycholinguistics , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.jml.2025.104650 , url =

work page doi:10.1016/j.jml.2025.104650 2025
[20]

Proceedings of the Royal Society of London , volume =

Pearson, Karl , title =. Proceedings of the Royal Society of London , volume =. 1895 , doi =

work page
[21]

The American Journal of Psychology , volume =

Spearman, Charles , title =. The American Journal of Psychology , volume =. 1904 , doi =

work page 1904
[22]

, title =

Cureton, Edward E. , title =. Psychometrika , volume =. 1956 , doi =

work page 1956
[23]

and McNeil, Barbara J

Hanley, James A. and McNeil, Barbara J. , title =. Radiology , volume =. 1982 , doi =

work page 1982
[24]

Pattern Recognition Letters , volume =

Fawcett, Tom , title =. Pattern Recognition Letters , volume =. 2006 , doi =

work page 2006
[25]

, title =

Bradley, Andrew P. , title =. Pattern Recognition , volume =. 1997 , doi =

work page 1997
[26]

, title =

Glass, Gene V. , title =. Educational and Psychological Measurement , volume =. 1966 , doi =

work page 1966
[27]

Causal Estimation of Tokenisation Bias

Lesci, Pietro and Meister, Clara and Hofmann, Thomas and Vlachos, Andreas and Pimentel, Tiago. Causal Estimation of Tokenisation Bias. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1374

work page doi:10.18653/v1/2025.acl-long.1374 2025
[28]

Gibbs, Jr, Raymond W. , year=. Embodiment and Cognitive Science , publisher=

work page
[29]

Brain Research , year =

Arzouan, Yossi and Goldstein, Abraham and Faust, Miriam , title =. Brain Research , year =

work page
[30]

, year 2007

Kövecses, Zoltán , title =. 2002 , month =. doi:10.1093/oso/9780195145113.001.0001 , url =

work page doi:10.1093/oso/9780195145113.001.0001 2002
[31]

Journal of Experimental Psychology: Learning, Memory, and Cognition , year =

Effects of Familiarity and Aptness on Metaphor Processing , author =. Journal of Experimental Psychology: Learning, Memory, and Cognition , year =

work page
[32]

NeuroImage , year =

From Novel to Familiar: Tuning the Brain for Metaphors , author =. NeuroImage , year =

work page
[33]

2024 , eprint=

Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors , author=. 2024 , eprint=

work page 2024
[34]

Cognitive Science , year =

Jey Han Lau and Alexander Clark and Shalom Lappin , title =. Cognitive Science , year =

work page
[35]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month =

Revisiting the Uniform Information Density Hypothesis , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month =. 2021 , address =. doi:10.18653/v1/2021.emnlp-main.74 , pages =

work page doi:10.18653/v1/2021.emnlp-main.74 2021
[36]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =

A Systematic Assessment of Syntactic Generalization in Neural Language Models , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =. 2020 , address =. doi:10.18653/v1/2020.acl-main.158 , pages =

work page doi:10.18653/v1/2020.acl-main.158 2020
[37]

Cognition , volume=

Expectation-based syntactic comprehension , author=. Cognition , volume=. 2008 , publisher=

work page 2008
[38]

Proceedings of the National Academy of Sciences , year =

Cory Shain and Clara Meister and Tiago Pimentel and Ryan Cotterell and Roger Levy , title =. Proceedings of the National Academy of Sciences , volume =. 2024 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2307876121 , abstract =

work page doi:10.1073/pnas.2307876121 2024
[39]

CoRR , volume =

Ethan Gotlieb Wilcox and Jon Gauthier and Jennifer Hu and Peng Qian and Roger Levy , title =. CoRR , volume =. 2020 , url =. 2006.01912 , timestamp =

work page arXiv 2020
[40]

Journal of Psycholinguistic Research , volume =

Making the Unseen Seen: The Role of Signaling and Novelty in Rating Metaphors , author =. Journal of Psycholinguistic Research , volume =. 2024 , doi =

work page 2024
[41]

and de Almeida, R

Roncero, C. and de Almeida, R. G. , title =. Language and Cognition , volume =. 2014 , doi =

work page 2014
[42]

Cardillo, E. R. and Watson, C. E. and Schmidt, G. L. and Kranjec, A. and Chatterjee, A. , title =. Frontiers in Psychology , volume =. 2012 , doi =

work page 2012
[43]

Cardillo, E. R. and Schmidt, G. L. and Kranjec, A. and Chatterjee, A. , title =. Behavior Research Methods , volume =. 2010 , doi =

work page 2010
[44]

Introducing the LCC Metaphor Datasets

Mohler, Michael and Brunson, Mary and Rink, Bryan and Tomlinson, Marc. Introducing the LCC Metaphor Datasets. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016

work page 2016
[45]

Gudrun and Burgers, Christian and Krennmayr, Tina and Steen, Gerard J

Reijnierse, W. Gudrun and Burgers, Christian and Krennmayr, Tina and Steen, Gerard J. , title =. Corpora , volume =. 2019 , doi =. https://doi.org/10.3366/cor.2019.0176 , abstract =

work page doi:10.3366/cor.2019.0176 2019
[46]

On the Role of Context in Reading Time Prediction

Opedal, Andreas and Chodroff, Eleanor and Cotterell, Ryan and Wilcox, Ethan. On the Role of Context in Reading Time Prediction. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.179

work page doi:10.18653/v1/2024.emnlp-main.179 2024
[47]

10.4324/9781315672953

Gill Philip , title =. The Routledge Handbook of Metaphor and Language , editor =. 2016 , doi = "10.4324/9781315672953", note =

work page doi:10.4324/9781315672953 2016
[48]

Metaphorical Polysemy Detection: Conventional Metaphor Meets Word Sense Disambiguation

Maudslay, Rowan Hall and Teufel, Simone. Metaphorical Polysemy Detection: Conventional Metaphor Meets Word Sense Disambiguation. Proceedings of the 29th International Conference on Computational Linguistics. 2022

work page 2022
[49]

Scaling in Cognitive Modelling: a Multilingual Approach to Human Reading Times

de Varda, Andrea and Marelli, Marco. Scaling in Cognitive Modelling: a Multilingual Approach to Human Reading Times. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023. doi:10.18653/v1/2023.acl-short.14

work page doi:10.18653/v1/2023.acl-short.14 2023
[50]

1980 , note =

George Lakoff and Mark Johnson , title =. 1980 , note =

work page 1980
[51]

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

Oh, Byung-Doh and Schuler, William. Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00548

work page doi:10.1162/tacl_a_00548 2023
[52]

The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage

Oh, Byung-Doh and Zhu, Hongao and Schuler, William. The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.91

work page doi:10.18653/v1/2025.findings-acl.91 2025
[53]

Cognition , volume =

Levy, Roger , title =. Cognition , volume =. 2008 , doi =

work page 2008
[54]

A Probabilistic

Hale, John , title =. Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies , pages =. 2001 , publisher =. doi:10.3115/1073336.1073357 , abstract =

work page doi:10.3115/1073336.1073357 2001
[55]

arXiv preprint arXiv:2303.13988 , year=

Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

work page arXiv
[56]

Proceedings of the National Academy of Sciences , volume=

Using cognitive psychology to understand GPT-3 , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=

work page 2023
[57]

2020 , eprint=

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior , author=. 2020 , eprint=

work page 2020
[58]

Proceedings of the National Academy of Sciences , volume =

Jennifer Hu and Kyle Mahowald and Gary Lupyan and Anna Ivanova and Roger Levy , title =. Proceedings of the National Academy of Sciences , volume =. 2024 , doi =

work page 2024
[59]

Psychological Bulletin , volume =

Eye movements in reading and information processing: 20 years of research , author =. Psychological Bulletin , volume =

work page
[60]

and Levy, Roger , year =

Nathaniel J. Smith and Roger Levy , keywords =. The effect of word predictability on reading time is logarithmic , journal =. 2013 , issn =. doi:https://doi.org/10.1016/j.cognition.2013.02.013 , url =

work page doi:10.1016/j.cognition.2013.02.013 2013
[61]

Frontiers in Psychology , volume =

A study on surprisal and semantic relatedness for eye-tracking data prediction , author =. Frontiers in Psychology , volume =. 2023 , pages =. doi:10.3389/fpsyg.2023.1112365 , url =

work page doi:10.3389/fpsyg.2023.1112365 2023
[62]

2025 , eprint=

Eye Tracking Based Cognitive Evaluation of Automatic Readability Assessment Measures , author=. 2025 , eprint=

work page 2025
[63]

Shannon , title =

Claude E. Shannon , title =. Bell System Technical Journal , volume =. 1948 , note =

work page 1948
[64]

1980 , publisher =

George Lakoff and Mark Johnson , title =. 1980 , publisher =

work page 1980
[65]

2007 , volume =

MIP: A Method for Identifying Metaphorically Used Words in Discourse , journal =. 2007 , volume =

work page 2007
[66]

Comprehending conventional and novel metaphors: An ERP study , journal =

Vicky Tzuyin Lai and Tim Curran and Lise Menn , keywords =. Comprehending conventional and novel metaphors: An ERP study , journal =. 2009 , issn =. doi:https://doi.org/10.1016/j.brainres.2009.05.088 , url =

work page doi:10.1016/j.brainres.2009.05.088 2009
[67]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Warner, Benjamin and Chaffin, Antoine and Clavi. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.127

work page doi:10.18653/v1/2025.acl-long.127 2025
[68]

2021 , eprint=

DeBERTa: Decoding-enhanced BERT with Disentangled Attention , author=. 2021 , eprint=

work page 2021
[69]

2024 , journal =

Qwen2.5 Technical Report , author =. 2024 , journal =

work page 2024
[70]

2024 , journal =

The Llama 3 Herd of Models , author =. 2024 , journal =

work page 2024
[71]

2019 , journal =

Language Models are Unsupervised Multitask Learners , author =. 2019 , journal =

work page 2019
[72]

Steen and A.G

G.J. Steen and A.G. Dorst and J.B. Herrmann and A.A. Kaal and T. Krennmayr and T. Pasma. A method for linguistic metaphor identification. From MIP to MIPVU. 2010

work page 2010
[73]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[74]

Publications Manual , year = "1983", publisher =

work page 1983
[75]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[76]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[77]

Dan Gusfield , title =. 1997

work page 1997
[78]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[79]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[80]

Extracting

Dong, Chuanming and Gambette, Philippe and Dominguès, Catherine , month = oct, year =. Extracting. doi:10.5220/0010656700003064 , abstract =

work page doi:10.5220/0010656700003064

[1] [1]

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

Tjuatja, Lindia and Neubig, Graham and Linzen, Tal and Hao, Sophie. What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1...

work page doi:10.18653/v1/2025.naacl-long.109 2025

[2] [2]

Open Mind , year =

Word Frequency and Predictability Dissociate in Naturalistic Reading , author =. Open Mind , year =

work page

[3] [3]

Psychological Review , year =

The Career of Metaphor , author =. Psychological Review , year =

work page

[4] [4]

When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection

Reimann, Sebastian and Scheffler, Tatjana. When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection. Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII). 2024

work page 2024

[5] [5]

2006 , edition =

The Study of Language , author =. 2006 , edition =

work page 2006

[6] [6]

Scientific American , volume =

The Origin of Speech , author =. Scientific American , volume =. 1960 , month = sep, doi =

work page 1960

[7] [7]

Procedia Computer Science , volume =

A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans , author =. Procedia Computer Science , volume =. 2025 , doi =

work page 2025

[8] [8]

, author Bicknell, K

Goodkind, Adam and Bicknell, Klinton. Predictive power of word surprisal for reading times is a linear function of language model quality. Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics ( CMCL 2018). 2018. doi:10.18653/v1/W18-0102

work page doi:10.18653/v1/w18-0102 2018

[9] [9]

Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal ' s Fit to Reading Times

Oh, Byung-Doh and Yue, Shisen and Schuler, William. Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal ' s Fit to Reading Times. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.eacl-long.162

work page doi:10.18653/v1/2024.eacl-long.162 2024

[10] [10]

Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

Oh, Byung-Doh and Schuler, William. Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.128

work page doi:10.18653/v1/2023.findings-emnlp.128 2023

[11] [11]

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

Momen, Omar and Sitter, Emilie and Herrmann, Berenike and Zarrie , Sina. Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026...

work page doi:10.18653/v1/2026.eacl-long.378 2026

[12] [12]

Leading Whitespaces of Language Models' Subword Vocabulary Pose a Confound for Calculating Word Probabilities

Oh, Byung-Doh and Schuler, William. Leading Whitespaces of Language Models' Subword Vocabulary Pose a Confound for Calculating Word Probabilities. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.202

work page doi:10.18653/v1/2024.emnlp-main.202 2024

[13] [13]

How to Compute the Probability of a Word

Pimentel, Tiago and Meister, Clara. How to Compute the Probability of a Word. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1020

work page doi:10.18653/v1/2024.emnlp-main.1020 2024

[14] [14]

2020 , eprint=

The Pile: An 800GB Dataset of Diverse Text for Language Modeling , author=. 2020 , eprint=

work page 2020

[15] [15]

2023 , eprint=

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author=. 2023 , eprint=

work page 2023

[16] [16]

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation

Kiritchenko, Svetlana and Mohammad, Saif. Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017. doi:10.18653/v1/P17-2074

work page doi:10.18653/v1/p17-2074 2017

[17] [17]

Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations

Do Dinh, Erik-L \^a n and Wieland, Hannah and Gurevych, Iryna. Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1171

work page doi:10.18653/v1/d18-1171 2018

[18] [18]

Proceedings of the 43rd Annual Meeting of the Cognitive Science Society , year =

Episodic Memory Demands Modulate Novel Metaphor Use during Event Narration , author =. Proceedings of the 43rd Annual Meeting of the Cognitive Science Society , year =

work page

[19] [19]

Hu and Aaron Mueller and Alex Warstadt and Leshem Choshen and Chengxu Zhuang and Adina Williams and Ryan Cotterell and Tal Linzen , keywords =

Ethan Gotlieb Wilcox and Michael Y. Hu and Aaron Mueller and Alex Warstadt and Leshem Choshen and Chengxu Zhuang and Adina Williams and Ryan Cotterell and Tal Linzen , keywords =. Bigger is not always better: The importance of human-scale language modeling for psycholinguistics , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.jml.2025.104650 , url =

work page doi:10.1016/j.jml.2025.104650 2025

[20] [20]

Proceedings of the Royal Society of London , volume =

Pearson, Karl , title =. Proceedings of the Royal Society of London , volume =. 1895 , doi =

work page

[21] [21]

The American Journal of Psychology , volume =

Spearman, Charles , title =. The American Journal of Psychology , volume =. 1904 , doi =

work page 1904

[22] [22]

, title =

Cureton, Edward E. , title =. Psychometrika , volume =. 1956 , doi =

work page 1956

[23] [23]

and McNeil, Barbara J

Hanley, James A. and McNeil, Barbara J. , title =. Radiology , volume =. 1982 , doi =

work page 1982

[24] [24]

Pattern Recognition Letters , volume =

Fawcett, Tom , title =. Pattern Recognition Letters , volume =. 2006 , doi =

work page 2006

[25] [25]

, title =

Bradley, Andrew P. , title =. Pattern Recognition , volume =. 1997 , doi =

work page 1997

[26] [26]

, title =

Glass, Gene V. , title =. Educational and Psychological Measurement , volume =. 1966 , doi =

work page 1966

[27] [27]

Causal Estimation of Tokenisation Bias

Lesci, Pietro and Meister, Clara and Hofmann, Thomas and Vlachos, Andreas and Pimentel, Tiago. Causal Estimation of Tokenisation Bias. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1374

work page doi:10.18653/v1/2025.acl-long.1374 2025

[28] [28]

Gibbs, Jr, Raymond W. , year=. Embodiment and Cognitive Science , publisher=

work page

[29] [29]

Brain Research , year =

Arzouan, Yossi and Goldstein, Abraham and Faust, Miriam , title =. Brain Research , year =

work page

[30] [30]

, year 2007

Kövecses, Zoltán , title =. 2002 , month =. doi:10.1093/oso/9780195145113.001.0001 , url =

work page doi:10.1093/oso/9780195145113.001.0001 2002

[31] [31]

Journal of Experimental Psychology: Learning, Memory, and Cognition , year =

Effects of Familiarity and Aptness on Metaphor Processing , author =. Journal of Experimental Psychology: Learning, Memory, and Cognition , year =

work page

[32] [32]

NeuroImage , year =

From Novel to Familiar: Tuning the Brain for Metaphors , author =. NeuroImage , year =

work page

[33] [33]

2024 , eprint=

Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors , author=. 2024 , eprint=

work page 2024

[34] [34]

Cognitive Science , year =

Jey Han Lau and Alexander Clark and Shalom Lappin , title =. Cognitive Science , year =

work page

[35] [35]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month =

Revisiting the Uniform Information Density Hypothesis , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month =. 2021 , address =. doi:10.18653/v1/2021.emnlp-main.74 , pages =

work page doi:10.18653/v1/2021.emnlp-main.74 2021

[36] [36]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =

A Systematic Assessment of Syntactic Generalization in Neural Language Models , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =. 2020 , address =. doi:10.18653/v1/2020.acl-main.158 , pages =

work page doi:10.18653/v1/2020.acl-main.158 2020

[37] [37]

Cognition , volume=

Expectation-based syntactic comprehension , author=. Cognition , volume=. 2008 , publisher=

work page 2008

[38] [38]

Proceedings of the National Academy of Sciences , year =

Cory Shain and Clara Meister and Tiago Pimentel and Ryan Cotterell and Roger Levy , title =. Proceedings of the National Academy of Sciences , volume =. 2024 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2307876121 , abstract =

work page doi:10.1073/pnas.2307876121 2024

[39] [39]

CoRR , volume =

Ethan Gotlieb Wilcox and Jon Gauthier and Jennifer Hu and Peng Qian and Roger Levy , title =. CoRR , volume =. 2020 , url =. 2006.01912 , timestamp =

work page arXiv 2020

[40] [40]

Journal of Psycholinguistic Research , volume =

Making the Unseen Seen: The Role of Signaling and Novelty in Rating Metaphors , author =. Journal of Psycholinguistic Research , volume =. 2024 , doi =

work page 2024

[41] [41]

and de Almeida, R

Roncero, C. and de Almeida, R. G. , title =. Language and Cognition , volume =. 2014 , doi =

work page 2014

[42] [42]

Cardillo, E. R. and Watson, C. E. and Schmidt, G. L. and Kranjec, A. and Chatterjee, A. , title =. Frontiers in Psychology , volume =. 2012 , doi =

work page 2012

[43] [43]

Cardillo, E. R. and Schmidt, G. L. and Kranjec, A. and Chatterjee, A. , title =. Behavior Research Methods , volume =. 2010 , doi =

work page 2010

[44] [44]

Introducing the LCC Metaphor Datasets

Mohler, Michael and Brunson, Mary and Rink, Bryan and Tomlinson, Marc. Introducing the LCC Metaphor Datasets. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016

work page 2016

[45] [45]

Gudrun and Burgers, Christian and Krennmayr, Tina and Steen, Gerard J

Reijnierse, W. Gudrun and Burgers, Christian and Krennmayr, Tina and Steen, Gerard J. , title =. Corpora , volume =. 2019 , doi =. https://doi.org/10.3366/cor.2019.0176 , abstract =

work page doi:10.3366/cor.2019.0176 2019

[46] [46]

On the Role of Context in Reading Time Prediction

Opedal, Andreas and Chodroff, Eleanor and Cotterell, Ryan and Wilcox, Ethan. On the Role of Context in Reading Time Prediction. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.179

work page doi:10.18653/v1/2024.emnlp-main.179 2024

[47] [47]

10.4324/9781315672953

Gill Philip , title =. The Routledge Handbook of Metaphor and Language , editor =. 2016 , doi = "10.4324/9781315672953", note =

work page doi:10.4324/9781315672953 2016

[48] [48]

Metaphorical Polysemy Detection: Conventional Metaphor Meets Word Sense Disambiguation

Maudslay, Rowan Hall and Teufel, Simone. Metaphorical Polysemy Detection: Conventional Metaphor Meets Word Sense Disambiguation. Proceedings of the 29th International Conference on Computational Linguistics. 2022

work page 2022

[49] [49]

Scaling in Cognitive Modelling: a Multilingual Approach to Human Reading Times

de Varda, Andrea and Marelli, Marco. Scaling in Cognitive Modelling: a Multilingual Approach to Human Reading Times. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023. doi:10.18653/v1/2023.acl-short.14

work page doi:10.18653/v1/2023.acl-short.14 2023

[50] [50]

1980 , note =

George Lakoff and Mark Johnson , title =. 1980 , note =

work page 1980

[51] [51]

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

Oh, Byung-Doh and Schuler, William. Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00548

work page doi:10.1162/tacl_a_00548 2023

[52] [52]

The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage

Oh, Byung-Doh and Zhu, Hongao and Schuler, William. The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.91

work page doi:10.18653/v1/2025.findings-acl.91 2025

[53] [53]

Cognition , volume =

Levy, Roger , title =. Cognition , volume =. 2008 , doi =

work page 2008

[54] [54]

A Probabilistic

Hale, John , title =. Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies , pages =. 2001 , publisher =. doi:10.3115/1073336.1073357 , abstract =

work page doi:10.3115/1073336.1073357 2001

[55] [55]

arXiv preprint arXiv:2303.13988 , year=

Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

work page arXiv

[56] [56]

Proceedings of the National Academy of Sciences , volume=

Using cognitive psychology to understand GPT-3 , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=

work page 2023

[57] [57]

2020 , eprint=

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior , author=. 2020 , eprint=

work page 2020

[58] [58]

Proceedings of the National Academy of Sciences , volume =

Jennifer Hu and Kyle Mahowald and Gary Lupyan and Anna Ivanova and Roger Levy , title =. Proceedings of the National Academy of Sciences , volume =. 2024 , doi =

work page 2024

[59] [59]

Psychological Bulletin , volume =

Eye movements in reading and information processing: 20 years of research , author =. Psychological Bulletin , volume =

work page

[60] [60]

and Levy, Roger , year =

Nathaniel J. Smith and Roger Levy , keywords =. The effect of word predictability on reading time is logarithmic , journal =. 2013 , issn =. doi:https://doi.org/10.1016/j.cognition.2013.02.013 , url =

work page doi:10.1016/j.cognition.2013.02.013 2013

[61] [61]

Frontiers in Psychology , volume =

A study on surprisal and semantic relatedness for eye-tracking data prediction , author =. Frontiers in Psychology , volume =. 2023 , pages =. doi:10.3389/fpsyg.2023.1112365 , url =

work page doi:10.3389/fpsyg.2023.1112365 2023

[62] [62]

2025 , eprint=

Eye Tracking Based Cognitive Evaluation of Automatic Readability Assessment Measures , author=. 2025 , eprint=

work page 2025

[63] [63]

Shannon , title =

Claude E. Shannon , title =. Bell System Technical Journal , volume =. 1948 , note =

work page 1948

[64] [64]

1980 , publisher =

George Lakoff and Mark Johnson , title =. 1980 , publisher =

work page 1980

[65] [65]

2007 , volume =

MIP: A Method for Identifying Metaphorically Used Words in Discourse , journal =. 2007 , volume =

work page 2007

[66] [66]

Comprehending conventional and novel metaphors: An ERP study , journal =

Vicky Tzuyin Lai and Tim Curran and Lise Menn , keywords =. Comprehending conventional and novel metaphors: An ERP study , journal =. 2009 , issn =. doi:https://doi.org/10.1016/j.brainres.2009.05.088 , url =

work page doi:10.1016/j.brainres.2009.05.088 2009

[67] [67]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Warner, Benjamin and Chaffin, Antoine and Clavi. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.127

work page doi:10.18653/v1/2025.acl-long.127 2025

[68] [68]

2021 , eprint=

DeBERTa: Decoding-enhanced BERT with Disentangled Attention , author=. 2021 , eprint=

work page 2021

[69] [69]

2024 , journal =

Qwen2.5 Technical Report , author =. 2024 , journal =

work page 2024

[70] [70]

2024 , journal =

The Llama 3 Herd of Models , author =. 2024 , journal =

work page 2024

[71] [71]

2019 , journal =

Language Models are Unsupervised Multitask Learners , author =. 2019 , journal =

work page 2019

[72] [72]

Steen and A.G

G.J. Steen and A.G. Dorst and J.B. Herrmann and A.A. Kaal and T. Krennmayr and T. Pasma. A method for linguistic metaphor identification. From MIP to MIPVU. 2010

work page 2010

[73] [73]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972

[74] [74]

Publications Manual , year = "1983", publisher =

work page 1983

[75] [75]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[76] [76]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page

[77] [77]

Dan Gusfield , title =. 1997

work page 1997

[78] [78]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015

[79] [79]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page

[80] [80]

Extracting

Dong, Chuanming and Gambette, Philippe and Dominguès, Catherine , month = oct, year =. Extracting. doi:10.5220/0010656700003064 , abstract =

work page doi:10.5220/0010656700003064