Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset

C\'esar Renn\'o-Costa; Eduardo Santos; Juliana Carvalho

arxiv: 2605.04857 · v1 · submitted 2026-05-06 · 💻 cs.CL · cs.AI· cs.CV

Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset

Eduardo Santos , Juliana Carvalho , C\'esar Renn\'o-Costa This is my paper

Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CV

keywords eye-trackingL2 idiom processingcognitive effortregressive eye movementsproficiency levelsdataset validationfigurative language

0 comments

The pith

L2 learners of English show more regressive eye movements on idioms at lower proficiency levels in a new eye-tracking dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper creates and validates an eye-tracking dataset from Portuguese L1 speakers of English across all CEFR levels to measure cognitive effort during idiom processing. It reports that lower proficiency links to higher rates of regressive eye movements, consistent with a literal-first strategy that carries extra cost. The work also establishes that 60 Hz sampling captures enough detail to identify fixations and regressions in reading. The dataset is offered as a benchmark for testing how well human and machine models align on figurative language understanding.

Core claim

The central claim is that a new eye-tracking dataset recorded at 60 Hz from L2 learners across proficiency levels A1-C2 reliably indexes cognitive costs in idiomatic processing through ocular metrics, as evidenced by a strong inverse correlation between proficiency and regressive movements, while confirming sufficient data density for macro-cognitive events.

What carries the argument

Ocular metrics of fixations and regressions at 60 Hz sampling, used to index the cognitive cost of literal-first strategies in L2 idiom reading.

If this is right

The dataset supplies a cognitively grounded benchmark for evaluating alignment between large language models and human figurative processing.
Entry-level 60 Hz eye-trackers can support research on macro-cognitive events in second-language reading.
Proficiency level modulates the frequency of regressions during idiom comprehension, with lower levels showing greater reliance on literal analysis.
Integration into broader modeling initiatives enables direct comparisons of idiomaticity across human and artificial systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same metrics could be applied to test whether language models internally simulate regression-like reprocessing when handling idioms.
Educators might identify specific idioms that trigger extra regressions at particular proficiency thresholds to target instruction.
Comparison with native-speaker eye data from the same idioms would quantify the added L2 processing cost in concrete terms.

Load-bearing premise

That regressive eye movements directly and reliably measure the extra cognitive cost of literal-first idiom processing in L2 learners and that 60 Hz hardware records these events without missing critical details that would weaken the proficiency correlation.

What would settle it

A replication with the same idioms and proficiency-grouped participants that finds no significant inverse correlation between CEFR level and regression frequency, or a higher-rate recording that shows systematic fixation differences missed at 60 Hz.

Figures

Figures reproduced from arXiv: 2605.04857 by C\'esar Renn\'o-Costa, Eduardo Santos, Juliana Carvalho.

**Figure 1.** Figure 1: Example of stimulus presentation during the experiment, showing sentence view at source ↗

**Figure 2.** Figure 2: Experimental setup showing participant seating position, display, and eye view at source ↗

**Figure 3.** Figure 3: Overview of the eye-tracking dataset directory structure. view at source ↗

read the original abstract

This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New eye-tracking dataset for L2 idiom processing across CEFR levels, but the 60 Hz validation rests on untested assumptions about regression detection.

read the letter

This paper releases an eye-tracking dataset on how Portuguese L1 speakers process English idioms at every CEFR level from A1 to C2. The resource uses 60 Hz Tobii hardware and reports a strong inverse correlation between proficiency and regressive eye movements as preliminary validation. That is the core offering: a new, openly shared collection tied to the MIA initiative for human-LLM comparison on figurative language. Dataset releases like this are practical and fill a narrow gap in the literature on L2 figurative processing. The decision to use entry-level equipment is also sensible if it lowers barriers for future studies. The soft spots sit in the validation. The abstract states the correlation and claims sufficient data density for macro events like fixations and regressions, yet supplies no participant numbers, no statistical details, no detection algorithm, and no comparison to higher sampling rates. If lower-proficiency readers produce shorter or more variable saccades, the 60 Hz rate could systematically under-count regressions and create an artifactual correlation rather than a clean cognitive signal. That assumption is load-bearing and currently unsupported. Readers working on cognitive models of idiom processing, L2 acquisition, or benchmarks for LLM alignment would get direct value from the data itself. Dataset papers of this type deserve referee time because the field needs more shared eye-tracking resources, even when the initial analysis is light. I would send it for peer review with specific requests for the missing methods details and a robustness check on the sampling rate.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a new eye-tracking dataset collected from Portuguese L1 speakers of English across CEFR proficiency levels A1-C2 to study cognitive effort in L2 idiomatic processing. It uses entry-level 60 Hz Tobii Pro Spark hardware and claims this rate yields sufficient data density for detecting fixations and regressions. Preliminary analysis is reported to show a strong inverse correlation between proficiency and regressive eye movements, positioning the resource as a benchmark within the MIA initiative for human and AI models of figurative language.

Significance. If the sampling-rate validation and correlation can be substantiated with full methodological details, the dataset would supply a useful, cognitively grounded resource for psycholinguistic work on literal-first strategies in L2 idiom processing and for benchmarking LLM alignment with human figurative comprehension. The emphasis on accessible hardware could also facilitate wider data collection.

major comments (2)

[Abstract] Abstract: The claim that 60 Hz sampling 'provides sufficient data density to detect macro-cognitive events such as fixations and regressions' is unsupported. No section describes the fixation/saccade detection algorithm (velocity or dispersion thresholds), reports a comparison against higher-rate ground truth, or tests for systematic under-counting of regressions in lower-proficiency readers who may produce shorter or more variable saccades. This directly undermines the interpretability of the reported proficiency-regression correlation as evidence of dataset validity rather than a possible measurement artifact.
[Preliminary analysis] Preliminary analysis / Results section: The manuscript states a 'strong inverse correlation' between CEFR proficiency and regressive eye movements but provides no participant counts per level, stimulus details, exact statistical tests, effect sizes, error bars, or controls for confounds such as overall reading speed or individual saccade variability. These omissions prevent verification that the correlation supports the central validation claim.

minor comments (1)

[Abstract] The abstract mentions integration into the MIA initiative but does not clarify how the dataset will be released or what specific evaluation protocols it supports for human vs. model comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas for strengthening the methodological transparency and reporting of our dataset. We respond to each major comment below and will incorporate revisions as outlined.

read point-by-point responses

Referee: The claim that 60 Hz sampling 'provides sufficient data density to detect macro-cognitive events such as fixations and regressions' is unsupported. No section describes the fixation/saccade detection algorithm (velocity or dispersion thresholds), reports a comparison against higher-rate ground truth, or tests for systematic under-counting of regressions in lower-proficiency readers who may produce shorter or more variable saccades. This directly undermines the interpretability of the reported proficiency-regression correlation as evidence of dataset validity rather than a possible measurement artifact.

Authors: We agree that the manuscript requires expanded methodological detail to substantiate the sampling-rate claim. In revision we will fully describe the fixation and saccade detection algorithm, including the exact velocity and dispersion thresholds applied. We will add an analysis of saccade duration and variability across CEFR levels to evaluate potential under-counting of regressions. Although paired higher-rate ground-truth recordings are not available, we will cite established reading-research literature on 60 Hz sufficiency for macro-events and discuss this limitation explicitly. These additions will clarify that the observed correlation is not an artifact. revision: partial
Referee: The manuscript states a 'strong inverse correlation' between CEFR proficiency and regressive eye movements but provides no participant counts per level, stimulus details, exact statistical tests, effect sizes, error bars, or controls for confounds such as overall reading speed or individual saccade variability. These omissions prevent verification that the correlation supports the central validation claim.

Authors: We acknowledge the reporting gaps in the preliminary analysis. The revised manuscript will supply participant counts per CEFR level, complete stimulus specifications, the precise statistical tests performed (with p-values), effect sizes accompanied by error bars or confidence intervals, and confound controls via partial-correlation or regression models that account for reading speed and saccade variability. This will permit full verification of the correlation's validity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset validation rests on independent correlation finding

full rationale

The paper presents an eye-tracking dataset and reports a preliminary empirical finding of an inverse correlation between CEFR proficiency and regressive eye movements. No equations, derivations, fitted parameters, or self-citations are invoked to derive or validate this result; the correlation is computed directly from the collected data. The claim that 60 Hz sampling suffices for macro-event detection is presented as a demonstration from the recordings themselves rather than a self-referential fit or imported uniqueness theorem. The analysis chain is self-contained against external benchmarks (standard psycholinguistic ocular metrics) with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that eye-movement regressions index cognitive effort in literal-first idiom processing; no free parameters or new entities are introduced.

axioms (1)

domain assumption Eye-tracking metrics such as number of regressions reflect cognitive effort and literal-first processing strategies in L2 idiom comprehension.
Invoked to interpret the inverse proficiency correlation as evidence of reduced cognitive cost.

pith-pipeline@v0.9.0 · 5472 in / 1284 out tokens · 52230 ms · 2026-05-08T17:21:28.244975+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Memory & Cognition , volume=

On catching on to idiomatic expressions , author=. Memory & Cognition , volume=

work page
[2]

Journal of Experimental Psychology: General , year=

Toward a compositional view of idiom processing , author=. Journal of Experimental Psychology: General , year=

work page
[3]

Second Language Research , volume=

Literal salience in on-line processing of idiomatic expressions by second language learners , author=. Second Language Research , volume=

work page
[4]

Applied Linguistics , volume=

Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? , author=. Applied Linguistics , volume=

work page
[5]

Studies in Second Language Acquisition , volume=

Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers , author=. Studies in Second Language Acquisition , volume=

work page
[6]

Proceedings of LREC , year=

MAGPIE: A large corpus of potentially idiomatic expressions , author=. Proceedings of LREC , year=

work page
[7]

Proceedings of SemEval , year=

SemEval-2025 Task 1: Advancing Multimodal Idiomaticity Representation (AdMIRe) , author=. Proceedings of SemEval , year=

work page 2025
[8]

Journal of Neuroscience Methods , volume=

PsychoPy—Psychophysics software in Python , author=. Journal of Neuroscience Methods , volume=

work page
[9]

Behavior Research Methods , volume=

PyGaze: An open-source, cross-platform toolbox for minimal-effort programming of eyetracking experiments , author=. Behavior Research Methods , volume=

work page
[10]

Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models , url =

Rambelli, Giulia and Chersoni, Emmanuele and Senaldi, Marco and Blache, Philippe and Lenci, Alessandro , urldate =. Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models , url =. Workshop on Multiword Expressions (. 2023 , file =

work page 2023
[11]

Frontiers in Computational Neuroscience , author =

How the Brain Represents Language and Answers Questions? Using an. Frontiers in Computational Neuroscience , author =. 2019 , file =

work page 2019
[12]

Investigating Idiomaticity in Word Representations

Investigating Idiomaticity in Word Representations , volume =. Computational Linguistics , author =. 2025 , file =. doi:10.1162/coli_a_00546 , abstract =

work page doi:10.1162/coli_a_00546 2025
[13]

and Wilkens, Rodrigo and Villavicencio, Aline and Hubner, Lilian C

Ribeiro, Marina and Malcorra, Bárbara and Mota, Natália B. and Wilkens, Rodrigo and Villavicencio, Aline and Hubner, Lilian C. and Rennó-Costa, César , urldate =. A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification , url =. 2024 , eprinttype =. doi:10.48550/arXiv.2410.00250 , abstrac...

work page doi:10.48550/arxiv.2410.00250 2024
[14]

and Cutler, Anne , urldate =

Swinney, David A. and Cutler, Anne , urldate =. The access and processing of idiomatic expressions , volume =. Journal of Verbal Learning and Verbal Behavior , shortjournal =. 1979 , file =. doi:10.1016/S0022-5371(79)90284-6 , abstract =

work page doi:10.1016/s0022-5371(79)90284-6 1979
[15]

Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , editor =. Multiword Expressions: A Pain in the Neck for. Computational Linguistics and Intelligent Text Processing , publisher =. 2002 , langid =. doi:10.1007/3-540-45715-1_1 , shorttitle =

work page doi:10.1007/3-540-45715-1_1 2002
[16]

Literal salience in on-line processing of idiomatic expressions by second language learners , volume =

Cieślicka, Anna , urldate =. Literal salience in on-line processing of idiomatic expressions by second language learners , volume =. Second Language Research , publisher =. doi:10.1191/0267658306sr263oa , abstract =

work page doi:10.1191/0267658306sr263oa
[17]

2020 , url =

Villavicencio, Aline , title =. 2020 , url =

work page 2020
[18]

, urldate =

Leon, Frances Laureano De and Madabushi, Harish Tayyar and Lee, Mark G. , urldate =. Evaluating Large Language Models on Multiword Expressions in Multilingual and Code-Switched Contexts , url =. doi:10.48550/arXiv.2504.20051 , abstract =. 2504.20051 [cs] , keywords =

work page doi:10.48550/arxiv.2504.20051
[19]

Bollepally, Samhita and Sloman-Moll, Aurora and Yamauchi, Takashi , urldate =. Can. doi:10.48550/arXiv.2601.09041 , shorttitle =. 2601.09041 [cs] , keywords =

work page doi:10.48550/arxiv.2601.09041

[1] [1]

Memory & Cognition , volume=

On catching on to idiomatic expressions , author=. Memory & Cognition , volume=

work page

[2] [2]

Journal of Experimental Psychology: General , year=

Toward a compositional view of idiom processing , author=. Journal of Experimental Psychology: General , year=

work page

[3] [3]

Second Language Research , volume=

Literal salience in on-line processing of idiomatic expressions by second language learners , author=. Second Language Research , volume=

work page

[4] [4]

Applied Linguistics , volume=

Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? , author=. Applied Linguistics , volume=

work page

[5] [5]

Studies in Second Language Acquisition , volume=

Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers , author=. Studies in Second Language Acquisition , volume=

work page

[6] [6]

Proceedings of LREC , year=

MAGPIE: A large corpus of potentially idiomatic expressions , author=. Proceedings of LREC , year=

work page

[7] [7]

Proceedings of SemEval , year=

SemEval-2025 Task 1: Advancing Multimodal Idiomaticity Representation (AdMIRe) , author=. Proceedings of SemEval , year=

work page 2025

[8] [8]

Journal of Neuroscience Methods , volume=

PsychoPy—Psychophysics software in Python , author=. Journal of Neuroscience Methods , volume=

work page

[9] [9]

Behavior Research Methods , volume=

PyGaze: An open-source, cross-platform toolbox for minimal-effort programming of eyetracking experiments , author=. Behavior Research Methods , volume=

work page

[10] [10]

Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models , url =

Rambelli, Giulia and Chersoni, Emmanuele and Senaldi, Marco and Blache, Philippe and Lenci, Alessandro , urldate =. Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-paced Reading and Language Models , url =. Workshop on Multiword Expressions (. 2023 , file =

work page 2023

[11] [11]

Frontiers in Computational Neuroscience , author =

How the Brain Represents Language and Answers Questions? Using an. Frontiers in Computational Neuroscience , author =. 2019 , file =

work page 2019

[12] [12]

Investigating Idiomaticity in Word Representations

Investigating Idiomaticity in Word Representations , volume =. Computational Linguistics , author =. 2025 , file =. doi:10.1162/coli_a_00546 , abstract =

work page doi:10.1162/coli_a_00546 2025

[13] [13]

and Wilkens, Rodrigo and Villavicencio, Aline and Hubner, Lilian C

Ribeiro, Marina and Malcorra, Bárbara and Mota, Natália B. and Wilkens, Rodrigo and Villavicencio, Aline and Hubner, Lilian C. and Rennó-Costa, César , urldate =. A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification , url =. 2024 , eprinttype =. doi:10.48550/arXiv.2410.00250 , abstrac...

work page doi:10.48550/arxiv.2410.00250 2024

[14] [14]

and Cutler, Anne , urldate =

Swinney, David A. and Cutler, Anne , urldate =. The access and processing of idiomatic expressions , volume =. Journal of Verbal Learning and Verbal Behavior , shortjournal =. 1979 , file =. doi:10.1016/S0022-5371(79)90284-6 , abstract =

work page doi:10.1016/s0022-5371(79)90284-6 1979

[15] [15]

Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , editor =. Multiword Expressions: A Pain in the Neck for. Computational Linguistics and Intelligent Text Processing , publisher =. 2002 , langid =. doi:10.1007/3-540-45715-1_1 , shorttitle =

work page doi:10.1007/3-540-45715-1_1 2002

[16] [16]

Literal salience in on-line processing of idiomatic expressions by second language learners , volume =

Cieślicka, Anna , urldate =. Literal salience in on-line processing of idiomatic expressions by second language learners , volume =. Second Language Research , publisher =. doi:10.1191/0267658306sr263oa , abstract =

work page doi:10.1191/0267658306sr263oa

[17] [17]

2020 , url =

Villavicencio, Aline , title =. 2020 , url =

work page 2020

[18] [18]

, urldate =

Leon, Frances Laureano De and Madabushi, Harish Tayyar and Lee, Mark G. , urldate =. Evaluating Large Language Models on Multiword Expressions in Multilingual and Code-Switched Contexts , url =. doi:10.48550/arXiv.2504.20051 , abstract =. 2504.20051 [cs] , keywords =

work page doi:10.48550/arxiv.2504.20051

[19] [19]

Bollepally, Samhita and Sloman-Moll, Aurora and Yamauchi, Takashi , urldate =. Can. doi:10.48550/arXiv.2601.09041 , shorttitle =. 2601.09041 [cs] , keywords =

work page doi:10.48550/arxiv.2601.09041