Fixation-related potentials reveal that confusing program code elicits a late frontal positivity
Pith reviewed 2026-05-23 07:34 UTC · model grok-4.3
The pith
Confusing program code elicits a late frontal positivity 400-700 ms after fixation, resembling responses to unexpected words in sentences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Relative to clean counterparts in program code without an atom of confusion, confusing code elicits a late frontal positivity of about 400 to 700 ms after first looking at the atom of confusion. This frontal positivity resembles an event-related potential component found during natural language processing that is elicited by unexpected but plausible words in sentence context. Thus, the brain engages similar neurocognitive mechanisms in response to unexpected and informative inputs in program code and in natural language, updating a comprehender's situation model essential for information extraction from quickly unfolding input.
What carries the argument
Fixation-related potentials recorded while programmers read code snippets containing atoms of confusion versus clean matched counterparts.
If this is right
- The brain applies comparable mechanisms to update situation models for unexpected inputs in both code and natural language.
- Situation-model updating supports information extraction from rapidly unfolding sequential input in programming.
- The results carry implications for how programmers understand and maintain code.
- The work opens routes for collaboration between software engineering and psycholinguistics.
Where Pith is reading between the lines
- Designers of programming languages or tools might reduce cognitive cost by minimizing atoms of confusion.
- The neural similarity raises the possibility that methods from language comprehension research could be tested for improving code reading.
- The finding could be extended to measure how other code properties affect the same frontal positivity component.
Load-bearing premise
The late frontal positivity arises specifically from the confusing nature of the atom rather than from any other differences between the code snippets or reading conditions.
What would settle it
An experiment that presents perfectly matched confusing and clean code snippets and records no difference in the 400-700 ms frontal positivity would falsify the central claim.
Figures
read the original abstract
As software pervades more and more areas of our professional and personal lives, there is an ever-increasing need to maintain software and for programmers to efficiently write and understand program code. In the first study of its kind, we analyze fixation-related potentials (FRPs) to explore the online processing of program code patterns that are confusing to programmers, but not to the computer (so-called atoms of confusion), and their underlying neurocognitive mechanisms in an ecologically valid setting. Relative to clean counterparts in program code without an atom of confusion, confusing code elicits a late frontal positivity of about 400 to 700 ms after first looking at the atom of confusion. This frontal positivity resembles an event-related potential (ERP) component found during natural language processing that is elicited by unexpected but plausible words in sentence context. Thus, we suggest that the brain engages similar neurocognitive mechanisms in response to unexpected and informative inputs in program code and in natural language. In both domains, these inputs update a comprehender's situation model, which is essential for information extraction from a quickly unfolding input. Our results have far-reaching implications for programming and pave the way for interdisciplinary collaborations between software engineering and psycholinguistics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the first use of fixation-related potentials (FRPs) in an ecologically valid setting to study neurocognitive responses to 'atoms of confusion' in program code. It claims that, relative to clean counterparts, confusing code elicits a late frontal positivity (approximately 400-700 ms after first fixation on the atom), resembling the frontal positivity ERP component observed in natural-language processing for unexpected but plausible words; the authors interpret this as evidence that the brain uses similar mechanisms to update situation models during code and language comprehension.
Significance. If the central result is robust, the work opens an interdisciplinary bridge between software engineering and psycholinguistics by demonstrating measurable, time-locked neural signatures of code confusion during natural reading. The ecological-validity framing and the explicit link to an established language ERP component are strengths; successful replication could inform both theories of program comprehension and practical guidelines for code readability.
major comments (2)
- [Methods] Methods (stimulus construction): the manuscript does not report quantitative matching criteria or statistical checks confirming that clean counterparts differ from confusing snippets only in the presence of the atom of confusion. Variables such as token count, line length, indentation depth, syntactic complexity, and visual salience must be shown to be balanced (or covaried) before the late frontal positivity can be attributed specifically to confusability rather than to any other systematic difference between the two sets of snippets.
- [Results] Results (FRP analysis): while the abstract states a directional effect in the 400-700 ms window, the manuscript must supply participant N, trial counts after exclusion, exact statistical tests (including any correction for multiple comparisons across electrodes/time windows), and effect-size or peak-amplitude measures with confidence intervals. Without these, the claim that the positivity is reliably elicited by the atom remains unverifiable.
minor comments (2)
- [Introduction] The abstract and introduction repeatedly use the phrase 'first study of its kind'; a brief literature note on prior EEG/ERP work on code comprehension would clarify the precise novelty.
- [Figures] Figure captions should explicitly state the electrode montage, reference, and time window used for the grand-average waveforms shown.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important issues of stimulus control and statistical reporting that strengthen the manuscript. We address each point below and have revised the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Methods] Methods (stimulus construction): the manuscript does not report quantitative matching criteria or statistical checks confirming that clean counterparts differ from confusing snippets only in the presence of the atom of confusion. Variables such as token count, line length, indentation depth, syntactic complexity, and visual salience must be shown to be balanced (or covaried) before the late frontal positivity can be attributed specifically to confusability rather than to any other systematic difference between the two sets of snippets.
Authors: We agree that explicit quantitative matching is necessary to isolate the effect of the atom of confusion. The original stimulus set was constructed by starting from published atoms of confusion and creating minimal clean variants that differ only in the targeted syntactic or semantic feature; however, the manuscript did not include formal balance checks. In the revision we will add a new table (or supplementary table) reporting means, standard deviations, and statistical comparisons (independent-samples t-tests or Wilcoxon tests as appropriate) for token count, line length, indentation depth, syntactic complexity (e.g., cyclomatic complexity or number of AST nodes), and visual salience (e.g., pixel-level contrast or saliency-map metrics) between confusing and clean snippets. Any variables showing reliable differences will be entered as covariates in the FRP models or discussed as potential confounds. revision: yes
-
Referee: [Results] Results (FRP analysis): while the abstract states a directional effect in the 400-700 ms window, the manuscript must supply participant N, trial counts after exclusion, exact statistical tests (including any correction for multiple comparisons across electrodes/time windows), and effect-size or peak-amplitude measures with confidence intervals. Without these, the claim that the positivity is reliably elicited by the atom remains unverifiable.
Authors: We accept that the current version under-reports key sample and inferential statistics. The revised manuscript will explicitly state the final participant N after artifact and behavioral exclusion criteria, the number of trials per condition retained for analysis, the precise statistical procedure (including the electrode/time-window selection method and any correction for multiple comparisons such as cluster-based permutation testing or FDR), and effect-size estimates (Cohen’s d or partial eta-squared) together with 95% confidence intervals for the amplitude difference in the 400–700 ms frontal window. These values were computed during analysis but were not fully documented in the submitted text; they will now appear in the Results section and a new supplementary table. revision: yes
Circularity Check
No circularity: empirical observation of FRPs with no derivation or fitting chain
full rationale
The paper reports results from an ERP/FRP experiment comparing brain responses to confusing code atoms versus clean counterparts. No equations, parameters, models, or derivations are present in the provided text. The central claim rests on measured electrophysiological data rather than any self-definitional, fitted-prediction, or self-citation reduction. Self-citations, if present, are not load-bearing for any claimed derivation because none exists. This matches the default case of a self-contained empirical study.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard assumptions in ERP/FRP analysis such as trial averaging and baseline correction hold for the recorded signals
- domain assumption The experimental reading task approximates real-world program comprehension
Reference graph
Works this paper leans on
-
[1]
LaToza, T., Venolia, G. & DeLine, R. ”Maintaining mental models: A study of developer work habits”. In Osterweil, L. J., Rombach, H. D. & Soffa, M. L. (eds.) Proc Int Conf Softw Eng, 492–501 (ACM, 2006)
work page 2006
-
[2]
What programmers really do: An observational study
Tiarks, R. What programmers really do: An observational study. Softwaretechnik-Trends 31, 36–37 (2011)
work page 2011
-
[3]
Minelli, R., Mocci, A. & Lanza, M. ”I know what you did last summer - an investigation of how developers spend their time”. In Lucia, A. D., Bird, C. & Oliveto, R. (eds.) Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension ICPC, 25–35 (IEEE, 2015)
work page 2015
-
[4]
Meyer, A. N. et al. Detecting developers’ task switches and types. IEEE Transactions on Software Engineering 48, 225–240 (2022)
work page 2022
-
[5]
Meyer, A. N., Barr, E. T., Bird, C. & Zimmermann, T. Today was a good day: The daily life of software developers. IEEE Transactions on Software Engineering 47, 863–880 (2021)
work page 2021
-
[6]
Meyer, A. N., Barton, L. E., Murphy, G. C., Zimmermann, T. & Fritz, T. The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering 43, 1178–1193 (2017)
work page 2017
-
[7]
Gopstein, D. et al. ”Understanding misunderstandings in source code”. In Bodden, E., Sch ¨afer, W., van Deursen, A. & Zisman, A. (eds.) Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering ESEC/FSE, 129–139 (ACM, 2017)
work page 2017
-
[8]
Gopstein, D., Zhou, H. H., Frankl, P. & Cappos, J. ”Prevalence of confusing code in software projects”. In Zaidman, A., Kamei, Y . & Hill, E. (eds.)Proceedings of the 15th International Conference on Mining Software Repositories MSR(2018)
work page 2018
-
[9]
Medeiros, F. et al. An investigation of misunderstanding code patterns in C open-source software projects. Empir Softw Eng 24, 1693–1726 (2019)
work page 2019
-
[10]
Pinheiro, O., Rocha, L. & Viana, W. ”How they relate and leave: Understanding atoms of confusion in open-source Java projects”. In Moonen, L., Newman, C. D. & Gorla, A. (eds.) 2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation SCAM, 119–130 (IEEE, 2023)
work page 2023
-
[11]
Gopstein, D., Fayard, A.-L., Apel, S. & Cappos, J. ”Thinking aloud about confusing code: A qualitative investigation of program comprehension and atoms of confusion”. In Devanbu, P., Cohen, M. B. & Zimmermann, T. (eds.) 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering ESEC/FSE, 605—-616 (Asso...
work page 2020
-
[12]
de Oliveira, B. et al. ”Atoms of confusion: The eyes do not lie”. In Cavalcante, E., Dantas, F. & Batista, T. (eds.) 34th Brazilian Symposium on Software Engineering SBES, 243–252 (ACM, 2020)
work page 2020
-
[13]
Langhout, C. & Aniche, M. ”Atoms of confusion in Java”. In O’Conner, L. (ed.)29th IEEE/ACM International Conference on Program Comprehension ICPC, 25–35 (2021)
work page 2021
-
[14]
The apple goto fail vulnerability: lessons learned
Wheeler, D. The apple goto fail vulnerability: lessons learned. https://dwheeler.com/essays/apple-goto-fail.html (2014). Accessed: 2025-01-20
work page 2014
-
[15]
Boyes, H., Norris, P., Bryant, I. & Watson, T. Trustworthy software: lessons from goto fail & heartbleed bugs. In 9th IET International Conference on System Safety and Cyber Security (2014) (IET, 2014)
work page 2014
-
[16]
Yeh, M., Gopstein, D., Yan, Y . & Zhuang, Y . ”Detecting and comparing brain activity in short program comprehension using EEG”. In 2017 IEEE Frontiers in Education Conference FIE, 1–5 (IEEE, 2017)
work page 2017
-
[17]
da Costa, J. A. S. et al. Seeing confusion through a new lens: On the impact of atoms of confusion on novices’ code comprehension. Empir Softw Eng 28, 81 (2023)
work page 2023
-
[18]
Yeh, M. K.-C., Yan, Y ., Zhuang, Y . & DeLong, L. A. Identifying program confusion using electroencephalogram measurements. Behaviour @AND@ Information Technology 41, 2528–2545 (2022)
work page 2022
-
[19]
Siegmund, J. et al. ”Understanding understanding source code with functional magnetic resonance imaging”. In Jalote, P., C., L. & van der Hoek, A. (eds.) Proc Int Conf Softw Eng, 378–389 (ACM, 2014)
work page 2014
-
[20]
Peitek, N. et al. A look into programmers’ heads. IEEE Transactions on Software Engineering 46, 442–462 (2020)
work page 2020
-
[21]
Dur˜aes, J., Madeira, H., Castelhano, J., Duarte, I. & Castelo-Branco, M. ”WAP: Understanding the brain at software debugging”. In Bradbury, J. (ed.) 27th IEEE International Symposium on Software Reliability Engineering ISSRE, 87–92 (IEEE, 2016)
work page 2016
-
[22]
Castelhano, J. et al. The role of the insula in intuitive expert bug detection in computer code: An fMRI study. Brain Imaging Behav 13, 623–637 (2019)
work page 2019
-
[23]
Floyd, B., Santander, T. & Weimer, W. ”Decoding the representation of code in the brain: An fMRI study of code review and expertise”. In Uchitel, S., Orso, A. & Robillard, M. P. (eds.) Proc Int Conf Softw Eng, 175–186 (IEEE, 2017)
work page 2017
-
[24]
An introduction to the event-related potential technique (MIT press, 2014)
Luck, S. An introduction to the event-related potential technique (MIT press, 2014)
work page 2014
-
[25]
Swaab, T. Y ., Ledoux, K., Camblin, C. C. & Boudewyn, M. A.The Oxford handbook of event-related potential components, chap. ”Language-related ERP components” (Oxford University Press, 2011)
work page 2011
-
[26]
Thornhill, D. E. & Petten, C. V . Lexical versus conceptual anticipation during sentence processing: Frontal positivity and N400 ERP components. Int J Psychophysiol 83, 382–392 (2012)
work page 2012
-
[27]
Kuperberg, G. R., Brothers, T. & Wlotko, E. W. A tale of two positivities and the N400: Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. J Cogn Neurosci 32, 12–35 (2020)
work page 2020
-
[28]
Kutas, M. & Hillyard, S. A. Reading senseless sentences: Brain potentials reflect semantic incongruity. Science 207, 203–205 (1980)
work page 1980
-
[29]
Kasparian, K. & Steinhauer, K. Confusing similar words: ERP correlates of lexical-semantic processing in first language attrition and late second language acquisition. Neuropsychologia 93, 200–217 (2016)
work page 2016
-
[30]
N´u˜nez-Pe˜na, M. & Honrubia-Serrano, M. P600 related to rule violation in an arithmetic task. Brain Res Cogn Brain Res 18, 130–141 (2004)
work page 2004
-
[31]
Osterhout, L. & Holcomb, P. J. Event-related brain potentials elicited by syntactic anomaly. J Mem Lang 31, 785–806 (1992)
work page 1992
-
[32]
Friederici, A. D., Gunter, T. C., Hahne, A. & Mauth, K. The relative timing of syntactic and semantic processes in sentence comprehension. Neuroreport 15, 165–169 (2004)
work page 2004
-
[33]
Paczynski, M. & Kuperberg, G. R. Multiple influences of semantic memory on sentence processing: Distinct effects of semantic relatedness on violations of real-world event/state knowledge and animacy selection restrictions. J Mem Lang 67, 426–448 (2012)
work page 2012
-
[34]
Brothers, T., Wlotko, E. W., Warnke, L. & Kuperberg, G. R. Going the extra mile: Effects of discourse context on two late positivities during language comprehension. Neurobiol Lang (Camb) 1, 135–160 (2020)
work page 2020
-
[35]
Kuperberg, G. R. Separate streams or probabilistic inference? What the N400 can tell us about the comprehension of events. Lang Cogn Neurosci 31, 602–616 (2016). 12/15
work page 2016
- [36]
-
[37]
Kuperberg, G. R. et al. Multimodal neuroimaging evidence for looser lexico-semantic networks in schizophrenia: Evidence from masked indirect semantic priming. Neuropsychologia 124, 337–349 (2019)
work page 2019
- [38]
-
[39]
Aljehane, S., Sharif, B. & Maletic, J. ”Determining differences in reading behavior between experts and novices by investigating eye movement on source code constructs during a bug fixing task”. In Bulling, A. et al. (eds.) Proc Eye Track Res Appl Symp, 1–6 (ACM, 2021)
work page 2021
-
[40]
Busjahn, T. et al. ”Eye movements in code reading: Relaxing the linear order”. In Lucia, A. D., Bird, C. & Oliveto, R. (eds.) Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension ICPC, 255–265 (IEEE, 2015)
work page 2015
-
[41]
Peitek, N. et al. ”Correlates of programmer efficacy and their link to experience: A combined EEG and eye-tracking study”. In Roychoudhury, A., Cadar, C. & Kim, M. (eds.) Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 120–131 (2022)
work page 2022
- [42]
-
[43]
Wen, Y ., Mirault, J. & Grainger, J. Fast syntax in the brain: Electrophysiological evidence from the rapid parallel visual presentation paradigm (RPVP). J Exp Psychol Learn Mem Cogn 47, 99 (2021)
work page 2021
-
[44]
Hutzler, F. et al. Welcome to the real world: Validating fixation-related brain potentials for ecologically valid settings. Brain Res 1172, 124–129 (2007)
work page 2007
-
[45]
Federmeier, K. D. Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology 44, 491–505 (2007)
work page 2007
-
[46]
H¨oltje, G. & Mecklinger, A. Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation. Brain Res 1788, 147942 (2022)
work page 2022
-
[47]
Hubbard, R. J. & Federmeier, K. D. The impact of linguistic prediction violations on downstream recognition memory and sentence recall. J Cogn Neurosci 36, 1–23 (2024)
work page 2024
-
[48]
Friederici, A. D. Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6, 78–84 (2002)
work page 2002
-
[49]
Kutas, M. In the company of other words: Electrophysiological evidence for single-word and sentence context effects. Lang Cogn Process 8, 533–572 (1993)
work page 1993
-
[50]
Ness, T. & Meltzer-Asscher, A. Lexical inhibition due to failed prediction: Behavioral evidence and ERP correlates. J Exp Psychol Learn Mem Cogn 44, 1269 (2018)
work page 2018
-
[51]
Chow, W.-Y ., Smith, C., Lau, E. & Phillips, C. A “bag-of-arguments” mechanism for initial verb predictions.Lang Cogn Neurosci 31, 577–596 (2016)
work page 2016
-
[52]
Freunberger, D. & Roehm, D. Semantic prediction in language comprehension: evidence from brain potentials. Lang Cogn Neurosci 31, 1193–1205 (2016)
work page 2016
-
[53]
Zirnstein, M., van Hell, J. G. & Kroll, J. F. Cognitive control ability mediates prediction costs in monolinguals and bilinguals. Cognition 176, 87–106 (2018)
work page 2018
-
[54]
Ng, S., Payne, B. R., Steen, A. A., Stine-Morrow, E. A. & Federmeier, K. D. Use of contextual information and prediction by struggling adult readers: Evidence from reading times and event-related potentials. Sci Stud Read 21, 359–375 (2017)
work page 2017
-
[55]
Hubbard, R. J., Rommers, J., Jacobs, C. L. & Federmeier, K. D. Downstream behavioral and electrophysiological consequences of word prediction on recognition memory. Front Hum Neurosci 13, 291 (2019)
work page 2019
-
[56]
Brothers, T., Morgan, E., Yacovone, A. & Kuperberg, G. Multiple predictions during language comprehension: Friends, foes, or indifferent companions? Cognition 241, 105602 (2023)
work page 2023
- [57]
-
[58]
Zhuang, Y ., Yan, Y ., DeLong, L. A. & Yeh, M. K. Do developer perceptions have borders? Comparing C code responses across continents. Softw Qual J 32, 431–457 (2024). 13/15
work page 2024
-
[59]
Siegmund, J., K¨astner, C., Liebig, J., Apel, S. & Hanenberg, S. Measuring and modeling programming experience. Empir Softw Eng 19, 1299–1334 (2014)
work page 2014
-
[60]
Oldfield, R. C. The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 9, 97–113 (1971)
work page 1971
-
[61]
Hessels, R., Niehorster, D., Kemner, C. & Hooge, I. ”Noise-robust fixation detection in eye movement data: Identification by two-means clustering (I2MC)”. Behav Res Methods 49, 1802–1823 (2017)
work page 2017
-
[62]
Carr, J., Pescuma, V ., Furlan, M., Ktori, M. & Crepaldi, D. Algorithms for the automated correction of vertical drift in eye-tracking data. Behav Res Methods 54, 287–310 (2022)
work page 2022
-
[63]
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
work page 1977
-
[64]
Report of the committee on methods of clinical examination in electroencephalography
Jasper, H. Report of the committee on methods of clinical examination in electroencephalography. Electroencephalogr Clin Neurophysiol 10, 370–375 (1958)
work page 1958
-
[65]
Candia-Rivera, D. & Valenza, G. Cluster permutation analysis for EEG series based on non-parametric Wilcoxon–Mann– Whitney statistical tests. SoftwareX 19, 101170 (2022)
work page 2022
-
[66]
Groppe, D. M., Urbach, T. P. & Kutas, M. Mass univariate analysis of event-related brain potentials/fields II: Simulation studies. Psychophysiology 48, 1726–1737 (2011). Author contribution Annabelle Bergum, Anna-Maria Maurer, Norman Peitek, Regine Bader, Axel Mecklinger, Janet Siegmund, and Sven Apel developed the concept and the general research idea. A...
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.