pith. sign in

arxiv: 2605.23008 · v1 · pith:WGZB573Gnew · submitted 2026-05-21 · 💻 cs.SE

On the Reliability of Code Comprehension Proxies

Pith reviewed 2026-05-25 05:34 UTC · model grok-4.3

classification 💻 cs.SE
keywords code comprehensionproxiesreliabilityDelphi methodsoftware engineeringinput-output questionsresponse timesyntax questions
0
0 comments X

The pith

Proxies from input-output questions that measure response time align best with expert rankings of code comprehensibility, while syntax-based proxies do not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a ground-truth ranking of how comprehensible eight code snippets are by having five professional engineers reach consensus through a Delphi protocol. It then measures fourteen different proxies on the same snippets using responses from forty-four students and checks which proxies match the expert ranking. Proxies built from questions about what a program does, especially when they record how long participants take to answer, track the expert view most closely. Proxies built from questions about program syntax match poorly no matter whether they record accuracy or time. This matters because many existing studies of code readability and maintainability rely on one or another of these proxies, so the choice directly affects how much trust to place in their conclusions.

Core claim

By first creating an expert consensus ranking of eight code snippets via the Delphi protocol and then correlating that ranking with fourteen literature-derived proxies collected from forty-four students, the study concludes that input-output proxies measured by response time are especially reliable while syntax proxies are especially unreliable regardless of measurement strategy.

What carries the argument

Correlation of student-derived comprehension proxies against an expert Delphi consensus ranking of the same code snippets.

If this is right

  • Empirical studies of code comprehension should favor input-output questions measured by response time over other common proxies.
  • Existing findings that rest on syntax-question proxies should be treated as less reliable.
  • The choice of proxy affects whether a study can claim to approximate how comprehensible code is to practicing engineers.
  • Future replication studies can test the same set of proxies on new code snippets to confirm the pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool builders who want to predict which code will be hard to read could incorporate quick input-output questions with timing.
  • Education research on teaching programming might shift emphasis away from syntax-only quiz formats when measuring student understanding.
  • The Delphi approach itself could be applied to other software-engineering judgment tasks where ground truth is hard to obtain.

Load-bearing premise

The ground-truth comprehensibility ranking produced by the five-expert Delphi consensus accurately reflects how comprehensible the code snippets are to software engineers in general.

What would settle it

A new panel of professional software engineers, following the same Delphi protocol on the same eight snippets, produces a ranking that differs substantially from the original five-expert ranking.

Figures

Figures reproduced from arXiv: 2605.23008 by Erfan Arvan, Martin Kellogg, Nadeeshan De Silva, Oscar Chaparro.

Figure 1
Figure 1. Figure 1: Simplified taxonomy of question categories derived from [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Two example code snippets used in the study: [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Correlation between proxies and expert-determined rank [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of correlations between aggregated student [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of correlations between per-student proxy [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of correlations between aggregated student [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlation between aggregated student proxies and expert [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distribution of correlations between per-student proxy [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
read the original abstract

Prior work on code comprehension uses different comprehension proxies-for example, Likert-scale ratings or answers to input-output questions about program snippets, usually collected from students, to approximate whether code is comprehensible to software engineers, but the relative reliability of these proxies is not known. This paper investigates the relative reliability of a collection of proxies common in the extant literature with a pair of human studies. First, we conducted an expert-consensus study with a panel of five professional software engineers to establish a ground-truth comprehensibility ranking of eight code snippets by adapting the Delphi expert-consensus protocol. The Delphi protocol is widely used for expert consensus under conditions of uncertainty in other domains, such as medicine and national-security forecasting, but to our knowledge, this is its first application in software engineering. Second, we conducted a study with 44 student participants who completed tasks, allowing us to measure 14 comprehension proxies derived from the literature on the same set of eight code snippets. Finally, we conducted a correlation analysis on the results, concluding that proxies 1) derived from input-output questions and 2) that measure response time rather than accuracy are especially reliable. We also found that proxies derived from questions about program syntax (rather than semantics) are especially unreliable, regardless of measurement strategy, which draws into question the reliability of parts of the existing comprehensibility literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript reports two human studies on code comprehension proxies: an expert Delphi consensus with five professional software engineers to rank eight code snippets by comprehensibility, a study with 44 students measuring 14 proxies from the literature, and a correlation analysis concluding that input-output question proxies and response-time measures are more reliable while syntax-based proxies are less reliable.

Significance. If the Delphi-derived ground truth is representative of software engineers in general, the findings offer practical guidance for selecting reliable proxies in future code comprehension research, addressing a gap in the literature regarding proxy reliability. The application of the Delphi protocol is noted as novel in SE.

major comments (3)
  1. [Expert-consensus study] Expert-consensus study: The ground-truth ranking rests on a Delphi consensus from only five experts with no reported validation against larger or more diverse panels of practicing engineers; because all 14 proxy reliability conclusions are derived solely from rank correlations against this single ordering, any instability or bias in the ground truth directly undermines the central claims about which proxies are 'especially reliable.'
  2. [Correlation analysis] Correlation analysis: No statistical details (e.g., Spearman or Kendall coefficients, p-values, confidence intervals), sample-size justification, or handling of confounds (e.g., order effects, fatigue) are supplied for the 44-student study or the subsequent correlations, so it is not possible to verify whether the data support the stated conclusions on proxy reliability.
  3. [Expert-consensus study] Delphi protocol: The description states that the Delphi protocol was adapted but supplies no specifics on number of rounds, feedback mechanisms, anonymity procedures, or convergence criteria, leaving unclear whether the five-expert consensus meets the standards used in other domains where Delphi is established.
minor comments (1)
  1. The manuscript would benefit from a table or appendix reporting the raw proxy scores, the expert ranking, and the full set of correlation results to allow independent evaluation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below, indicating revisions where appropriate to strengthen the paper while maintaining the integrity of our findings.

read point-by-point responses
  1. Referee: [Expert-consensus study] The ground-truth ranking rests on a Delphi consensus from only five experts with no reported validation against larger or more diverse panels of practicing engineers; because all 14 proxy reliability conclusions are derived solely from rank correlations against this single ordering, any instability or bias in the ground truth directly undermines the central claims about which proxies are 'especially reliable.'

    Authors: We acknowledge the small panel size as a limitation of the study. The Delphi method is designed for small expert groups to achieve consensus under uncertainty, and five professional engineers is consistent with applications in other fields. However, we agree that lack of external validation is a concern for generalizability. In revision, we will add an expanded limitations subsection discussing potential instability in the ground truth, report any available inter-expert agreement metrics from the process, and explicitly recommend future validation with larger panels. This does not change our core claims but contextualizes them appropriately. revision: partial

  2. Referee: [Correlation analysis] No statistical details (e.g., Spearman or Kendall coefficients, p-values, confidence intervals), sample-size justification, or handling of confounds (e.g., order effects, fatigue) are supplied for the 44-student study or the subsequent correlations, so it is not possible to verify whether the data support the stated conclusions on proxy reliability.

    Authors: We will revise the methods and results sections to include all requested details. This includes reporting Spearman rank correlation coefficients with p-values and confidence intervals for the proxy correlations, a sample-size justification based on prior code comprehension studies and power considerations for detecting moderate correlations, and explicit description of confound mitigation (snippet order was randomized across participants to address order effects; sessions were limited in duration to reduce fatigue, though fatigue was not directly measured). These additions will allow verification of the conclusions. revision: yes

  3. Referee: [Expert-consensus study] Delphi protocol: The description states that the Delphi protocol was adapted but supplies no specifics on number of rounds, feedback mechanisms, anonymity procedures, or convergence criteria, leaving unclear whether the five-expert consensus meets the standards used in other domains where Delphi is established.

    Authors: We will expand the methods section with full protocol details: two rounds were conducted; after round 1, anonymized aggregate rankings and rationales were shared as feedback; experts remained anonymous to each other throughout; convergence was reached when no participant changed their ranking in round 2. These specifics align with standard Delphi practices in other domains and will be added to clarify the adaptation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study relies on independent data collection

full rationale

The paper's derivation consists of two separate human-subject studies (Delphi expert consensus for ground-truth ranking of eight snippets; student tasks yielding 14 proxies) followed by rank correlation analysis. No equations, fitted parameters, or self-citations are present that reduce any claimed result to its own inputs by construction. The reliability conclusions are statistical outcomes of fresh data against an externally elicited consensus; the central claims therefore remain independent of the measurement process itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical human-subjects study with no mathematical model, free parameters, or postulated entities.

pith-pipeline@v0.9.0 · 5768 in / 1109 out tokens · 28356 ms · 2026-05-25T05:34:29.683928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages

  1. [1]

    Apache Kafka

    2025. Apache Kafka. https://github.com/apache/kafka

  2. [2]

    2025. libGDX. https://github.com/libgdx/libgdx

  3. [3]

    Amine Abbad-Andaloussi, Thierry Sorg, and Barbara Weber. 2022. Estimating developers’ cognitive load at a fine-grained level using eye-tracking measures. InICPC. 111–121

  4. [4]

    Youssef Abdelsalam, Norman Peitek, Annabelle Bergum, and Sven Apel. 2026. The effect of comments on program comprehension: an eye-tracking study. Empir. Softw. Eng.31, 4 (2026), 94

  5. [5]

    Tarek Alakmeh, David Reich, Lena Jäger, and Thomas Fritz. 2024. Predicting code comprehension: a novel approach to align human gaze with code using deep neural networks.Proc. ACM Softw. Eng.1, FSE (2024), 1982–2004

  6. [6]

    Alardawi and Agil M

    Ahmed S. Alardawi and Agil M. Agil. 2015. Novice comprehension of object- oriented OO programs: an empirical study. InWCITCA. 1–4

  7. [7]

    Aljehane, Bonita Sharif, and Jonathan I

    Salwa D. Aljehane, Bonita Sharif, and Jonathan I. Maletic. 2023. Studying developer eye movements to measure cognitive workload and visual effort for expertise assessment.Proc. ACM Hum.-Comput. Interact.7, ETRA (2023), 1–18

  8. [8]

    On the Reliability of Code Com- prehension Proxies

    Anonymous. 2026. Replication package for “On the Reliability of Code Com- prehension Proxies”. https://doi.org/10.5281/zenodo.19348389. Zenodo. DOI: 10.5281/zenodo.19348389

  9. [9]

    Dimitar Asenov, Otmar Hilliges, and Peter Müller. 2016. The effect of richer visualizations on code comprehension. InCHI. 5040–5045

  10. [10]

    Ronald Baecker. 1988. Enhancing program readability and comprehensibility with tools for program visualization. InICSE. 356–357

  11. [11]

    Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2015. Are test smells really harmful? an empirical study.Empir. Softw. Eng.20 (2015), 1052–1094

  12. [12]

    Roman Bednarik, Carsten Schulte, Lea Budde, Birte Heinemann, and Hana Vrza- kova. 2018. Eye-movement modeling examples in source code comprehension: a classroom study. InKoli Calling. 1–8

  13. [13]

    Annabelle Bergum, Norman Peitek, Maurice Rekrut, Janet Siegmund, and Sven Apel. 2026. On the influence of the baseline in neuroimaging experiments on program comprehension.ACM Trans. Softw. Eng. Methodol. (TOSEM)35, 3 (2026), 1–27

  14. [14]

    Maletic, Christopher Morrell, and Bonita Sharif

    Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, and Bonita Sharif. 2013. The impact of identifier style on effort and comprehension.Empir. Softw. Eng.18, 2 (2013), 219–276

  15. [15]

    Scott Blinman and Andy Cockburn. 2005. Program comprehension: investigating the effects of naming style and documentation. InAUIC. 73–78

  16. [16]

    Jürgen Börstler and Barbara Paech. 2016. The role of method chains and com- ments in software readability and comprehension: an experiment.IEEE Trans. Softw. Eng. (TSE)42, 9 (2016), 886–898

  17. [17]

    Jean-Marie Burkhardt, Françoise Détienne, and Susan Wiedenbeck. 2002. Object- oriented program comprehension: effect of expertise, task, and phase.Empir. Softw. Eng.7, 2 (2002), 115–156

  18. [18]

    Raymond P. L. Buse and Westley R. Weimer. 2009. Learning a metric for code readability.IEEE Trans. Softw. Eng. (TSE)36, 4 (2009), 546–558

  19. [19]

    Pa- terson, Carsten Schulte, Bonita Sharif, and Sascha Tamm

    Teresa Busjahn, Roman Bednarik, Andrew Begel, Martha Crosby, James H. Pa- terson, Carsten Schulte, Bonita Sharif, and Sascha Tamm. 2015. Eye movements in code reading: relaxing the linear order. InICPC. 255–265

  20. [20]

    Celia Chen, Reem Alfayez, Kamonphop Srisopha, Lin Shi, and Barry Boehm

  21. [21]

    Evaluating human-assessed software maintainability metrics. InNASAC. 120–132

  22. [22]

    code” back in “code comprehension study

    Kyle D. Chin and Reid Holmes. 2026. Put the “code” back in “code comprehension study”. (2026)

  23. [23]

    2013.Statistical power analysis for the behavioral sciences

    Jacob Cohen. 2013.Statistical power analysis for the behavioral sciences. Rout- ledge

  24. [24]

    Ricardo Couceiro, Raul Barbosa, João Duráes, Gonçalo Duarte, João Castelhano, Catarina Duarte, Cesar Teixeira, Nuno Laranjeiro, Júlio Medeiros, and Paulo Car- valho. 2019. Spotting problematic code lines using nonintrusive programmers’ biofeedback. InISSRE. 93–103

  25. [25]

    Igor Crk, Timothy Kluthe, and Andreas Stefik. 2015. Understanding program- ming expertise: an empirical study of phasic brain wave changes.ACM Trans. Comput.-Hum. Interact. (TOCHI)23, 1 (2015), 1–29

  26. [26]

    Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling projects in GitHub for MSR studies. InMSR. 560–564

  27. [27]

    Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer

  28. [28]

    InESEC/FSE

    Modeling readability to improve unit tests. InESEC/FSE. 107–118

  29. [29]

    Norman Dalkey and Olaf Helmer. 1963. An experimental application of the Delphi method to the use of experts.Manage. Sci.9, 3 (1963), 458–467

  30. [30]

    Norman C. Dalkey. 1969.The Delphi method: an experimental study of group opinion. RAND Corp., Santa Monica, CA. https://doi.org/10.7249/RM5888

  31. [31]

    Nadeeshan De Silva, Martin Kellogg, and Oscar Chaparro. 2025. Relative code comprehensibility prediction.arXiv(2025). arXiv:2510.03474

  32. [32]

    WPM De Silva et al . 2025. Circular economic strategies for maximising the end-of-life value of modular buildings: a Delphi study.Smart Sustain. Built Environ.(2025)

  33. [33]

    refactor to understand

    Bart Du Bois, Serge Demeyer, and Jan Verelst. 2005. Does the “refactor to understand” reverse engineering pattern improve program comprehension?. In CSMR. 334–343

  34. [34]

    Aruna Duraisingam, Ramaswamy Palaniappan, and Samraj Andrews. 2017. Cognitive task difficulty analysis using EEG and data mining. InICEDSS. 52–57

  35. [35]

    Yasmine Elfares, Gül Çalikli, and Mohamed Khamis. 2025. GazeCopilot: evalu- ating novel gaze-informed prompting for AI-supported code comprehension and readability.arXiv(2025). arXiv:2511.08177

  36. [36]

    Sarah Fakhoury, Yuzhan Ma, Venera Arnaoudova, and Olusola Adesope. 2018. The effect of poor source code lexicon and readability on developers’ cognitive load. InICPC. 286–296

  37. [37]

    Sarah Fakhoury, Devjeet Roy, Yuzhan Ma, Venera Arnaoudova, and Olusola Adesope. 2020. Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization.Empir. Softw. Eng. (ESE) 25 (2020), 2140–2178

  38. [38]

    Janet Feigenspan, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanen- berg. 2012. Measuring programming experience. InProc. IEEE/ACM Int. Conf. Program Comprehension (ICPC). 73–82

  39. [39]

    Flint, Robert Dyer, and Bonita Sharif

    Samuel W. Flint, Robert Dyer, and Bonita Sharif. 2026. Do developers read type in- formation? An eye-tracking study on TypeScript.arXiv(2026). arXiv:2602.04824

  40. [40]

    Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance.J. Am. Stat. Assoc.32, 200 (1937), 675–701

  41. [41]

    Hao Gao, Haytham Hijazi, Júlio Medeiros, João Durães, Chan Tong Lam, Paulo de Carvalho, and Henrique Madeira. 2025. NRevisit: a cognitive behavioral metric for code understandability assessment. InProc. Int. Conf. Evaluation Assessment Softw. Eng. (EASE). 908–918

  42. [42]

    Ileana Gefaell Larrondo et al . 2026. Strengthening Primary Health Care in Europe: A Delphi study towards accessibility, equity and continuity of care.Eur. J. Gen. Pract.32, 1 (2026), 2619226

  43. [43]

    Gilmore and Thomas R

    David J. Gilmore and Thomas R. G. Green. 1984. Comprehension and recall of miniature programs.Int. J. Man-Mach. Stud.21, 1 (1984), 31–48

  44. [44]

    Google. 2024. Google Java Formatter. https://github.com/google/google-java- format. Accessed: 2024-11-20

  45. [45]

    Goldstone

    Michael Hansen, Andrew Lumsdaine, and Richard L. Goldstone. 2013. An experiment on the cognitive complexity of code. InProc. Annu. Conf. Cogn. Sci. Soc. (CogSci)

  46. [46]

    Cross, and Saeed Maghsoodloo

    Dean Hendrix, James H. Cross, and Saeed Maghsoodloo. 2002. The effectiveness of control structure diagrams in source code comprehension activities.IEEE Trans. Softw. Eng. (TSE)28, 5 (2002), 463–477

  47. [47]

    Hofmeister, Janet Siegmund, and Daniel V

    Johannes C. Hofmeister, Janet Siegmund, and Daniel V. Holt. 2019. Shorter identifier names take longer to comprehend.Empir. Softw. Eng.24 (2019), 417– 443

  48. [48]

    Errol R. Iselin. 1988. Conditional statements, looping constructs, and program comprehension: an experimental study.Int. J. Man-Mach. Stud.28, 1 (1988), 45–66. 11 Conference 2026, 1 - 4 January, 2026, City, Country Erfan Arvan, Nadeeshan de Silva, Oscar Chaparro, and Martin Kellogg

  49. [49]

    Oleksandra Ishchenko et al . 2025. Barriers and opportunities for Demand Response Aggregation in Ukraine and Norway: A Delphi-based study.Energy 328 (2025), 136296

  50. [50]

    Toyomi Ishida and Hidetake Uwano. 2019. Synchronized analysis of eye move- ment and EEG during program comprehension. InEMIP. 26–32

  51. [51]

    Feitelson

    Ahmad Jbara and Dror G. Feitelson. 2017. How programmers read regular code: a controlled experiment using eye tracking.Empir. Softw. Eng.22 (2017), 1440–1477

  52. [52]

    John Johnson, Sergio Lubo, Nishitha Yedla, Jairo Aponte, and Bonita Sharif

  53. [53]

    An empirical study assessing source code readability in comprehension. InICSME. 513–523

  54. [54]

    Zachary Karas, Aakash Bansal, Yifan Zhang, Toby Li, Collin McMillan, and Yu Huang. 2024. A tale of two comprehensions? analyzing student programmer attention during code summarization.ACM Trans. Softw. Eng. Methodol. (TOSEM) 33, 7 (2024), 1–37

  55. [55]

    Nadia Kasto and Jacqueline Whalley. 2013. Measuring the difficulty of code comprehension tasks using software metrics. InProc. Australas. Comput. Educ. Conf. (ACE). 59–65

  56. [56]

    Maurice G. Kendall. 1945. The treatment of ties in ranking problems.Biometrika 33, 3 (1945), 239–251

  57. [57]

    Kendall, Sheila F

    Maurice G. Kendall, Sheila F. H. Kendall, and B. Babington Smith. 1939. The distribution of Spearman’s coefficient of rank correlation in a universe in which all rankings occur an equal number of times.Biometrika(1939), 251–273

  58. [58]

    2023.RAND methodological guidance for conducting and critically appraising Delphi panels

    Dmitry Khodyakov, Sean Grant, Jack Kroger, and Melissa Bauman. 2023.RAND methodological guidance for conducting and critically appraising Delphi panels. RAND Corp., Santa Monica, CA. https://doi.org/10.7249/TLA3082-1

  59. [59]

    George Kinnear, Ian Jones, and Ben Davies. 2025. Comparative judgement as a research tool: A meta-analysis of application and reliability.Behavior Research Methods57 (2025), 222. https://doi.org/10.3758/s13428-025-02744-w

  60. [60]

    Walter Kintsch. 1988. The role of knowledge in discourse comprehension: a construction-integration model.Psychol. Rev.95, 2 (1988), 163–182

  61. [61]

    Van Dijk

    Walter Kintsch and Teun A. Van Dijk. 1978. Toward a model of text comprehen- sion and production.Psychol. Rev.85, 5 (1978), 363–394

  62. [62]

    Luigi Lavazza, Sandro Morasca, and Marco Gatto. 2023. An empirical study on software understandability and its dependence on code characteristics.Empir. Softw. Eng.28, 6 (2023), 155

  63. [63]

    Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2007. Ef- fective identifier names for comprehension and memory.Innov. Syst. Softw. Eng. 3, 4 (2007), 303–318

  64. [64]

    SeolHwa Lee, Andrew Matteson, Danial Hooshyar, SongHyun Kim, JaeBum Jung, GiChun Nam, and HeuiSeok Lim. 2016. Comparing programming language comprehension between novice and expert programmers using EEG analysis. InBIBE. 350–355

  65. [65]

    Danielle R Lombardi et al. 2025. The increased role of advanced technology and automation in audit: A delphi study.Int. J. Account. Inf. Syst.56 (2025), 100733

  66. [66]

    Brady D Lund. 2020. Review of the Delphi method in library and information science research.J. Doc.76, 4 (2020), 929–960

  67. [67]

    Sarah B Maness, Stacey B Griner, and Erika L Thompson. 2025. Expert Consensus on Indicators of Social Determinants of Health: A Modified Delphi Study.J. Prim. Care Community Health16 (2025)

  68. [68]

    Jean Melo, Fabricio Batista Narcizo, Dan Witzner Hansen, Claus Brabrand, and Andrzej Wasowski. 2017. Variability through the eyes of the programmer. In Proc. IEEE/ACM Int. Conf. Program Comprehension (ICPC). 34–44

  69. [69]

    Miara, Joyce A

    Richard J. Miara, Joyce A. Musselman, Juan A. Navarro, and Ben Shneiderman

  70. [70]

    ACM26, 11 (1983), 861–867

    Program indentation and comprehensibility.Commun. ACM26, 11 (1983), 861–867

  71. [71]

    Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I know what you did last summer: an investigation of how developers spend their time. InICPC. 25–35

  72. [72]

    Russell Mosemann and Susan Wiedenbeck. 2001. Navigation and comprehension of programs by novice programmers. InIWPC. 79–88

  73. [73]

    Sebastian Nielebock, Dariusz Krolikowski, Jacob Krüger, Thomas Leich, and Frank Ortmeier. 2019. Commenting source code: is it worth it for small pro- gramming tasks?Empir. Softw. Eng.24, 3 (2019), 1418–1457

  74. [74]

    Orlov and Roman Bednarik

    Pavel A. Orlov and Roman Bednarik. 2017. The role of extrafoveal vision in source code comprehension.Perception46, 5 (2017), 541–565

  75. [75]

    Peterson, Nishitha Yedla, Isaac Baysinger, Jairo Aponte, and Bonita Sharif

    Kang-il Park, Jack Johnson, Cole S. Peterson, Nishitha Yedla, Isaac Baysinger, Jairo Aponte, and Bonita Sharif. 2024. An eye tracking study assessing source code readability rules for program comprehension.Empir. Softw. Eng.29, 6 (2024), 160

  76. [76]

    Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, and Janet Siegmund

  77. [77]

    Program comprehension and code complexity metrics: an fMRI study. In ICSE. 524–536

  78. [78]

    Norman Peitek, Janet Siegmund, and Sven Apel. 2020. What drives the reading order of programmers? an eye tracking study. InICPC. 342–353

  79. [79]

    Norman Peitek, Janet Siegmund, Sven Apel, Christian Kästner, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. 2018. A look into programmers’ heads.IEEE Trans. Softw. Eng. (TSE)46, 4 (2018), 442–462

  80. [80]

    Norman Peitek, Janet Siegmund, Chris Parnin, Sven Apel, and André Brechmann

Showing first 80 references.