pith. sign in

arxiv: 2604.22653 · v1 · submitted 2026-04-24 · 💻 cs.SE

Verifier Warnings Do Not Improve Comprehensibility Prediction

Pith reviewed 2026-05-08 11:24 UTC · model grok-4.3

classification 💻 cs.SE
keywords code comprehensibilityverifier warningsmachine learning predictionsoftware verificationempirical studysyntactic featuresprediction performance
0
0 comments X

The pith

Adding verifier warning counts does not improve machine learning models for predicting code comprehensibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether the total number of warnings from a formal verifier can serve as a useful input feature to boost the accuracy of machine learning models that predict how understandable humans find code. Researchers took existing models that rely on syntactic code properties and developer information, added the verifier warning sum as an extra feature, and ran a controlled comparison. The results showed no meaningful gain in predictive performance from including the warnings. This outcome indicates that the correlation between warning counts and human comprehensibility judgments does not translate into better discrimination when models already have access to syntactic and developer data.

Core claim

We performed a control-treatment experiment incorporating the verifier warning sum feature into machine learning models from the literature, and conducted a comparative analysis of their performance against models trained only on syntactic and developer features. We found no significant difference in the prediction performance of models with and without the warnings feature. Our findings suggest that while a correlation exists, the verifier warning sum offers limited discriminative power: combining syntactic and developer features is just as effective for predicting human-judged code comprehensibility.

What carries the argument

The control-treatment experiment that adds the sum of verifier warnings as an input feature to existing ML models and measures any change in prediction accuracy compared to models using only syntactic and developer features.

If this is right

  • Machine learning models for code comprehensibility can rely on syntactic and developer features alone without loss of predictive power.
  • The total count of verifier warnings does not supply enough unique information to justify its inclusion in comprehensibility predictors.
  • Empirical studies can treat verifier warning sums as optional rather than required when building or evaluating such models.
  • Correlation between warnings and comprehensibility does not guarantee that the warnings will improve downstream predictive tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Verifier tools might need to produce different kinds of warnings or summaries if their outputs are to help with human-focused code quality predictions.
  • The limited value of warning sums could stem from overlap with existing syntactic metrics, suggesting future work to test alternative aggregations of verifier output.
  • This result may affect how software teams decide whether to run formal verifiers primarily for comprehensibility assessment versus other goals.
  • Replication on code from different domains or languages could reveal whether the finding holds beyond the datasets used here.

Load-bearing premise

The machine learning models and datasets drawn from prior literature are appropriate and sufficient to detect any real contribution from the verifier warning feature if one exists.

What would settle it

Re-running the same models on a new dataset where the version that includes verifier warning sums achieves statistically significantly higher accuracy than the version without them.

Figures

Figures reproduced from arXiv: 2604.22653 by Martin Kellogg, Nadeeshan De Silva, Oscar Chaparro.

Figure 1
Figure 1. Figure 1: Overview of our methodology for evaluating the impact of verifier warnings on code comprehensibility prediction. view at source ↗
read the original abstract

Proponents of software verification suggest that code simplicity is linked to the effort to verify code, hypothesizing that formal verifiers produce fewer false positive warnings and require less manual intervention when analyzing simpler code. A recent meta-analysis study found empirical support for this hypothesis: a small correlation between the sum of verifier warnings and human-derived code comprehensibility metrics. Based on this finding, we conjectured that using the sum of verifier tool (verifier) warnings to represent program semantic information as an input feature to machine learning (ML) models for code comprehensibility prediction can enhance their performance, when combined with traditional syntactic and developer features. To test this conjecture, we performed a control-treatment experiment incorporating the verifier warning sum feature into machine learning models from the literature, and conducted a comparative analysis of their performance against models trained only on syntactic and developer features. We found no significant difference in the prediction performance of models with and without the warnings feature. Our findings suggest that while a correlation exists, the verifier warning sum offers limited discriminative power: combining syntactic and developer features is just as effective for predicting human-judged code comprehensibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports a control-treatment experiment testing whether adding the sum of verifier warnings as an input feature improves the performance of machine learning models (drawn from prior literature) for predicting human-judged code comprehensibility. The authors find no significant difference in predictive performance between models using only syntactic and developer features versus those that also include the verifier-warning-sum feature, and conclude that the warning sum offers limited discriminative power beyond the baseline features.

Significance. A robust null result would indicate that the small correlation between verifier warnings and comprehensibility metrics identified in prior meta-analyses does not yield practically useful gains in ML prediction tasks. This would help bound the utility of verification artifacts for comprehensibility modeling and reinforce that syntactic plus developer features are sufficient, potentially guiding future work away from incorporating verifier outputs in this domain.

major comments (2)
  1. [Methods] Methods section: The manuscript provides no information on dataset size, number of code samples, how the data were split for training/testing, or the specific ML models and hyperparameters employed. These omissions prevent evaluation of whether the experiment was adequately powered to detect a small performance lift consistent with the meta-analytic correlation cited in the introduction.
  2. [Results] Results section: The claim of 'no significant difference' is reported without effect sizes, confidence intervals on the performance delta, or a power analysis. Given that the motivating meta-analysis reports only a small correlation, the absence of these statistics leaves open the possibility of a type-II error and undermines the stronger conclusion that the verifier warning sum 'offers limited discriminative power.'
minor comments (1)
  1. [Abstract] The abstract and introduction refer to 'models from the literature' without naming the specific models or citing the exact prior papers; adding these references would improve reproducibility and context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight important omissions in the reporting of our experimental design and results. We agree that additional details are needed to allow proper evaluation of statistical power and the strength of the null finding. We will revise the manuscript to address both points.

read point-by-point responses
  1. Referee: [Methods] Methods section: The manuscript provides no information on dataset size, number of code samples, how the data were split for training/testing, or the specific ML models and hyperparameters employed. These omissions prevent evaluation of whether the experiment was adequately powered to detect a small performance lift consistent with the meta-analytic correlation cited in the introduction.

    Authors: We agree that these methodological details are essential for reproducibility and for assessing whether the study was adequately powered. In the revised manuscript we will expand the Methods section to report the total number of code samples, the exact train/test split procedure and ratios, the specific machine-learning models used, and all hyperparameter values. We will also add an a-priori or post-hoc power analysis that references the small correlation size reported in the motivating meta-analysis. revision: yes

  2. Referee: [Results] Results section: The claim of 'no significant difference' is reported without effect sizes, confidence intervals on the performance delta, or a power analysis. Given that the motivating meta-analysis reports only a small correlation, the absence of these statistics leaves open the possibility of a type-II error and undermines the stronger conclusion that the verifier warning sum 'offers limited discriminative power.'

    Authors: We concur that effect sizes, confidence intervals around the performance difference, and a power analysis should be reported to support the interpretation of the null result. The revised Results section will include these quantities (e.g., Cohen’s d or AUC differences with 95 % CIs) together with the power calculation. This will allow readers to judge both statistical and practical significance and will temper the language of the conclusion if the power analysis indicates the study may have been under-powered for a small effect. revision: yes

Circularity Check

0 steps flagged

Empirical comparison of ML models with/without verifier warnings feature

full rationale

The paper performs a control-treatment experiment: it takes existing ML models and datasets from the literature, adds the verifier-warning-sum feature as an additional input, and reports that performance metrics show no statistically significant difference versus the syntactic+developer baseline. This is a direct empirical measurement, not a derivation, equation, or fitted parameter that reduces to its own inputs by construction. The meta-analysis citation supplies the motivating hypothesis but is not invoked as a uniqueness theorem or ansatz that forces the result; the experiment tests and rejects the performance implication. No self-citation chain, self-definitional loop, or renaming of a known result is present in the load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is a straightforward empirical test that relies on standard machine learning evaluation practices and statistical comparison methods already established in the literature.

axioms (1)
  • standard math Standard assumptions of statistical significance testing for model performance comparison
    Used to conclude there is no significant difference between the two model sets.

pith-pipeline@v0.9.0 · 5491 in / 1074 out tokens · 43295 ms · 2026-05-08T11:24:55.595232+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages

  1. [1]

    K Fold Cross Validation

    2024. K Fold Cross Validation. https://scikit-learn.org/stable/modules/generated/ sklearn.model_selection.KFold.html/

  2. [2]

    Java Language Specification

    2026. Java Language Specification. https://docs.oracle.com/javase/specs/jls/se8/ html/index.html

  3. [3]

    SpotBugs

    2026. SpotBugs. https://spotbugs.github.io/

  4. [4]

    Amine Abbad-Andaloussi, Thierry Sorg, and Barbara Weber. 2022. Estimating Developers’ Cognitive Load at a Fine-grained Level Using Eye-tracking Measures. InIntl. Conf. on Prog. Compr. (ICPC). 111–121

  5. [5]

    Herve Abdi, Lynne J Williams, et al . 2010. Normalizing data.Encyclopedia of research design1 (2010), 935–938

  6. [6]

    Feitelson

    Shulamyt Ajami, Yonatan Woodbridge, and Dror G. Feitelson. 2019. Syntax, predicates, idioms — what really affects code complexity?Emp. Soft. Eng.24, 1 (2019), 287–328

  7. [7]

    Vard Antinyan. 2020. Evaluating Essential and Accidental Code Complexity Triggers by Practitioners’ Perception.IEEE Soft.37, 6 (2020), 86–93

  8. [8]

    Vard Antinyan, Miroslaw Staron, and Anna Sandberg. 2017. Evaluating code com- plexity triggers, use of complexity measures and the influence of code complexity on maintenance time.Emp. Soft. Eng.22, 6 (2017), 3057–3087

  9. [9]

    Maletic, Christopher Morrell, and Bonita Sharif

    Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, and Bonita Sharif. 2013. The impact of identifier style on effort and comprehension.Emp. Soft. Eng.18, 2 (2013), 219–276

  10. [10]

    Jürgen Börstler, Kwabena E Bennin, Sara Hooshangi, Johan Jeuring, Hieke Keun- ing, Carsten Kleiner, Bonnie MacKellar, Rodrigo Duran, Harald Störrle, Daniel Toll, et al. 2023. Developers talking about code quality.Empirical Software Engineering28, 6 (2023), 128

  11. [11]

    Leo Breiman. 2001. Random forests.Machine learning45, 1 (2001), 5–32

  12. [12]

    1987.No silver bullet

    Frederick Brooks and H Kugler. 1987.No silver bullet. April

  13. [13]

    Raymond Buse and Westley Weimer. 2009. Learning a metric for code readability. Trans. on Soft. Eng. (TSE)36, 4 (2009), 546–558

  14. [14]

    Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Pur- brick, and Dulma Rodriguez. 2015. Moving fast with software verification. In NASA Formal Methods Symp.Springer, 3–11

  15. [15]

    Cristiano Calcagno, Dino Distefano, Peter O’Hearn, and Hongseok Yang. 2009. Compositional shape analysis by means of bi-abduction. InPrinciples of Program- ming Languages (POPL). 289–300

  16. [16]

    Gavin C Cawley and Nicola LC Talbot. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation.The Journal of Machine Learning Research11 (2010), 2079–2107

  17. [17]

    S le Cessie and JC Van Houwelingen. 1992. Ridge estimators in logistic regression. Journal of the Royal Statistical Society Series C: Applied Statistics41, 1 (1992), 191– 201

  18. [18]

    Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer

  19. [19]

    SMOTE: synthetic minority over-sampling technique.Journal of artificial intelligence research16 (2002), 321–357

  20. [20]

    1999.Elements of information theory

    Thomas M Cover. 1999.Elements of information theory. John Wiley & Sons

  21. [21]

    Carlos Dantas, Adriano Rocha, and Marcelo Maia. 2023. Assessing the readability of chatgpt code snippet recommendations: A comparative study. InProceedings of the XXXVII Brazilian Symposium on Software Engineering. 283–292

  22. [22]

    Nadeeshan De Silva, Martin Kellogg, and Oscar Chaparro. 2025. Relative Code Comprehensibility Prediction.arXiv preprint arXiv:2510.03474(2025)

  23. [23]

    Nadeeshan De Silva, Martin Kellogg, and Oscar Chaparro. 2026. Online replication package. https://github.com/sea-lab-wm/warning-comprehensibility

  24. [24]

    Pablo Del Moral, Sławomir Nowaczyk, and Sepideh Pashami. 2022. Why is multiclass classification hard?IEEE Access10 (2022), 80448–80462

  25. [25]

    Jonathan Dorn. 2012. A general software readability model.MCS Thesis available from (http://www. cs. virginia. edu/weimer/students/dorn-mcs-paper. pdf)5 (2012), 11–14

  26. [26]

    Stephen G Eick, Todd L Graves, Alan F Karr, J Steve Marron, and Audris Mockus

  27. [27]

    IEEE transactions on software engineering27, 1 (2002), 1–12

    Does code decay? assessing the evidence from change management data. IEEE transactions on software engineering27, 1 (2002), 1–12

  28. [28]

    Janet Feigenspan, Sven Apel, Jorg Liebig, and Christian Kastner. 2011. Exploring Software Measures to Assess Program Comprehension. InIntl. Symp. on Emp. Soft. Eng. and Meas. (ESEM). 127–136

  29. [29]

    Dror G Feitelson. 2021. Considerations and pitfalls in controlled experiments on code comprehension. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 106–117

  30. [30]

    Kobi Feldman, Martin Kellogg, and Oscar Chaparro. 2023. On the Relationship between Code Verifiability and Understandability. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 211–223

  31. [31]

    1979.How to write plain English: A book for lawyers and consumers

    Rudolf Flesch. 1979.How to write plain English: A book for lawyers and consumers. Vol. 76026225. Harper & Row New York

  32. [32]

    George Forman and Martin Scholz. 2010. Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement.Acm Sigkdd Explorations Newsletter12, 1 (2010), 49–57

  33. [33]

    Müller, Serap Yigit-Elliott, and Manuela Züger

    Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, and Manuela Züger. 2014. Using psycho-physiological measures to assess task difficulty in software development. InIntl. Conf. on Soft. Eng. (ICSE). 402–413

  34. [34]

    Davide Fucci, Daniela Girardi, Nicole Novielli, Luigi Quaranta, and Filippo Lanu- bile. 2019. A Replication Study on Code Comprehension and Expertise using Lightweight Biometric Sensors. InIntl. Conf. on Prog. Compr. (ICPC). 311–322

  35. [35]

    Amy GrabNGoInfo. 2022. Support Vector Machine (SVM) Hyperparameter Tun- ing In Python. https://medium.com/grabngoinfo/support-vector-machine-svm- hyperparameter-tuning-in-python-a65586289bcb/

  36. [36]

    Halstead

    Maurice H. Halstead. 1977.Elements of Soft. Science. Elsevier

  37. [37]

    James A Hanley and Barbara J McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology143, 1 (1982), 29–36

  38. [38]

    Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf

    Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines.IEEE Intelligent Systems and their applications13, 4 (1998), 18–28

  39. [39]

    Mohammad Hossin and Md Nasir Sulaiman. 2015. A review on evaluation metrics for data classification evaluations.International journal of data mining & knowledge management process5, 2 (2015), 1

  40. [40]

    Feitelson

    Ahmad Jbara and Dror G. Feitelson. 2017. How programmers read regular code: a controlled experiment using eye tracking.Emp. Soft. Eng.22, 3 (2017), 1440–1477

  41. [41]

    Cem Kaner, Senior Member, and Walter P. Bond. 2004. Software Engineering Metrics: What Do They Measure and How Do We Know?. InIntl. Soft. Metrics Symp. (METRICS)

  42. [42]

    Zachary Karas, Aakash Bansal, Yifan Zhang, Toby Li, Collin McMillan, and Yu Huang. 2024. A tale of two comprehensions? analyzing student programmer attention during code summarization.ACM Transactions on Software Engineering and Methodology33, 7 (2024), 1–37

  43. [43]

    Maurice G Kendall. 1938. A new measure of rank correlation.Biometrika30, 1-2 (1938), 81–93

  44. [44]

    Maurice G. Kendall. 1938. A new measure of rank correlation.Biometrika30, 1/2 (1938), 81–93

  45. [45]

    Amy J Ko and Brad A Myers. 2005. A framework and methodology for study- ing the causes of software errors in programming systems.Journal of Visual Languages & Computing16, 1-2 (2005), 41–84

  46. [46]

    Cognitive Complexity

    Luigi Lavazza, Abedallah Zaid Abualkishik, Geng Liu, and Sandro Morasca. 2023. An empirical evaluation of the “Cognitive Complexity” measure as a predictor of code understandability.Journal of Systems and Software197 (2023), 111561

  47. [47]

    Luigi Lavazza, Sandro Morasca, and Marco Gatto. 2023. An empirical study on software understandability and its dependence on code characteristics.Empirical Software Engineering28, 6 (2023), 155

  48. [48]

    Gary T Leavens, Albert L Baker, and Clyde Ruby. 1998. JML: a Java modeling language. InFormal Underpinnings of Java Workshop (at OOPSLA 1998). Citeseer, 404–420

  49. [49]

    Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)

  50. [50]

    Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the Comprehension of Program Comprehension.Trans. on Soft. Eng. and Methodology (TSEM)23, 4 (2014), 1–37

  51. [51]

    T.J. McCabe. 1976. A Complexity Measure.Trans. on Soft. Eng. (TSE)SE-2, 4 (1976), 308–320

  52. [52]

    Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test.The Corsini encyclopedia of psychology(2010), 1–1

  53. [53]

    Qing Mi, Yiqun Hao, Liwei Ou, and Wei Ma. 2022. Towards using visual, semantic and structural features to improve code readability classification.Journal of Systems and Software193 (2022), 111454

  54. [54]

    Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I Know What You Did Last Summer - An Investigation of How Developers Spend Their Time. InIntl. Conf. on Prog. Compr. (ICPC). 25–35

  55. [55]

    João Mota, Marco Giunti, and António Ravara. 2021. Java typestate checker. In Intl. Conf. on Coord. Lang. and Models. Springer, 121–133

  56. [56]

    Gireen Naidu, Tranos Zuva, and Elias Mmbongeni Sibanda. 2023. A review of evaluation metrics in machine learning algorithms. InComputer science on-line conference. Springer, 15–25

  57. [57]

    Peter O’Hearn, John Reynolds, and Hongseok Yang. 2001. Local reasoning about programs that alter data structures. InIntl. Workshop on Computer Science Logic. Springer, 1–19

  58. [58]

    Paul Oman and Jack Hagemeister. 1992. Metrics for assessing a software system’s maintainability. InProceedings Conference on Software Maintenance 1992. IEEE, 337–344

  59. [59]

    Matthew M Papi, Mahmood Ali, Telmo Luis Correa Jr, Jeff H Perkins, and Michael D Ernst. 2008. Practical pluggable types for Java. InProceedings of the 2008 international symposium on Software testing and analysis. 201–212

  60. [60]

    Kang-il Park, Jack Johnson, Cole S Peterson, Nishitha Yedla, Isaac Baysinger, Jairo Aponte, and Bonita Sharif. 2024. An eye tracking study assessing source code readability rules for program comprehension.Empirical Software Engineering29, 6 (2024), 160. EASE 2026, 9–12 June, 2026, Glasgow, Scotland, United Kingdom Nadeeshan De Silva, Martin Kellogg, and O...

  61. [61]

    In: 2024 IEEE Interna- tional Conference on Big Data (BigData), pp

    Abhi Patel, Kazi Zakia Sultana, and Bharath K. Samanthula. 2024. A Comparative Analysis between AI Generated Code and Human Written Code: A Preliminary Study. In2024 IEEE International Conference on Big Data (BigData). 7521–7529. https://doi.org/10.1109/BigData62323.2024.10825958

  62. [62]

    Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, and Janet Siegmund

  63. [63]

    Program comprehension and code complexity metrics: An fMRI study. In Intl. Conf. on Soft. Eng. (ICSE). 524–536

  64. [64]

    Norman Peitek, Janet Siegmund, and Sven Apel. 2020. What Drives the Reading Order of Programmers? An Eye Tracking Study. InIntl. Conf. on Prog. Compr. (ICPC). 342–353

  65. [65]

    Norman Peitek, Janet Siegmund, Sven Apel, Christian Kästner, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. 2018. A look into programmers’ heads.Trans. on Soft. Eng. (TSE)46, 4 (2018), 442–462

  66. [66]

    Leif E Peterson. 2009. K-nearest neighbor.Scholarpedia4, 2 (2009), 1883

  67. [67]

    Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. InProceedings of the 8th working conference on mining software repositories. 73–82

  68. [68]

    Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2021. Reflections on: A Simpler Model of Software Readability.ACM SIGSOFT Soft. Eng. Notes46, 3 (2021), 30–32

  69. [69]

    Hassan Ramchoun, Youssef Ghanou, Mohamed Ettaouil, and Mohammed Amine Janati Idrissi. 2016. Multilayer perceptron: Architecture optimization and training. (2016)

  70. [70]

    Steven J Rigatti. 2017. Random forest.Journal of Insurance Medicine47, 1 (2017), 31–39

  71. [71]

    Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares- Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2019. Automatically assessing code understandability.Trans. on Soft. Eng. (TSE)47, 3 (2019), 595–613

  72. [72]

    Simone Scalabrino, Mario Linares-Vásquez, Rocco Oliveto, and Denys Poshy- vanyk. 2018. A comprehensive model for code readability.Journal of Software: Evolution and Process30, 6 (2018), e1958

  73. [73]

    Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving code readability models with textual features. In2016 IEEE 24th International Conference on Program Comprehension (ICPC). IEEE, 1–10

  74. [74]

    Agnia Sergeyuk, Olga Lvova, Sergey Titov, Anastasiia Serova, Farid Bagirov, Evgeniia Kirillova, and Timofey Bryksin. 2024. Reassessing java code readability models with a human-centered approach. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 225–235

  75. [75]

    Claude Elwood Shannon. 1948. A mathematical theory of communication.The Bell system technical journal27, 3 (1948), 379–423

  76. [76]

    Janet Siegmund. 2016. Program Comprehension: Past, Present, and Future. In Intl. Conf. on Soft. Analysis, Evolution, and ReEng. (SANER), Vol. 5. 13–20

  77. [77]

    Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. 2014. Understanding understanding source code with functional magnetic resonance imaging. InIntl. Conf. on Soft. Eng. (ICSE). 378–389

  78. [78]

    Dag IK Sjøberg, Jo Erskine Hannay, Ove Hansen, Vigdis By Kampenes, Amela Karahasanovic, N-K Liborg, and Anette C Rekdal. 2005. A survey of controlled experiments in software engineering.IEEE transactions on software engineering 31, 9 (2005), 733–753

  79. [79]

    Ryo SOGA, Takatomi KUBO, Takashi ISHIO, Yuna NUNOMURA, Takahiro KI- NOSHITA, Hideyuki KANUKA, and Kenichi MATSUMOTO. 2025. Your heart foretells your performance: Analysis of pre-task heart rate in program compre- hension tasks.IEICE Transactions on Information and Systems(2025)

  80. [80]

    Stevens, Glenford J

    Wayne P. Stevens, Glenford J. Myers, and Larry L. Constantine. 1974. Structured design.IBM systems journal13, 2 (1974), 115–139

Showing first 80 references.