arxiv: 2603.16791 · v2 · submitted 2026-03-17 · 💻 cs.SE

Recognition: 2 theorem links

· Lean Theorem

Improving Code Comprehension through Cognitive-Load Aware Automated Refactoring for Novice Programmers

Subarna Saha , Alif Al Hasan , Fariha Tanjim Shifat , Mia Mohammad Imran

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:44 UTC · model grok-4.3

classification 💻 cs.SE

keywords refactoringnovice programmerscode comprehensioncognitive loadCognitive-Driven DevelopmentCyclomatic complexityautomated refactoring

0 comments

The pith

Cognitively guided refactoring improves novice programmers' code comprehension by reducing control-flow complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a method to automatically refactor code for better novice understanding using principles from Cognitive-Driven Development. The approach, called CDDRefactorER, limits changes to decrease nesting and complexity measures without altering what the code does. Tests on standard programming datasets with AI models show major drops in failed refactors and complexity increases. In a study with actual novices, the refactored code led to better identification of functions and easier reading of structure. If effective, this could mean novices learn from code examples more readily without extra explanations.

Core claim

The central claim is that cognitively guided refactoring, operationalized in CDDRefactorER, provides a practical mechanism for enhancing novice code comprehension by constraining transformations to reduce control-flow complexity while preserving behavior and structural similarity, as evidenced by reduced refactoring failures and improved human comprehension metrics.

What carries the argument

CDDRefactorER, the automated refactoring approach that applies constrained transformations from Cognitive-Driven Development to lower Cyclomatic and Cognitive complexity.

Load-bearing premise

That the specific constrained transformations reliably preserve original behavior and structural similarity while reducing cognitive load as measured by complexity metrics.

What would settle it

Conducting the human-subject study again and finding no statistically significant improvement in novice comprehension scores for the refactored code.

Figures

Figures reproduced from arXiv: 2603.16791 by Alif Al Hasan, Fariha Tanjim Shifat, Mia Mohammad Imran, Subarna Saha.

**Figure 2.** Figure 2: Overview of the Methodology. Boolean Returns replaces verbose conditional patterns with direct boolean expressions [69]. Descriptive Naming improves identifier clarity [59, 61], and Sequential Flow encourages chronological ordering and grouping of statements to support comprehension [64]. Each strategy is defined through explicit transformation rules and illustrated with concrete examples in the prompt. F… view at source ↗

**Figure 3.** Figure 3: CodeBLEU similarity distributions after refactoring [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Original code (top), erroneous baseline refactoring [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Novice programmers often struggle to comprehend code due to vague naming, deep nesting, and poor structural organization. While explanations may offer partial support, they typically do not restructure the code itself. We propose code refactoring as cognitive scaffolding, where cognitively guided refactoring automatically restructures code to improve clarity. We operationalize this in CDDRefactorER, an automated approach grounded in Cognitive-Driven Development that constrains transformations to reduce control-flow complexity while preserving behavior and structural similarity. We evaluate CDDRefactorER using two benchmark datasets (MBPP and APPS) against two models (gpt-5-nano and kimi-k2), and a controlled human-subject study with novice programmers. Across datasets and models, CDDRefactorER reduces refactoring failures by 54-71% and substantially lowers the likelihood of increased Cyclomatic and Cognitive complexity during refactoring, compared to unconstrained prompting. Results from the human study show consistent improvements in novice code comprehension, with function identification increasing by 31.3% and structural readability by 22.0%. The findings suggest that cognitively guided refactoring offers a practical and effective mechanism for enhancing novice code comprehension.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CDDRefactorER layers cognitive constraints onto LLM refactoring and reports fewer failures plus lower complexity on benchmarks with novice gains, but semantic preservation rests only on prompt instructions without independent checks.

read the letter

The paper introduces CDDRefactorER, which applies Cognitive-Driven Development rules as prompt constraints to guide LLMs toward simpler control flow during refactoring. On MBPP and APPS it cuts refactoring failures by 54-71% and reduces the chance of complexity increases compared to plain prompting, using two different models. The human study then shows novices doing better at spotting functions and reading structure after the changes. That combination is the main takeaway: a practical way to make automated refactoring more helpful for beginners rather than just another prompting trick. The approach is new in turning an established CDD framework into concrete constraints for this exact task, and the benchmark comparison is straightforward. The human results give it a direct link to the education use case. The soft spot is behavior preservation. The abstract says the prompts require keeping original semantics and structural similarity, but it gives no sign of running tests afterward, checking output equivalence, or measuring structural diffs on the benchmark problems. If a refactor quietly alters side effects or control flow, the comprehension numbers lose their meaning. The human study details are also thin in the abstract, with no visible sample size or controls. This is for people working on AI tools for programming education or cognitive aspects of code. A reader who already follows LLM refactoring work would pick up the specific constraint method and the reported deltas. It deserves peer review so the verification steps and study design can be examined in full.

Referee Report

1 major / 1 minor

Summary. The paper claims that CDDRefactorER, an LLM-based automated refactoring approach grounded in Cognitive-Driven Development, constrains transformations to reduce control-flow complexity (Cyclomatic and Cognitive) while preserving behavior and structural similarity. On MBPP and APPS benchmarks it reports 54-71% fewer refactoring failures and lower rates of complexity increase versus unconstrained prompting; a controlled human study with novices reports 31.3% higher function identification and 22.0% higher structural readability.

Significance. If the behavior-preservation claim holds, the work supplies a practical, cognitively grounded mechanism for improving novice code comprehension that could be integrated into programming tools and education platforms. The use of independent public benchmarks plus a separate human study is a strength.

major comments (1)

[Evaluation] Evaluation section: the central claim requires that constrained transformations preserve original semantics, yet the manuscript provides no description of post-refactoring test execution, output-equivalence checks, or structural-diff metrics on MBPP/APPS. Without these, the reported reductions in failures and complexity cannot be interpreted as evidence of safe refactoring.

minor comments (1)

[Human study] The abstract and visible sections omit sample size, task design details, and statistical controls for the human study; these should be added for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment regarding the evaluation of semantic preservation below and agree that additional details are needed.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim requires that constrained transformations preserve original semantics, yet the manuscript provides no description of post-refactoring test execution, output-equivalence checks, or structural-diff metrics on MBPP/APPS. Without these, the reported reductions in failures and complexity cannot be interpreted as evidence of safe refactoring.

Authors: We agree that the manuscript should explicitly describe how behavior preservation was verified to support the central claims. Although the CDD constraints are intended to ensure equivalence by restricting transformations to behavior-preserving operations (such as renaming and restructuring without logic changes), we acknowledge the absence of verification details. In the revised version, we will add a dedicated subsection to the Evaluation section detailing: post-refactoring execution of test cases from MBPP and APPS to confirm output equivalence; use of structural-diff metrics (e.g., AST similarity) to quantify structural preservation; and any manual or automated equivalence checks performed. This will allow the reported failure reductions and complexity improvements to be interpreted as evidence of safe refactoring. revision: yes

Circularity Check

0 steps flagged

Minor reliance on established CDD framework with independent benchmarks and human evaluation

full rationale

The paper grounds CDDRefactorER in the existing Cognitive-Driven Development framework and evaluates refactoring success, complexity metrics, and novice comprehension gains on external public datasets (MBPP, APPS) plus a separate controlled human study. No equations, fitted parameters, or self-citations reduce the central claims to inputs defined by the same data. Behavior preservation is enforced via prompt constraints rather than derived from the evaluation itself, so the derivation chain remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that cognitive complexity metrics can guide refactoring rules that improve human comprehension without introducing new fitted parameters or invented entities.

axioms (1)

domain assumption Cognitive-Driven Development principles can be operationalized as constraints on code transformations that reduce control-flow complexity while preserving behavior
Invoked to justify the design of CDDRefactorER

pith-pipeline@v0.9.0 · 5510 in / 1110 out tokens · 24317 ms · 2026-05-15T09:44:33.064147+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CDDRefactorER... constrains transformations to reduce control-flow complexity while preserving behavior and structural similarity... ICPs ≤ 7 per function (Miller’s Law)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reduces refactoring failures by 54-71% and substantially lowers the likelihood of increased Cyclomatic and Cognitive complexity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 5 internal anchors

[1]

Code Transformation

2017. Code Transformation. ScienceDirect Topics, Computer Science. https:// www.sciencedirect.com/topics/computer-science/code-transformation (accessed January 11, 2026). Improving Code Comprehension through Cognitive-Load Aware Automated Refactoring EASE 2026, 9–12 June, 2026, Glasgow, Scotland, United Kingdom

work page 2017
[2]

CDDRefactorER

2025. CDDRefactorER. https://chatgpt.com/g/g- 6803de5d95fc81919a4cdbcb210b8200-cddrefactorgpt

work page 2025
[3]

Replication Package

2025. Replication Package. https://zenodo.org/records/18153415

work page arXiv 2025
[4]

Felix Adler, Gordon Fraser, Eva Grundinger, et al. 2021. Improving Readability of Scratch Programs with Search-based Refactoring . In2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE Computer Society, Los Alamitos, CA, USA, 120–130

work page 2021
[5]

Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, and Ali Ouni. 2024. Au- tomating Source Code Refactoring in the Classroom. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1(Portland, OR, USA)(SIGCSE 2024). Association for Computing Machinery, New York, NY, USA, 60–66

work page 2024
[6]

Eman Abdullah AlOmar, Luo Xu, Sofia Martinez, et al. 2025. ChatGPT for Code Refactoring: Analyzing Topics, Interaction, and Effective Prompts.35th IEEE International Conference on Collaborative Advances in Software and Computing (CASCON)(2025)

work page 2025
[7]

Jacob Austin, Augustus Odena, Maxwell Nye, et al. 2021. Program synthesis with large language models.arXiv preprint arXiv:2108.07732(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[8]

Leonardo Ferreira Barbosa, Victor Hugo Pinto, Alberto Luiz Oliveira Tavares de Souza, et al. 2022. To What Extent Cognitive-Driven Development Improves Code Readability?. InProceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement(Helsinki, Finland)(ESEM ’22). Association for Computing Machinery, New York...

work page 2022
[9]

Shraddha Barke, Michael B James, and Nadia Polikarpova. 2023. Grounded copilot: How programmers interact with code-generating models.Proceedings of the ACM on Programming Languages7, OOPSLA1 (2023), 85–111

work page 2023
[10]

Arie Bennett and Cruz Izu. 2025. Replicating a SOLO approach to Measure Students’ Ability to Improve Code Efficiency. InProceedings of the ACM Global on Computing Education Conference 2025 Vol 1(Gaborone, Botswana)(CompEd 2025). Association for Computing Machinery, New York, NY, USA, 43–49

work page 2025
[11]

João Henrique Berssanette and Antonio Carlos de Francisco. 2021. Cognitive load theory in the context of teaching and learning computer programming: A systematic literature review.IEEE Transactions on Education65, 3 (2021), 440–449

work page 2021
[12]

Teresa Busjahn, Carsten Schulte, and Andreas Busjahn. 2011. Analysis of code reading to gain more insight in program comprehension. InProceedings of the 11th Koli Calling International Conference on Computing Education Research(Koli, Finland)(Koli Calling ’11). Association for Computing Machinery, New York, NY, USA, 1–9

work page 2011
[13]

G Ann Campbell. 2018. Cognitive complexity: An overview and evaluation. In Proceedings of the 2018 international conference on technical debt(Gothenburg, Sweden)(TechDebt ’18). Association for Computing Machinery, New York, NY, USA, 57–58

work page 2018
[14]

Eduardo Carneiro Oliveira, Hieke Keuning, and Johan Jeuring. 2024. Investigat- ing student reasoning in method-level code refactoring: A think-aloud study. InProceedings of the 24th Koli Calling International Conference on Computing Education Research. 1–11

work page 2024
[15]

Eduardo Carneiro Oliveira, Hieke Keuning, and Johan Jeuring. 2025. Uncovering Behavioral Patterns in Student–LLM Conversations during Code Refactoring Tasks. InProceedings of the 25th Koli Calling International Conference on Comput- ing Education Research (Koli Calling ’25). Association for Computing Machinery, New York, NY, USA, Article 39, 11 pages

work page 2025
[16]

Gary Charness, Uri Gneezy, and Michael A Kuhn. 2012. Experimental methods: Between-subject and within-subject design.Journal of economic behavior & organization81, 1 (2012), 1–8

work page 2012
[17]

Mark Chen, Jerry Tworek, Heewoo Jun, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2021
[18]

Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological bulletin114, 3 (1993), 494

work page 1993
[19]

Refactor to Understand

Bart Du Bois, Serge Demeyer, and Jan Verelst. 2005. Does the "Refactor to Understand" Reverse Engineering Pattern Improve Program Comprehension?. InProceedings of the Ninth European Conference on Software Maintenance and Reengineering (CSMR ’05). IEEE Computer Society, USA, 334–343

work page 2005
[20]

Rodrigo Duran, Albina Zavgorodniaia, and Juha Sorva. 2022. Cognitive load the- ory in computing education research: A review.ACM Transactions on Computing Education (TOCE)22, 4 (2022), 1–27

work page 2022
[21]

Ericsson, Emma. 2023. Evaluating Similarity-Based Refactoring Recommenda- tions. Student Paper

work page 2023
[22]

Matteo Esposito, Andrea Janes, Terhi Kilamo, et al. 2025. Early Career Developers’ Perceptions of Code Understandability: A Study of Complexity Metrics.IEEE Access13 (2025), 135027–135042

work page 2025
[23]

Sarah Fakhoury, Yuzhan Ma, Venera Arnaoudova, et al. 2018. The effect of poor source code lexicon and readability on developers’ cognitive load. InProceedings of the 26th Conference on Program Comprehension(Gothenburg, Sweden)(ICPC ’18). Association for Computing Machinery, New York, NY, USA, 286–296

work page 2018
[24]

Sarah Fakhoury, Devjeet Roy, Adnan Hassan, et al. 2019. Improving source code readability: Theory and practice. In2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2–12

work page 2019
[25]

Zhangyin Feng, Daya Guo, Duyu Tang, et al. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020). Association for Computational Linguistics, 1536–1547

work page 2020
[26]

Pinto, Cleidson R

Ronivaldo Ferreira, Victor Hugo Santiago C. Pinto, Cleidson R. B. de Souza, et al. 2024. Assisting Novice Developers Learning in Flutter Through Cognitive- Driven Development. InProceedings of the 38th Brazilian Symposium on Software Engineering, SBES 2024, Curitiba, Brazil, September 30 - October 4, 2024. SBC, 367–376

work page 2024
[27]

2018.Refactoring: improving the design of existing code

Martin Fowler. 2018.Refactoring: improving the design of existing code. Addison- Wesley Professional

work page 2018
[28]

Lucian José Gonçales, Kleinner Farias, and Bruno C da Silva. 2021. Measuring the cognitive load of software developers: An extended Systematic Mapping Study. Information and Software Technology136 (2021), 106563

work page 2021
[29]

Dan Gopstein, Jake Iannacone, Yu Yan, et al. 2017. Understanding misunderstand- ings in source code. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering(Paderborn, Germany)(ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 129–139

work page 2017
[30]

Anthony G Greenwald. 1976. Within-subjects designs: To use or not to use? Psychological Bulletin83, 2 (1976), 314

work page 1976
[31]

Gao Hao, Haytham Hijazi, João Durães, et al . 2023. On the accuracy of code complexity metrics: A neuroscience-based guideline for improvement.Frontiers in Neuroscience16 (2023), 1065366

work page 2023
[32]

Alif Al Hasan, Subarna Saha, and Mia Mohammad Imran. 2026. Learning Pro- gramming in Informal Spaces: Using Emotion as a Lens to Understand Novice Struggles on r/learnprogramming. InProceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET ’26). ACM, Rio de Janeiro, Brazil, 1–12

work page 2026
[33]

Dan Hendrycks, Steven Basart, Saurav Kadavath, et al. 2021. Measuring Coding Challenge Competence With APPS. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.), Vol. 1

work page 2021
[34]

Felienne Hermans and Efthimia Aivaloglou. 2016. Do code smells hamper novice programming? A controlled experiment on Scratch programs . In2016 IEEE 24th International Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 1–10

work page 2016
[35]

John Johnson, Sergio Lubo, Nishitha Yedla, et al . 2019. An Empirical Study Assessing Source Code Readability in Comprehension . In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 513–523

work page 2019
[36]

Shahedul Huq Khandkar. 2009. Open coding.University of Calgary23, 2009 (2009), 2009

work page 2009
[37]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65

work page 2019
[38]

Stephen MacNeil, Andrew Tran, Arto Hellas, et al. 2023. Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 931–937

work page 2023
[39]

Philomena Marfo and G.A. Okyere. 2019. The accuracy of effect-size estimates under normals and contaminated normals in meta-analysis.Heliyon5, 6 (2019), e01838

work page 2019
[40]

T.J. McCabe. 1976. A Complexity Measure .IEEE Transactions on Software Engineering2, 04 (Dec. 1976), 308–320

work page 1976
[41]

Flavio Medeiros, Marcio Ribeiro, Rohit Gheyi, et al. 2018. Discipline Matters: Refactoring of Preprocessor Directives in the #ifdef Hell .IEEE Transactions on Software Engineering44, 05 (May 2018), 453–469

work page 2018
[42]

G. A. Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological Review63, 2 (1956), 81–97

work page 1956
[43]

Rodrigo Morales, Foutse Khomh, and Giuliano Antoniol. 2020. RePOR: Mimicking humans on refactoring tasks. Are we there yet?Empirical Software Engineering 25, 4 (2020), 2960–2996

work page 2020
[44]

Marvin Muñoz Barón, Marvin Wyrich, and Stefan Wagner. 2020. An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandabil- ity. InProceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)(Bari, Italy)(ESEM ’20). Associa- tion for Computing Machinery, New York, NY, USA, ...

work page 2020
[45]

Sara Nurollahian, Hieke Keuning, and Eliane Wiese. 2025. Teaching Well- Structured Code: A Literature Review of Instructional Approaches . In2025 IEEE/ACM 37th International Conference on Software Engineering Education and Training (CSEE&T). IEEE Computer Society, Los Alamitos, CA, USA, 205–216

work page 2025
[46]

Indranil Palit and Tushar Sharma. 2025. Reinforcement Learning vs Supervised Learning: A tug of war to generate refactored code accurately. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE ’25). Association for Computing Machinery, New York, NY, USA, 429–440. EASE 2026, 9–12 June, 2026, Glasgow,...

work page 2025
[47]

Peterson, et al

Kang-il Park, Jack Johnson, Cole S. Peterson, et al. 2024. An eye tracking study assessing source code readability rules for program comprehension.Empirical Softw. Engg.29, 6 (Oct. 2024), 60 pages

work page 2024
[48]

Norman Peitek, Sven Apel, Chris Parnin, et al. 2021. Program Comprehension and Code Complexity Metrics: An fMRI Study. InProceedings of the 43rd International Conference on Software Engineering(Madrid, Spain)(ICSE ’21). IEEE Press, NJ, USA, 524–536

work page 2021
[49]

Anthony Peruma, Steven Simmons, Eman Abdullah AlOmar, et al. 2022. How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow.Empirical Software Engineering27, 1 (2022), 11

work page 2022
[50]

Yonnel Chen Kuang Piao, Jean Carlors Paul, Leuson Da Silva, et al. 2025. Refac- toring with LLMs: Bridging Human Expertise and Machine Understanding. arXiv:2510.03914 [cs.SE]

work page arXiv 2025
[51]

Gustavo Pinto and Alberto de Souza. 2023. Cognitive Driven Development helps software teams to keep code units under the limit!Journal of Systems and Software 206 (2023), 111830

work page 2023
[52]

Pinto and Alberto Luiz Oliveira Tavares De Souza

Victor Hugo Santiago C. Pinto and Alberto Luiz Oliveira Tavares De Souza. 2022. Effects of Cognitive-driven Development in the Early Stages of the Software Development Life Cycle. InProceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS

work page 2022
[53]

Pinto, Alberto Luiz Oliveira Tavares de Souza, Yuri Matheus Barboza de Oliveira, et al

Victor Hugo Santiago C. Pinto, Alberto Luiz Oliveira Tavares de Souza, Yuri Matheus Barboza de Oliveira, et al. 2021. Cognitive-Driven Development: Preliminary Results on Software Refactorings. InProceedings of the 16th Inter- national Conference on Evaluation of Novel Approaches to Software Engineering - ENASE. INSTICC, SciTePress, 92–102. doi:10.5220/00...

work page doi:10.5220/0010408100920102 2021
[54]

It’s weird that it knows what i want

James Prather, Brent N Reeves, Paul Denny, et al. 2023. “It’s weird that it knows what i want”: Usability and interactions with copilot for novice programmers. ACM transactions on computer-human interaction31, 1 (2023), 1–31

work page 2023
[55]

Raluca Budiu. 2023. Between-Subjects vs. Within-Subjects Study Design. https: //www.nngroup.com/articles/between-within-subjects/. Accessed: 2026-01-10

work page 2023
[56]

Shuo Ren, Daya Guo, Shuai Lu, et al. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[57]

Devjeet Roy, Sarah Fakhoury, John Lee, et al. 2020. A Model to Detect Readability Improvements in Incremental Changes. InProceedings of the 28th International Conference on Program Comprehension(Seoul, Republic of Korea)(ICPC ’20). Association for Computing Machinery, New York, NY, USA, 25–36

work page 2020
[58]

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, et al. 2024. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, et al. 2016. Im- proving code readability models with textual features . In2016 IEEE 24th Interna- tional Conference on Program Comprehension (ICPC). IEEE Computer Society, Los Alamitos, CA, USA, 1–10

work page 2016
[60]

Sandro Schulze, Jörg Liebig, Janet Siegmund, et al . 2013. Does the discipline of preprocessor annotations matter? a controlled experiment. InProceedings of the 12th International Conference on Generative Programming: Concepts & Experiences(Indianapolis, Indiana, USA)(GPCE ’13). Association for Computing Machinery, New York, NY, USA, 65–74

work page 2013
[61]

Giulia Sellitto, Emanuele Iannone, Zadia Codabux, et al. 2022. Toward Under- standing the Impact of Refactoring on Program Comprehension. InIEEE Interna- tional Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, 731–742

work page 2022
[62]

Janet Siegmund, Norman Peitek, Chris Parnin, et al . 2017. Measuring neural efficiency of program comprehension. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering(Paderborn, Germany)(ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 140–150

work page 2017
[63]

José Aldo Silva Da Costa and Rohit Gheyi. 2023. Evaluating the Code Com- prehension of Novices with Eye Tracking. InProceedings of the XXII Brazilian Symposium on Software Quality(Brasília, Brazil)(SBQS ’23). Association for Computing Machinery, New York, NY, USA, 332–341

work page 2023
[64]

John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science12, 2 (1988), 257–285

work page 1988
[65]

Alberto Luiz Oliveira Tavares de Souza and Victor Hugo Santiago Costa Pinto

work page
[66]

In2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Toward a Definition of Cognitive-Driven Development . In2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 776–778

work page
[67]

Kimi Team, Yifan Bai, Yiping Bao, et al. 2025. Kimi K2: Open Agentic Intelligence. arXiv:2507.20534 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[68]

Peeratham Techapalokul and Eli Tilevich. 2019. Position: Manual Refactoring (by Novice Programmers) Considered Harmful . In2019 IEEE Blocks and Beyond Workshop (B&B). IEEE Computer Society, Los Alamitos, CA, USA, 79–80

work page 2019
[69]

Garry L White and Marcos P Sivitanides. 2002. A theory of the relationships between cognitive requirements of computer programming languages and pro- grammers’ cognitive characteristics.Journal of information systems education13, 1 (2002), 59–66

work page 2002
[70]

Wiese, Anna N

Eliane S. Wiese, Anna N. Rafferty, and Armando Fox. 2019. Linking code readabil- ity, structure, and comprehension among novices: it’s complicated. InProceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training(Montreal, Quebec, Canada)(ICSE-SEET ’19). IEEE Press, NJ, USA, 84–94

work page 2019
[71]

Frank Wilcoxon. 1945. Individual comparisons by ranking methods.Biometrics bulletin1, 6 (1945), 80–83

work page 1945
[72]

Yisen Xu, Feng Lin, Jinqiu Yang, et al. 2025. MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collabo- ration. arXiv:2503.14340 [cs.SE]

work page arXiv 2025
[73]

Albert Ziegler, Eirini Kalliamvakou, X Alice Li, et al . 2024. Measuring github copilot’s impact on productivity.Commun. ACM67, 3 (2024), 54–63

work page 2024