Recognition: unknown
Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation
Pith reviewed 2026-05-10 02:53 UTC · model grok-4.3
The pith
Novice programmers show no direct link between trust in AI code tools and compliance with AI suggestions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Subjective trust in AIDEs did not predict subsequent compliance with AI-generated code during timed programming tasks. Compliance was positively associated with performance, and higher performance led to greater trust on follow-up measures. The findings corroborate experience-driven changes in trust but indicate that trust does not directly influence compliance behavior for novices using generative AI code tools.
What carries the argument
The triadic set of relationships (and observed independences) among subjective trust scores from questionnaires, objective compliance rates (adoption of AI suggestions), and performance metrics in novice programming tasks with AI assistance.
Load-bearing premise
That the subjective trust questionnaires and objective compliance and performance metrics validly and reliably measure the intended constructs for novice programmers working under time pressure with AI code generation tools.
What would settle it
A replication study with novice programmers that finds a statistically significant positive correlation between pre-task trust scores and rates of compliance with AI suggestions, independent of performance, would contradict the reported lack of relationship.
read the original abstract
Objective. To explore how novice programmers' trust in Artificial Intelligence-driven Development Environments (AIDEs) relates to their coding performance and AI compliance while programming under time pressure. Background. Computer programming has undergone rapid upheaval due to state-of-the-art AIDEs, which provide clever automation for many aspects of software development. A longstanding interest of researchers of automation more generally has been the attitude of trust. Decades of research seek to explain how influencing trust can help to achieve desirable outcomes in different domains, but very limited work has provided similar focus on trust in AIDEs. Method. We collected subjective measures of trust along with objective measures of performance and AIDE compliance from a diverse group of 27 novice programmers between two study locations. Results. Our results corroborated traditional understandings of how trust changes through experiences. However, we did not find a relationship between trust and subsequent compliance during programming tasks. Greater compliance was associated with strong performance, and strong performance led to greater subsequent trust. Conclusion. Our findings raise new questions about the utility of trust in the context of interacting with AIDEs and generative AI. We call for further research into the effect of trust on compliance to recommendations from imperfect AI. Application. This work can inform the design of training and educational content for generative AI use within and beyond software development. Instructional designers should consider risks of AI misuse and disuse and focus on promoting desirable interaction outcomes, regardless of trust's connection to them.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical user study with 27 novice programmers across two sites who completed programming tasks under time pressure using AI code generation tools. Subjective trust was measured via questionnaires while objective compliance (adherence to AI suggestions) and performance were recorded. Results indicate that trust changes with experience in expected ways, but no relationship was found between trust and subsequent compliance; greater compliance was associated with stronger performance, and stronger performance predicted greater subsequent trust. The authors conclude that trust may have limited utility for predicting desirable outcomes with AIDEs and call for further research on trust, compliance, and AI misuse/disuse.
Significance. If the results hold after statistical clarification, the work offers a useful counterpoint to decades of automation-trust literature by showing that trust does not predict compliance in novice AIDE use, while performance and compliance appear mutually reinforcing. This could inform training design and tool interfaces that prioritize performance feedback over trust calibration, and it highlights risks of over- or under-reliance on generative AI in education.
major comments (3)
- [Abstract] Abstract and Results: the directional findings and null result on trust-compliance are reported from only 27 participants with no statistical tests, effect sizes, confidence intervals, power analysis, or controls for individual differences or repeated measures, preventing assessment of whether the absence of relationship is robust or an artifact of low power.
- [Method] Method: the subjective trust questionnaires and the operational definitions of objective compliance and performance metrics are not described in sufficient detail to evaluate their validity and reliability for novices working under time pressure with imperfect AI suggestions.
- [Results] Results: the claim that 'strong performance led to greater subsequent trust' and the compliance-performance association require explicit reporting of correlation or regression coefficients, p-values, and effect sizes to support the directional conclusions.
minor comments (1)
- [Abstract] The abstract could more precisely state the number of tasks, sites, and any pre-registered hypotheses to improve transparency.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each of the major comments below and have made revisions to strengthen the statistical reporting and methodological details.
read point-by-point responses
-
Referee: [Abstract] Abstract and Results: the directional findings and null result on trust-compliance are reported from only 27 participants with no statistical tests, effect sizes, confidence intervals, power analysis, or controls for individual differences or repeated measures, preventing assessment of whether the absence of relationship is robust or an artifact of low power.
Authors: We recognize the importance of rigorous statistical analysis, particularly with a modest sample size. In the revised manuscript, we now report correlation coefficients, p-values, effect sizes, and confidence intervals for all key relationships, including the null finding on trust and compliance. A post-hoc power analysis has been added to the Results section, along with a discussion of the study's exploratory nature and limitations regarding individual differences and repeated measures. We note that an a priori power analysis was not conducted as the study was designed to be exploratory. revision: yes
-
Referee: [Method] Method: the subjective trust questionnaires and the operational definitions of objective compliance and performance metrics are not described in sufficient detail to evaluate their validity and reliability for novices working under time pressure with imperfect AI suggestions.
Authors: We agree that additional detail is warranted. The revised Method section now includes the full trust questionnaire with all items and response scales, along with citations to prior validation studies. Operational definitions have been clarified: compliance is defined as the proportion of AI-generated suggestions that were accepted and incorporated into the code, and performance is measured by task accuracy and completion efficiency under the imposed time constraints. We have also described how these metrics account for the imperfect nature of AI suggestions. revision: yes
-
Referee: [Results] Results: the claim that 'strong performance led to greater subsequent trust' and the compliance-performance association require explicit reporting of correlation or regression coefficients, p-values, and effect sizes to support the directional conclusions.
Authors: We have updated the Results section to provide the requested statistical details. For the performance-trust relationship, we report the correlation coefficient, p-value, and effect size from our analysis. Similarly for the compliance-performance association. These additions substantiate the directional claims with quantitative evidence. revision: yes
Circularity Check
No circularity: empirical user study with direct data reporting
full rationale
The paper is a straightforward empirical study collecting subjective trust ratings and objective compliance/performance metrics from 27 novice programmers. It reports observed associations (or lack thereof) without any equations, fitted models, predictions derived from inputs, or self-referential definitions. No derivation chain exists that could reduce to its own inputs by construction. The null finding on trust-compliance and positive links between compliance-performance and performance-trust are presented as direct results from the collected data, not as outputs forced by prior assumptions or self-citations.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Subjective questionnaires provide a valid measure of trust in AI coding tools.
- domain assumption Compliance with AI suggestions and coding performance can be objectively and unambiguously quantified from task logs.
Reference graph
Works this paper leans on
-
[1]
and Chiou, Erin K
Rodriguez Rodriguez, Lucero and Bustamante Orellana, Carlos E. and Chiou, Erin K. and Huang, Lixiao and Cooke, Nancy and Kang, Yun , year =. A review of mathematical models of human trust in automation , volume =. Frontiers in Neuroergonomics , publisher =
-
[2]
Technometrics19(2), 135–144 (May 1977).https://doi.org/10.1080/ 00401706.1977.10489521
Detection of. Technometrics , author =. 1977 , pages =. doi:10.1080/00401706.1977.10489493 , language =
-
[3]
A multi-national, multi-institutional study of assessment of programming skills of first-year
McCracken, Michael and Almstrum, Vicki and Diaz, Danny and Guzdial, Mark and Hagan, Dianne and Kolikant, Yifat Ben-David and Laxer, Cary and Thomas, Lynda and Utting, Ian and Wilusz, Tadeusz , month = dec, year =. A multi-national, multi-institutional study of assessment of programming skills of first-year. Working group reports from. doi:10.1145/572133.5...
-
[4]
Mayer, Roger C. and Davis, James H. and Schoorman, F. David , year =. An. The Academy of Management Review , publisher =. doi:10.2307/258792 , abstract =
-
[5]
Asleep at the keyboard? assessing the security of github copilot’s code contributions , shorttitle =
Pearce, Hammond and Ahmad, Baleegh and Tan, Benjamin and Dolan-Gavitt, Brendan and Karri, Ramesh , year =. Asleep at the keyboard? assessing the security of github copilot’s code contributions , shorttitle =. 2022
2022
-
[6]
IEEE Transactions on Human-Machine Systems , author =
Computational. IEEE Transactions on Human-Machine Systems , author =. 2019 , keywords =. doi:10.1109/THMS.2018.2874188 , abstract =
-
[7]
Proceedings of the Human Factors and Ergonomics Society Annual Meeting , author =
Automation. Proceedings of the Human Factors and Ergonomics Society Annual Meeting , author =. 1999 , pages =. doi:10.1177/154193129904300346 , abstract =
-
[8]
Human Factors: The Journal of the Human Factors and Ergonomics Society , author =
Affective. Human Factors: The Journal of the Human Factors and Ergonomics Society , author =. 2011 , pages =. doi:10.1177/0018720811411912 , abstract =
-
[9]
Perry, Neil and Srivastava, Megha and Kumar, Deepak and Boneh, Dan , month = nov, year =. Do. Proceedings of the 2023. doi:10.1145/3576915.3623157 , abstract =
-
[10]
, year =
Wetzel, Jacob M. , year =. Driver trust, annoyance, and compliance for an automated calendar system , copyright =
-
[11]
Estimating relative risks in multicenter studies with a small number of centers — which methods to use?. Trials , author =. 2017 , pages =. doi:10.1186/s13063-017-2248-1 , language =
-
[12]
Evaluating large language models trained on code , journal =
Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg , year =. Evaluating large language models trained on code , journal =
-
[13]
Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models
Vaithilingam, Priyan and Zhang, Tianyi and Glassman, Elena L. , month = apr, year =. Expectation vs. Extended. doi:10.1145/3491101.3519665 , abstract =
-
[14]
, year =
Jayagopal, Dhanya and Lubin, Justin and Chasins, Sarah E. , year =. Exploring the learnability of program synthesizers by novice programmers , booktitle =
-
[15]
Wang, Ming , year =. Geesmv:
-
[16]
The R Journal , author =
Generalized. The R Journal , author =. 2023 , pages =
2023
-
[17]
Kazemitabaar, Majeed and Hou, Xinying and Henley, Austin and Ericson, Barbara Jane and Weintrop, David and Grossman, Tovi , month = feb, year =. How. Proceedings of the 23rd. doi:10.1145/3631802.3631806 , abstract =
-
[18]
Humans and automation: Use, misuse, disuse, abuse
Parasuraman, Raja and Riley, Victor , month = jun, year =. Humans and. Human Factors , publisher =. doi:10.1518/001872097778543886 , abstract =
-
[19]
Brown, Adam and D'Angelo, Sarah and Murillo, Ambar and Jaspan, Ciera and Green, Collin , month = jul, year =. Identifying the. Proceedings of the 1st. doi:10.1145/3664646.3664757 , abstract =
-
[20]
Wang, Ruotong and Cheng, Ruijia and Ford, Denae and Zimmermann, Thomas , month = jun, year =. Investigating and. The 2024. doi:10.1145/3630106.3658984 , language =
-
[21]
Liang, Kung-Yee and Zeger, Scott L. , year =. Longitudinal. Biometrika , publisher =. doi:10.2307/2336267 , abstract =
-
[22]
Multivariate Behavioral Research , author =
Modeling. Multivariate Behavioral Research , author =. 2016 , pages =. doi:10.1080/00273171.2016.1167008 , language =
-
[23]
McNeish, Daniel M. , year =. Modeling sparsely clustered data:. Psychological Methods , publisher =. doi:10.1037/met0000024 , abstract =
-
[24]
Finnie-Ansley, James and Denny, Paul and Luxton-Reilly, Andrew and Santos, Eddie Antonio and Prather, James and Becker, Brett A. , month = jan, year =. My. Proceedings of the 25th. doi:10.1145/3576123.3576134 , abstract =
-
[25]
Human factors , author =
Not all trust is created equal:. Human factors , author =. 2008 , note =
2008
-
[26]
Transformer Circuits Thread , author =
On the. Transformer Circuits Thread , author =. 2025 , file =
2025
-
[27]
Dixon, Stephen R. and Wickens, Christopher D. and McCarley, Jason S. , month = aug, year =. On the. Human Factors , publisher =. doi:10.1518/001872007X215656 , abstract =
-
[28]
, year =
Muir, Bonnie M. , year =. Operators' trust in and use of automatic controllers in a supervisory process control task , copyright =
-
[29]
Microsoft Research , author =
Overreliance on. Microsoft Research , author =. 2022 , file =
2022
-
[30]
Weisz, Justin D. and. Perfection. 2021 , doi =. doi:10.1145/3397481.3450656 , abstract =
-
[31]
Gardella, Nicholas and Pettit, Raymond and Riggs, Sara L. , month = jul, year =. Performance,. Proceedings of the 2024 on. doi:10.1145/3649217.3653615 , language =
-
[32]
Horne, Dwight , year =. 2023
2023
-
[33]
Journal of Computational and Graphical Statistics , publisher =
R:. Journal of Computational and Graphical Statistics , author =. 1996 , pages =. doi:10.1080/10618600.1996.10474713 , language =
-
[34]
Rethinking productivity in software engineering , url =
Sadowski, Caitlin and Zimmermann, Thomas , year =. Rethinking productivity in software engineering , url =
-
[35]
Small. Biometrical Journal , author =. 2003 , note =. doi:10.1002/bimj.200390021 , abstract =
-
[36]
Students’ use of github copilot for working with large code bases,
Shah, Anshul and Chernova, Anya and Tomson, Elena and Porter, Leo and Griswold, William G. and Soosai Raj, Adalbert Gerald , month = feb, year =. Students'. Proceedings of the 56th. doi:10.1145/3641554.3701800 , abstract =
-
[37]
Ericson, David Weintrop, and Tovi Grossman
Kazemitabaar, Majeed and Chow, Justin and Ma, Carl Ka To and Ericson, Barbara J. and Weintrop, David and Grossman, Tovi , month = apr, year =. Studying the effect of. Proceedings of the 2023. doi:10.1145/3544548.3580919 , abstract =
-
[38]
McGuirl, John M. and Sarter, Nadine B. , month = dec, year =. Supporting. Human Factors , publisher =. doi:10.1518/001872006779166334 , abstract =
-
[39]
Annual Review of Psychology , author =. 1999 , pages =. doi:10.1146/annurev.psych.50.1.569 , abstract =
-
[40]
Daly, Mark , month = mar, year =. Task
-
[41]
Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D
Ross, Steven I. and Martinez, Fernando and Houde, Stephanie and Muller, Michael and Weisz, Justin D. , month = mar, year =. The. Proceedings of the 28th. doi:10.1145/3581641.3584037 , language =
-
[42]
Journal of statistical software , author =
The. Journal of statistical software , author =. 2006 , pages =
2006
-
[43]
and Fields, Gregory and Gunsch, Gregg , year =
Biros, David P. and Fields, Gregory and Gunsch, Gregg , year =. The effect of external safeguards on human-information system trust in an information warfare environment , url =. 36th
-
[44]
, year =
Spain, Randall D. , year =. The effects of automation expertise, system confidence, and image quality on trust, compliance, and performance , url =
-
[45]
International Journal of STEM Education , author =
The impact of. International Journal of STEM Education , author =. 2025 , pages =. doi:10.1186/s40594-025-00537-3 , language =
-
[46]
The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information
Miller, George A. , month = mar, year =. The magical number seven, plus or minus two:. Psychological Review , publisher =. doi:10.1037/h0043158 , number =
-
[47]
The relationship of trust and dependence , volume =. Ergonomics , author =. 2024 , pages =. doi:10.1080/00140139.2024.2342436 , language =
-
[48]
and Peterson, Scott A
Dzindolet, Mary T. and Peterson, Scott A. and Pomranky, Regina A. and Pierce, Linda G. and Beck, Hall P. , year =. The role of trust in automation reliance , volume =. International journal of human-computer studies , publisher =
-
[49]
Annual Review of Information Science and Technology (ARIST) , author =
The role of trust in information science and technology , volume =. Annual Review of Information Science and Technology (ARIST) , author =. 2003 , pages =
2003
-
[50]
Human Factors: The Journal of the Human Factors and Ergonomics Society , author =
Toward. Human Factors: The Journal of the Human Factors and Ergonomics Society , author =. 2023 , pages =. doi:10.1177/00187208211034716 , abstract =
-
[51]
Cognitive Research: Principles and Implications , author =
Transparent systems, opaque results: a study on automation compliance and task performance , volume =. Cognitive Research: Principles and Implications , author =. 2025 , pages =. doi:10.1186/s41235-025-00619-4 , abstract =
-
[52]
Sabouri, Sadra and Eibl, Philipp and Zhou, Xinyi and Ziyadi, Morteza and Medvidovic, Nenad and Lindemann, Lars and Chattopadhyay, Souti , month = apr, year =. Trust. 2025. doi:10.1109/ICSE55347.2025.00199 , abstract =
-
[53]
Trust and reliance on
Klingbeil, Artur and Grützner, Cassandra and Schreck, Philipp , year =. Trust and reliance on. Computers in Human Behavior , publisher =
-
[54]
Hoff, Kevin Anthony and Bashir, Masooda , month = may, year =. Trust in. Human Factors , publisher =. doi:10.1177/0018720814547570 , abstract =
-
[55]
Amoozadeh, Matin and Daniels, David and Nam, Daye and Kumar, Aayush and Chen, Stella and Hilton, Michael and Srinivasa Ragavan, Sruti and Alipour, Mohammad Amin , month = mar, year =. Trust in. Proceedings of the 55th. doi:10.1145/3626252.3630842 , language =
-
[56]
Trust in automation. Ergonomics , author =. 1996 , pages =. doi:10.1080/00140139608964474 , language =
-
[57]
Human factors , author =
Trust in automation:. Human factors , author =. 2004 , note =
2004
-
[58]
Ergonomics in Design , author =
Trust is not a virtue:. Ergonomics in Design , author =. 2022 , note =
2022
-
[59]
, year =
Lee, John D. , year =. Trust, self-confidence, and operators' adaptation to automation , urldate =
-
[60]
and Moray, Neville , year =
Lee, John D. and Moray, Neville , year =. Trust, self-confidence, and operators' adaptation to automation , volume =. International journal of human-computer studies , publisher =
-
[61]
and Fishbein, Martin
Ajzen, Icek. and Fishbein, Martin. , year =. Understanding attitudes and predicting social behavior : illustration of applied social research , isbn =
-
[62]
Proceedings of the Royal Society of London 58, 240–242
Pearson, Karl and Galton, Francis , month = jan, year =. Proceedings of the Royal Society of London , publisher =. doi:10.1098/rspl.1895.0041 , abstract =
-
[63]
AMCIS 2000 Proceedings , author =
What is. AMCIS 2000 Proceedings , author =. 2000 , file =
2000
-
[64]
When can categorical variables be treated as continuous?
Rhemtulla, Mijke and Brosseau-Liard, Patricia É and Savalei, Victoria , year =. When can categorical variables be treated as continuous?. Psychological methods , publisher =
-
[65]
Journal of statistical software , author =
lavaan:. Journal of statistical software , author =. 2012 , pages =
2012
-
[66]
It’s Weird That it Knows What I Want
“. ACM Transactions on Computer-Human Interaction , author =. 2024 , keywords =. doi:10.1145/3617367 , abstract =
-
[67]
Universal
Stolz, Suzanne StolzSuzanne , month = may, year =. Universal. The
-
[68]
A. ACM Trans. Comput. Educ. , author =. 2022 , pages =. doi:10.1145/3487052 , abstract =
-
[69]
The. Higher. 2014 , pages =. doi:10.1007/978-94-017-8005-6_5 , language =
-
[70]
Kallia, Maria and Cutts, Quintin , month = aug, year =. Re-. Proceedings of the 17th. doi:10.1145/3446871.3469763 , abstract =
-
[71]
and Bruno, Paul and Lewis, Colleen M
Isenegger, Kathleen and George, Kari L. and Bruno, Paul and Lewis, Colleen M. , month = mar, year =. Goal-. Proceedings of the 54th. doi:10.1145/3545945.3569834 , abstract =
-
[72]
Finnie-Ansley, James and Denny, Paul and Becker, Brett A. and Luxton-Reilly, Andrew and Prather, James , month = feb, year =. The. Proceedings of the 24th. doi:10.1145/3511861.3511863 , abstract =
-
[73]
and Steinberg, Mia and Brown, Elizabeth R
Diekman, Amanda B. and Steinberg, Mia and Brown, Elizabeth R. and Belanger, Aimee L. and Clark, Emily K. , month = may, year =. A. Personality and Social Psychology Review , publisher =. doi:10.1177/1088868316642141 , abstract =
-
[74]
Bourdieu, Pierre , year =. The. Cultural theory:
-
[75]
doi:10.48550/arXiv.2407.00305 arXiv:2407.00305 [cs]
Amoozadeh, Matin and Nam, Daye and Prol, Daniel and Alfageeh, Ali and Prather, James and Hilton, Michael and Ragavan, Sruti Srinivasa and Alipour, Mohammad Amin , month = oct, year =. Student-. doi:10.48550/arXiv.2407.00305 , abstract =
-
[76]
Keuning, Hieke and Alpizar-Chacon, Isaac and Lykourentzou, Ioanna and Beehler, Lauren and Köppe, Christian and De Jong, Imke and Sosnovsky, Sergey , month = nov, year =. Students'. Proceedings of the 24th. doi:10.1145/3699538.3699546 , abstract =
-
[77]
Sharmin, Sadia and Huang, Sicong and Soden, Robert , month = nov, year =. Impact of. Proceedings of the 23rd. doi:10.1145/3631802.3631820 , abstract =
-
[78]
Science capital as a lens for studying science aspirations – a systematic review , volume =
Kontkanen, Sini and Koskela, Teija and Kanerva, Oksana and Kärkkäinen, Sirpa and Waltzer, Katariina and Mikkilä-Erdmann, Mirjamaija and Havu-Nuutinen, Sari , month = jan, year =. Science capital as a lens for studying science aspirations – a systematic review , volume =. Studies in Science Education , publisher =. doi:10.1080/03057267.2024.2388931 , abstract =
-
[79]
Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche
Carneiro Oliveira, Eduardo and Keuning, Hieke and Jeuring, Johan , month = nov, year =. Investigating. Proceedings of the 24th. doi:10.1145/3699538.3699550 , abstract =
-
[80]
and Viceisza, Angelino CG , year =
Price, Gregory N. and Viceisza, Angelino CG , year =. What. Journal of Economic Perspectives , publisher =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.