pith. machine review for the scientific record. sign in

arxiv: 2604.01114 · v2 · submitted 2026-04-01 · 💻 cs.HC · cs.AI· cs.CY· cs.ET

Recognition: no theorem link

Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:52 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CYcs.ET
keywords AI trustappropriate relianceAI literacyneed for cognitioneducational AIhuman-AI interactionoverrelianceprogramming education
0
0 comments X

The pith

Higher trust in an AI assistant links to lower appropriate reliance on its suggestions in programming tasks, moderated by AI literacy and need for cognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how students' trust in an AI assistant shapes their appropriate reliance during Python programming problem-solving. With 432 undergraduates, participants received both accurate and misleading AI suggestions and responses were scored on whether students accepted correct ones and rejected incorrect ones. Results showed a non-linear pattern in which higher trust correlated with weaker discrimination between good and bad recommendations. This pattern was significantly moderated by students' AI literacy and need for cognition, with higher levels of these traits associated with better calibration of reliance. The work underscores the role of learner characteristics in shaping effective use of AI tools in education.

Core claim

In a study with 432 undergraduate participants completing Python output-prediction problems while receiving recommendations and explanations from an AI chatbot that included both accurate and intentionally misleading suggestions, higher trust in the assistant was associated with lower appropriate reliance. Appropriate reliance was measured behaviorally as the extent to which students accepted correct suggestions and rejected incorrect ones. The relationship was non-linear and significantly moderated by AI literacy and need for cognition, such that students scoring higher on these traits showed stronger discrimination between correct and incorrect AI output.

What carries the argument

Behavioral operationalization of appropriate reliance as acceptance of correct AI suggestions paired with rejection of incorrect ones, moderated by measured levels of AI literacy and need for cognition.

If this is right

  • Educational AI interfaces should include prompts that encourage evaluation of individual suggestions rather than relying on overall trust.
  • Building AI literacy can help students maintain appropriate reliance even when trust is high.
  • Students with higher need for cognition tend to discriminate better between accurate and misleading AI output.
  • System designs should target supports for reflective evaluation during problem-solving instead of attempting to lower trust uniformly.
  • The non-linear shape of the trust-reliance curve implies that interventions must account for varying levels of trust rather than treating it as a simple linear risk factor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trust-reliance pattern may appear in non-programming domains such as essay feedback or math tutoring, suggesting a broader need for literacy-focused supports.
  • Interfaces could test default settings that require brief evaluation steps before AI output is incorporated, independent of user trust scores.
  • Training programs that raise AI literacy might reduce overreliance even among users who already report high trust in the system.
  • Task difficulty and prior domain knowledge could interact with the observed moderators and should be controlled or varied in follow-up work.

Load-bearing premise

The behavioral measure of accepting correct suggestions and rejecting incorrect ones accurately reflects real-world critical evaluation of AI output without confounds from task difficulty, prior knowledge, or how the AI was presented.

What would settle it

An experiment that requires students to justify acceptance or rejection of each AI suggestion before finalizing their answer and then measures whether the trust-reliance relationship flattens or reverses compared with the original no-justification condition.

Figures

Figures reproduced from arXiv: 2604.01114 by Griffin Pitts, Neha Rani, Weedguet Mildort.

Figure 1
Figure 1. Figure 1: Human-AI reliance framework [29,36] incorrect AI recommendations, though these effects vary by task demands and context [2,3,13,36]. Related system-design research has explored ways to support more reflective use of AI, with cognitive forcing functions showing promise for reducing overreliance [2], while other approaches, such as partial explanations, have produced mixed results [13,39]. Despite these adva… view at source ↗
Figure 2
Figure 2. Figure 2: Experimental interface with AI chatbot recommendation 3.2 Measures Behavioral Measures Reliance behavior was measured at the problem level and then aggregated across trials for each participant. On each problem, the AI [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Post-task trust predicting appropriate reliance. indicating that students who reported higher trust tended to make fewer deci￾sions appropriately in this task. Given that the assistant intentionally included misleading recommendations, this negative association is consistent with the pos￾sibility that higher trust was linked to less scrutiny of recommendations before acceptance, resulting in less selective… view at source ↗
Figure 4
Figure 4. Figure 4: Interaction models showing predicted appropriate reliance across trust at three levels of each moderator. Shaded bands represent 95% confidence intervals. task that these characteristics altered how trust related to appropriate reliance. As discussed in Section 5, this pattern may reflect the specific features of the present task and survey measures. Our study examined selective reliance dur￾ing brief Pyth… view at source ↗
read the original abstract

As generative AI systems are integrated into educational settings, students often encounter AI-generated output while working through learning tasks, either by requesting help or through integrated tools. Trust in AI can influence how students interpret and use that output, including whether they evaluate it critically or exhibit overreliance. We investigate how students' trust relates to their appropriate reliance on an AI assistant during programming problem-solving tasks, and whether this relationship differs by learner characteristics. With 432 undergraduate participants, students' completed Python output-prediction problems while receiving recommendations and explanations from an AI chatbot, including accurate and intentionally misleading suggestions. We operationalize reliance behaviorally as the extent to which students' responses reflected appropriate use of the AI assistant's suggestions, accepting them when they were correct and rejecting them when they were incorrect. Pre- and post-task surveys assessed trust in the assistant, AI literacy, need for cognition, programming self-efficacy, and programming literacy. Results showed a non-linear relationship in which higher trust was associated with lower appropriate reliance, suggesting weaker discrimination between correct and incorrect recommendations. This relationship was significantly moderated by students' AI literacy and need for cognition. These findings highlight the need for future work on instructional and system supports that encourage more reflective evaluation of AI assistance during problem-solving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports an empirical study involving 432 undergraduate participants who completed Python output-prediction tasks while receiving suggestions from an AI chatbot that included both accurate and intentionally misleading recommendations. Trust in the AI was measured via pre- and post-task surveys, along with AI literacy, need for cognition, programming self-efficacy, and programming literacy. Appropriate reliance was operationalized behaviorally as accepting correct suggestions and rejecting incorrect ones. The central findings are a non-linear relationship in which higher trust was associated with lower appropriate reliance (weaker discrimination between correct and incorrect AI output) and significant moderation of this relationship by AI literacy and need for cognition.

Significance. If the reported non-linear trust-reliance link and its moderation by AI literacy and need for cognition prove robust, the work contributes empirical evidence on how learner characteristics shape critical evaluation of generative AI in educational problem-solving. This has direct implications for the design of AI-assisted learning tools that aim to reduce overreliance while supporting students with varying levels of AI familiarity and cognitive motivation.

major comments (2)
  1. [Methods and Results sections] Methods/Results: Programming self-efficacy and programming literacy were measured in the pre- and post-task surveys, yet the abstract and reported analyses give no indication that these variables were entered as covariates in the regressions testing the trust-reliance relationship or the AI-literacy/need-for-cognition moderation effects. Because the behavioral reliance score requires participants to independently judge suggestion correctness, any association between trust and reliance may be confounded by domain expertise; re-running the models with these controls is necessary to establish that the headline effects are attributable to trust rather than prior programming knowledge.
  2. [Results section] Results: The abstract states that a non-linear relationship was observed and that moderation was significant, but provides no details on the specific statistical models (e.g., polynomial terms, interaction terms, or model comparison), effect sizes, or whether the non-linearity was pre-registered or exploratory. Without these elements, it is difficult to assess the strength and replicability of the moderation findings.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by briefly stating the statistical approach used to detect non-linearity and the exact nature of the moderation (e.g., simple slopes or Johnson-Neyman regions).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us strengthen the reporting and robustness of our analyses. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Methods and Results sections] Methods/Results: Programming self-efficacy and programming literacy were measured in the pre- and post-task surveys, yet the abstract and reported analyses give no indication that these variables were entered as covariates in the regressions testing the trust-reliance relationship or the AI-literacy/need-for-cognition moderation effects. Because the behavioral reliance score requires participants to independently judge suggestion correctness, any association between trust and reliance may be confounded by domain expertise; re-running the models with these controls is necessary to establish that the headline effects are attributable to trust rather than prior programming knowledge.

    Authors: We agree that controlling for programming self-efficacy and programming literacy is important to rule out potential confounding by domain expertise. In the revised manuscript, we have re-run all regression models with these variables included as covariates. The non-linear relationship between trust and appropriate reliance, as well as the significant moderation by AI literacy and need for cognition, remain statistically significant after including these controls. We have updated the Results section to report the covariate-adjusted models and revised the abstract to note that the headline effects hold after controlling for programming knowledge. revision: yes

  2. Referee: [Results section] Results: The abstract states that a non-linear relationship was observed and that moderation was significant, but provides no details on the specific statistical models (e.g., polynomial terms, interaction terms, or model comparison), effect sizes, or whether the non-linearity was pre-registered or exploratory. Without these elements, it is difficult to assess the strength and replicability of the moderation findings.

    Authors: We acknowledge that the original reporting lacked sufficient detail on the statistical procedures. In the revised manuscript, we have expanded the Results section to specify the use of quadratic polynomial terms to model non-linearity, the inclusion of interaction terms for the moderation analyses, model comparison statistics (e.g., via likelihood ratio tests), and effect sizes (including changes in R-squared). We also explicitly state that the non-linear relationship was identified through exploratory analysis and was not pre-registered. The abstract has been updated to reference these modeling details and effect size information. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical measurement and statistical testing

full rationale

The paper reports an empirical study with 432 participants completing Python tasks while receiving AI suggestions. Appropriate reliance is measured directly as the behavioral count of accepting correct suggestions and rejecting incorrect ones; trust, AI literacy, need for cognition, and programming self-efficacy are assessed via pre/post surveys. Statistical tests then examine the non-linear trust-reliance relationship and its moderation by literacy and cognition. No equations, fitted parameters, derivations, or self-citational uniqueness claims appear; the reported associations are data-driven rather than constructed by definition or prior self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions from experimental psychology and statistics with no new free parameters, axioms beyond domain norms, or invented entities.

axioms (1)
  • standard math Standard assumptions for linear regression and moderation analysis (linearity, independence, homoscedasticity) hold for the reported effects.
    Implicit in the claim of significant moderation effects.

pith-pipeline@v0.9.0 · 5535 in / 1199 out tokens · 43553 ms · 2026-05-13T21:52:37.137691+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Engineering Students' Usage and Perceptions of GitHub Copilot in Open-Source Projects

    cs.SE 2026-04 unverdicted novelty 5.0

    Students primarily used Copilot chat and code generation features during open-source contributions, with usage patterns varying significantly by gender, programming skill, and AI experience.

  2. Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education

    cs.AI 2026-05 unverdicted novelty 4.0

    KITE is a RAG-based tutoring system delivering intent-aware Socratic feedback from course content that improves accuracy of simulated student responses on algorithm tracing and procedural questions.

  3. Perceived Importance of Cognitive Skills Among Computing Students in the Era of AI

    cs.CY 2026-04 unverdicted novelty 4.0

    Computing students anticipate that all 11 cognitive skills will decline in importance with greater future AI integration.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 3 Pith papers

  1. [1]

    Humanities and Social Sciences Commu- nications10(1), 1–14 (2023)

    Ahmad, S.F., Han, H., Alam, M.M., Rehmat, M., Irshad, M., Arraño-Muñoz, M., Ariza-Montes, A., et al.: Impact of artificial intelligence on human loss in decision making, laziness and safety in education. Humanities and Social Sciences Commu- nications10(1), 1–14 (2023)

  2. [2]

    Proceedings of the ACM on Human-computer Interaction5(CSCW1), 1–21 (2021)

    Buçinca, Z., Malaya, M.B., Gajos, K.Z.: To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making. Proceedings of the ACM on Human-computer Interaction5(CSCW1), 1–21 (2021)

  3. [3]

    Journal of personality and social psychology42(1), 116 (1982)

    Cacioppo, J.T., Petty, R.E.: The need for cognition. Journal of personality and social psychology42(1), 116 (1982)

  4. [4]

    Computers in Human Behavior: Artificial Humans1(2), 100014 (2023)

    Carolus, A., Koch, M.J., Straka, S., Latoschik, M.E., Wienrich, C.: Mails- meta ai literacy scale: Development and testing of an ai literacy questionnaire based on well-founded competency models and psychological change-and meta- competencies. Computers in Human Behavior: Artificial Humans1(2), 100014 (2023)

  5. [5]

    TechTrends 68(4), 832–844 (2024)

    Choi, G.W., Kim, S.H., Lee, D., Moon, J.: Utilizing generative ai for instructional design: Exploring strengths, weaknesses, opportunities, and threats. TechTrends 68(4), 832–844 (2024)

  6. [6]

    Journal of Namibian Studies33(2023), 4367–4378 (2023)

    Duhaylungsod, A.V., Chavez, J.V.: Chatgpt and other ai users: Innovative and creative utilitarian value and mindset shift. Journal of Namibian Studies33(2023), 4367–4378 (2023)

  7. [7]

    In: Proceedings of the 2021 CHI conference on human factors in computing systems

    Ehsan, U., Liao, Q.V., Muller, M., Riedl, M.O., Weisz, J.D.: Expanding explain- ability: Towards social transparency in ai systems. In: Proceedings of the 2021 CHI conference on human factors in computing systems. pp. 1–19 (2021)

  8. [8]

    NPJ digital medicine4(1), 31 (2021)

    Gaube, S., Suresh, H., Raue, M., Merritt, A., Berkowitz, S.J., Lermer, E., Cough- lin, J.F., Guttag, J.V., Colak, E., Ghassemi, M.: Do as ai say: susceptibility in deployment of clinical decision-aids. NPJ digital medicine4(1), 31 (2021)

  9. [9]

    Academy of management annals14(2), 627–660 (2020)

    Glikson, E., Woolley, A.W.: Human trust in artificial intelligence: Review of em- pirical research. Academy of management annals14(2), 627–660 (2020)

  10. [10]

    Journal of the American Medical Informatics Association19(1), 121–127 (2012)

    Goddard, K., Roudsari, A., Wyatt, J.C.: Automation bias: a systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association19(1), 121–127 (2012)

  11. [11]

    Assessment27(8), 1870–1885 (2020)

    LinsdeHolandaCoelho,G.,HPHanel,P.,J.Wolf,L.:Theveryefficientassessment of need for cognition: Developing a six-item version. Assessment27(8), 1870–1885 (2020)

  12. [12]

    In: International Conference on Artificial Intelligence in Education

    Jin, L., Lin, B., Hong, M., So, H.J., Zhang, K.: Learning by teaching: Enhancing music learning through llm-based teachable agents. In: International Conference on Artificial Intelligence in Education. pp. 148–155. Springer (2025)

  13. [13]

    Proceedings of the ACM on Human-Computer Interaction9(2), 1–30 (2025)

    de Jong, S., Paananen, V., Tag, B., van Berkel, N.: Cognitive forcing for better decision-making: Reducing overreliance on ai systems through partial explanations. Proceedings of the ACM on Human-Computer Interaction9(2), 1–30 (2025)

  14. [14]

    macmillan (2011)

    Kahneman, D.: Thinking, fast and slow. macmillan (2011)

  15. [15]

    In: Proceedings of the 2016 CHI conference on human factors in computing systems

    Kizilcec, R.F.: How much information? effects of transparency on trust in an algo- rithmic interface. In: Proceedings of the 2016 CHI conference on human factors in computing systems. pp. 2390–2395 (2016)

  16. [16]

    In: Proceedings of the International Conference on Mobile and Ubiquitous Multimedia

    Krupp,L.,Steinert,S.,Kiefer-Emmanouilidis,M.,Avila,K.E.,Lukowicz,P.,Kuhn, J., Küchemann, S., Karolus, J.: Challenges and opportunities of moderating us- age of large language models in education. In: Proceedings of the International Conference on Mobile and Ubiquitous Multimedia. pp. 249–254 (2024) 14 G. Pitts et al

  17. [17]

    Hu- man factors46(1), 50–80 (2004)

    Lee, J.D., See, K.A.: Trust in automation: Designing for appropriate reliance. Hu- man factors46(1), 50–80 (2004)

  18. [18]

    Journal of the American College of Radiology20(11), 1126–1130 (2023)

    Li, M.D., Little, B.P.: Appropriate reliance on artificial intelligence in radiology education. Journal of the American College of Radiology20(11), 1126–1130 (2023)

  19. [19]

    In: 2024 5th International Con- ference on Artificial Intelligence and Electromechanical Automation (AIEA)

    Lin, X.: Factors influencing college students’ use of ai chatbots for learning– empirical study based on tam extended model. In: 2024 5th International Con- ference on Artificial Intelligence and Electromechanical Automation (AIEA). pp. 151–159. IEEE (2024)

  20. [20]

    Ma, B., Guo, L., Yang, T., Ding, J.: How generative ai impact student emotion and engagement in programming tasks? In: International Conference on Artificial Intelligence in Education. pp. 236–243. Springer (2025)

  21. [21]

    Computers and Education: Artificial Intelligence p

    Nazaretsky, T., Mejia-Domenzain, P., Swamy, V., Frej, J., Käser, T.: The critical role of trust in adopting ai-powered educational technology for learning: An in- strument for measuring student perceptions. Computers and Education: Artificial Intelligence p. 100368 (2025)

  22. [22]

    Human factors39(2), 230–253 (1997)

    Parasuraman, R., Riley, V.: Humans and automation: Use, misuse, disuse, abuse. Human factors39(2), 230–253 (1997)

  23. [23]

    In: International Conference on Artificial Intelligence in Education

    Phung, T., Choi, H., Wu, M., Singla, A., Brooks, C.: Plan more, debug less: Apply- ing metacognitive theory to ai-assisted programming education. In: International Conference on Artificial Intelligence in Education. pp. 3–17. Springer (2025)

  24. [24]

    In: Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+ NLP)

    Pitts,G.,Hridi,A.P.,Narayanan,A.B.L.:Asurveyofllm-basedapplicationsinpro- gramming education: Balancing automation and human oversight. In: Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+ NLP). pp. 255–262 (2025)

  25. [25]

    In: Proceedings of the Eleventh ACM Conference on Learning@ Scale

    Pitts, G., Marcus, V., Motamedi, S.: A proposed model of learners’ acceptance and trust of pedagogical conversational ai. In: Proceedings of the Eleventh ACM Conference on Learning@ Scale. pp. 427–432 (2024)

  26. [26]

    In: 2025 ASEE Annual Conference & Exposition (2025)

    Pitts, G., Marcus, V.M., Motamedi, S.: Student perspectives on the benefits and risks of ai in education. In: 2025 ASEE Annual Conference & Exposition (2025)

  27. [27]

    Telematics and Informatics Reports p

    Pitts, G., Motamedi, S.: Understanding human-ai trust in education. Telematics and Informatics Reports p. 100270 (2025)

  28. [28]

    arXiv preprint arXiv:2602.20547 , year=

    Pitts, G., Motamedi, S.: What drives students’ use of ai chatbots? technology acceptance in conversational ai. arXiv preprint arXiv:2602.20547 (2026)

  29. [29]

    In: International Conference on Human- Computer Interaction

    Pitts, G., Rani, N., Mildort, W., Cook, E.M.: Students’ reliance on ai in higher ed- ucation: identifying contributing factors. In: International Conference on Human- Computer Interaction. pp. 86–97. Springer (2025)

  30. [30]

    Journal of Second Language Writing52, 100816 (2021)

    Ranalli, J.: L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing52, 100816 (2021)

  31. [31]

    University of Florida (2023)

    Rani, N.: Investigating User Trust in Context-Aware Recommender Systems in Search as Learning. University of Florida (2023)

  32. [32]

    In: CHI Conference on Human Factors in Computing Systems Extended Abstracts

    Rani, N., Chu, S.L., Mei, V.R.: Investigating the effects of different levels of user control on the effectiveness of context-aware recommender systems for web-based search. In: CHI Conference on Human Factors in Computing Systems Extended Abstracts. pp. 1–6 (2022)

  33. [33]

    In: Interna- tional Conference on Human-Computer Interaction

    Rani, N., Kantamani, S., Chu, S.L.: Exploring the impact of user feedback for trust in context-aware recommender systems in search-as-learning. In: Interna- tional Conference on Human-Computer Interaction. pp. 83–99. Springer (2025)

  34. [34]

    Journal of Content, Community and Communication17(9), 162–174 (2023)

    Sabharwal, D., Kabha, R., Srivastava, K.: Artificial intelligence (ai)-powered vir- tual assistants and their effect on human productivity and laziness: Study on stu- Students’ Reliance on AI in Higher Education 15 dents of delhi-ncr (india) & fujairah (uae). Journal of Content, Community and Communication17(9), 162–174 (2023)

  35. [35]

    Journal of Educational Computing Research56(8), 1345–1360 (2019)

    Tsai, M.J., Wang, C.Y., Hsu, P.F.: Developing the computer programming self- efficacy scale for computer literacy education. Journal of Educational Computing Research56(8), 1345–1360 (2019)

  36. [36]

    Proceedings of the ACM on Human-Computer Interaction 7(CSCW1), 1–38 (2023)

    Vasconcelos, H., Jörke, M., Grunde-McLaughlin, M., Gerstenberg, T., Bernstein, M.S., Krishna, R.: Explanations can reduce overreliance on ai systems dur- ing decision-making. Proceedings of the ACM on Human-Computer Interaction 7(CSCW1), 1–38 (2023)

  37. [37]

    Philosophy & Technology34(4), 1607–1622 (2021)

    Von Eschenbach, W.J.: Transparency and the black box problem: Why we do not trust ai. Philosophy & Technology34(4), 1607–1622 (2021)

  38. [38]

    Smart Learning Environments 11(1), 28 (2024)

    Zhai, C., Wibowo, S., Li, L.D.: The effects of over-reliance on ai dialogue systems on students’ cognitive abilities: a systematic review. Smart Learning Environments 11(1), 28 (2024)

  39. [39]

    In: Proceedings of the 2020 conference on fairness, accountability, and transparency

    Zhang, Y., Liao, Q.V., Bellamy, R.K.: Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. pp. 295–305 (2020)