Same Feedback, Different Source: How AI vs. Human Feedback Attribution and Credibility Shape Learner Behavior in Computing Education
Pith reviewed 2026-05-13 18:25 UTC · model grok-4.3
The pith
Believed human feedback increases time spent on coding tasks, but only if learners find the attribution credible; disbelief leads to worse outcomes than transparent AI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a three-condition study, participants who accepted the human attribution spent more time on task than those receiving the same-timed AI-attributed feedback, while participants who disbelieved the human attribution produced less complex code and spent less time than those receiving transparent AI feedback; a separate timing delay increased code complexity without changing time measures.
What carries the argument
The three-condition design that holds feedback content constant while varying only source label and delivery timing, allowing attribution belief to be isolated from speed of response.
Load-bearing premise
Participants' self-reported belief in the human attribution accurately reflects their actual perception of the source and is not shaped by unmeasured prior expectations about AI quality.
What would settle it
A follow-up experiment that independently manipulates credibility (for example by adding verification steps for the human source) and measures whether time-on-task differences disappear when belief rates are equalized across conditions.
Figures
read the original abstract
As AI systems increasingly take on instructional roles - providing feedback, guiding practice, evaluating work - a fundamental question emerges: does it matter to learners who they believe is on the other side? We investigated this using a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received feedback generated by the same large language model, attributed to either an AI system (with instant or delayed delivery) or a human teaching assistant (with matched delayed delivery). This three-condition design separates the effect of source attribution from the confound of delivery timing, which prior studies have not controlled. Source attribution and timing had distinct effects on different outcomes: participants who believed the human attribution spent more time on task than those receiving equivalently timed AI-attributed feedback (d=0.61, p=.013, uncorrected), while the delivery delay independently increased output complexity without affecting time measures. An exploratory analysis revealed that 46% of participants in the human-attributed condition did not believe the attribution, and these participants showed worse outcomes than those receiving transparent AI feedback (code complexity d=0.77, p=.003; time on task d=0.70, p=.007). These findings suggest that believed human presence may carry motivational value, but that this value depends on credibility. For computing educators, transparent AI attribution may be the lower-risk default in contexts where human attribution would not be credible.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received identical LLM-generated feedback attributed to either an AI system (instant or delayed delivery) or a human teaching assistant (delayed delivery). The design isolates source attribution from timing confounds. Results indicate that participants who believed the human attribution spent more time on task than those receiving delayed AI-attributed feedback (d=0.61, p=.013), while delivery delay independently increased output complexity. An exploratory split revealed 46% disbelief in the human attribution, with disbelievers showing worse code complexity (d=0.77) and time on task (d=0.70) than transparent AI recipients. The authors conclude that believed human presence carries motivational value only when credible, favoring transparent AI attribution as a lower-risk default.
Significance. If the central findings hold after addressing statistical and measurement concerns, the work offers timely empirical guidance for AI integration in computing education. The controlled three-condition design that separates attribution from timing is a clear methodological contribution. The practical implication—that transparent AI attribution may be preferable when human attribution risks low credibility—directly informs instructional practice and has potential to shape how educators deploy generative AI feedback tools.
major comments (3)
- [Results] Results section, primary time-on-task comparison: the reported d=0.61 (p=.013) between believers and the AI-delayed condition is presented without correction for multiple comparisons. The manuscript also reports additional exploratory tests (e.g., d=0.77, p=.003 and d=0.70, p=.007), so the uncorrected p-value for the key motivational claim requires either adjustment or explicit justification.
- [Results] Results section, exploratory disbelief analysis: the 46% disbelief rate and subsequent subgroup comparisons rest on a post-task self-report of belief with no pre-experimental covariates for AI skepticism, general motivation, or prior expectations. Because this belief variable is measured after randomization and task completion, the observed differences (d=0.61 time on task) could reflect selection effects from unmeasured traits rather than a causal effect of attribution, weakening the claim that 'believed human presence may carry motivational value.'
- [Methods] Methods section, belief measurement: the exact wording of the belief question, response scale, and timing of administration are not described in sufficient detail. Without this information it is difficult to evaluate demand characteristics or the possibility that participants rationalized their effort after seeing the feedback.
minor comments (2)
- [Abstract] Abstract and Results: consistently flag that the primary p-value is uncorrected and that the disbelief analysis is exploratory.
- [Discussion] Discussion: expand the practical recommendations with concrete scenarios (e.g., class size, student AI familiarity) where human attribution is likely to lack credibility.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Results] Results section, primary time-on-task comparison: the reported d=0.61 (p=.013) between believers and the AI-delayed condition is presented without correction for multiple comparisons. The manuscript also reports additional exploratory tests (e.g., d=0.77, p=.003 and d=0.70, p=.007), so the uncorrected p-value for the key motivational claim requires either adjustment or explicit justification.
Authors: The primary comparison (believers vs. delayed AI) was our a priori hypothesis-driven analysis, while the other tests were labeled as exploratory. We will revise the Results section to explicitly justify the lack of correction for the primary test by noting its pre-specification and provide Bonferroni-adjusted p-values for all reported tests as a sensitivity analysis. This approach balances statistical rigor with the interpretive value of the planned comparison. revision: partial
-
Referee: [Results] Results section, exploratory disbelief analysis: the 46% disbelief rate and subsequent subgroup comparisons rest on a post-task self-report of belief with no pre-experimental covariates for AI skepticism, general motivation, or prior expectations. Because this belief variable is measured after randomization and task completion, the observed differences (d=0.61 time on task) could reflect selection effects from unmeasured traits rather than a causal effect of attribution, weakening the claim that 'believed human presence may carry motivational value.'
Authors: We concur that this analysis is exploratory and the post-hoc measurement of belief precludes strong causal claims. The differences may indeed be influenced by unmeasured traits such as baseline motivation. In the revised manuscript, we will clarify in both the Results and Discussion that this is an exploratory finding, explicitly discuss the potential for selection effects, and revise the concluding language to emphasize association rather than causation. We will also propose pre-registered follow-up studies with baseline measures. revision: yes
-
Referee: [Methods] Methods section, belief measurement: the exact wording of the belief question, response scale, and timing of administration are not described in sufficient detail. Without this information it is difficult to evaluate demand characteristics or the possibility that participants rationalized their effort after seeing the feedback.
Authors: We appreciate this request for detail. The belief question was: 'To what extent did you believe the feedback was provided by a human teaching assistant?' answered on a 5-point scale from 'Not at all' to 'Completely,' administered immediately after participants reviewed the feedback but before the post-task survey. We will insert this exact wording, scale, and timing into the Methods section of the revised version to facilitate evaluation of demand characteristics. revision: yes
Circularity Check
No circularity: purely empirical experiment with direct behavioral measures
full rationale
The paper reports results from a three-condition randomized experiment (N=148) measuring time on task, code complexity, and self-reported belief via direct observation and post-task survey. No equations, fitted parameters, or derivations appear; outcomes are compared statistically without reducing any result to prior quantities by construction. Self-citations are absent from the provided text and not invoked to justify uniqueness or ansatzes. The central claims rest on observed differences (e.g., d=0.61 for believers vs. AI) rather than any definitional or predictive loop.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions for independent-samples t-tests and Cohen's d effect sizes
Reference graph
Works this paper leans on
-
[1]
Rogers Kaliisa et al. “How does artificial intelligence compare to human feedback? A meta-analysis of perfor- mance, feedback perception, and learning dispositions”. en. In:Educ. Psychol. (Lond.)(Sept. 2025), pp. 1–32
work page 2025
-
[2]
Re- sponse to assessment feedback: The effects of grades, praise, and source of information
Anastasiya A Lipnevich and Jeffrey K Smith. “Re- sponse to assessment feedback: The effects of grades, praise, and source of information”. en. In:ETS Res. Rep. Ser.2008.1 (June 2008), pp. i–57
work page 2008
-
[3]
Comparing the value of perceived human versus AI-generated empathy
Matan Rubin et al. “Comparing the value of perceived human versus AI-generated empathy”. en. In:Nat. Hum. Behav.9.11 (Nov. 2025), pp. 2345–2359
work page 2025
-
[4]
Processing Foundation. “p5.js”. In: (2014)
work page 2014
-
[5]
Human Feedback Source on Learner Behavior and Perception
Caitlin Morris and Pattie Maes.Pre-Registration for: Effects of Perceived AI vs. Human Feedback Source on Learner Behavior and Perception. Mar. 2026
work page 2026
-
[6]
Caitlin Morris and Pattie Maes.Pre-Registration for: Same Feedback, Different Source: How Framing Shapes Perception of Technical vs. Creative Guidance. Jan. 2026
work page 2026
-
[7]
The social psychology of telecommunications
Edwin B Parker et al. “The social psychology of telecommunications”. In:Contemp. Sociol.7.1 (Jan. 1978), p. 32
work page 1978
-
[8]
Jennifer C Richardson and Karen Swan. “Examining social presence in online courses in relation to students’ perceived learning and satisfaction”. In:Online Learn. 7.1 (Mar. 2019)
work page 2019
-
[9]
Jennifer C Richardson et al. “Social presence in relation to students’ satisfaction and learning in the online environment: A meta-analysis”. en. In:Comput. Human Behav.71 (June 2017), pp. 402–417
work page 2017
-
[10]
R M Ryan and E L Deci. “Self-determination theory and the facilitation of intrinsic motivation, social de- velopment, and well-being”. en. In:Am. Psychol.55.1 (Jan. 2000), pp. 68–78
work page 2000
-
[11]
Christopher P Niemiec and Richard M Ryan. “Auton- omy, competence, and relatedness in the classroom: Applying self-determination theory to educational prac- tice”. en. In:Theory Res. Educ.7.2 (July 2009), pp. 133–144
work page 2009
-
[12]
Similarities and differences between human–human and hu- man–automation trust: an integrative review
P Madhavan and D A Wiegmann. “Similarities and differences between human–human and hu- man–automation trust: an integrative review”. en. In: Theor. Issues Ergon.8.4 (July 2007), pp. 277–301
work page 2007
-
[13]
Iris Bohnet and Richard Zeckhauser. “Trust, risk and betrayal”. en. In:J. Econ. Behav. Organ.55.4 (Dec. 2004), pp. 467–484
work page 2004
-
[14]
Individual dif- ferences in psychological reactance
Sharon S Brehm and Jack W Brehm. “Individual dif- ferences in psychological reactance”. In:Psychological Reactance. Elsevier, 1981, pp. 213–228
work page 1981
-
[15]
Exploring the potential of large language models to generate formative programming feedback
Natalie Kiesler, Dominic Lohr, and Hieke Keuning. “Exploring the potential of large language models to generate formative programming feedback”. In:arXiv [cs.AI](Aug. 2023)
work page 2023
-
[16]
Generating high-precision feedback for programming syntax errors using large language models
Tung Phung et al. “Generating high-precision feedback for programming syntax errors using large language models”. In:arXiv [cs.PL](Jan. 2023)
work page 2023
-
[17]
ChatGPT for good? On op- portunities and challenges of large language models for education
Enkelejda Kasneci et al. “ChatGPT for good? On op- portunities and challenges of large language models for education”. en. In:Learn. Individ. Differ.103.102274 (Apr. 2023), p. 102274
work page 2023
-
[18]
AI or human? Evaluating student feedback perceptions in higher education
Tanya Nazaretsky et al. “AI or human? Evaluating student feedback perceptions in higher education”. en. In:Lecture Notes in Computer Science. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2024, pp. 284–298
work page 2024
-
[19]
What is AI Literacy? Competencies and Design Considerations
Duri Long and Brian Magerko. “What is AI Literacy? Competencies and Design Considerations”. In:Proceed- ings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY , USA: ACM, Apr. 2020, pp. 1–16
work page 2020
-
[20]
All that’s ‘human’ is not gold: Evaluating human evaluation of generated text
Elizabeth Clark et al. “All that’s ‘human’ is not gold: Evaluating human evaluation of generated text”. In: Proc. 59th Annu. Meeting Assoc. Comput. Linguist.Ed. by Chengqing Zong et al. Stroudsburg, PA, USA: Asso- ciation for Computational Linguistics, 2021, pp. 7282– 7296
work page 2021
-
[21]
John Hattie and Helen Timperley. “The power of feed- back”. en. In:Rev. Educ. Res.77.1 (Mar. 2007), pp. 81– 112
work page 2007
-
[22]
Making sense of assessment feedback in higher education
Carol Evans. “Making sense of assessment feedback in higher education”. en. In:Rev. Educ. Res.83.1 (Mar. 2013), pp. 70–120
work page 2013
-
[23]
School engagement: Potential of the concept, state of the evidence
Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. “School engagement: Potential of the concept, state of the evidence”. en. In:Rev. Educ. Res.74.1 (Mar. 2004), pp. 59–109
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.