pith. the verified trust layer for science. sign in

arxiv: 2604.03075 · v1 · submitted 2026-04-03 · 💻 cs.HC

Same Feedback, Different Source: How AI vs. Human Feedback Attribution and Credibility Shape Learner Behavior in Computing Education

Pith reviewed 2026-05-13 18:25 UTC · model grok-4.3

classification 💻 cs.HC
keywords AI feedbackhuman attributionlearner behaviorcomputing educationfeedback credibilitysource perceptiontutorial interaction
0
0 comments X p. Extension

The pith

Believed human feedback increases time spent on coding tasks, but only if learners find the attribution credible; disbelief leads to worse outcomes than transparent AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports a controlled experiment in which the same AI-generated feedback on a creative coding task was attributed either to an AI system or to a human teaching assistant. Belief that the feedback came from a human correlated with more time spent on the task, yet nearly half the participants in the human-attributed condition rejected the attribution and then performed worse than those who received clearly labeled AI feedback. Delivery delay, independent of attribution, raised the complexity of the code produced. These patterns indicate that any motivational benefit of human attribution hinges on whether learners accept the source as real.

Core claim

In a three-condition study, participants who accepted the human attribution spent more time on task than those receiving the same-timed AI-attributed feedback, while participants who disbelieved the human attribution produced less complex code and spent less time than those receiving transparent AI feedback; a separate timing delay increased code complexity without changing time measures.

What carries the argument

The three-condition design that holds feedback content constant while varying only source label and delivery timing, allowing attribution belief to be isolated from speed of response.

Load-bearing premise

Participants' self-reported belief in the human attribution accurately reflects their actual perception of the source and is not shaped by unmeasured prior expectations about AI quality.

What would settle it

A follow-up experiment that independently manipulates credibility (for example by adding verification steps for the human source) and measures whether time-on-task differences disappear when belief rates are equalized across conditions.

Figures

Figures reproduced from arXiv: 2604.03075 by Caitlin Morris, Pattie Maes.

Figure 1
Figure 1. Figure 1: Experiment design experience, interest in learning creative coding, and baseline attitudes toward AI tools and human teaching assistants. These attitude measures were collected at screening — before condition assignment — to prevent contamination from the experimental manipulation. Participants who met eligibility criteria (interest in learning programming, English fluency, desktop/laptop access, non-advan… view at source ↗
Figure 2
Figure 2. Figure 2: Behavioral measures by group. (L) Time on task in minutes per module. (R) Code complexity score, experience-adjusted. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Self-reported feedback behavior by condition. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Selected motivation survey items by condition. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

As AI systems increasingly take on instructional roles - providing feedback, guiding practice, evaluating work - a fundamental question emerges: does it matter to learners who they believe is on the other side? We investigated this using a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received feedback generated by the same large language model, attributed to either an AI system (with instant or delayed delivery) or a human teaching assistant (with matched delayed delivery). This three-condition design separates the effect of source attribution from the confound of delivery timing, which prior studies have not controlled. Source attribution and timing had distinct effects on different outcomes: participants who believed the human attribution spent more time on task than those receiving equivalently timed AI-attributed feedback (d=0.61, p=.013, uncorrected), while the delivery delay independently increased output complexity without affecting time measures. An exploratory analysis revealed that 46% of participants in the human-attributed condition did not believe the attribution, and these participants showed worse outcomes than those receiving transparent AI feedback (code complexity d=0.77, p=.003; time on task d=0.70, p=.007). These findings suggest that believed human presence may carry motivational value, but that this value depends on credibility. For computing educators, transparent AI attribution may be the lower-risk default in contexts where human attribution would not be credible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received identical LLM-generated feedback attributed to either an AI system (instant or delayed delivery) or a human teaching assistant (delayed delivery). The design isolates source attribution from timing confounds. Results indicate that participants who believed the human attribution spent more time on task than those receiving delayed AI-attributed feedback (d=0.61, p=.013), while delivery delay independently increased output complexity. An exploratory split revealed 46% disbelief in the human attribution, with disbelievers showing worse code complexity (d=0.77) and time on task (d=0.70) than transparent AI recipients. The authors conclude that believed human presence carries motivational value only when credible, favoring transparent AI attribution as a lower-risk default.

Significance. If the central findings hold after addressing statistical and measurement concerns, the work offers timely empirical guidance for AI integration in computing education. The controlled three-condition design that separates attribution from timing is a clear methodological contribution. The practical implication—that transparent AI attribution may be preferable when human attribution risks low credibility—directly informs instructional practice and has potential to shape how educators deploy generative AI feedback tools.

major comments (3)
  1. [Results] Results section, primary time-on-task comparison: the reported d=0.61 (p=.013) between believers and the AI-delayed condition is presented without correction for multiple comparisons. The manuscript also reports additional exploratory tests (e.g., d=0.77, p=.003 and d=0.70, p=.007), so the uncorrected p-value for the key motivational claim requires either adjustment or explicit justification.
  2. [Results] Results section, exploratory disbelief analysis: the 46% disbelief rate and subsequent subgroup comparisons rest on a post-task self-report of belief with no pre-experimental covariates for AI skepticism, general motivation, or prior expectations. Because this belief variable is measured after randomization and task completion, the observed differences (d=0.61 time on task) could reflect selection effects from unmeasured traits rather than a causal effect of attribution, weakening the claim that 'believed human presence may carry motivational value.'
  3. [Methods] Methods section, belief measurement: the exact wording of the belief question, response scale, and timing of administration are not described in sufficient detail. Without this information it is difficult to evaluate demand characteristics or the possibility that participants rationalized their effort after seeing the feedback.
minor comments (2)
  1. [Abstract] Abstract and Results: consistently flag that the primary p-value is uncorrected and that the disbelief analysis is exploratory.
  2. [Discussion] Discussion: expand the practical recommendations with concrete scenarios (e.g., class size, student AI familiarity) where human attribution is likely to lack credibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Results] Results section, primary time-on-task comparison: the reported d=0.61 (p=.013) between believers and the AI-delayed condition is presented without correction for multiple comparisons. The manuscript also reports additional exploratory tests (e.g., d=0.77, p=.003 and d=0.70, p=.007), so the uncorrected p-value for the key motivational claim requires either adjustment or explicit justification.

    Authors: The primary comparison (believers vs. delayed AI) was our a priori hypothesis-driven analysis, while the other tests were labeled as exploratory. We will revise the Results section to explicitly justify the lack of correction for the primary test by noting its pre-specification and provide Bonferroni-adjusted p-values for all reported tests as a sensitivity analysis. This approach balances statistical rigor with the interpretive value of the planned comparison. revision: partial

  2. Referee: [Results] Results section, exploratory disbelief analysis: the 46% disbelief rate and subsequent subgroup comparisons rest on a post-task self-report of belief with no pre-experimental covariates for AI skepticism, general motivation, or prior expectations. Because this belief variable is measured after randomization and task completion, the observed differences (d=0.61 time on task) could reflect selection effects from unmeasured traits rather than a causal effect of attribution, weakening the claim that 'believed human presence may carry motivational value.'

    Authors: We concur that this analysis is exploratory and the post-hoc measurement of belief precludes strong causal claims. The differences may indeed be influenced by unmeasured traits such as baseline motivation. In the revised manuscript, we will clarify in both the Results and Discussion that this is an exploratory finding, explicitly discuss the potential for selection effects, and revise the concluding language to emphasize association rather than causation. We will also propose pre-registered follow-up studies with baseline measures. revision: yes

  3. Referee: [Methods] Methods section, belief measurement: the exact wording of the belief question, response scale, and timing of administration are not described in sufficient detail. Without this information it is difficult to evaluate demand characteristics or the possibility that participants rationalized their effort after seeing the feedback.

    Authors: We appreciate this request for detail. The belief question was: 'To what extent did you believe the feedback was provided by a human teaching assistant?' answered on a 5-point scale from 'Not at all' to 'Completely,' administered immediately after participants reviewed the feedback but before the post-task survey. We will insert this exact wording, scale, and timing into the Methods section of the revised version to facilitate evaluation of demand characteristics. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical experiment with direct behavioral measures

full rationale

The paper reports results from a three-condition randomized experiment (N=148) measuring time on task, code complexity, and self-reported belief via direct observation and post-task survey. No equations, fitted parameters, or derivations appear; outcomes are compared statistically without reducing any result to prior quantities by construction. Self-citations are absent from the provided text and not invoked to justify uniqueness or ansatzes. The central claims rest on observed differences (e.g., d=0.61 for believers vs. AI) rather than any definitional or predictive loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the experimental manipulation of attribution and on participants' self-reported belief in the source.

axioms (1)
  • standard math Standard assumptions for independent-samples t-tests and Cohen's d effect sizes
    Used to compute and report the d=0.61 and p=.013 values.

pith-pipeline@v0.9.0 · 5556 in / 1297 out tokens · 76013 ms · 2026-05-13T18:25:40.812964+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    How does artificial intelligence compare to human feedback? A meta-analysis of perfor- mance, feedback perception, and learning dispositions

    Rogers Kaliisa et al. “How does artificial intelligence compare to human feedback? A meta-analysis of perfor- mance, feedback perception, and learning dispositions”. en. In:Educ. Psychol. (Lond.)(Sept. 2025), pp. 1–32

  2. [2]

    Re- sponse to assessment feedback: The effects of grades, praise, and source of information

    Anastasiya A Lipnevich and Jeffrey K Smith. “Re- sponse to assessment feedback: The effects of grades, praise, and source of information”. en. In:ETS Res. Rep. Ser.2008.1 (June 2008), pp. i–57

  3. [3]

    Comparing the value of perceived human versus AI-generated empathy

    Matan Rubin et al. “Comparing the value of perceived human versus AI-generated empathy”. en. In:Nat. Hum. Behav.9.11 (Nov. 2025), pp. 2345–2359

  4. [4]

    Processing Foundation. “p5.js”. In: (2014)

  5. [5]

    Human Feedback Source on Learner Behavior and Perception

    Caitlin Morris and Pattie Maes.Pre-Registration for: Effects of Perceived AI vs. Human Feedback Source on Learner Behavior and Perception. Mar. 2026

  6. [6]

    Creative Guidance

    Caitlin Morris and Pattie Maes.Pre-Registration for: Same Feedback, Different Source: How Framing Shapes Perception of Technical vs. Creative Guidance. Jan. 2026

  7. [7]

    The social psychology of telecommunications

    Edwin B Parker et al. “The social psychology of telecommunications”. In:Contemp. Sociol.7.1 (Jan. 1978), p. 32

  8. [8]

    Examining social presence in online courses in relation to students’ perceived learning and satisfaction

    Jennifer C Richardson and Karen Swan. “Examining social presence in online courses in relation to students’ perceived learning and satisfaction”. In:Online Learn. 7.1 (Mar. 2019)

  9. [9]

    Social presence in relation to students’ satisfaction and learning in the online environment: A meta-analysis

    Jennifer C Richardson et al. “Social presence in relation to students’ satisfaction and learning in the online environment: A meta-analysis”. en. In:Comput. Human Behav.71 (June 2017), pp. 402–417

  10. [10]

    Self-determination theory and the facilitation of intrinsic motivation, social de- velopment, and well-being

    R M Ryan and E L Deci. “Self-determination theory and the facilitation of intrinsic motivation, social de- velopment, and well-being”. en. In:Am. Psychol.55.1 (Jan. 2000), pp. 68–78

  11. [11]

    Auton- omy, competence, and relatedness in the classroom: Applying self-determination theory to educational prac- tice

    Christopher P Niemiec and Richard M Ryan. “Auton- omy, competence, and relatedness in the classroom: Applying self-determination theory to educational prac- tice”. en. In:Theory Res. Educ.7.2 (July 2009), pp. 133–144

  12. [12]

    Similarities and differences between human–human and hu- man–automation trust: an integrative review

    P Madhavan and D A Wiegmann. “Similarities and differences between human–human and hu- man–automation trust: an integrative review”. en. In: Theor. Issues Ergon.8.4 (July 2007), pp. 277–301

  13. [13]

    Trust, risk and betrayal

    Iris Bohnet and Richard Zeckhauser. “Trust, risk and betrayal”. en. In:J. Econ. Behav. Organ.55.4 (Dec. 2004), pp. 467–484

  14. [14]

    Individual dif- ferences in psychological reactance

    Sharon S Brehm and Jack W Brehm. “Individual dif- ferences in psychological reactance”. In:Psychological Reactance. Elsevier, 1981, pp. 213–228

  15. [15]

    Exploring the potential of large language models to generate formative programming feedback

    Natalie Kiesler, Dominic Lohr, and Hieke Keuning. “Exploring the potential of large language models to generate formative programming feedback”. In:arXiv [cs.AI](Aug. 2023)

  16. [16]

    Generating high-precision feedback for programming syntax errors using large language models

    Tung Phung et al. “Generating high-precision feedback for programming syntax errors using large language models”. In:arXiv [cs.PL](Jan. 2023)

  17. [17]

    ChatGPT for good? On op- portunities and challenges of large language models for education

    Enkelejda Kasneci et al. “ChatGPT for good? On op- portunities and challenges of large language models for education”. en. In:Learn. Individ. Differ.103.102274 (Apr. 2023), p. 102274

  18. [18]

    AI or human? Evaluating student feedback perceptions in higher education

    Tanya Nazaretsky et al. “AI or human? Evaluating student feedback perceptions in higher education”. en. In:Lecture Notes in Computer Science. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2024, pp. 284–298

  19. [19]

    What is AI Literacy? Competencies and Design Considerations

    Duri Long and Brian Magerko. “What is AI Literacy? Competencies and Design Considerations”. In:Proceed- ings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY , USA: ACM, Apr. 2020, pp. 1–16

  20. [20]

    All that’s ‘human’ is not gold: Evaluating human evaluation of generated text

    Elizabeth Clark et al. “All that’s ‘human’ is not gold: Evaluating human evaluation of generated text”. In: Proc. 59th Annu. Meeting Assoc. Comput. Linguist.Ed. by Chengqing Zong et al. Stroudsburg, PA, USA: Asso- ciation for Computational Linguistics, 2021, pp. 7282– 7296

  21. [21]

    The power of feed- back

    John Hattie and Helen Timperley. “The power of feed- back”. en. In:Rev. Educ. Res.77.1 (Mar. 2007), pp. 81– 112

  22. [22]

    Making sense of assessment feedback in higher education

    Carol Evans. “Making sense of assessment feedback in higher education”. en. In:Rev. Educ. Res.83.1 (Mar. 2013), pp. 70–120

  23. [23]

    School engagement: Potential of the concept, state of the evidence

    Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. “School engagement: Potential of the concept, state of the evidence”. en. In:Rev. Educ. Res.74.1 (Mar. 2004), pp. 59–109