Towards More Empathic Programming Environments: An Experimental Empathic AI-Enhanced IDE

Aaron Daniel Go; Jocelynn Cu; Justin Rainier Go; Kurt Christian Andaya; Roemer Gabriel Caliboso

arxiv: 2604.19142 · v1 · submitted 2026-04-21 · 💻 cs.SE · cs.CY

Towards More Empathic Programming Environments: An Experimental Empathic AI-Enhanced IDE

Justin Rainier Go , Kurt Christian Andaya , Roemer Gabriel Caliboso , Aaron Daniel Go , Jocelynn Cu This is my paper

Pith reviewed 2026-05-10 02:41 UTC · model grok-4.3

classification 💻 cs.SE cs.CY

keywords empathic AInovice programmersprogramming IDEAI-assisted learningerror correctionuser studyC programminglearning outcomes

0 comments

The pith

An empathic AI IDE for novice C programmers matches standard tools on most measures but users find it more helpful for fixing errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Ceci as a Caring Empathic C IDE that uses AI to offer emotional support and encourage learning instead of supplying code solutions directly. It describes a small pilot study in which eleven novice programmers completed a task either with Ceci or with VSCode plus ChatGPT, then completed workload surveys and usability questions. Results showed no meaningful differences in how effective or educational the tools seemed or how much effort they required, yet participants using Ceci rated it significantly higher for help during error correction. A sympathetic reader would care because the work tests whether adding empathy to AI coding tools can reduce over-reliance while still supporting beginners in the moment they get stuck.

Core claim

The study establishes that the Caring Empathic C IDE called Ceci, which prioritizes learning and emotional support over direct code generation, produces no significant differences from VSCode paired with ChatGPT in perceived effectiveness, learning outcomes, or workload among novice programmers, although Ceci receives significantly higher ratings for helpfulness when correcting errors.

What carries the argument

Ceci, the Caring Empathic C IDE, which embeds empathic AI responses into the programming environment to focus on emotional encouragement and learner growth rather than immediate code provision.

If this is right

Empathic responses can be added to an IDE without raising users' reported workload.
The main observed benefit appears in how helpful the tool feels when users are fixing mistakes.
Empathic features by themselves are unlikely to produce broad gains in learning or reduced effort.
Future designs should test larger groups, varied tasks, and deeper integration of empathic elements with other supports.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pairing empathic feedback with guided hints that still require the user to write code might produce stronger learning gains than empathy alone.
The same interface approach could be tested in other languages or technical subjects where novices often feel frustrated.
AI coding assistants may need separate controls for empathy level so users can choose support without automatic code generation.

Load-bearing premise

The empathic features were correctly designed and placed in the interface, and the chosen survey measures plus small participant group were sufficient to detect any real differences in learning or workload.

What would settle it

A follow-up experiment with fifty or more novices, multiple distinct coding tasks, and direct measures of skill retention or code quality over repeated sessions would show whether the empathic condition produces measurably better learning results than the control condition.

Figures

Figures reproduced from arXiv: 2604.19142 by Aaron Daniel Go, Jocelynn Cu, Justin Rainier Go, Kurt Christian Andaya, Roemer Gabriel Caliboso.

**Figure 1.** Figure 1: Schematic Diagram of Methodology facing a cryptic compiler error. Furthermore, while studies explore empathic responses, few have investigated the use of visual, nonverbal cues, such as an animated agent with different emotional poses, to deliver this feedback in a debugging context. Therefore, our project aims to address this gap by designing and evaluating a system that provides empathic feedback specif… view at source ↗

**Figure 2.** Figure 2: Prototype Interface Design 3.1 Preparation and Participant Plan The first step in the methodology is the preparation of necessary components for the study such as the prototype itself and the participants. 3.1.1 Prototype Design. The prototype, named Ceci, is an Integrated Development Environment (IDE) run locally through Python Flask. Within the IDE there would be three sections: the AI chatbot interfa… view at source ↗

**Figure 3.** Figure 3: NASA TLX Results per Group. The Ceci group seems to experience a slightly higher task load index than the ChatGPT group, but the difference is not statistically significant as can be seen in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

As generative AI becomes integral to software development, the risk of over-reliance and diminished critical thinking grows. This study introduces "Ceci," our Caring Empathic C IDE designed to support novice programmers by prioritizing learning and emotional support over direct code generation. The researchers conducted a comparative pilot study between Ceci and VSCode + ChatGPT [9, 40]. Participants completed a coding task and were evaluated using the NASA-TLX workload assessment and a post-test usability survey. Although the sample size was small (n = 11), results show that there is no significant difference in perceived effectiveness, learning and workload between the Experimental Ceci group and the Control group, though Ceci users reported significantly greater perceived helpfulness in error correction (p = 0.0220). These findings suggest that empathic responses may not be sufficient on their own to enhance the learner's outcomes, perceptions, or reduce workload. Overall, this study provides a foundational framework for future research. Such research should explore larger sample sizes, diverse programming tasks, and additional empathic features to better understand the potential of empathic programming environments in supporting novice programmers; they must also ensure that the empathic features are well-integrated in the user interface.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small n=11 pilot on empathic IDE can't back claims that empathic features are insufficient due to low power.

read the letter

This is a small-scale pilot on a custom empathic IDE called Ceci versus regular tools with ChatGPT. The n=11 sample makes the mostly null results hard to interpret as evidence that empathic features fall short. They created an original setup where the IDE gives caring responses aimed at learning rather than direct answers. Participants did a coding task, then filled out NASA-TLX and a survey. Ceci scored better on perceived helpfulness for fixing errors at p=0.022, but showed no difference in effectiveness, learning, or workload. The strength is in trying to address over-reliance through interface design instead of just post-hoc advice. Using established scales like NASA-TLX lets others build on it directly. It's honest about being a pilot and calls for larger follow-ups. The main weakness is the power issue. With so few people, you expect most comparisons to come up empty even if real effects exist. Treating those empties as support for 'may not be sufficient' overstates what the data can say. The one significant result lacks an effect size, and there's no mention of multiple comparison adjustments. Also, without screenshots or detailed examples of the empathic prompts, it's difficult to judge if the feature was implemented strongly enough to test the idea. Readers working on AI-assisted education or novice programming tools would find this useful as an early data point. It shows one way to prototype a 'caring' assistant and what basic metrics to track. The work deserves peer review. It's a genuine attempt with original data in a relevant area, even if the current version needs tightening on statistics and claims. Referees could help shape it into a solid short report that guides bigger experiments.

Referee Report

1 major / 2 minor

Summary. The paper introduces 'Ceci', a Caring Empathic C IDE that prioritizes emotional support and learning for novice programmers over direct code generation. It reports a pilot comparative study (n=11) of Ceci versus VSCode+ChatGPT on a coding task, assessed via NASA-TLX workload scores and a post-task usability survey. No significant differences were found in perceived effectiveness, learning, or workload, but Ceci users reported significantly greater helpfulness in error correction (p=0.022). The authors conclude that empathic responses may not be sufficient on their own and recommend larger studies with additional features.

Significance. If replicated with adequate power and refined measures, the work would usefully constrain expectations for empathic AI in programming education: it suggests that empathy alone may not outperform standard generative-AI assistance on core outcomes such as perceived learning or workload. The study supplies a concrete experimental framework (task, instruments, and comparison) that future work can build upon, particularly if it incorporates power analyses and effect-size reporting.

major comments (1)

[Abstract and Conclusion] Abstract and Conclusion: the claim that empathic responses 'may not be sufficient on their own to enhance the learner's outcomes, perceptions, or reduce workload' rests on interpreting non-significant differences in effectiveness, learning, and workload as positive evidence of insufficiency. With n=11 split across groups, standard power calculations for typical Likert-scale or usability differences (Cohen's d ≈ 0.5–0.8) yield power well below 0.5; the null results are therefore inconclusive rather than supportive of the central claim. The single significant result (p=0.022) is reported without effect size, confidence intervals, or multiplicity correction.

minor comments (2)

[Methods] Methods: the manuscript provides limited detail on the concrete empathic response mechanisms implemented in Ceci and their precise integration into the IDE UI; explicit examples or screenshots would allow readers to evaluate whether the features were delivered as intended.
[Results] Results: the p=0.022 finding on error-correction helpfulness should be accompanied by the exact test statistic, degrees of freedom, effect size, and any pre-specified analysis plan to permit assessment of its robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our pilot study. We agree that the small sample size makes the non-significant results inconclusive and that the reporting of the significant finding requires improvement. We will revise the abstract and conclusion to more cautiously frame the findings as preliminary, emphasize the pilot nature of the work, and avoid any implication that null results constitute evidence of insufficiency. We will also enhance statistical reporting for the significant result.

read point-by-point responses

Referee: the claim that empathic responses 'may not be sufficient on their own to enhance the learner's outcomes, perceptions, or reduce workload' rests on interpreting non-significant differences in effectiveness, learning, and workload as positive evidence of insufficiency. With n=11 split across groups, standard power calculations for typical Likert-scale or usability differences (Cohen's d ≈ 0.5–0.8) yield power well below 0.5; the null results are therefore inconclusive rather than supportive of the central claim.

Authors: We agree that the non-significant results cannot be interpreted as positive evidence of insufficiency given the low power of the pilot (n=11). Our intent was to present the absence of observed benefits in this initial comparison as motivation for further research rather than a definitive conclusion. However, we recognize that the current phrasing risks overinterpretation. In the revised manuscript we will reword the abstract and conclusion to state that the pilot findings are inconclusive on the question of sufficiency, explicitly note the limited statistical power, and stress that larger, pre-registered studies with power analyses are needed to evaluate whether empathic features can improve the measured outcomes. revision: yes
Referee: The single significant result (p=0.022) is reported without effect size, confidence intervals, or multiplicity correction.

Authors: We thank the referee for this point. In the revision we will add the effect size (Cohen’s d or appropriate non-parametric equivalent) and 95% confidence interval for the difference in perceived helpfulness during error correction. We will also clarify that the analysis was exploratory with a modest number of comparisons and discuss whether a multiplicity adjustment is warranted; if applied, we will report both unadjusted and adjusted values. These additions will improve transparency and allow readers to better assess the result. revision: yes

Circularity Check

0 steps flagged

Purely empirical pilot study with no derivations or fitted predictions

full rationale

The paper is a comparative user study (n=11) reporting survey and NASA-TLX results between Ceci and VSCode+ChatGPT. No equations, parameter fitting, predictions derived from prior fits, or self-citation chains appear in the abstract or described methods. Conclusions follow directly from the observed p-values and null findings without any reduction to inputs by construction. This matches the default expectation of a self-contained empirical evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of standard survey instruments (NASA-TLX) and the assumption that the empathic responses were implemented as intended in the tool.

axioms (2)

domain assumption NASA-TLX workload assessment is a valid and sensitive measure for this context
Used to compare groups without additional validation in the study.
ad hoc to paper The empathic responses in Ceci were correctly designed and delivered to participants
Core to the experimental manipulation but not independently verified in the abstract.

pith-pipeline@v0.9.0 · 5527 in / 1412 out tokens · 40656 ms · 2026-05-10T02:41:55.957999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

S. K. Ahmed, R. A. Mohammed, A. J. Nashwan, R. H. Ibrahim, A. Q. Abdalla, B. M. Ameen, and R. M. Khdhir. 2025. Using thematic analysis in qualitative research. Journal of Medicine, Surgery, and Public Health6 (2025), 100198

work page 2025
[2]

Aldrup, B

K. Aldrup, B. Carstensen, and U. Klusmann. 2022. Is empathy the key to effective teaching? a systematic review of its association with teacher-student interactions and student outcomes.Educational Psychology Review34, 1 (2022), 1177–1216

work page 2022
[3]

Bosch and S

N. Bosch and S. D’Mello. 2015. The affective experience of novice computer programmers.International Journal of Artificial Intelligence in Education27, 1 (2015), 181–206

work page 2015
[4]

Carreira, L

G. Carreira, L. Silva, A. J. Mendes, and H. G. Oliveira. 2022. Pyo, a chatbot assistant for introductory programming students. In2022 International Symposium on Computers in Education (SIIE)

work page 2022
[5]

Castellano, A

G. Castellano, A. Paiva, A. Kappas, R. Aylett, H. Hastie, W. Barendregt, F. Nabais, and S. Bull. 2013. Towards empathic virtual and robotic tutors. InLecture Notes in Computer Science. 733–736

work page 2013
[6]

C. K. Y. Chan and L. H. Y. Tsi. 2023. The AI revolution in education: Will AI replace or assist teachers in higher education? (2023). arXiv:2305.01185

work page arXiv 2023
[7]

Colligan, H

L. Colligan, H. W. W. Potts, C. T. Finn, and R. A. Sinkin. 2015. Cognitive workload changes for nurses transitioning from a legacy system with paper documenta- tion to a commercial electronic health record.International Journal of Medical Informatics84, 7 (2015), 469–476

work page 2015
[8]

L. M. Collins. 2007. Research design and methods. InCronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3):297–334. Elsevier, 433–442

work page 2007
[9]

Deckker and S

D. Deckker and S. Sumanasekara. 2024.The role of ChatGPT in software develop- ment and code generation. Wrexham University, Tech report

work page 2024
[10]

P. Eibl, S. Sabouri, and S. Chattopadhyay. 2025. Exploring the challenges and opportunities of AI-assisted codebase generation. (2025). arXiv:2508.07966

work page arXiv 2025
[11]

Ford and C

D. Ford and C. Parnin. 2015. Exploring causes of frustration for software devel- opers. (2015). doi:10.1109/chase.2015.19

work page doi:10.1109/chase.2015.19 2015
[12]

García-Pérez, J.-M

R. García-Pérez, J.-M. Santos-Delgado, and O. Buzón-García. 2016. Virtual empa- thy as digital competence in education 3.0.International Journal of Educational Technology in Higher Education13 (2016)

work page 2016
[13]

Goroshit and M

M. Goroshit and M. Hen. 2014. Does emotional self-efficacy predict teachers’ self-efficacy and empathy?Journal of Education and Training Studies2, 3 (2014)

work page 2014
[14]

Groothuijsen, A

S. Groothuijsen, A. van den Beemt, J. C. Remmers, and L. W. van Meeuwen. 2024.Ai chatbots in programming education. Computers and Education: Artificial Intelligence, 7:100290

work page 2024
[15]

F. Gu, Z. Liang, H. Li, and J. Ma. 2025. The Matthew effect of AI programming assistants: A hidden bias in software evolution. (2025). arXiv:2509.23261

work page arXiv 2025
[16]

Gupta, H

R. Gupta, H. Goyal, D. Kumar, A. Mehra, S. Sharma, K. Mittal, and J. S. Challa

work page
[17]

Sakshm AI: Advancing AI-assisted coding education for engineering students in india through socratic tutoring and comprehensive feedback. (2025). arXiv:2503.12479

work page arXiv 2025
[18]

S. G. Hart and L. E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research.Advances in Psychology52 (1988), 139–183

work page 1988
[19]

Coding Tutor

S. Hobert. 2019. Say hello to “Coding Tutor”! design and evaluation of a chatbot- based learning system supporting students. (2019)

work page 2019
[20]

Kate Hone. 2006. Empathic agents to reduce user frustration: The effects of varying agent characteristics.Interacting with Computers18, 2 (2006), 227–245. doi:10.1016/j.intcom.2005.05.003

work page doi:10.1016/j.intcom.2005.05.003 2006
[21]

Hossain, C

I. Hossain, C. Hundhausen, A. Tariq, S. Haque, Y. Qiao, and B. Mulanda. 2025. The effects of GitHub Copilot on computing students’ programming effectiveness, efficiency, and processes in brownfield programming tasks. (2025). doi:10.1145/ 3702652.3744219

work page arXiv 2025
[22]

M. L. Kamins and C. S. Dweck. 1999. Person versus process praise and criticism: Implications for contingent self-worth and coping.Developmental Psychology35, 3 (1999), 835–847

work page 1999
[23]

Kurniawan, E

O. Kurniawan, E. Chandra, C. M. Poskitt, Y. Noller, K. Tsu, and C. Jegourel. 2025. Designing for novice debuggers: A pilot study on an AI-assisted debugging tool. (2025). arXiv:2509.21067

work page arXiv 2025
[24]

Lasha, M

L. Lasha, M. Grigolia, and L. Machaidze. 2023. Role of AI chatbots in education: systematic literature review. (2023)

work page 2023
[25]

Chenxi Li. 2025. AIVA: An AI-based Virtual Companion for Emotion-aware Interaction. arXiv:2509.03212 [cs.CV] https://arxiv.org/abs/2509.03212

work page arXiv 2025
[26]

H. Li, Z. Wang, L. Ding, J. Zhang, and G. Wang. 2025. The facts about the effects of pedagogical agents on learners’ cognitive load: a meta-analysis based on 24 studies.Frontiers in Psychology16 (2025)

work page 2025
[27]

J. Liu, X. Tang, L. Li, P. Chen, and Y. Liu. 2023. Which is a better programming assistant? A comparative study between ChatGPT and stack overflow. (2023). arXiv:2308.13851

work page arXiv 2023
[28]

R. Liu, J. Zhao, B. Xu, C. Perez, and D. J. Malan. 2025. Improving AI in CS50: Leveraging human feedback for better learning.SIGCSE TS2025 (2025), 715–721

work page 2025
[29]

Marwan, G

S. Marwan, G. Gao, S. Fisk, T. W. Price, and T. Barnes. 2020. Adaptive immediate feedback can improve novice programming engagement and intention to persist in computer science. InProceedings of the 2020 ACM Conference on International Computing Education Research

work page 2020
[30]

V. May, D. Misra, Y. Luo, A. Sridhar, J. Gehring, and J. Silvio. 2025. Fresh- brew: A benchmark for evaluating AI agents on java code migration. (2025). arXiv:2510.04852

work page arXiv 2025
[31]

Mondal, C

S. Mondal, C. K. Roy, H. Wang, J. Arguello, and S. Mathan. 2025. Can we trust the AI pair programmer? Copilot for API misuse detection and correction. (2025). arXiv:2509.16795

work page arXiv 2025
[32]

Mozannar, G

H. Mozannar, G. Bansal, A. Fourney, and E. Horvitz. 2024. Reading between the lines: Modeling user behavior and costs in AI-assisted programming. (2024)

work page 2024
[33]

C. M. Mueller and C. S. Dweck. 1998. Praise for intelligence can undermine chil- dren’s motivation and performance.Journal of Personality and Social Psychology 75, 1 (1998), 33–52

work page 1998
[34]

Novak, K

E. Novak, K. McDaniel, and J. Li. 2023. Factors that impact student frustration in digital learning environments.Computers and Education Open5 (2023), 100153

work page 2023
[35]

Ortega-Ochoa, J

E. Ortega-Ochoa, J. Q. Pérez, M. Arguedas, T. Daradoumis, and J. Manuel. 2024. The effectiveness of empathic chatbot feedback for developing computer com- petencies, motivation, self-regulation, and metacognitive reasoning in online higher education.Internet of Things25 (2024), 101101

work page 2024
[36]

Phanudom, T

P. Phanudom, T. Hirao, R. Gaikovina Kula, and H. Iida. 2021. Interactive chatbot for supporting students in online Python programming class. (2021)

work page 2021
[37]

R. A. Poldrack, T. Lu, and G. Beguš. 2023. AI-assisted coding: Experiments with GPT-4. (2023). arXiv:2304.13187

work page arXiv 2023
[38]

A. S. Raamkumar and Y. Yang. 2022. Empathetic conversational systems: A review of current advances, gaps, and opportunities.IEEE Transactions on Affective Computing(2022), 1–20

work page 2022
[39]

Raykov and G

T. Raykov and G. A. Marcoulides. 2017. Thanks coefficient alpha, we still need you!Educational and Psychological Measurement79, 1 (2017), 200–210

work page 2017
[40]

Schmidhuber, S

J. Schmidhuber, S. Schlögl, and C. Ploder. 2021. Cognitive load and productivity implications in human-chatbot interaction. arXiv / tech report. (2021)

work page 2021
[41]

A. Silver. 2025.Celebrating 50 Million Developers. Microsoft

work page 2025
[42]

F. Sun, L. Li, S. Meng, X. Teng, T. Payne, and P. Craig. 2025. Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting. (2025). arXiv:2505.19803

work page arXiv 2025
[43]

J. H. Sundjaja, R. Shrestha, and K. Krishan. 2023.McNemar and Mann-Whitney U Tests. StatPearls

work page 2023
[44]

Vijayvergiya, M

M. Vijayvergiya, M. Salawa, I. Budiselić, D. Zheng, P. Lamblin, M. Ivanković, J. Carin, M. Lewko, J. Andonov, G. Petrović, D. Tarlow, P. Maniatis, and R. Just

work page
[45]

arXiv / conference report

AI-assisted assessment of coding practices in modern code review. arXiv / conference report. (2024)

work page 2024
[46]

Z. Zhou. 2022. Empathy in education: A critical review.International Journal for the Scholarship of Teaching and Learning16, 3 (2022)

work page 2022

[1] [1]

S. K. Ahmed, R. A. Mohammed, A. J. Nashwan, R. H. Ibrahim, A. Q. Abdalla, B. M. Ameen, and R. M. Khdhir. 2025. Using thematic analysis in qualitative research. Journal of Medicine, Surgery, and Public Health6 (2025), 100198

work page 2025

[2] [2]

Aldrup, B

K. Aldrup, B. Carstensen, and U. Klusmann. 2022. Is empathy the key to effective teaching? a systematic review of its association with teacher-student interactions and student outcomes.Educational Psychology Review34, 1 (2022), 1177–1216

work page 2022

[3] [3]

Bosch and S

N. Bosch and S. D’Mello. 2015. The affective experience of novice computer programmers.International Journal of Artificial Intelligence in Education27, 1 (2015), 181–206

work page 2015

[4] [4]

Carreira, L

G. Carreira, L. Silva, A. J. Mendes, and H. G. Oliveira. 2022. Pyo, a chatbot assistant for introductory programming students. In2022 International Symposium on Computers in Education (SIIE)

work page 2022

[5] [5]

Castellano, A

G. Castellano, A. Paiva, A. Kappas, R. Aylett, H. Hastie, W. Barendregt, F. Nabais, and S. Bull. 2013. Towards empathic virtual and robotic tutors. InLecture Notes in Computer Science. 733–736

work page 2013

[6] [6]

C. K. Y. Chan and L. H. Y. Tsi. 2023. The AI revolution in education: Will AI replace or assist teachers in higher education? (2023). arXiv:2305.01185

work page arXiv 2023

[7] [7]

Colligan, H

L. Colligan, H. W. W. Potts, C. T. Finn, and R. A. Sinkin. 2015. Cognitive workload changes for nurses transitioning from a legacy system with paper documenta- tion to a commercial electronic health record.International Journal of Medical Informatics84, 7 (2015), 469–476

work page 2015

[8] [8]

L. M. Collins. 2007. Research design and methods. InCronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3):297–334. Elsevier, 433–442

work page 2007

[9] [9]

Deckker and S

D. Deckker and S. Sumanasekara. 2024.The role of ChatGPT in software develop- ment and code generation. Wrexham University, Tech report

work page 2024

[10] [10]

P. Eibl, S. Sabouri, and S. Chattopadhyay. 2025. Exploring the challenges and opportunities of AI-assisted codebase generation. (2025). arXiv:2508.07966

work page arXiv 2025

[11] [11]

Ford and C

D. Ford and C. Parnin. 2015. Exploring causes of frustration for software devel- opers. (2015). doi:10.1109/chase.2015.19

work page doi:10.1109/chase.2015.19 2015

[12] [12]

García-Pérez, J.-M

R. García-Pérez, J.-M. Santos-Delgado, and O. Buzón-García. 2016. Virtual empa- thy as digital competence in education 3.0.International Journal of Educational Technology in Higher Education13 (2016)

work page 2016

[13] [13]

Goroshit and M

M. Goroshit and M. Hen. 2014. Does emotional self-efficacy predict teachers’ self-efficacy and empathy?Journal of Education and Training Studies2, 3 (2014)

work page 2014

[14] [14]

Groothuijsen, A

S. Groothuijsen, A. van den Beemt, J. C. Remmers, and L. W. van Meeuwen. 2024.Ai chatbots in programming education. Computers and Education: Artificial Intelligence, 7:100290

work page 2024

[15] [15]

F. Gu, Z. Liang, H. Li, and J. Ma. 2025. The Matthew effect of AI programming assistants: A hidden bias in software evolution. (2025). arXiv:2509.23261

work page arXiv 2025

[16] [16]

Gupta, H

R. Gupta, H. Goyal, D. Kumar, A. Mehra, S. Sharma, K. Mittal, and J. S. Challa

work page

[17] [17]

Sakshm AI: Advancing AI-assisted coding education for engineering students in india through socratic tutoring and comprehensive feedback. (2025). arXiv:2503.12479

work page arXiv 2025

[18] [18]

S. G. Hart and L. E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research.Advances in Psychology52 (1988), 139–183

work page 1988

[19] [19]

Coding Tutor

S. Hobert. 2019. Say hello to “Coding Tutor”! design and evaluation of a chatbot- based learning system supporting students. (2019)

work page 2019

[20] [20]

Kate Hone. 2006. Empathic agents to reduce user frustration: The effects of varying agent characteristics.Interacting with Computers18, 2 (2006), 227–245. doi:10.1016/j.intcom.2005.05.003

work page doi:10.1016/j.intcom.2005.05.003 2006

[21] [21]

Hossain, C

I. Hossain, C. Hundhausen, A. Tariq, S. Haque, Y. Qiao, and B. Mulanda. 2025. The effects of GitHub Copilot on computing students’ programming effectiveness, efficiency, and processes in brownfield programming tasks. (2025). doi:10.1145/ 3702652.3744219

work page arXiv 2025

[22] [22]

M. L. Kamins and C. S. Dweck. 1999. Person versus process praise and criticism: Implications for contingent self-worth and coping.Developmental Psychology35, 3 (1999), 835–847

work page 1999

[23] [23]

Kurniawan, E

O. Kurniawan, E. Chandra, C. M. Poskitt, Y. Noller, K. Tsu, and C. Jegourel. 2025. Designing for novice debuggers: A pilot study on an AI-assisted debugging tool. (2025). arXiv:2509.21067

work page arXiv 2025

[24] [24]

Lasha, M

L. Lasha, M. Grigolia, and L. Machaidze. 2023. Role of AI chatbots in education: systematic literature review. (2023)

work page 2023

[25] [25]

Chenxi Li. 2025. AIVA: An AI-based Virtual Companion for Emotion-aware Interaction. arXiv:2509.03212 [cs.CV] https://arxiv.org/abs/2509.03212

work page arXiv 2025

[26] [26]

H. Li, Z. Wang, L. Ding, J. Zhang, and G. Wang. 2025. The facts about the effects of pedagogical agents on learners’ cognitive load: a meta-analysis based on 24 studies.Frontiers in Psychology16 (2025)

work page 2025

[27] [27]

J. Liu, X. Tang, L. Li, P. Chen, and Y. Liu. 2023. Which is a better programming assistant? A comparative study between ChatGPT and stack overflow. (2023). arXiv:2308.13851

work page arXiv 2023

[28] [28]

R. Liu, J. Zhao, B. Xu, C. Perez, and D. J. Malan. 2025. Improving AI in CS50: Leveraging human feedback for better learning.SIGCSE TS2025 (2025), 715–721

work page 2025

[29] [29]

Marwan, G

S. Marwan, G. Gao, S. Fisk, T. W. Price, and T. Barnes. 2020. Adaptive immediate feedback can improve novice programming engagement and intention to persist in computer science. InProceedings of the 2020 ACM Conference on International Computing Education Research

work page 2020

[30] [30]

V. May, D. Misra, Y. Luo, A. Sridhar, J. Gehring, and J. Silvio. 2025. Fresh- brew: A benchmark for evaluating AI agents on java code migration. (2025). arXiv:2510.04852

work page arXiv 2025

[31] [31]

Mondal, C

S. Mondal, C. K. Roy, H. Wang, J. Arguello, and S. Mathan. 2025. Can we trust the AI pair programmer? Copilot for API misuse detection and correction. (2025). arXiv:2509.16795

work page arXiv 2025

[32] [32]

Mozannar, G

H. Mozannar, G. Bansal, A. Fourney, and E. Horvitz. 2024. Reading between the lines: Modeling user behavior and costs in AI-assisted programming. (2024)

work page 2024

[33] [33]

C. M. Mueller and C. S. Dweck. 1998. Praise for intelligence can undermine chil- dren’s motivation and performance.Journal of Personality and Social Psychology 75, 1 (1998), 33–52

work page 1998

[34] [34]

Novak, K

E. Novak, K. McDaniel, and J. Li. 2023. Factors that impact student frustration in digital learning environments.Computers and Education Open5 (2023), 100153

work page 2023

[35] [35]

Ortega-Ochoa, J

E. Ortega-Ochoa, J. Q. Pérez, M. Arguedas, T. Daradoumis, and J. Manuel. 2024. The effectiveness of empathic chatbot feedback for developing computer com- petencies, motivation, self-regulation, and metacognitive reasoning in online higher education.Internet of Things25 (2024), 101101

work page 2024

[36] [36]

Phanudom, T

P. Phanudom, T. Hirao, R. Gaikovina Kula, and H. Iida. 2021. Interactive chatbot for supporting students in online Python programming class. (2021)

work page 2021

[37] [37]

R. A. Poldrack, T. Lu, and G. Beguš. 2023. AI-assisted coding: Experiments with GPT-4. (2023). arXiv:2304.13187

work page arXiv 2023

[38] [38]

A. S. Raamkumar and Y. Yang. 2022. Empathetic conversational systems: A review of current advances, gaps, and opportunities.IEEE Transactions on Affective Computing(2022), 1–20

work page 2022

[39] [39]

Raykov and G

T. Raykov and G. A. Marcoulides. 2017. Thanks coefficient alpha, we still need you!Educational and Psychological Measurement79, 1 (2017), 200–210

work page 2017

[40] [40]

Schmidhuber, S

J. Schmidhuber, S. Schlögl, and C. Ploder. 2021. Cognitive load and productivity implications in human-chatbot interaction. arXiv / tech report. (2021)

work page 2021

[41] [41]

A. Silver. 2025.Celebrating 50 Million Developers. Microsoft

work page 2025

[42] [42]

F. Sun, L. Li, S. Meng, X. Teng, T. Payne, and P. Craig. 2025. Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting. (2025). arXiv:2505.19803

work page arXiv 2025

[43] [43]

J. H. Sundjaja, R. Shrestha, and K. Krishan. 2023.McNemar and Mann-Whitney U Tests. StatPearls

work page 2023

[44] [44]

Vijayvergiya, M

M. Vijayvergiya, M. Salawa, I. Budiselić, D. Zheng, P. Lamblin, M. Ivanković, J. Carin, M. Lewko, J. Andonov, G. Petrović, D. Tarlow, P. Maniatis, and R. Just

work page

[45] [45]

arXiv / conference report

AI-assisted assessment of coding practices in modern code review. arXiv / conference report. (2024)

work page 2024

[46] [46]

Z. Zhou. 2022. Empathy in education: A critical review.International Journal for the Scholarship of Teaching and Learning16, 3 (2022)

work page 2022