AI-supported data analysis boosts student motivation and reduces stress in physics education
Pith reviewed 2026-05-23 06:50 UTC · model grok-4.3
The pith
AI chatbot for physics data analysis raises engagement and enjoyment while matching Excel on learning gains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Fifty student teachers were randomly assigned to use either a custom GPT-based chatbot called ExperiMentor or standard Excel to complete identical guided tasks on thread and spring pendulum data. Both groups showed significant learning gains from pre- to post-test with no statistically significant difference between them. Surveys measuring emotional and motivational variables found the AI group scored substantially higher on engagement, enjoyment, and perceived method effectiveness.
What carries the argument
The ExperiMentor GPT-based chatbot that provides interactive guidance during experimental data analysis, contrasted with Excel to isolate effects on affective responses from effects on cognitive performance.
If this is right
- Interactive AI tools can improve the emotional side of learning tasks while cognitive outcomes remain comparable.
- AI should be integrated as a supportive element inside pedagogical frameworks rather than as a replacement for instructional design.
- Long-term retention effects, the role of learner diversity, and comparisons with other forms of support remain open questions for further study.
Where Pith is reading between the lines
- The same chatbot structure might reduce stress for students handling data in other experimental sciences if the prompts are adapted to new contexts.
- Teachers could deploy similar tools to support learners who find spreadsheet interfaces especially difficult.
- Testing the approach with high-school pupils instead of student teachers would check whether the motivation gains hold for younger or less experienced groups.
Load-bearing premise
The structured surveys give unbiased readings of true differences caused by the analysis tool, and random assignment produced groups that differed only in the method used.
What would settle it
A follow-up trial in which the AI and Excel groups show equal scores on the engagement, enjoyment, and effectiveness survey items after identical tasks.
read the original abstract
The integration of artificial intelligence (AI) into education presents new opportunities for supporting learning processes. This study investigates the impact of AI-assisted versus traditional Excel-based data analysis on both learning outcomes and emotional-motivational responses in a physics education context. A custom GPT-based chatbot, ExperiMentor, was developed to support student teachers in analyzing experimental data from thread and spring pendulum experiments. Fifty student teachers were randomly assigned to either the AI or Excel group, with both groups completing identical tasks in a guided setting. Learning progress was measured using pre- and post-tests, while emotional and motivational variables were assessed through structured surveys. Both groups demonstrated significant learning gains, with no statistically significant differences found between them in terms of cognitive performance. However, the AI group reported substantially higher levels of engagement, enjoyment, and perceived method effectiveness compared to the Excel group. These findings suggest that interactive AI tools may enhance the affective dimensions of learning, even when cognitive outcomes remain comparable to traditional methods. The results underscore the importance of integrating AI not as a replacement for instructional design, but as a supportive element within pedagogical frameworks. Future research should explore long-term retention effects, the role of learner diversity, and comparisons with other forms of pedagogical support.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a randomized controlled study with 50 student teachers comparing a custom GPT-based chatbot (ExperiMentor) for data analysis against traditional Excel methods on identical pendulum experiment tasks. Both groups showed significant pre-to-post learning gains with no statistically significant difference between conditions on cognitive measures, while the AI group reported substantially higher engagement, enjoyment, and perceived method effectiveness on post-task structured surveys.
Significance. If the affective differences prove robust, the work would provide evidence that interactive AI tools can improve motivational and emotional aspects of physics lab work without reducing cognitive outcomes relative to standard spreadsheet methods. The random assignment and matched tasks are strengths that support causal inference on the reported null cognitive result.
major comments (3)
- [Methods (survey instruments)] The abstract and methods description of the structured surveys provide no information on item development, validation, reliability (e.g., internal consistency), or pilot testing. Because the headline claim of higher affective scores in the AI arm rests entirely on these self-report measures, absence of such details leaves open the possibility that observed differences reflect measurement properties rather than true group effects.
- [Results] No effect sizes, exact statistical tests, p-values, or power information are reported for either the cognitive or affective comparisons. The claim of 'no statistically significant differences' in learning gains and 'substantially higher' affective scores cannot be evaluated for practical importance or robustness without these quantities.
- [Methods (design and procedure)] The design description does not address potential confounds specific to the AI condition, including pre-existing group differences in AI familiarity, participant or experimenter blinding, or controls for novelty/expectancy effects. Given that the custom GPT tool is inherently novel in an educational setting, these factors could account for the affective differences without requiring a stable motivational advantage of the AI method.
minor comments (2)
- [Title and abstract] The title references 'reduces stress' but the abstract and reported outcomes emphasize engagement, enjoyment, and effectiveness; clarify whether stress was separately measured and what the specific findings were.
- [Methods] Provide the exact wording or sample items from the pre/post tests and surveys so readers can assess alignment with the claimed constructs.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which has strengthened the reporting and interpretation of our study. We address each major comment below.
read point-by-point responses
-
Referee: [Methods (survey instruments)] The abstract and methods description of the structured surveys provide no information on item development, validation, reliability (e.g., internal consistency), or pilot testing. Because the headline claim of higher affective scores in the AI arm rests entirely on these self-report measures, absence of such details leaves open the possibility that observed differences reflect measurement properties rather than true group effects.
Authors: We agree that the original submission lacked sufficient detail on survey construction. The revised manuscript now includes a description of item sources (adapted from established educational psychology scales), the adaptation process, pilot testing with a separate sample of 10 students, and internal consistency metrics (Cronbach's alpha) for each subscale. revision: yes
-
Referee: [Results] No effect sizes, exact statistical tests, p-values, or power information are reported for either the cognitive or affective comparisons. The claim of 'no statistically significant differences' in learning gains and 'substantially higher' affective scores cannot be evaluated for practical importance or robustness without these quantities.
Authors: We have updated the Results section to report exact p-values, test statistics (t-tests and ANOVA), effect sizes (Cohen's d with 95% CI), and a post-hoc power analysis (achieved power > 0.80 for the affective differences). These additions allow evaluation of both statistical and practical significance. revision: yes
-
Referee: [Methods (design and procedure)] The design description does not address potential confounds specific to the AI condition, including pre-existing group differences in AI familiarity, participant or experimenter blinding, or controls for novelty/expectancy effects. Given that the custom GPT tool is inherently novel in an educational setting, these factors could account for the affective differences without requiring a stable motivational advantage of the AI method.
Authors: We acknowledge these design limitations. Random assignment was used, but prior AI experience was not assessed and blinding was not feasible given the intervention. The revised manuscript adds an explicit Limitations paragraph discussing novelty and expectancy effects as plausible alternative explanations for the affective results. We cannot alter the original procedure but maintain that the cognitive null finding is still interpretable under random assignment. revision: partial
Circularity Check
No circularity: direct empirical RCT with independent measures
full rationale
The paper reports a randomized assignment of 50 student teachers to AI chatbot vs. Excel conditions, identical tasks, pre/post cognitive tests, and post-task structured surveys for affective variables. No equations, fitted parameters, predictions, or derivation steps appear in the abstract or described design. Results (learning gains equivalent; AI group higher on engagement/enjoyment/effectiveness) are presented as direct observations, not as outputs derived from or equivalent to the inputs by construction. No self-citations are invoked as load-bearing premises. The study is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Random assignment produces comparable groups and there is no interaction between groups.
- domain assumption Survey responses validly reflect true emotional and motivational states without response bias.
Reference graph
Works this paper leans on
-
[1]
First, the entire questionnaire was analyzed, i.e
Performance Data To answer the first research question, the learning gains within the individual groups between pre and post were examined. First, the entire questionnaire was analyzed, i.e. the sum of all correct answers in the pre- test and post -test. Then the three indiv idual subject areas of thread pendulum, spring pendulum and evaluation methods we...
-
[2]
Emotional-motivational Data In addition to the descriptive statistics, the normal distribution was tested using the Shapiro-Wilk test for all questions, independent of whether it was the pre - or the post-test. The intern reliability of the post-test is evaluated with Cronbach’s α , a coefficient that represents the average correlation among all individua...
-
[3]
Excel Group The analysis of the data from the pre- and post-test of the Excel group revealed significant differences in the results of the two measurement times. Overall, an increase in performance can be seen in the post - survey, both in the total sum and in several categories and individual items. The total sum, which includes all items and categories,...
-
[4]
The median value also increased from MDN = 7 to MDN = 9
AI Group Similar to the analysis of the Excel group data, the AI group also showed an improvement in performance from Pre M = 7.64 (SD = 3.43) to Post M = 9.92 (SD = 2.96). The median value also increased from MDN = 7 to MDN = 9. FIG 3. Boxplot of the total result of the AI-group. The dashed line indicates the mean, and the solid line represents the media...
-
[5]
Pre-Intervention Differences in Emotional- Motivational Attitudes For the pre -test, Cronbach's Α was not applied. The questions from section B 1 were already validated in prior studies [41], ensuring their reliability without the need for further analysis. In section B2, the number of items was too small to reliably calculate Cronbach's Α. As a result, t...
-
[6]
Post-Intervention Differences in Emotional- Motivational Attitudes The comparative analysis between Excel and AI - assisted learning methods revealed detailed and statistically significant differences across the eight key constructs of learning experience. These constructs were analyzed with Cronbach’s α, where values above 0.9 are considered excellent, a...
work page 2005
-
[7]
Excel Group The intra-group results of the Excel -group regarding the difference from pre- to post-test indicate an overall positive development in the participants’ performance. The significant increase in the total sum (Σ Total) underscores a general improvement in the measured skills after the intervention. The mean value and the median rose notably fr...
-
[8]
AI Group The performance of the AI-assisted group exhibited significant improvement from pre - to post -test, reinforcing the potential effectiveness of AI -driven learning methods in enhancing learning outcomes. The overall increase in performance, coupled with a large effect size, highlig hts the robustness of this finding. The results demonstrate that ...
-
[9]
Pre-Intervention Differences in Emotional- Motivational Attitudes A comparison of the attitudes at the beginning allows to identify possible differences in the initial conditions between the groups. The attitudes of the participants towards technical innovations and the evaluation of experiments provide insights into the ir motivation, openness and self -...
-
[10]
Post-Intervention Differences in Emotional- Motivational Attitudes The results of the comparative analysis reveal a complex pattern of technological interaction in educational contexts comparing AI -assisted and Excel-based data analysis. The research goes beyond surface-level comparisons to uncover detailed insights into how different technological appro...
-
[11]
I. Roll and R. Wylie, Evolution and Revolution in Artificial Intelligence in Education , Int J Artif Intell Educ 26, 582 (2016)
work page 2016
-
[12]
J. Winkelmann, M. Freese, and T. Strömmer, Schwierigkeitserzeugende Merkmale im Physikunterricht [Difficulty-Inducing Features in Physics Education] (2021)
work page 2021
-
[13]
V. Kuleto et al. , Exploring Opportunities and Challenges of Artificial Intelligence and Machine Learning in Higher Education Institutions , Sustainability 13, 10424 (2021)
work page 2021
-
[14]
E. Bacia et al. , Innovatives Lernen mit Intelligenten Tutoriellen Systemen. Eine Analyse der bildungspolitischen Gelingensbedingungen [Innovative Learning with Intelligent Tutoring Systems. An Analysis of the Conditions for Educational Policy Success] (2024)
work page 2024
-
[15]
S. Küchemann et al., Large language models— Valuable tools that require a sensitive integration into teaching and learning physics , The Physics Teacher 62, 400 (2024). 15
work page 2024
-
[16]
D. Tong et al. , Investigating ChatGPT -4’s performance in solving physics problems and its potential implications for education, Asia Pacific Educ. Rev. 25, 1379 (2024)
work page 2024
-
[17]
M. Farrokhnia, S. K. Banihashem, O. Noroozi, and A. Wals, A SWOT analysis of ChatGPT: Implications for educational practice and research, Innovations in Education and Teaching International 61, 460 (2024)
work page 2024
-
[18]
J. Kechel and R. Wodzinski, Methoden zur Erfassung von Schwierigkeiten bei Schülerexperimenten [Variety of Prerequisites in Science Education] , in Heterogenität und Diversität – Vielfalt der Voraussetzungen im naturwissenschaftlichen Unterricht. Tagungsband Jahrestagung in Bremen 2014 [Heterogeneity and Diversity - Conference Proceedings, Annual Meeting ...
work page 2014
- [19]
-
[20]
S. A. D. Popenici and S. Kerr, Exploring the impact of artificial intelligence on teaching and learning in higher education , Research and practice in technology enhanced learning 12, 22 (2017)
work page 2017
-
[21]
Ständige Wissenschaftliche Kommission der Kultusministerkonferenz [Standing Scientific Commission of the Conference of Ministers of Education and Cultural Affairs], Large Language Models und ihre Potenziale im Bildungssystem. Impulspapier der Ständigen Wissenschaftlichen Kommission (SWK) der Kultusministerkonferenz [Large Language Models and their potenti...
-
[22]
S. Vincent -Lancrin and R. van der Vlies, Trustworthy artificial intelligence (AI) in education. Promises and challenges , OECD Education Working Papers No. 218, Vol. 218 (2020)
work page 2020
-
[23]
F. Mahligawati, E. Allanas, M. H. Butarbutar, and N. A. N. Nordin, Artificial intelligence in Physics Education. A comprehensive literature review, J. Phys.: Conf. Ser. 2596, 12080 (2023)
work page 2023
-
[24]
L. Chen, P. Chen, and Z. Lin, Artificial Intelligence in Education: A Review , IEEE Access 8, 75264 (2020)
work page 2020
-
[25]
O. Zawacki-Richter, V. I. Marín, M. Bond, and F. Gouverneur, Systematic review of research on artificial intelligence applications in higher education – where are the educators?, Int J Educ Technol High Educ 16 (2019)
work page 2019
-
[26]
S. Salas -Pilco, K. Xiao, and X. Hu, Artificial Intelligence and Learning Analytics in Teacher Education: A Systematic Review , Education Sciences 12, 569 (2022)
work page 2022
- [27]
- [28]
- [29]
-
[30]
Y. Liang, Di Zou, H. Xie, and F. L. Wang, Exploring the potential of using ChatGPT in physics education , Smart Learn. Environ. 10 (2023)
work page 2023
-
[31]
H. A. Mustofa, M. R. Bilad, and N. W. B. Grendis, Utilizing AI for Physics Problem Solving: A Literature Review and ChatGPT Experience, Jurnal. Kependidikan. Fisika 12, 78 (2024)
work page 2024
-
[32]
L. Krupp et al., Challenges and Opportunities of Moderating Usage of Large Language Models in Education (2023), http://arxiv.org/pdf/2312.14969v1
-
[33]
M. Halaweh, ChatGPT in education: Strategies for responsible implementation , CONT ED TECHNOLOGY 15, ep421 (2023)
work page 2023
-
[34]
C. Padma and C. Rama, A Study of Artificial Intelligence in Education System & Role of AI in Indian Education Sector, International Journal of Scientific Research in Engineering and Management 06 (2022)
work page 2022
-
[35]
C. K. Lo, What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature , Education Sciences 13, 410 (2023)
work page 2023
-
[36]
P. Bitzenbauer, ChatGPT in physics education: A pilot study on easy -to-implement activities , CONT ED TECHNOLOGY 15, ep430 (2023)
work page 2023
-
[37]
Z. Wen, E. Bai, and M. Li, An Evaluation of the Impact of Artificial Intelligence on university Students' Learning, JID 6, 22 (2024)
work page 2024
-
[38]
E. Kasneci et al. , ChatGPT for good? On opportunities and challenges of large language models for education , Learning and Individual Differences 103 (2023)
work page 2023
-
[39]
Kortemeyer, Could an Artificial -Intelligence agent pass an introductory physics course? , Phys
G. Kortemeyer, Could an Artificial -Intelligence agent pass an introductory physics course? , Phys. Rev. Phys. Educ. Res. 19, 15 (2023), http://arxiv.org/pdf/2301.12127v2. 16
-
[40]
L. Ding, T. Li, S. Jiang, and A. Gapud, Students’ perceptions of using ChatGPT in a physics class as a virtual tutor, Int J Educ Technol High Educ 20 (2023)
work page 2023
-
[41]
M. N. Dahlkemper, S. Z. Lahme, and P. Klein, How do physics students evaluate artificial intelligence responses on comprehension questions? A study on the perceived scientific accuracy and linguistic quality, Phys. Rev. Phys. Educ. Res. 19 (2023), http://arxiv.org/pdf/2304.05906v2
-
[42]
L. Krupp et al. , Unreflected Acceptance. Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education (2023), http://arxiv.org/pdf/2309.03087v1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
T. Gada and S. Chudasana, Impact of Artificial Intelligence on student attitudes, engagement, and learning, IRJMETS 06, 2695 (2024)
work page 2024
-
[44]
H. B. Essel, D. Vlachopoulos, A. B. Essuman, and J. O. Amankwa, ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI -based conversational large language models (LLMs) , Computers and Education: Artificial Intelligence 6, 100198 (2024)
work page 2024
-
[45]
F. Hanum Siregar, B. Hasmayni, and A. H. Lubis, The Analysis of Chat GPT Usage Impact on Learning Motivation among Scout Students, Int J Res Rev 10, 632 (2023)
work page 2023
- [46]
-
[47]
Schule in Zeiten von künstlicher Intelligenz und ChatGPT [Into the Unknown
Vodafone Stiftung Deutschland [Vodafone Foundation Germany], Aufbruch ins Unbekannte. Schule in Zeiten von künstlicher Intelligenz und ChatGPT [Into the Unknown. Schools in Times of Artificial Intelligence and ChatGPT] (2023)
work page 2023
-
[48]
J. Hedderich and L. Sachs, Angewandte Statistik. Methodensammlung mit R [Applied Statistics. Collection of Methods with R] (2016), http://nbn- resolving.org/urn:nbn:de:bsz:31-epflicht- 1574102
work page 2016
-
[49]
N. Döring et al. , Forschungsmethoden und Evaluation in den Sozial - und Humanwissenschaften [Research Methods and Evaluation in the Social and Human Sciences] (2016)
work page 2016
-
[50]
See Supplemental Material at [Link] for the tasks, tests and data tables
-
[51]
F. J. Neyer, J. Felber, and C. Gebhardt, Kurzskala Technikbereitschaft [Short Scale for Technology Commitment ] (2016)
work page 2016
-
[52]
Wollschläger, Grundlagen der Datenanalyse mit R [Basics of Data Analysis with R] (2014)
D. Wollschläger, Grundlagen der Datenanalyse mit R [Basics of Data Analysis with R] (2014)
work page 2014
-
[53]
J. Bortz and C. Schuster, Statistik für Human- und Sozialwissenschaftler [Statistics for Human and Social Scientists] (2010), http://site.ebrary.com/lib/alltitles/docDetail.actio n?docID=10448295
work page 2010
- [54]
-
[55]
G. V. Glass, P. D. Peckham, and J. R. Sanders, Consequences of Failure to Meet Assumptions Underlying the Fixed Effects Analyses of Variance and Covariance , Review of Educational Research 42, 237 (1972)
work page 1972
-
[56]
R. R. Hake, Interactive-engagement versus traditional methods: A six -thousand-student survey of mechanics test data for introductory physics courses, American Journal of Physics 66, 64 (1998)
work page 1998
-
[57]
S. McKagan, E. Sayre, and A. Madsen, Normalized gain. What is it and when and how should I use it? , 2022, https://www.physport.org/recommendations/Ent ry.cfm?ID=93334
work page 2022
-
[58]
Hake, Lessons from the Physics Education Reform Effort, CE 5 (2002)
R. Hake, Lessons from the Physics Education Reform Effort, CE 5 (2002)
work page 2002
-
[59]
V. P. Coletta and J. J. Steinert, Why normalized gain should continue to be used in analyzing preinstruction and postinstruction scores on concept inventories, Phys. Rev. Phys. Educ. Res. 16 (2020)
work page 2020
-
[60]
L. J. Cronbach, Coefficient alpha and the internal structure of tests, Psychometrika 16, 297 (1951)
work page 1951
-
[61]
Blanz, Forschungsmethoden und Statistik für die Soziale Arbeit
M. Blanz, Forschungsmethoden und Statistik für die Soziale Arbeit. Grundlagen und Anwendungen [Research Methods and Statistics for Social Work. Basics and Applications] (2021), http://www.kohlhammer.de/wms/instances/KOB /appDE/nav_product.php?product=978-3-17- 039818-4
work page 2021
- [62]
-
[63]
M. Stadler, M. Bannert, and M. Sailer, Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry , Computers in Human Behavior 160, 108386 (2024)
work page 2024
-
[64]
F. Karataş and B. A. Ataç, When TPACK meets artificial intelligence: Analyzing TPACK and AI- TPACK components through structural equation modelling, Educ Inf Technol (2024)
work page 2024
-
[65]
I.-A. Chounta, E. Bardone, A. Raudsep, and M. Pedaste, Exploring Teachers’ Perceptions of Artificial Intelligence as a Tool to Support their Practice in Estonian K -12 Education, Int J Artif Intell Educ 32, 725 (2022)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.