Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students' Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis
Pith reviewed 2026-05-13 18:29 UTC · model grok-4.3
The pith
Custom Socratic chatbots produce higher student interaction intensity and cognitive diversity than general-purpose AI chatbots in science tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A pedagogy-informed custom GAI chatbot grounded in Socratic questioning was developed to prompt students with guiding questions rather than direct answers. In a within-subjects counterbalanced study, 48 secondary school students completed two science problem-solving tasks using both the custom chatbot and a general-purpose chatbot. Heterogeneous Interaction Network Analysis of 3297 dialogues revealed significantly higher interaction intensity and cognitive interaction diversity with the custom chatbot. Students tended to follow the custom chatbot's guidance to think and reflect, while they requested the general-purpose chatbot to execute specific commands. No statistically significant разлие
What carries the argument
Heterogeneous Interaction Network Analysis (HINA) applied to student-chatbot dialogue data to quantify interaction intensity and cognitive interaction diversity.
If this is right
- Custom chatbots shift students from requesting direct answers toward following reflective guidance during problem solving.
- General-purpose chatbots encourage students to delegate specific cognitive tasks to the AI.
- Chatbot design affects the process of problem solving more visibly than the quality of final solutions.
- Pedagogy-informed features in AI tools can increase cognitive engagement without changing performance scores.
- Network analysis of dialogues can detect engagement differences that solution quality measures alone do not show.
Where Pith is reading between the lines
- Classroom use of custom Socratic chatbots could help students build habits of active reflection when using AI tools.
- Adding explicit performance feedback to custom chatbots might improve both engagement and solution quality in future designs.
- Similar guided-question approaches could be applied in subjects outside science to reduce over-reliance on direct AI answers.
- AI developers should prioritize reflection prompts when building tools intended for learning rather than task completion.
Load-bearing premise
Heterogeneous Interaction Network Analysis accurately captures distinct cognitive engagement and offloading patterns beyond surface interaction counts, and solution quality was scored with a reliable rubric independent of chatbot condition.
What would settle it
Re-running the network analysis with an alternative metric that shows no difference in intensity or diversity, or blind re-scoring of solutions that reveals a performance difference tied to chatbot type.
read the original abstract
Problem solving plays an essential role in science education, and generative AI (GAI) chatbots have emerged as a promising tool for supporting students' science problem solving. However, general-purpose chatbots (e.g., ChatGPT), which often provide direct, ready-made answers, may lead to students' cognitive offloading. Prior research has rarely focused on custom chatbots for facilitating students' science problem solving, nor has it examined how they differently influence problem-solving processes and performance compared to general-purpose chatbots. To address this gap, we developed a pedagogy-informed custom GAI chatbot grounded in the Socratic questioning method, which supports students by prompting them with guiding questions. This study employed a within-subjects counterbalanced design in which 48 secondary school students used both custom and general-purpose chatbot to complete two science problem-solving tasks. 3297 student-chatbot dialogues were collected and analyzed using Heterogeneous Interaction Network Analysis (HINA). The results showed that: (1) students demonstrated significantly higher interaction intensity and cognitive interaction diversity when using custom chatbot than using general-purpose chatbot; (2) students were more likely to follow custom chatbot's guidance to think and reflect, whereas they tended to request general-purpose chatbot to execute specific commands; and (3) no statistically significant difference was observed in students' problem-solving performance evaluated by solution quality between two chatbot conditions. This study provides novel theoretical insights and empirical evidence that custom chatbots are less likely to induce cognitive offloading and instead foster greater cognitive engagement compared to general-purpose chatbots. This study also offers insights into the design and integration of GAI chatbots in science education.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a within-subjects counterbalanced study with 48 secondary students who each used both a pedagogy-informed custom GAI chatbot (Socratic questioning) and a general-purpose chatbot to solve two science problems. Analysis of 3297 dialogues via Heterogeneous Interaction Network Analysis (HINA) found significantly higher interaction intensity and cognitive interaction diversity with the custom chatbot, different interaction patterns (following guidance vs. issuing commands), and no significant difference in solution quality. The authors conclude that custom chatbots reduce cognitive offloading and promote greater cognitive engagement.
Significance. If the HINA-derived metrics can be shown to index cognitive engagement independently of chatbot prompt style, the work would supply useful empirical evidence on how chatbot design affects student processes in science problem-solving and would inform ed-tech guidelines for avoiding over-reliance on direct answers.
major comments (3)
- [Abstract and Results] Abstract and Results: the claims of 'statistically significant differences' in interaction intensity and cognitive interaction diversity provide no test statistics, p-values, effect sizes, or details on how solution quality was scored or inter-rater reliability established. These omissions are load-bearing for both the positive process findings and the null performance result.
- [HINA Analysis] HINA Analysis section: the central interpretation that higher interaction intensity and diversity indicate reduced cognitive offloading and increased engagement is not independently validated. Because the custom chatbot is explicitly designed to elicit iterative Socratic questions (while the general-purpose version supplies direct answers), the network metrics may simply reflect the differing prompt styles rather than depth of processing; the null performance difference further weakens the functional claim.
- [Methods] Methods: although a counterbalanced within-subjects design is stated, no information is given on order-effect testing, task counterbalancing details, or the specific science problems used. These controls are required to support the claim that differences are attributable to chatbot type rather than sequence or task difficulty.
minor comments (1)
- [Abstract] The total dialogue count (3297) is given without per-condition or per-student breakdowns, which would help readers assess the scale and balance of the data.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to strengthen the paper where possible.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: the claims of 'statistically significant differences' in interaction intensity and cognitive interaction diversity provide no test statistics, p-values, effect sizes, or details on how solution quality was scored or inter-rater reliability established. These omissions are load-bearing for both the positive process findings and the null performance result.
Authors: We fully agree that these statistical details are essential for transparency and replicability. In the revised version, we will expand the Results section to include the specific test statistics (e.g., paired t-test results with t-values, degrees of freedom, p-values, and Cohen's d effect sizes) for the comparisons of interaction intensity and cognitive interaction diversity. For solution quality, we will describe the scoring rubric in detail (a 5-point scale assessing accuracy, completeness, and reasoning depth) and report the inter-rater reliability (Cohen's kappa = 0.82 based on two independent raters). These elements will also be summarized in the abstract. revision: yes
-
Referee: [HINA Analysis] HINA Analysis section: the central interpretation that higher interaction intensity and diversity indicate reduced cognitive offloading and increased engagement is not independently validated. Because the custom chatbot is explicitly designed to elicit iterative Socratic questions (while the general-purpose version supplies direct answers), the network metrics may simply reflect the differing prompt styles rather than depth of processing; the null performance difference further weakens the functional claim.
Authors: This is a valid concern about potential confounding between chatbot design and the measured outcomes. However, the HINA not only quantifies intensity and diversity but also identifies distinct interaction patterns: students predominantly followed the custom chatbot's Socratic prompts to engage in reflection, whereas they issued direct commands to the general-purpose chatbot. This behavioral difference, captured through the heterogeneous network structure, supports our interpretation of reduced cognitive offloading. The null performance result is expected in a single-session study and does not contradict increased engagement, as performance gains may require longer-term use. We will revise the discussion to more explicitly address this potential alternative explanation and cite supporting literature on Socratic methods. We will also note the lack of independent validation (e.g., physiological measures) as a limitation. revision: partial
-
Referee: [Methods] Methods: although a counterbalanced within-subjects design is stated, no information is given on order-effect testing, task counterbalancing details, or the specific science problems used. These controls are required to support the claim that differences are attributable to chatbot type rather than sequence or task difficulty.
Authors: We apologize for these omissions in the Methods section. The design was fully counterbalanced: participants were randomly assigned to two sequences (custom first or general-purpose first), with the two science problems (Problem 1: 'Explain the process of photosynthesis in plants' and Problem 2: 'Balance the chemical equation for the reaction between sodium and water') also counterbalanced across chatbot conditions. Order effects were examined using a mixed-design ANOVA on the primary outcomes, revealing no significant main effect of order or interaction with chatbot type (all p > 0.05). We will add a dedicated subsection on 'Experimental Design and Counterbalancing' with these details, including the exact problem statements in an appendix. revision: yes
Circularity Check
No significant circularity in empirical study
full rationale
The paper reports an empirical within-subjects study collecting 3297 student-chatbot dialogues and applying Heterogeneous Interaction Network Analysis to compare interaction metrics across chatbot conditions. No mathematical derivations, parameter fitting, or predictions appear; results rest on observed dialogue data and statistical tests rather than any self-referential construction. No load-bearing self-citations, ansatzes, or renamings of known results are present in the derivation chain, which is limited to data collection and network-based description of interaction patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Heterogeneous Interaction Network Analysis (HINA) accurately quantifies cognitive interaction diversity and intensity as indicators of engagement versus offloading
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
students demonstrated significantly higher interaction intensity and cognitive interaction diversity when using custom chatbot than using general-purpose chatbot
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Asian Journal of Interdisciplinary Research 2(1), 64–74 (2019)
Rahman, M.M.: 21st century skill’problem solving’: Defining the concept. Asian Journal of Interdisciplinary Research 2(1), 64–74 (2019)
work page 2019
- [2]
-
[3]
ACM Transactions on Speech and Language Processing (TSLP) 7(4), 1–29 (2011)
Ward, W., Cole, R., Bolanos, D., Buchenroth-Martin, C., Svirsky, E., Vuuren, S.V., Weston, T., Zheng, J., Becker, L.: My science tutor: A conversational multimedia virtual tutor for elementary school science. ACM Transactions on Speech and Language Processing (TSLP) 7(4), 1–29 (2011)
work page 2011
-
[4]
Journal of Science Education and Technology, 1 –19 (2025) 14 H
Calvo-Utrilla, M., Pañ os, E., Ruiz-Gallardo, J.R.: Chatbots in science education: A scoping review of early empirical evidence. Journal of Science Education and Technology, 1 –19 (2025) 14 H. Su et al
work page 2025
-
[5]
Computers & Education 234, 105323 (2025)
Debets, T., Banihashem, S.K., Joosten-Ten Brinke, D., Vos, T.E., de Buy Wenniger, G.M., Camp, G.: Chatbots in education: A systematic review of objectives, underlying technology and theory, evaluation criteria, and impacts. Computers & Education 234, 105323 (2025)
work page 2025
-
[6]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... McGrew, B.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774. (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Smart Learning Environments 11(1), 28 (2024)
Zhai, C., Wibowo, S., Li, L.D.: The effects of over -reliance on ai dialogue systems on stu- dents’ cognitive abilities: a systematic review. Smart Learning Environments 11(1), 28 (2024)
work page 2024
-
[8]
Rienties, B.: The promise and challenges of generative AI in educat ion
Giannakos, M., Azevedo, R., Brusilovsky, P., Cukurova, M., Dimitriadis, Y., Hernandez - Leo, D., ... Rienties, B.: The promise and challenges of generative AI in educat ion. Behav- iour & Information Technology 44(11), 2518-2544 (2025)
work page 2025
-
[9]
The Clearing House 71(5), 297–301 (1998)
Elder, L., Paul, R.: The role of socratic questioning in thinking, teaching, and learning. The Clearing House 71(5), 297–301 (1998)
work page 1998
-
[10]
Ö.: Problem-Solving in science and technology education
Çavaş, B., Çavaş, P., Yılmaz, Y. Ö.: Problem-Solving in science and technology education. In: Contemporary issues in science and technology education, pp. 253–265. Springer (2023)
work page 2023
-
[11]
International Journal of Science Education 38(11), 1766–1784 (2016)
Kelly, R., McLoughlin, E., Finlayson, O.E.: Analysing student written solutions to investi- gate if problem-solving processes are evident throughout. International Journal of Science Education 38(11), 1766–1784 (2016)
work page 2016
-
[12]
Physical Review Physics Education Research 21(1), 010111 (2025)
Tschisgale, P., Kubsch, M., Wulff, P., Petersen, S., Neumann, K.: Exploring the sequential structure of students’ physics problem -solving approaches using process mining and s e- quence analysis. Physical Review Physics Education Research 21(1), 010111 (2025)
work page 2025
-
[13]
Princeton University Press (1945)
Pó lya, G.: How to solve it. Princeton University Press (1945)
work page 1945
-
[14]
Ervynck, G.: Mathematical Creativity, pp. 42–53. Springer Netherlands, Dordrecht (1991)
work page 1991
-
[15]
Do you just have to know that?
Tó thová , M., Rusek, M.: “Do you just have to know that?” Novice and experts’ procedure when solving science problem tasks Frontiers in Education 7, (2022)
work page 2022
-
[16]
Project-based Education and Other Activating Strategies in Science Education XVI, 98-104 (2021)
Rusek, M., Koreneková , K., Tó thová , M.: How Much Do We Know about the Way Students Solve Problem-tasks. Project-based Education and Other Activating Strategies in Science Education XVI, 98-104 (2021)
work page 2021
-
[17]
Mullis, I. V., Martin, M. O., Fishbein, B., Foy, P., Moncaleano, S. Findings from the TIMSS 2019 problem solving and inquiry tasks. https://timss2019.org/psi/, last accessed 2026/1/26
work page 2019
-
[18]
International Journal of STEM Education 1(1), (2014)
Scherer, R., Meß inger-Koppelt, J., Tiemann, R.: Developing a computer -based assessment of complex problem solving in Chemistry. International Journal of STEM Education 1(1), (2014)
work page 2014
-
[19]
Kuhail, M. A., Alturki, N., A lramlawi, S., Alhejori, K.: Interacting with educational chat- bots: A systematic review. Education and Information Technologies 28(1), 973–1018 (2022)
work page 2022
-
[20]
Interactive Learning Environments 32(8), 4529–4557 (2023)
Zhang, R., Zou, D., Cheng, G.: A review of chatbot -assisted learning: pedagogical ap- proaches, implementations, factors leading to effectiveness, theories, and future directions. Interactive Learning Environments 32(8), 4529–4557 (2023)
work page 2023
-
[21]
Knowledge and Information Systems 67(9), 7319–7354 (2025)
Yigit, G., Bayraktar, R.: Chatbot development strategies: a review of current studies and applications. Knowledge and Information Systems 67(9), 7319–7354 (2025)
work page 2025
-
[22]
Physical Review Physics Education Research 20(1), 010152 (2024)
Wan, T., Chen, Z.: Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning. Physical Review Physics Education Research 20(1), 010152 (2024)
work page 2024
-
[23]
Journal of Computer Assisted Learning 41(3), 1-20 (2025)
Tang, Q., Deng, W., Huang, Y., Wang, S., Zhang, H.: Can Generative Artificial Intelligence be a Good Teaching Assistant? An Empirical Analysis Based on Generative AI -Assisted Teaching. Journal of Computer Assisted Learning 41(3), 1-20 (2025)
work page 2025
-
[24]
De La Roca, M., Chan, M. M., Garcia‐Cabot, A., Garcia‐Lopez, E., Amado‐Salvatierra, H.: The impact of a chatbot working as an assistant in a course for supporting student learning and engagement. Computer Applications in Engineering Education 32(5), (2024) Comparing Custom and General-Purpose GAI Chatbots on Problem Solving 15
work page 2024
-
[25]
Education and Information Technologies 30, 1-32 (2025)
Min, T., Lee, B., Jho, H.: Integrating generative artificial intelligence in the design of scien- tific inquiry for middle school students. Education and Information Technologies 30, 1-32 (2025)
work page 2025
-
[26]
A.: Generative artificial intelligence and extended cognition in science learning contexts
Rivera-Novoa, A., Arias, D. A.: Generative artificial intelligence and extended cognition in science learning contexts. Science & Education, 1-22 (2025)
work page 2025
-
[27]
Tang, K., Putra, G. B. S.: Generative AI as a Dialogic Partner: Enhancing Multiple Perspec- tives, Reasoning, and Argumentation in Science Education with Customized Chatbots. Jour- nal of Science Education and Technology, (2025)
work page 2025
-
[28]
Ng, D. T. K., Tan, C. W., Leung, J. K. L.: Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study. British Journal of Educa- tional Technology 55(4), 1328–1353 (2024)
work page 2024
-
[29]
She, H., Cheng, M., Li, T., Wang, C., Chiu, H., Lee, P., Chou, W., Chuang, M. Web -based undergraduate chemistry problem -solving: The interplay of task performance, domain knowledge and web-searching strategies. Computers & Education 59(2), 750–761 (2012)
work page 2012
-
[30]
British Journal of Educational Technology, (2025)
Dang, B., Huynh, L., Gul, F., Rosé , C., Jä rvelä , S., Nguyen, A.: Human–AI collaborative learning in mixed reality: Examining the cognitive and socio‐emotional interactions. British Journal of Educational Technology, (2025)
work page 2025
-
[31]
British Journal of Educational Technology, (2025)
Feng, S.: Group interaction patterns in generative AI‐supported collaborative problem solv- ing: Network analysis of the interactions among students and a GAI chatbot. British Journal of Educational Technology, (2025)
work page 2025
-
[32]
Learning and Instruction 43, 39-51 (2016)
Jä rvelä , S., Jä rvenoja, H., Malmberg, J., Isohä tä lä , J., Sobocinski, M.: How do types of in- teraction and phases of self -regulated learning set a stage for collaborative engagement?. Learning and Instruction 43, 39-51 (2016)
work page 2016
-
[33]
Educational and Psychologica l Measurement 20(1), 37–46 (1960)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychologica l Measurement 20(1), 37–46 (1960)
work page 1960
-
[34]
Feng, S., He, B., Gasevic, D., Kirkley, A.: Heterogeneous Interaction Network Analysis (HINA): A New Learning Analytics Approach for Modelling, Analyzing, and Visualizing Complex Interactions in Learning Processes. arXiv preprint arXiv:2601.06771, (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[35]
Journal of Open Source Software 10(111), 8299 (2025)
Feng, S., He, B., Kirkley, A.: HINA: A Learning Analytics Tool for Heterogenous Interac- tion Network Analysis in Python. Journal of Open Source Software 10(111), 8299 (2025)
work page 2025
-
[36]
Australasian Journal of Educational Technology, (2025)
Li, S., Liu, J., Dong, Q.: Generative artificial i ntelligence-supported programming educa- tion: Effects on learning performance, self -efficacy and processes. Australasian Journal of Educational Technology, (2025)
work page 2025
-
[37]
Chen, X., Ruan, K., Ju, K. P., Yap, N., Wang, X.: More ai assistance reduces cognitive engagement: Examining the ai assistance dilemma in ai-supported note-taking. Proceedings of the ACM on Human-Computer Interaction 9(7), 1-29 (2025)
work page 2025
-
[38]
Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., ... Gašević, D.: Beware of metacog- nitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology 56(2), 489-530 (2025)
work page 2025
-
[39]
Riabko, A. V., Vakaliuk, T. A.: Physics on autopilot: exploring the use of an AI assistant for independent problem-solving practice. Educational Technology Quarterly 2024(1), 56-75 (2024)
work page 2024
-
[40]
Computers & Education, 105494 (2025)
Xi, L., Zhang, Y., Wang, Q.: Investigating the effects of an LLM -based Socratic conversa- tional agent on students’ academic performance and reflective thinking in higher education. Computers & Education, 105494 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.