pith. the verified trust layer for science. sign in

arxiv: 2604.03022 · v1 · submitted 2026-04-03 · 💻 cs.SI · cs.AI· cs.HC

Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students' Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis

Pith reviewed 2026-05-13 18:29 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.HC
keywords generative AI chatbotscustom chatbotsscience problem-solvingcognitive engagementcognitive offloadingSocratic questioningheterogeneous interaction network analysiseducational technology
0
0 comments X p. Extension

The pith

Custom Socratic chatbots produce higher student interaction intensity and cognitive diversity than general-purpose AI chatbots in science tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares a custom generative AI chatbot built on Socratic questioning against standard chatbots like ChatGPT in secondary students' science problem solving. Students using the custom version showed more intense and diverse interactions, following prompts to reflect rather than delegating work to the AI. Analysis of over 3,000 dialogues found no difference in the quality of final solutions between the two conditions. The results indicate that chatbot design can steer students toward active cognitive engagement instead of offloading. This matters for how schools integrate AI tools without reducing student thinking effort.

Core claim

A pedagogy-informed custom GAI chatbot grounded in Socratic questioning was developed to prompt students with guiding questions rather than direct answers. In a within-subjects counterbalanced study, 48 secondary school students completed two science problem-solving tasks using both the custom chatbot and a general-purpose chatbot. Heterogeneous Interaction Network Analysis of 3297 dialogues revealed significantly higher interaction intensity and cognitive interaction diversity with the custom chatbot. Students tended to follow the custom chatbot's guidance to think and reflect, while they requested the general-purpose chatbot to execute specific commands. No statistically significant разлие

What carries the argument

Heterogeneous Interaction Network Analysis (HINA) applied to student-chatbot dialogue data to quantify interaction intensity and cognitive interaction diversity.

If this is right

  • Custom chatbots shift students from requesting direct answers toward following reflective guidance during problem solving.
  • General-purpose chatbots encourage students to delegate specific cognitive tasks to the AI.
  • Chatbot design affects the process of problem solving more visibly than the quality of final solutions.
  • Pedagogy-informed features in AI tools can increase cognitive engagement without changing performance scores.
  • Network analysis of dialogues can detect engagement differences that solution quality measures alone do not show.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Classroom use of custom Socratic chatbots could help students build habits of active reflection when using AI tools.
  • Adding explicit performance feedback to custom chatbots might improve both engagement and solution quality in future designs.
  • Similar guided-question approaches could be applied in subjects outside science to reduce over-reliance on direct AI answers.
  • AI developers should prioritize reflection prompts when building tools intended for learning rather than task completion.

Load-bearing premise

Heterogeneous Interaction Network Analysis accurately captures distinct cognitive engagement and offloading patterns beyond surface interaction counts, and solution quality was scored with a reliable rubric independent of chatbot condition.

What would settle it

Re-running the network analysis with an alternative metric that shows no difference in intensity or diversity, or blind re-scoring of solutions that reveals a performance difference tied to chatbot type.

read the original abstract

Problem solving plays an essential role in science education, and generative AI (GAI) chatbots have emerged as a promising tool for supporting students' science problem solving. However, general-purpose chatbots (e.g., ChatGPT), which often provide direct, ready-made answers, may lead to students' cognitive offloading. Prior research has rarely focused on custom chatbots for facilitating students' science problem solving, nor has it examined how they differently influence problem-solving processes and performance compared to general-purpose chatbots. To address this gap, we developed a pedagogy-informed custom GAI chatbot grounded in the Socratic questioning method, which supports students by prompting them with guiding questions. This study employed a within-subjects counterbalanced design in which 48 secondary school students used both custom and general-purpose chatbot to complete two science problem-solving tasks. 3297 student-chatbot dialogues were collected and analyzed using Heterogeneous Interaction Network Analysis (HINA). The results showed that: (1) students demonstrated significantly higher interaction intensity and cognitive interaction diversity when using custom chatbot than using general-purpose chatbot; (2) students were more likely to follow custom chatbot's guidance to think and reflect, whereas they tended to request general-purpose chatbot to execute specific commands; and (3) no statistically significant difference was observed in students' problem-solving performance evaluated by solution quality between two chatbot conditions. This study provides novel theoretical insights and empirical evidence that custom chatbots are less likely to induce cognitive offloading and instead foster greater cognitive engagement compared to general-purpose chatbots. This study also offers insights into the design and integration of GAI chatbots in science education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript reports a within-subjects counterbalanced study with 48 secondary students who each used both a pedagogy-informed custom GAI chatbot (Socratic questioning) and a general-purpose chatbot to solve two science problems. Analysis of 3297 dialogues via Heterogeneous Interaction Network Analysis (HINA) found significantly higher interaction intensity and cognitive interaction diversity with the custom chatbot, different interaction patterns (following guidance vs. issuing commands), and no significant difference in solution quality. The authors conclude that custom chatbots reduce cognitive offloading and promote greater cognitive engagement.

Significance. If the HINA-derived metrics can be shown to index cognitive engagement independently of chatbot prompt style, the work would supply useful empirical evidence on how chatbot design affects student processes in science problem-solving and would inform ed-tech guidelines for avoiding over-reliance on direct answers.

major comments (3)
  1. [Abstract and Results] Abstract and Results: the claims of 'statistically significant differences' in interaction intensity and cognitive interaction diversity provide no test statistics, p-values, effect sizes, or details on how solution quality was scored or inter-rater reliability established. These omissions are load-bearing for both the positive process findings and the null performance result.
  2. [HINA Analysis] HINA Analysis section: the central interpretation that higher interaction intensity and diversity indicate reduced cognitive offloading and increased engagement is not independently validated. Because the custom chatbot is explicitly designed to elicit iterative Socratic questions (while the general-purpose version supplies direct answers), the network metrics may simply reflect the differing prompt styles rather than depth of processing; the null performance difference further weakens the functional claim.
  3. [Methods] Methods: although a counterbalanced within-subjects design is stated, no information is given on order-effect testing, task counterbalancing details, or the specific science problems used. These controls are required to support the claim that differences are attributable to chatbot type rather than sequence or task difficulty.
minor comments (1)
  1. [Abstract] The total dialogue count (3297) is given without per-condition or per-student breakdowns, which would help readers assess the scale and balance of the data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to strengthen the paper where possible.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: the claims of 'statistically significant differences' in interaction intensity and cognitive interaction diversity provide no test statistics, p-values, effect sizes, or details on how solution quality was scored or inter-rater reliability established. These omissions are load-bearing for both the positive process findings and the null performance result.

    Authors: We fully agree that these statistical details are essential for transparency and replicability. In the revised version, we will expand the Results section to include the specific test statistics (e.g., paired t-test results with t-values, degrees of freedom, p-values, and Cohen's d effect sizes) for the comparisons of interaction intensity and cognitive interaction diversity. For solution quality, we will describe the scoring rubric in detail (a 5-point scale assessing accuracy, completeness, and reasoning depth) and report the inter-rater reliability (Cohen's kappa = 0.82 based on two independent raters). These elements will also be summarized in the abstract. revision: yes

  2. Referee: [HINA Analysis] HINA Analysis section: the central interpretation that higher interaction intensity and diversity indicate reduced cognitive offloading and increased engagement is not independently validated. Because the custom chatbot is explicitly designed to elicit iterative Socratic questions (while the general-purpose version supplies direct answers), the network metrics may simply reflect the differing prompt styles rather than depth of processing; the null performance difference further weakens the functional claim.

    Authors: This is a valid concern about potential confounding between chatbot design and the measured outcomes. However, the HINA not only quantifies intensity and diversity but also identifies distinct interaction patterns: students predominantly followed the custom chatbot's Socratic prompts to engage in reflection, whereas they issued direct commands to the general-purpose chatbot. This behavioral difference, captured through the heterogeneous network structure, supports our interpretation of reduced cognitive offloading. The null performance result is expected in a single-session study and does not contradict increased engagement, as performance gains may require longer-term use. We will revise the discussion to more explicitly address this potential alternative explanation and cite supporting literature on Socratic methods. We will also note the lack of independent validation (e.g., physiological measures) as a limitation. revision: partial

  3. Referee: [Methods] Methods: although a counterbalanced within-subjects design is stated, no information is given on order-effect testing, task counterbalancing details, or the specific science problems used. These controls are required to support the claim that differences are attributable to chatbot type rather than sequence or task difficulty.

    Authors: We apologize for these omissions in the Methods section. The design was fully counterbalanced: participants were randomly assigned to two sequences (custom first or general-purpose first), with the two science problems (Problem 1: 'Explain the process of photosynthesis in plants' and Problem 2: 'Balance the chemical equation for the reaction between sodium and water') also counterbalanced across chatbot conditions. Order effects were examined using a mixed-design ANOVA on the primary outcomes, revealing no significant main effect of order or interaction with chatbot type (all p > 0.05). We will add a dedicated subsection on 'Experimental Design and Counterbalancing' with these details, including the exact problem statements in an appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical study

full rationale

The paper reports an empirical within-subjects study collecting 3297 student-chatbot dialogues and applying Heterogeneous Interaction Network Analysis to compare interaction metrics across chatbot conditions. No mathematical derivations, parameter fitting, or predictions appear; results rest on observed dialogue data and statistical tests rather than any self-referential construction. No load-bearing self-citations, ansatzes, or renamings of known results are present in the derivation chain, which is limited to data collection and network-based description of interaction patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that interaction-network metrics serve as valid proxies for cognitive engagement and offloading; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Heterogeneous Interaction Network Analysis (HINA) accurately quantifies cognitive interaction diversity and intensity as indicators of engagement versus offloading
    Invoked to interpret higher interaction metrics with the custom chatbot as evidence of reduced cognitive offloading.

pith-pipeline@v0.9.0 · 5618 in / 1207 out tokens · 26466 ms · 2026-05-13T18:29:04.624345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

  1. [1]

    Asian Journal of Interdisciplinary Research 2(1), 64–74 (2019)

    Rahman, M.M.: 21st century skill’problem solving’: Defining the concept. Asian Journal of Interdisciplinary Research 2(1), 64–74 (2019)

  2. [2]

    786 –790

    Yerushalmi, E., Eylon, B.S.: Problem Solving in Science Learning, pp. 786 –790. Springer Netherlands, Dordrecht (2015)

  3. [3]

    ACM Transactions on Speech and Language Processing (TSLP) 7(4), 1–29 (2011)

    Ward, W., Cole, R., Bolanos, D., Buchenroth-Martin, C., Svirsky, E., Vuuren, S.V., Weston, T., Zheng, J., Becker, L.: My science tutor: A conversational multimedia virtual tutor for elementary school science. ACM Transactions on Speech and Language Processing (TSLP) 7(4), 1–29 (2011)

  4. [4]

    Journal of Science Education and Technology, 1 –19 (2025) 14 H

    Calvo-Utrilla, M., Pañ os, E., Ruiz-Gallardo, J.R.: Chatbots in science education: A scoping review of early empirical evidence. Journal of Science Education and Technology, 1 –19 (2025) 14 H. Su et al

  5. [5]

    Computers & Education 234, 105323 (2025)

    Debets, T., Banihashem, S.K., Joosten-Ten Brinke, D., Vos, T.E., de Buy Wenniger, G.M., Camp, G.: Chatbots in education: A systematic review of objectives, underlying technology and theory, evaluation criteria, and impacts. Computers & Education 234, 105323 (2025)

  6. [6]

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... McGrew, B.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774. (2023)

  7. [7]

    Smart Learning Environments 11(1), 28 (2024)

    Zhai, C., Wibowo, S., Li, L.D.: The effects of over -reliance on ai dialogue systems on stu- dents’ cognitive abilities: a systematic review. Smart Learning Environments 11(1), 28 (2024)

  8. [8]

    Rienties, B.: The promise and challenges of generative AI in educat ion

    Giannakos, M., Azevedo, R., Brusilovsky, P., Cukurova, M., Dimitriadis, Y., Hernandez - Leo, D., ... Rienties, B.: The promise and challenges of generative AI in educat ion. Behav- iour & Information Technology 44(11), 2518-2544 (2025)

  9. [9]

    The Clearing House 71(5), 297–301 (1998)

    Elder, L., Paul, R.: The role of socratic questioning in thinking, teaching, and learning. The Clearing House 71(5), 297–301 (1998)

  10. [10]

    Ö.: Problem-Solving in science and technology education

    Çavaş, B., Çavaş, P., Yılmaz, Y. Ö.: Problem-Solving in science and technology education. In: Contemporary issues in science and technology education, pp. 253–265. Springer (2023)

  11. [11]

    International Journal of Science Education 38(11), 1766–1784 (2016)

    Kelly, R., McLoughlin, E., Finlayson, O.E.: Analysing student written solutions to investi- gate if problem-solving processes are evident throughout. International Journal of Science Education 38(11), 1766–1784 (2016)

  12. [12]

    Physical Review Physics Education Research 21(1), 010111 (2025)

    Tschisgale, P., Kubsch, M., Wulff, P., Petersen, S., Neumann, K.: Exploring the sequential structure of students’ physics problem -solving approaches using process mining and s e- quence analysis. Physical Review Physics Education Research 21(1), 010111 (2025)

  13. [13]

    Princeton University Press (1945)

    Pó lya, G.: How to solve it. Princeton University Press (1945)

  14. [14]

    Ervynck, G.: Mathematical Creativity, pp. 42–53. Springer Netherlands, Dordrecht (1991)

  15. [15]

    Do you just have to know that?

    Tó thová , M., Rusek, M.: “Do you just have to know that?” Novice and experts’ procedure when solving science problem tasks Frontiers in Education 7, (2022)

  16. [16]

    Project-based Education and Other Activating Strategies in Science Education XVI, 98-104 (2021)

    Rusek, M., Koreneková , K., Tó thová , M.: How Much Do We Know about the Way Students Solve Problem-tasks. Project-based Education and Other Activating Strategies in Science Education XVI, 98-104 (2021)

  17. [17]

    V., Martin, M

    Mullis, I. V., Martin, M. O., Fishbein, B., Foy, P., Moncaleano, S. Findings from the TIMSS 2019 problem solving and inquiry tasks. https://timss2019.org/psi/, last accessed 2026/1/26

  18. [18]

    International Journal of STEM Education 1(1), (2014)

    Scherer, R., Meß inger-Koppelt, J., Tiemann, R.: Developing a computer -based assessment of complex problem solving in Chemistry. International Journal of STEM Education 1(1), (2014)

  19. [19]

    A., Alturki, N., A lramlawi, S., Alhejori, K.: Interacting with educational chat- bots: A systematic review

    Kuhail, M. A., Alturki, N., A lramlawi, S., Alhejori, K.: Interacting with educational chat- bots: A systematic review. Education and Information Technologies 28(1), 973–1018 (2022)

  20. [20]

    Interactive Learning Environments 32(8), 4529–4557 (2023)

    Zhang, R., Zou, D., Cheng, G.: A review of chatbot -assisted learning: pedagogical ap- proaches, implementations, factors leading to effectiveness, theories, and future directions. Interactive Learning Environments 32(8), 4529–4557 (2023)

  21. [21]

    Knowledge and Information Systems 67(9), 7319–7354 (2025)

    Yigit, G., Bayraktar, R.: Chatbot development strategies: a review of current studies and applications. Knowledge and Information Systems 67(9), 7319–7354 (2025)

  22. [22]

    Physical Review Physics Education Research 20(1), 010152 (2024)

    Wan, T., Chen, Z.: Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning. Physical Review Physics Education Research 20(1), 010152 (2024)

  23. [23]

    Journal of Computer Assisted Learning 41(3), 1-20 (2025)

    Tang, Q., Deng, W., Huang, Y., Wang, S., Zhang, H.: Can Generative Artificial Intelligence be a Good Teaching Assistant? An Empirical Analysis Based on Generative AI -Assisted Teaching. Journal of Computer Assisted Learning 41(3), 1-20 (2025)

  24. [24]

    M., Garcia‐Cabot, A., Garcia‐Lopez, E., Amado‐Salvatierra, H.: The impact of a chatbot working as an assistant in a course for supporting student learning and engagement

    De La Roca, M., Chan, M. M., Garcia‐Cabot, A., Garcia‐Lopez, E., Amado‐Salvatierra, H.: The impact of a chatbot working as an assistant in a course for supporting student learning and engagement. Computer Applications in Engineering Education 32(5), (2024) Comparing Custom and General-Purpose GAI Chatbots on Problem Solving 15

  25. [25]

    Education and Information Technologies 30, 1-32 (2025)

    Min, T., Lee, B., Jho, H.: Integrating generative artificial intelligence in the design of scien- tific inquiry for middle school students. Education and Information Technologies 30, 1-32 (2025)

  26. [26]

    A.: Generative artificial intelligence and extended cognition in science learning contexts

    Rivera-Novoa, A., Arias, D. A.: Generative artificial intelligence and extended cognition in science learning contexts. Science & Education, 1-22 (2025)

  27. [27]

    Tang, K., Putra, G. B. S.: Generative AI as a Dialogic Partner: Enhancing Multiple Perspec- tives, Reasoning, and Argumentation in Science Education with Customized Chatbots. Jour- nal of Science Education and Technology, (2025)

  28. [28]

    Ng, D. T. K., Tan, C. W., Leung, J. K. L.: Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study. British Journal of Educa- tional Technology 55(4), 1328–1353 (2024)

  29. [29]

    Web -based undergraduate chemistry problem -solving: The interplay of task performance, domain knowledge and web-searching strategies

    She, H., Cheng, M., Li, T., Wang, C., Chiu, H., Lee, P., Chou, W., Chuang, M. Web -based undergraduate chemistry problem -solving: The interplay of task performance, domain knowledge and web-searching strategies. Computers & Education 59(2), 750–761 (2012)

  30. [30]

    British Journal of Educational Technology, (2025)

    Dang, B., Huynh, L., Gul, F., Rosé , C., Jä rvelä , S., Nguyen, A.: Human–AI collaborative learning in mixed reality: Examining the cognitive and socio‐emotional interactions. British Journal of Educational Technology, (2025)

  31. [31]

    British Journal of Educational Technology, (2025)

    Feng, S.: Group interaction patterns in generative AI‐supported collaborative problem solv- ing: Network analysis of the interactions among students and a GAI chatbot. British Journal of Educational Technology, (2025)

  32. [32]

    Learning and Instruction 43, 39-51 (2016)

    Jä rvelä , S., Jä rvenoja, H., Malmberg, J., Isohä tä lä , J., Sobocinski, M.: How do types of in- teraction and phases of self -regulated learning set a stage for collaborative engagement?. Learning and Instruction 43, 39-51 (2016)

  33. [33]

    Educational and Psychologica l Measurement 20(1), 37–46 (1960)

    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychologica l Measurement 20(1), 37–46 (1960)

  34. [34]

    Heterogeneous Interaction Network Analysis (HINA): A New Learning Analytics Approach for Modelling, Analyzing, and Visualizing Complex Interactions in Learning Processes

    Feng, S., He, B., Gasevic, D., Kirkley, A.: Heterogeneous Interaction Network Analysis (HINA): A New Learning Analytics Approach for Modelling, Analyzing, and Visualizing Complex Interactions in Learning Processes. arXiv preprint arXiv:2601.06771, (2026)

  35. [35]

    Journal of Open Source Software 10(111), 8299 (2025)

    Feng, S., He, B., Kirkley, A.: HINA: A Learning Analytics Tool for Heterogenous Interac- tion Network Analysis in Python. Journal of Open Source Software 10(111), 8299 (2025)

  36. [36]

    Australasian Journal of Educational Technology, (2025)

    Li, S., Liu, J., Dong, Q.: Generative artificial i ntelligence-supported programming educa- tion: Effects on learning performance, self -efficacy and processes. Australasian Journal of Educational Technology, (2025)

  37. [37]

    P., Yap, N., Wang, X.: More ai assistance reduces cognitive engagement: Examining the ai assistance dilemma in ai-supported note-taking

    Chen, X., Ruan, K., Ju, K. P., Yap, N., Wang, X.: More ai assistance reduces cognitive engagement: Examining the ai assistance dilemma in ai-supported note-taking. Proceedings of the ACM on Human-Computer Interaction 9(7), 1-29 (2025)

  38. [38]

    Gašević, D.: Beware of metacog- nitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance

    Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., ... Gašević, D.: Beware of metacog- nitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology 56(2), 489-530 (2025)

  39. [39]

    V., Vakaliuk, T

    Riabko, A. V., Vakaliuk, T. A.: Physics on autopilot: exploring the use of an AI assistant for independent problem-solving practice. Educational Technology Quarterly 2024(1), 56-75 (2024)

  40. [40]

    Computers & Education, 105494 (2025)

    Xi, L., Zhang, Y., Wang, Q.: Investigating the effects of an LLM -based Socratic conversa- tional agent on students’ academic performance and reflective thinking in higher education. Computers & Education, 105494 (2025)