Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance
Pith reviewed 2026-06-29 07:23 UTC · model grok-4.3
The pith
Modularizing an LLM agent into stage-specific components lets pedagogical rules be injected at each step of exercise solving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An agentic architecture that splits exercise-solving assistance into separate modules for different stages enables the direct incorporation of pedagogical constraints, producing guidance that is more controllable, transparent, and overseeable than monolithic LLM chatbots.
What carries the argument
Stage-specific modules within an agentic architecture that each embed targeted pedagogical advice during exercise solving.
If this is right
- Guidance at each stage can discourage direct answers in favor of hints that preserve student effort.
- Educators gain the ability to inspect or modify rules inside individual modules without retraining the whole system.
- Risks such as over-reliance on AI or loss of creativity can be addressed at the module level rather than through prompt engineering alone.
- The same modular skeleton can accept new pedagogical rules for different subjects or age groups.
Where Pith is reading between the lines
- The approach could be tested by measuring whether modular systems maintain pedagogical fidelity better than monolithic ones across multi-turn conversations.
- If successful, the design points toward reusable module libraries that different educational institutions could adapt without building full agents from scratch.
- It raises the further question of how to validate that module-level rules actually produce the intended learning outcomes rather than just sounding pedagogical.
Load-bearing premise
Pedagogical principles can be turned into concrete constraints inside separate modules while keeping the overall interaction coherent and helpful.
What would settle it
A controlled study in which students using the modular system show no measurable gains in critical-thinking or transfer skills compared with students using a standard monolithic LLM tutor.
Figures
read the original abstract
The widespread adoption of AI chatbots in education will drastically change learning, making responsible deployment a critical concern. While large language models (LLMs) might have access to sources discussing insights from educational sciences, they are not particularly inclined to adhere to pedagogical concepts, risking negative effects on the learning process, such as a loss of transfer capabilities, critical thinking, or creativity. In this paper, we introduce an agentic AI chatbot architecture assisting students with exercise solving, specifically designed to contribute to more responsible AI use in education. We base our conceptual development on the identification of several desiderata for responsible LLM-based educational systems, argue for the structural shortcomings inherent in monolithic, out-of-the-box solutions, and instead suggest modularizing the agentic architecture. We propose specific modules for different stages of exercise solving, enabling incorporation of targeted pedagogical advice, guiding students through the learning process in a more controllable, transparent, and overseeable manner.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a modular agentic architecture for LLM-based educational chatbots to assist with exercise solving. It identifies desiderata for responsible AI in education, critiques monolithic LLMs for failing to adhere to pedagogical concepts (risking loss of transfer, critical thinking, and creativity), and outlines stage-specific modules that purportedly enable targeted incorporation of educational science principles for more controllable, transparent, and overseeable guidance.
Significance. If the modular design can be realized with enforceable constraints, it could supply a reusable blueprint for aligning LLM agents with pedagogical goals in education, addressing a timely concern as AI chatbots proliferate. The contribution is primarily conceptual, resting on the definitional claim that decomposition by exercise-solving stage facilitates constraint injection; no empirical results or implementations are provided to demonstrate realized benefits.
major comments (2)
- [Abstract] Abstract and conceptual development section: the central claim that modularization overcomes the 'structural shortcomings inherent in monolithic, out-of-the-box solutions' and enables 'targeted pedagogical advice' is presented without any concrete specification of module interfaces, constraint mechanisms, or information flow between stages. This makes the controllability and transparency benefits definitional rather than demonstrated.
- [conceptual development] The translation of pedagogical concepts into module-level constraints (the weakest assumption noted in the design) is asserted but not illustrated with even a single worked example of how, e.g., a 'critical thinking' desideratum would be encoded in a particular stage module without coherence loss. This is load-bearing for the claim of responsible learning assistance.
minor comments (1)
- The manuscript would benefit from an explicit list or diagram of the proposed modules and their inputs/outputs to make the architecture reproducible from the text.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments correctly identify that the current manuscript remains at a high level of abstraction. We will revise to add the requested concreteness while preserving the paper's conceptual focus.
read point-by-point responses
-
Referee: [Abstract] Abstract and conceptual development section: the central claim that modularization overcomes the 'structural shortcomings inherent in monolithic, out-of-the-box solutions' and enables 'targeted pedagogical advice' is presented without any concrete specification of module interfaces, constraint mechanisms, or information flow between stages. This makes the controllability and transparency benefits definitional rather than demonstrated.
Authors: We agree that the absence of explicit module interfaces and information-flow descriptions leaves the claimed advantages at the definitional level. In the revision we will insert a new subsection that defines the input/output signatures of each stage module, the format in which pedagogical constraints are injected, and the protocol by which stage outputs are passed to the next module. revision: yes
-
Referee: [conceptual development] The translation of pedagogical concepts into module-level constraints (the weakest assumption noted in the design) is asserted but not illustrated with even a single worked example of how, e.g., a 'critical thinking' desideratum would be encoded in a particular stage module without coherence loss. This is load-bearing for the claim of responsible learning assistance.
Authors: We accept that a concrete worked example is necessary to substantiate the translation step. The revised manuscript will include at least one fully elaborated example showing how the desideratum of critical thinking is operationalized as a constraint within a designated stage module, including the prompt template, verification step, and mechanism for preserving coherence with adjacent stages. revision: yes
Circularity Check
No significant circularity
full rationale
The manuscript is a purely conceptual design proposal. It states desiderata for responsible educational LLMs, notes limitations of monolithic models, and outlines a modular architecture with stage-specific modules. No equations, fitted parameters, predictions, or uniqueness theorems appear. All central claims are definitional to the proposed design (i.e., the architecture is defined to incorporate pedagogical constraints) rather than derived from or reducing to prior fitted results or self-citations. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Monolithic out-of-the-box LLMs are structurally insufficient for responsible educational use because they do not reliably adhere to pedagogical concepts.
- domain assumption Pedagogical concepts from educational sciences can be incorporated via targeted module-level advice without compromising overall system coherence.
Reference graph
Works this paper leans on
-
[1]
ACM Computing Surveys55, 1 – 37 (2022),https://api.semanticscholar.org/ CorpusID:246035545
Abdelrahman, G.M., Wang, Q., Nunes, B.P.: Knowledge tracing: A survey. ACM Computing Surveys55, 1 – 37 (2022),https://api.semanticscholar.org/ CorpusID:246035545
2022
-
[2]
Higher Education Research & Development43(7), 1465– 1478 (2024)
Anson, D.W.J.: The impact of large language models on university stu- dents‚Äô literacy development: a dialogue with lea and street‚Äôs academic literacies framework. Higher Education Research & Development43(7), 1465– 1478 (2024). https://doi.org/10.1080/07294360.2024.2332259,https://doi.org/ 10.1080/07294360.2024.2332259
-
[3]
Anthropic: The claude 3 model family: Opus, sonnet, haiku. Tech. rep., Anthropic PBC (March 2024),https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf, official model card PDF for Claude 3 family (Opus, Sonnet, Haiku)
2024
-
[4]
In: Proc
Blobstein, A., Izmaylov, D., Yifat, T., Levy, M., Segal, A.: Angel: A new generation tool for learning material based questions and answers. In: Proc. of the NeurIPS Workshop on Generative AI for Education (GAIED) (2023)
2023
-
[5]
Frontiers in Human Dynamics 6(Jul 2024)
Cheong, B.C.: Transparency and accountability in ai systems: safeguarding well- being in the age of algorithmic decision-making. Frontiers in Human Dynamics 6(Jul 2024). https://doi.org/10.3389/fhumd.2024.1421273,http://dx.doi.org/ 10.3389/fhumd.2024.1421273
-
[6]
Information16(6) (2025),https://www.mdpi.com/ 2078-2489/16/6/469
Córdova-Esparza, D.M.: Ai-powered educational agents: Opportunities, innova- tions, and ethical challenges. Information16(6) (2025),https://www.mdpi.com/ 2078-2489/16/6/469
2025
- [7]
-
[8]
Demartini, C.G., Sciascia, L., Bosso, A., Manuri, F.: Artificial intelligence bringing improvements to adaptive learning in education: A case study. Sustainability16(3) (2024). https://doi.org/10.3390/su16031347,https://www.mdpi.com/2071-1050/ 16/3/1347
- [9]
-
[10]
Freeman, J.: Student generative ai survey 2025 (2025),https://www.hepi.ac.uk/ reports/student-generative-ai-survey-2025/, based on a survey conducted by Savanta, Foreword by Professor Janice Kay CBE
2025
-
[11]
Gemini Team et al.: Gemini: A family of highly capable multimodal models (2025), https://arxiv.org/abs/2312.11805 Modularizing Educational LLM-Agency for AI Responsibility 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Telematics and Informatics98, 102265 (2025)
Gnambs, T., Stein, J.P., Zinn, S., Griese, F., Appel, M.: Attitudes, experiences, and usage intentions of artificial intelligence: A popula- tion study in germany. Telematics and Informatics98, 102265 (2025). https://doi.org/https://doi.org/10.1016/j.tele.2025.102265,https://www. sciencedirect.com/science/article/pii/S0736585325000279
-
[13]
Review of educational research 77(1), 81–112 (2007)
Hattie, J., Timperley, H.: The power of feedback. Review of educational research 77(1), 81–112 (2007)
2007
-
[14]
Frontiers in EducationV olume 10 - 2025 (2025)
Jose, B., Cleetus, A., Joseph, B., Joseph, L., Jose, B., John, A.K.: Epis- temic authority and generative ai in learning spaces: rethinking knowl- edge in the algorithmic age. Frontiers in EducationV olume 10 - 2025 (2025). https://doi.org/10.3389/feduc.2025.1647687,https://www.frontiersin. org/journals/education/articles/10.3389/feduc.2025.1647687
-
[15]
In: Foundations and Frameworks for AI in Education, pp
Kaiser,G.,Kaiser,S.:Theuseofaiinteachingandlearning:Thepoweroffeedback. In: Foundations and Frameworks for AI in Education, pp. 255–290. IGI Global Scientific Publishing (2026)
2026
-
[16]
Educational psychologist51(2), 289–299 (2016)
Kapur, M.: Examining productive failure, productive success, unproductive failure, and unproductive success in learning. Educational psychologist51(2), 289–299 (2016)
2016
-
[17]
Khot, T., Trivedi, H., Finlayson, M., Fu, Y., Richardson, K., Clark, P., Sabharwal, A.: Decomposed prompting: A modular approach for solving complex tasks (2023), https://arxiv.org/abs/2210.02406
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
Maastricht Journal of European and Comparative Law27(6), 720–735 (2020)
Koulu, R.: Proceduralizing control and discretion: Human oversight in artificial in- telligence policy. Maastricht Journal of European and Comparative Law27(6), 720–735 (2020). https://doi.org/10.1177/1023263X20978649,https://doi.org/ 10.1177/1023263X20978649
-
[19]
Theory into prac- tice41(4), 212–218 (2002)
Krathwohl, D.R.: A revision of bloom’s taxonomy: An overview. Theory into prac- tice41(4), 212–218 (2002)
2002
-
[20]
Internet Policy Re- view9(05 2020)
Larsson, S., Heintz, F.: Transparency in artificial intelligence. Internet Policy Re- view9(05 2020). https://doi.org/10.14763/2020.2.1469
-
[21]
InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems
Long, D., Magerko, B.: What is ai literacy? competencies and design consid- erations. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. p. 1–16. CHI ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3313831.3376727,https: //doi.org/10.1145/3313831.3376727
- [22]
-
[23]
In: Arai, K
Malmqvist, L.: Sycophancy in large language models: Causes and mitigations. In: Arai, K. (ed.) Intelligent Computing. pp. 61–74. Springer Nature Switzerland, Cham (2025)
2025
-
[24]
In: Proceed- ings of the 2025 ACM Conference on Fairness, Accountability, and Transparency
Neumann, A., Kirsten, E., Zafar, M.B., Singh, J.: Position is power: System prompts as a mechanism of bias in large language models (llms). In: Proceed- ings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. pp. 573–598 (2025)
2025
-
[25]
OpenAI Team et al.: Gpt-4 technical report (2024),https://arxiv.org/abs/ 2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
IEEE Intelligent Systems40(1), 63–68 (2025)
O’Leary, D.E.: Confirmation and specificity biases in large language mod- els: An explorative study. IEEE Intelligent Systems40(1), 63–68 (2025). https://doi.org/10.1109/MIS.2024.3513992 14 J. Gabelmann et al
-
[27]
Pew Research Center: Americans’ awareness of AI and views of use in daily life, control over it (September 2025),https://www.pewresearch.org/science/2025/ 09/17/ai-in-americans-lives-awareness-experiences-and-attitudes/
2025
- [28]
-
[29]
Studies in Applied Linguistics and TESOL24(1) (2024)
Shetye, S.: An evaluation of khanmigo, a generative ai tool, as a computer-assisted language learning app. Studies in Applied Linguistics and TESOL24(1) (2024)
2024
-
[30]
In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Shridhar, K., Macina, J., El-Assady, M., Sinha, T., Kapur, M., Sachan, M.: Au- tomatic generation of socratic subquestions for teaching math word problems. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 4136–4149 (2022)
2022
-
[31]
Review of educational research78(1), 153–189 (2008)
Shute, V.J.: Focus on formative feedback. Review of educational research78(1), 153–189 (2008)
2008
-
[32]
Computers in Human Behavior160, 108386 (2024)
Stadler, M., Bannert, M., Sailer, M.: Cognitive Ease at a Cost: LLMs Reduce Mental Effort but Compromise Depth in Student Sci- entific Inquiry. Computers in Human Behavior160, 108386 (2024). https://doi.org/https://doi.org/10.1016/j.chb.2024.108386,https://www. sciencedirect.com/science/article/pii/S0747563224002541
-
[33]
Sterz, S., Baum, K., Biewer, S., Hermanns, H., Lauber-Rönsberg, A., Meinel, P., Markus, L.: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives (June 2024). https://doi.org/10.1145/3630106.3659051, fAccT ’24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans- parency
-
[34]
In: Psychology of learning and motivation, vol
Sweller, J.: Cognitive load theory. In: Psychology of learning and motivation, vol. 55, pp. 37–76. Elsevier (2011)
2011
- [35]
-
[36]
Vygotsky,L.S.:Mindinsociety:Thedevelopmentofhigherpsychologicalprocesses, vol. 86. Harvard university press (1978)
1978
-
[37]
Journal of child psychology and psychiatry17(2), 89–100 (1976)
Wood, D., Bruner, J.S., Ross, G.: The role of tutoring in problem solving. Journal of child psychology and psychiatry17(2), 89–100 (1976)
1976
-
[38]
Educational Researcher54(6), 358–368 (2025)
Wu, J.Y., Lee, Y.H., Chai, C.S., Tsai, C.C.: Strengthening human epis- temic agency in the symbiotic learning partnership with generative artificial intelligence. Educational Researcher54(6), 358–368 (2025). https://doi.org/10.3102/0013189X251333628,https://doi.org/10.3102/ 0013189X251333628 A Examples for Interactions with Different MALA Modules Figures ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.