Recognition: unknown
A Scoping Review of Large Language Model-Based Pedagogical Agents
Pith reviewed 2026-05-10 14:40 UTC · model grok-4.3
The pith
LLM-based pedagogical agents in education are characterized by four design dimensions: interaction approach, domain scope, role complexity, and system integration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Following PRISMA-ScR guidelines and searching five databases, the analysis of 52 studies shows that LLM-based pedagogical agents are defined by four key design dimensions: interaction approach (reactive versus proactive), domain scope (domain-specific versus general-purpose), role complexity (single-role versus multi-role), and system integration (standalone versus integrated). These dimensions span diverse educational contexts and reveal patterns such as the rise of multi-agent systems for naturalistic interactions and combinations with immersive tools or analytics.
What carries the argument
The four design dimensions—interaction approach, domain scope, role complexity, and system integration—that organize and classify LLM-based pedagogical agents across educational applications.
If this is right
- Educators and designers can select or create agents by combining specific values from the four dimensions to fit particular learning goals.
- Multi-agent setups may produce more realistic simulated learning environments than single agents.
- Linking agents to immersive technologies and learning analytics can expand their adaptability and feedback quality.
- Developers need to address privacy, accuracy, and autonomy concerns when scaling these systems in real classrooms.
- Identified gaps point to the need for studies on long-term effectiveness and integration challenges.
Where Pith is reading between the lines
- The four dimensions could be applied to compare LLM agents against earlier rule-based pedagogical agents to measure advances in natural language capabilities.
- Empirical tests of which dimension combinations improve student outcomes would turn the framework into actionable design guidelines.
- Rapid LLM improvements may require adding new dimensions over time, such as advanced reasoning or real-time personalization levels.
- The framework connects naturally to broader work on adaptive learning systems and intelligent tutoring by providing a shared vocabulary for agent features.
Load-bearing premise
The 52 studies retrieved from five databases in the specified period adequately represent the full emerging field of LLM pedagogical agents without major gaps from search limits or publication bias.
What would settle it
Publication of a new review or set of studies from the same period that identifies a significant number of agents whose designs fall outside the four dimensions or reveal different dominant trends.
read the original abstract
This scoping review examines the emerging field of Large Language Model (LLM)-based pedagogical agents in educational settings. While traditional pedagogical agents have been extensively studied, the integration of LLMs represents a transformative advancement with unprecedented capabilities in natural language understanding, reasoning, and adaptation. Following PRISMA-ScR guidelines, we analyzed 52 studies across five major databases from November 2022 to January 2025. Our findings reveal diverse LLM-based agents spanning K-12, higher education, and informal learning contexts across multiple subject domains. We identified four key design dimensions characterizing these agents: interaction approach (reactive vs. proactive), domain scope (domain-specific vs. general-purpose), role complexity (single-role vs. multi-role), and system integration (standalone vs. integrated). Emerging trends include multi-agent systems that simulate naturalistic learning environments, virtual student simulation for agent evaluation, integration with immersive technologies, and combinations with learning analytics. We also discuss significant research gaps and ethical considerations regarding privacy, accuracy, and student autonomy. This review provides researchers and practitioners with a comprehensive understanding of LLM-based pedagogical agents while identifying crucial areas for future development in this rapidly evolving field.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a scoping review of Large Language Model-based pedagogical agents following PRISMA-ScR guidelines. It examines 52 studies retrieved from five major databases between November 2022 and January 2025, covering K-12, higher education, and informal learning contexts. The review identifies four key design dimensions: interaction approach (reactive vs. proactive), domain scope (domain-specific vs. general-purpose), role complexity (single-role vs. multi-role), and system integration (standalone vs. integrated). It also highlights emerging trends like multi-agent systems, virtual student simulations, integration with immersive technologies and learning analytics, while discussing research gaps and ethical considerations such as privacy, accuracy, and student autonomy.
Significance. Should the included studies provide a representative sample of the field, the identification of these four design dimensions offers a valuable conceptual framework for categorizing and advancing LLM-based pedagogical agents. This could significantly aid researchers in designing more effective educational AI tools and help practitioners navigate the options in this fast-growing area. The discussion of trends and gaps further contributes to setting the research agenda.
major comments (2)
- [Methods] Although the abstract claims adherence to PRISMA-ScR, it does not provide the search strings used, detailed inclusion/exclusion criteria, or measures of inter-rater reliability. These omissions hinder evaluation of whether the 52 studies comprehensively represent the field, particularly given the rapid expansion of LLM applications in education since 2022 and potential omissions of preprints or non-indexed works.
- [Findings on Design Dimensions] The four dimensions are asserted to characterize the agents based on the 52 studies, but there is no description of the analytical process (e.g., how studies were coded or if the dimensions emerged inductively). This makes the claim that these are the 'key' dimensions difficult to assess for completeness or alternative interpretations.
minor comments (3)
- The time period ends in January 2025; please specify the exact search date to allow for reproducibility.
- A PRISMA-ScR flow diagram should be included to visually represent the study selection process from identification to final inclusion.
- Ensure all abbreviations (e.g., LLM, PRISMA-ScR) are defined at first use in the main text.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments, which have helped us identify areas where the manuscript can be strengthened for greater transparency and rigor. We have revised the paper to address both major comments and believe these changes improve the clarity of our methods and analytical approach without altering the core findings.
read point-by-point responses
-
Referee: [Methods] Although the abstract claims adherence to PRISMA-ScR, it does not provide the search strings used, detailed inclusion/exclusion criteria, or measures of inter-rater reliability. These omissions hinder evaluation of whether the 52 studies comprehensively represent the field, particularly given the rapid expansion of LLM applications in education since 2022 and potential omissions of preprints or non-indexed works.
Authors: We agree that explicit details enhance reproducibility and allow better assessment of coverage. The full methods section adheres to PRISMA-ScR, but we will revise the manuscript to include the complete search strings (now provided in a new Appendix A), expanded inclusion/exclusion criteria with justifications, and inter-rater reliability statistics (e.g., Cohen's kappa = 0.87 for screening). We will also add a limitations paragraph acknowledging the challenges of capturing all preprints and non-indexed works in this rapidly evolving field, while noting that our five-database search plus forward citation tracking aimed to maximize comprehensiveness. revision: yes
-
Referee: [Findings on Design Dimensions] The four dimensions are asserted to characterize the agents based on the 52 studies, but there is no description of the analytical process (e.g., how studies were coded or if the dimensions emerged inductively). This makes the claim that these are the 'key' dimensions difficult to assess for completeness or alternative interpretations.
Authors: We appreciate this feedback on methodological transparency. The dimensions were derived inductively via thematic analysis: two authors independently coded agent features from the 52 studies using an initial open-coding approach, then iteratively refined categories through constant comparison until saturation. Disagreements were resolved via consensus. In the revised manuscript, we will add a new subsection under Methods detailing this process, including the coding protocol, example codings for each dimension, and how the four dimensions were selected as the most salient and non-overlapping. This will enable readers to evaluate completeness and consider alternatives. revision: yes
Circularity Check
No circularity: purely descriptive scoping review with no derivations or self-referential claims
full rationale
The paper performs a PRISMA-ScR-guided scoping review of 52 studies retrieved from five databases, then maps observed patterns onto four design dimensions (interaction approach, domain scope, role complexity, system integration) extracted from the literature. No equations, fitted parameters, predictions, or uniqueness theorems appear; the dimensions are not defined in terms of themselves but are reported as emergent from the analyzed corpus. Self-citations, if present, are not load-bearing for the central mapping, and the work contains no ansatz smuggling, renaming of known results, or reduction of outputs to inputs by construction. The review is self-contained as a descriptive synthesis against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math PRISMA-ScR guidelines provide an adequate framework for scoping reviews of emerging technologies
Forward citations
Cited by 1 Pith paper
-
Agentic Education: Using Claude Code to Teach Claude Code
cc-self-train is an adaptive project-based curriculum for mastering Claude Code featuring persona progression from Guide to Launcher, hook-based engagement adaptation, cross-domain unified feature sequencing, explicit...
Reference graph
Works this paper leans on
-
[1]
imensions formed an analytical framework for extracting and categorizing data on how agents were designed and implemented across the literature. For the third question on emerging trends, we analyzed patterns across implementations from multiple studies to identify convergent developments and organized them into thematic categories. Given the LLM PEDAGOGI...
2018
-
[2]
For instance, Wei et al
3.1.1 Student-Facing Agents LLM-based pedagogical agents have been applied to diverse subject domains across educational levels, such as elementary and secondary science (Liu, Lu et al., 2024; Wei et al., 2024), programming (Bassner et al., 2024; Chiang et al., 2024; Glynn et al., 2024; Martínez-Araneda et al., 2023), literacy (Pan et al., 2024), anatomic...
2024
-
[3]
transforms traditional MOOCs into adaptive learning environments by employing multiple specialized agents including a teacher agent, teaching assistants, and customizable peer agents who collaboratively create an interactive classroom environment. Each agent serves distinct pedagogical functions: the teacher agent delivers content and guides discussions, ...
2024
-
[4]
lessons learned
serves as a naturally interactable teaching assistant that uses multiple modalities including speech, gesture, gaze, and spatial awareness to provide more human-like educational interactions. This system demonstrates how LLM-based agents can be embodied in ways that enhance their presence and effectiveness in educational settings. Assessment and feedback ...
2023
-
[5]
Discussion 4.1 Synthesis of Key Findings Regarding our first research question on the current landscape of LLM-based pedagogical agents, our review reveals a diverse ecosystem spanning multiple educational contexts and subject domains. These agents are being deployed across various subject-specific applications including elementary science education (Wei ...
2024
-
[6]
offer broader applicability but may lack specialized knowledge. Similarly, we find important relationships LLM PEDAGOGICAL AGENTS 15 between role complexity and interaction approach, with multi-role systems like Wei et al.’s 92024) GPA or Yu et al.’s (2024) MAIC typically incorporating both reactive and proactive elements distributed across specialized ag...
2024
-
[7]
include learners with different learner profiles... and examine their interactions with LLM-based chatbots
explore teacher-agent interactions, few examine how these systems are implemented in authentic educational environments or how they affect teacher roles and practices. Research is needed on how teachers perceive, adopt, and adapt these agents within their existing pedagogical approaches, particularly regarding the balance of authority and responsibility b...
2024
-
[8]
On the other hand, their implementation requires careful integration with existing pedagogical approaches and technological ecosystems
demonstrate how these agents can extend teacher capabilities, providing personalized support that would be impractical for a single human educator to offer. On the other hand, their implementation requires careful integration with existing pedagogical approaches and technological ecosystems. Teachers will need professional development focused not merely o...
2024
-
[9]
As Martínez-Araneda et al
potentially access extensive student data across multiple educational platforms, creating significant security and consent challenges. As Martínez-Araneda et al. (2023) acknowledged regarding TutorBot+, storing and analyzing student interactions can provide valuable insights. However, it simultaneously raised data protection LLM PEDAGOGICAL AGENTS 17 ques...
2023
-
[10]
ChemCrow: Augmenting large-language models with chemistry tools
Conclusion In this study, we conducted a scoping review of LLM-based pedagogical agents, a rapidly evolving field that is transforming educational technology through unprecedented natural language capabilities. Our analysis of 52 studies identified diverse applications across educational contexts, characterized by four key design dimensions: interaction a...
work page internal anchor Pith review arXiv 2024
-
[11]
https://doi.org/10.1016/j.compedu.2022.104607 Dan, Y., Lei, Z., Gu, Y., Li, Y., Yin, J., Lin, J., Ye, L., Tie, Z., Zhou, Y., & Wang, Y. (2023). Educhat: A large-scale language model-based chatbot system for intelligent education. ArXiv Preprint ArXiv:2308.02773. El-Deeb, S., Jahankhani, H., Amin Metwally Hussien, O. A., & Will Arachchige, I. S. (2024). To...
-
[12]
Hu, W., Tian, J., & Li, Y. (2025). Enhancing student engagement in online collaborative writing through a generative AI-based conversational agent. The Internet and Higher Education, 65, 100979. Jia, Y., Sin, Z. P. T., Wang, X. E., Li, C., Ng, P. H. F., Huang, X., Dong, J., Wang, Y., Baciu, G., & Cao, J. (2024). NivTA: Towards a naturally interactable edu...
-
[13]
closed” cues for targeted questions and flexible “open
https://doi.org/10.1016/j.compedu.2024.105165 Zhang, Z., Zhang-Li, D., Yu, J., Gong, L., Zhou, J., Liu, Z., Hou, L., & Li, J. (2024). Simulating classroom education with LLM-empowered agents. ArXiv Preprint ArXiv:2406.19226. LLM PEDAGOGICAL AGENTS 23 Appendix Representative LLM-based Pedagogical Agents Study Agent Name Purpose/Goal Role & Interaction Cont...
-
[14]
for processing teacher input Hu et al. (2025) CollaBot To enhance students’ engagement in online collaborative writing (OCW) activities by providing adaptive scaffolding support for groups working together on writing tasks Functions as an adaptive support system with two activation mechanisms: (1) Event-triggered: Monitors group behaviors and automaticall...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.