arxiv: 2604.12253 · v1 · submitted 2026-04-14 · 💻 cs.AI

Recognition: unknown

A Scoping Review of Large Language Model-Based Pedagogical Agents

Shan Li , Juan Zheng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:40 UTC · model grok-4.3

classification 💻 cs.AI

keywords large language modelspedagogical agentsscoping revieweducational technologyAI in educationdesign dimensionsLLM agentslearning analytics

0 comments

The pith

LLM-based pedagogical agents in education are characterized by four design dimensions: interaction approach, domain scope, role complexity, and system integration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This scoping review examines 52 studies on large language model-based pedagogical agents published between November 2022 and January 2025. It organizes the agents using four dimensions to map their variety across K-12, higher education, and informal learning in multiple subjects. The review also highlights trends such as multi-agent systems that simulate learning environments, virtual student simulations for testing, and links to immersive technologies or learning analytics. It notes research gaps and ethical issues around privacy, accuracy, and student autonomy. The resulting framework offers researchers and practitioners a way to understand current designs and plan next steps in AI-supported education.

Core claim

Following PRISMA-ScR guidelines and searching five databases, the analysis of 52 studies shows that LLM-based pedagogical agents are defined by four key design dimensions: interaction approach (reactive versus proactive), domain scope (domain-specific versus general-purpose), role complexity (single-role versus multi-role), and system integration (standalone versus integrated). These dimensions span diverse educational contexts and reveal patterns such as the rise of multi-agent systems for naturalistic interactions and combinations with immersive tools or analytics.

What carries the argument

The four design dimensions—interaction approach, domain scope, role complexity, and system integration—that organize and classify LLM-based pedagogical agents across educational applications.

If this is right

Educators and designers can select or create agents by combining specific values from the four dimensions to fit particular learning goals.
Multi-agent setups may produce more realistic simulated learning environments than single agents.
Linking agents to immersive technologies and learning analytics can expand their adaptability and feedback quality.
Developers need to address privacy, accuracy, and autonomy concerns when scaling these systems in real classrooms.
Identified gaps point to the need for studies on long-term effectiveness and integration challenges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The four dimensions could be applied to compare LLM agents against earlier rule-based pedagogical agents to measure advances in natural language capabilities.
Empirical tests of which dimension combinations improve student outcomes would turn the framework into actionable design guidelines.
Rapid LLM improvements may require adding new dimensions over time, such as advanced reasoning or real-time personalization levels.
The framework connects naturally to broader work on adaptive learning systems and intelligent tutoring by providing a shared vocabulary for agent features.

Load-bearing premise

The 52 studies retrieved from five databases in the specified period adequately represent the full emerging field of LLM pedagogical agents without major gaps from search limits or publication bias.

What would settle it

Publication of a new review or set of studies from the same period that identifies a significant number of agents whose designs fall outside the four dimensions or reveal different dominant trends.

read the original abstract

This scoping review examines the emerging field of Large Language Model (LLM)-based pedagogical agents in educational settings. While traditional pedagogical agents have been extensively studied, the integration of LLMs represents a transformative advancement with unprecedented capabilities in natural language understanding, reasoning, and adaptation. Following PRISMA-ScR guidelines, we analyzed 52 studies across five major databases from November 2022 to January 2025. Our findings reveal diverse LLM-based agents spanning K-12, higher education, and informal learning contexts across multiple subject domains. We identified four key design dimensions characterizing these agents: interaction approach (reactive vs. proactive), domain scope (domain-specific vs. general-purpose), role complexity (single-role vs. multi-role), and system integration (standalone vs. integrated). Emerging trends include multi-agent systems that simulate naturalistic learning environments, virtual student simulation for agent evaluation, integration with immersive technologies, and combinations with learning analytics. We also discuss significant research gaps and ethical considerations regarding privacy, accuracy, and student autonomy. This review provides researchers and practitioners with a comprehensive understanding of LLM-based pedagogical agents while identifying crucial areas for future development in this rapidly evolving field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This scoping review maps 52 studies into a four-way taxonomy for LLM pedagogical agents and flags some trends, but its reliability depends on search details not visible in the abstract.

read the letter

The paper organizes recent work on LLM-based pedagogical agents by pulling together 52 studies from five databases and laying out four design dimensions: reactive versus proactive interaction, domain-specific versus general-purpose scope, single-role versus multi-role setups, and standalone versus integrated systems. It also points to trends like multi-agent environments, virtual student simulations, and ties to immersive tech or learning analytics, while noting ethical issues around privacy and accuracy. That synthesis gives a practical starting point for anyone new to the area. The PRISMA-ScR framing is the right tool for a scoping review, and the post-2022 date range lines up with the real explosion in this space. What it does well is give structure without overclaiming technical novelty. The soft spot is coverage. The field moves fast with arXiv preprints, conference papers, and non-English work, so five databases over a narrow window could leave gaps that make the dimensions look more settled than they are. The abstract skips the actual search strings, inclusion rules, and any inter-rater checks, which leaves the completeness hard to judge without the full methods. This paper is mainly for researchers or practitioners who need a quick map before starting a project rather than for people looking for new algorithms or rigorous experiments. It deserves a serious referee. The topic is timely, the taxonomy is a reasonable organizing device, and reviewers can check the search rigor and push for expansions where needed. I would send it out for review.

Referee Report

2 major / 3 minor

Summary. The paper conducts a scoping review of Large Language Model-based pedagogical agents following PRISMA-ScR guidelines. It examines 52 studies retrieved from five major databases between November 2022 and January 2025, covering K-12, higher education, and informal learning contexts. The review identifies four key design dimensions: interaction approach (reactive vs. proactive), domain scope (domain-specific vs. general-purpose), role complexity (single-role vs. multi-role), and system integration (standalone vs. integrated). It also highlights emerging trends like multi-agent systems, virtual student simulations, integration with immersive technologies and learning analytics, while discussing research gaps and ethical considerations such as privacy, accuracy, and student autonomy.

Significance. Should the included studies provide a representative sample of the field, the identification of these four design dimensions offers a valuable conceptual framework for categorizing and advancing LLM-based pedagogical agents. This could significantly aid researchers in designing more effective educational AI tools and help practitioners navigate the options in this fast-growing area. The discussion of trends and gaps further contributes to setting the research agenda.

major comments (2)

[Methods] Although the abstract claims adherence to PRISMA-ScR, it does not provide the search strings used, detailed inclusion/exclusion criteria, or measures of inter-rater reliability. These omissions hinder evaluation of whether the 52 studies comprehensively represent the field, particularly given the rapid expansion of LLM applications in education since 2022 and potential omissions of preprints or non-indexed works.
[Findings on Design Dimensions] The four dimensions are asserted to characterize the agents based on the 52 studies, but there is no description of the analytical process (e.g., how studies were coded or if the dimensions emerged inductively). This makes the claim that these are the 'key' dimensions difficult to assess for completeness or alternative interpretations.

minor comments (3)

The time period ends in January 2025; please specify the exact search date to allow for reproducibility.
A PRISMA-ScR flow diagram should be included to visually represent the study selection process from identification to final inclusion.
Ensure all abbreviations (e.g., LLM, PRISMA-ScR) are defined at first use in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have helped us identify areas where the manuscript can be strengthened for greater transparency and rigor. We have revised the paper to address both major comments and believe these changes improve the clarity of our methods and analytical approach without altering the core findings.

read point-by-point responses

Referee: [Methods] Although the abstract claims adherence to PRISMA-ScR, it does not provide the search strings used, detailed inclusion/exclusion criteria, or measures of inter-rater reliability. These omissions hinder evaluation of whether the 52 studies comprehensively represent the field, particularly given the rapid expansion of LLM applications in education since 2022 and potential omissions of preprints or non-indexed works.

Authors: We agree that explicit details enhance reproducibility and allow better assessment of coverage. The full methods section adheres to PRISMA-ScR, but we will revise the manuscript to include the complete search strings (now provided in a new Appendix A), expanded inclusion/exclusion criteria with justifications, and inter-rater reliability statistics (e.g., Cohen's kappa = 0.87 for screening). We will also add a limitations paragraph acknowledging the challenges of capturing all preprints and non-indexed works in this rapidly evolving field, while noting that our five-database search plus forward citation tracking aimed to maximize comprehensiveness. revision: yes
Referee: [Findings on Design Dimensions] The four dimensions are asserted to characterize the agents based on the 52 studies, but there is no description of the analytical process (e.g., how studies were coded or if the dimensions emerged inductively). This makes the claim that these are the 'key' dimensions difficult to assess for completeness or alternative interpretations.

Authors: We appreciate this feedback on methodological transparency. The dimensions were derived inductively via thematic analysis: two authors independently coded agent features from the 52 studies using an initial open-coding approach, then iteratively refined categories through constant comparison until saturation. Disagreements were resolved via consensus. In the revised manuscript, we will add a new subsection under Methods detailing this process, including the coding protocol, example codings for each dimension, and how the four dimensions were selected as the most salient and non-overlapping. This will enable readers to evaluate completeness and consider alternatives. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive scoping review with no derivations or self-referential claims

full rationale

The paper performs a PRISMA-ScR-guided scoping review of 52 studies retrieved from five databases, then maps observed patterns onto four design dimensions (interaction approach, domain scope, role complexity, system integration) extracted from the literature. No equations, fitted parameters, predictions, or uniqueness theorems appear; the dimensions are not defined in terms of themselves but are reported as emergent from the analyzed corpus. Self-citations, if present, are not load-bearing for the central mapping, and the work contains no ansatz smuggling, renaming of known results, or reduction of outputs to inputs by construction. The review is self-contained as a descriptive synthesis against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on the assumption that standard scoping-review methodology applied to the selected papers yields a representative map of the field; no free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)

standard math PRISMA-ScR guidelines provide an adequate framework for scoping reviews of emerging technologies
Invoked in the abstract as the method followed for study selection and reporting.

pith-pipeline@v0.9.0 · 5493 in / 1226 out tokens · 26043 ms · 2026-05-10T14:40:17.383523+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agentic Education: Using Claude Code to Teach Claude Code
cs.CY 2026-04 unverdicted novelty 6.0

cc-self-train is an adaptive project-based curriculum for mastering Claude Code featuring persona progression from Guide to Launcher, hook-based engagement adaptation, cross-domain unified feature sequencing, explicit...

Reference graph

Works this paper leans on

14 extracted references · 4 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

imensions formed an analytical framework for extracting and categorizing data on how agents were designed and implemented across the literature. For the third question on emerging trends, we analyzed patterns across implementations from multiple studies to identify convergent developments and organized them into thematic categories. Given the LLM PEDAGOGI...

2018
[2]

For instance, Wei et al

3.1.1 Student-Facing Agents LLM-based pedagogical agents have been applied to diverse subject domains across educational levels, such as elementary and secondary science (Liu, Lu et al., 2024; Wei et al., 2024), programming (Bassner et al., 2024; Chiang et al., 2024; Glynn et al., 2024; Martínez-Araneda et al., 2023), literacy (Pan et al., 2024), anatomic...

2024
[3]

transforms traditional MOOCs into adaptive learning environments by employing multiple specialized agents including a teacher agent, teaching assistants, and customizable peer agents who collaboratively create an interactive classroom environment. Each agent serves distinct pedagogical functions: the teacher agent delivers content and guides discussions, ...

2024
[4]

lessons learned

serves as a naturally interactable teaching assistant that uses multiple modalities including speech, gesture, gaze, and spatial awareness to provide more human-like educational interactions. This system demonstrates how LLM-based agents can be embodied in ways that enhance their presence and effectiveness in educational settings. Assessment and feedback ...

2023
[5]

Discussion 4.1 Synthesis of Key Findings Regarding our first research question on the current landscape of LLM-based pedagogical agents, our review reveals a diverse ecosystem spanning multiple educational contexts and subject domains. These agents are being deployed across various subject-specific applications including elementary science education (Wei ...

2024
[6]

offer broader applicability but may lack specialized knowledge. Similarly, we find important relationships LLM PEDAGOGICAL AGENTS 15 between role complexity and interaction approach, with multi-role systems like Wei et al.’s 92024) GPA or Yu et al.’s (2024) MAIC typically incorporating both reactive and proactive elements distributed across specialized ag...

2024
[7]

include learners with different learner profiles... and examine their interactions with LLM-based chatbots

explore teacher-agent interactions, few examine how these systems are implemented in authentic educational environments or how they affect teacher roles and practices. Research is needed on how teachers perceive, adopt, and adapt these agents within their existing pedagogical approaches, particularly regarding the balance of authority and responsibility b...

2024
[8]

On the other hand, their implementation requires careful integration with existing pedagogical approaches and technological ecosystems

demonstrate how these agents can extend teacher capabilities, providing personalized support that would be impractical for a single human educator to offer. On the other hand, their implementation requires careful integration with existing pedagogical approaches and technological ecosystems. Teachers will need professional development focused not merely o...

2024
[9]

As Martínez-Araneda et al

potentially access extensive student data across multiple educational platforms, creating significant security and consent challenges. As Martínez-Araneda et al. (2023) acknowledged regarding TutorBot+, storing and analyzing student interactions can provide valuable insights. However, it simultaneously raised data protection LLM PEDAGOGICAL AGENTS 17 ques...

2023
[10]

ChemCrow: Augmenting large-language models with chemistry tools

Conclusion In this study, we conducted a scoping review of LLM-based pedagogical agents, a rapidly evolving field that is transforming educational technology through unprecedented natural language capabilities. Our analysis of 52 studies identified diverse applications across educational contexts, characterized by four key design dimensions: interaction a...

work page internal anchor Pith review arXiv 2024
[11]

https://doi.org/10.1016/j.compedu.2022.104607 Dan, Y., Lei, Z., Gu, Y., Li, Y., Yin, J., Lin, J., Ye, L., Tie, Z., Zhou, Y., & Wang, Y. (2023). Educhat: A large-scale language model-based chatbot system for intelligent education. ArXiv Preprint ArXiv:2308.02773. El-Deeb, S., Jahankhani, H., Amin Metwally Hussien, O. A., & Will Arachchige, I. S. (2024). To...

work page doi:10.1016/j.compedu.2022.104607 2022
[12]

Hu, W., Tian, J., & Li, Y. (2025). Enhancing student engagement in online collaborative writing through a generative AI-based conversational agent. The Internet and Higher Education, 65, 100979. Jia, Y., Sin, Z. P. T., Wang, X. E., Li, C., Ng, P. H. F., Huang, X., Dong, J., Wang, Y., Baciu, G., & Cao, J. (2024). NivTA: Towards a naturally interactable edu...

work page doi:10.1007/s40593-015-0055-y 2025
[13]

closed” cues for targeted questions and flexible “open

https://doi.org/10.1016/j.compedu.2024.105165 Zhang, Z., Zhang-Li, D., Yu, J., Gong, L., Zhou, J., Liu, Z., Hou, L., & Li, J. (2024). Simulating classroom education with LLM-empowered agents. ArXiv Preprint ArXiv:2406.19226. LLM PEDAGOGICAL AGENTS 23 Appendix Representative LLM-based Pedagogical Agents Study Agent Name Purpose/Goal Role & Interaction Cont...

work page doi:10.1016/j.compedu.2024.105165 2024
[14]

for processing teacher input Hu et al. (2025) CollaBot To enhance students’ engagement in online collaborative writing (OCW) activities by providing adaptive scaffolding support for groups working together on writing tasks Functions as an adaptive support system with two activation mechanisms: (1) Event-triggered: Monitors group behaviors and automaticall...

2025