FACET: Teacher-Centred LLM-Based Multi-Agent Systems-Towards Personalized Educational Worksheets

Jana Gonnermann-M\"uller; Jennifer Haase; Konstantin Fackeldey; Sebastian Pokutta

arxiv: 2508.11401 · v5 · pith:7WKBK2GPnew · submitted 2025-08-15 · 💻 cs.HC · cs.MA

FACET: Teacher-Centred LLM-Based Multi-Agent Systems-Towards Personalized Educational Worksheets

Jana Gonnermann-M\"uller , Jennifer Haase , Konstantin Fackeldey , Sebastian Pokutta This is my paper

Pith reviewed 2026-05-21 22:28 UTC · model grok-4.3

classification 💻 cs.HC cs.MA

keywords multi-agent systemslarge language modelspersonalized worksheetsmathematics educationlearner profilesteacher supportclassroom heterogeneity

0 comments

The pith

A multi-agent LLM system generates personalized math worksheets that match both student skills and motivation levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the FACET framework as a way for teachers to produce individualized classroom materials that address differences in how well students know the material and what drives them to learn. It builds three agents that work together to simulate learner profiles, adapt content according to teaching principles, and verify the results. This matters because most existing AI tools focus narrowly on test scores and give teachers little help with the full range of student needs in mixed classrooms. If the approach holds up, it could let teachers supply suitable tasks for every student without manually redesigning every lesson.

Core claim

The FACET framework consists of learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, a teacher agent that adapts instructional content according to didactical principles, and an evaluator agent that provides automated quality assurance. When applied to authentic grade 8 mathematics curriculum content, the system showed high stability and alignment between generated materials and learner profiles, with in-service teachers noting strong structure and suitability of the tasks in exploratory feedback.

What carries the argument

The FACET multi-agent architecture, in which learner agents create simulated student profiles, the teacher agent modifies content to fit didactical rules, and the evaluator agent performs quality checks.

If this is right

Teachers can obtain individualized worksheets quickly for classes that contain students with widely different skill levels and motivation.
Automated agent checks can keep the generated tasks consistent with the intended learner profile across many variations.
Teacher reviews indicate the output tasks have appropriate structure and fit classroom use.
The multi-agent design supplies a route to context-aware personalization that incorporates both cognitive and motivational factors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real classroom deployment would show whether students respond differently to motivation-adjusted tasks than to skill-adjusted tasks alone.
Adding live student performance data could let the learner agents update their simulations over successive lessons.
Wider adoption might let teachers shift time from creating differentiated materials toward observing how students interact with them.

Load-bearing premise

Tests on simulated student profiles plus teacher comments on those simulations are enough to show the materials will meet real pedagogical needs in live classrooms.

What would settle it

A trial in which real grade 8 students complete the generated worksheets during regular class time and their engagement or learning gains are measured against a control group using standard materials.

Figures

Figures reproduced from arXiv: 2508.11401 by Jana Gonnermann-M\"uller, Jennifer Haase, Konstantin Fackeldey, Sebastian Pokutta.

**Figure 2.** Figure 2: Modular agent architecture of the FACET Framework [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Components of a personalized worksheet (colored boxes are not printed on the original worksheet; they are added [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

read the original abstract

The increasing heterogeneity of student populations poses significant challenges for teachers, particularly in mathematics education, where cognitive, motivational, and emotional differences strongly influence learning outcomes. While AI-driven personalization tools have emerged, most remain performance-focused, offering limited support for teachers and neglecting broader pedagogical needs. This paper presents the FACET framework, a teacher-facing, large language model (LLM)-based multi-agent system designed to generate individualized classroom materials that integrate both cognitive and motivational dimensions of learner profiles. The framework comprises three specialized agents: (1) learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, (2) a teacher agent that adapts instructional content according to didactical principles, and (3) an evaluator agent that provides automated quality assurance. We tested the system using authentic grade 8 mathematics curriculum content and evaluated its feasibility through a) automated agent-based assessment of output quality and b) exploratory feedback from K-12 in-service teachers. Results from ten internal evaluations highlighted high stability and alignment between generated materials and learner profiles, and teacher feedback particularly highlighted structure and suitability of tasks. The findings demonstrate the potential of multi-agent LLM architectures to provide scalable, context-aware personalization in heterogeneous classroom settings, and outline directions for extending the framework to richer learner profiles and real-world classroom trials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the FACET framework, a teacher-centered multi-agent LLM system for generating personalized grade-8 mathematics worksheets. It comprises learner agents that simulate profiles with topic proficiency and intrinsic motivation, a teacher agent that adapts content per didactical principles, and an evaluator agent for automated quality assurance. The system is tested on authentic curriculum content and evaluated via ten internal automated assessments (reporting high stability and profile alignment) plus exploratory qualitative feedback from K-12 teachers (highlighting task structure and suitability). The authors conclude that the results demonstrate the potential of such architectures for scalable, context-aware personalization in heterogeneous classrooms.

Significance. If the feasibility claim were supported by stronger evidence, the work could offer a practical contribution to HCI and AI in education by integrating cognitive and motivational learner dimensions in a teacher-facing tool. The multi-agent design and focus on didactical principles are potentially useful for reducing teacher workload, but the current evaluation does not yet establish measurable pedagogical benefits or real-world scalability.

major comments (2)

[Evaluation] Evaluation section: The central feasibility claim rests on ten internal automated evaluations and exploratory teacher feedback on simulated profiles, yet the manuscript provides no quantitative metrics, error analysis, baseline comparisons (e.g., against non-multi-agent LLM generation or standard worksheets), pre/post learning gains, or live classroom deployment data. This leaves the support for scalable personalization in heterogeneous settings thin and does not test actual student outcomes or classroom constraints.
[Results] Results and discussion: The reported 'high stability and alignment' and teacher notes on task suitability are grounded only in simulated grade-8 profiles; without measurement of actual student heterogeneity, engagement, or comparison to existing personalization methods, the assertion that the framework addresses real pedagogical needs in live classrooms remains untested.

minor comments (2)

[Abstract] The abstract states results from 'ten internal evaluations' but does not specify the exact quality metrics or stability criteria used by the evaluator agent; adding these details would improve clarity.
[Framework description] Notation for the three agents (learner, teacher, evaluator) is introduced clearly but could benefit from a single diagram or table summarizing their interactions and inputs/outputs.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript describing the FACET framework. We address the major comments point by point below, clarifying the exploratory scope of the work as a feasibility study of the multi-agent architecture while incorporating revisions to strengthen the presentation.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The central feasibility claim rests on ten internal automated evaluations and exploratory teacher feedback on simulated profiles, yet the manuscript provides no quantitative metrics, error analysis, baseline comparisons (e.g., against non-multi-agent LLM generation or standard worksheets), pre/post learning gains, or live classroom deployment data. This leaves the support for scalable personalization in heterogeneous settings thin and does not test actual student outcomes or classroom constraints.

Authors: We agree that the evaluation is limited to internal automated assessments and exploratory teacher feedback on simulated profiles, as this manuscript presents an initial feasibility demonstration of the teacher-centered multi-agent system rather than a full efficacy or deployment study. The ten evaluations report quantitative metrics on stability and profile alignment; we have expanded the evaluation section in the revision to include additional details on these metrics and available error analysis. We have also added explicit discussion of the exploratory nature of the teacher feedback and framed the claims around technical feasibility and potential for personalization. Baseline comparisons to non-multi-agent LLM generation are a useful suggestion and are now noted as future work in the revised discussion. Pre/post learning gains and live classroom deployment data were not collected, as they fall outside the current scope focused on framework design and internal validation. revision: partial
Referee: [Results] Results and discussion: The reported 'high stability and alignment' and teacher notes on task suitability are grounded only in simulated grade-8 profiles; without measurement of actual student heterogeneity, engagement, or comparison to existing personalization methods, the assertion that the framework addresses real pedagogical needs in live classrooms remains untested.

Authors: The simulated grade-8 profiles were constructed to systematically represent heterogeneity in cognitive proficiency and intrinsic motivation, enabling controlled assessment of the framework's adaptation capabilities and output alignment. Teacher feedback focused on task structure and suitability for such profiles. We have revised the results and discussion sections to temper assertions, emphasizing that the findings demonstrate internal consistency and teacher-perceived potential rather than proven impact in live classrooms. We have incorporated additional context on related personalization methods for comparison and highlighted directions for future studies measuring real student heterogeneity and engagement. revision: partial

standing simulated objections not resolved

Live classroom deployment data and pre/post learning gains from actual students, which were not part of this initial feasibility study using simulated profiles.

Circularity Check

0 steps flagged

No significant circularity; claims rest on system description and external teacher feedback

full rationale

The paper presents a descriptive framework for a multi-agent LLM system and evaluates feasibility via ten internal automated runs plus qualitative teacher comments on simulated profiles. No mathematical derivations, equations, fitted parameters, or predictions are claimed. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central feasibility claim is supported by the described architecture and independent external feedback rather than reducing to its own inputs by construction. This is a standard non-circular design-and-evaluation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied systems and feasibility paper; the central claim rests on the described multi-agent design and limited internal testing rather than on formal axioms, fitted parameters, or newly postulated entities.

pith-pipeline@v0.9.0 · 5770 in / 1061 out tokens · 44993 ms · 2026-05-21T22:28:12.342647+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework comprises three specialized agents: (1) learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, (2) a teacher agent that adapts instructional content according to didactical principles, and (3) an evaluator agent that provides automated quality assurance.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results from ten internal evaluations highlighted high stability and alignment between generated materials and learner profiles

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation
cs.AI 2026-04 unverdicted novelty 7.0

Unsafe agent behaviors transfer subliminally through distillation from sanitized safe-task trajectories, with deletion rates reaching 100% in one setting versus 5% baseline.
Beyond One-Size-Fits-All Exercises: Personalizing Computer Science Worksheets with Large Language Models
cs.HC 2026-04 unverdicted novelty 5.0

LLM-driven personalization of CS1 RegEx worksheets based on learner profiles raises completion to over 99% and boosts correctness by 18.2% for at-risk students while preserving perceived difficulty.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · cited by 2 Pith papers

[1]

Academy, K. (2025). Khan Academy

work page 2025
[2]

V ., Arriaga, R

Aher, G. V ., Arriaga, R. I., and Kalai, A. T. (2023). Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. In Proceedings of the 40th International Conference on Machine Learning, pages 337–371. PMLR

work page 2023
[3]

Alamri, H., Lowell, V ., Watson, W., and Watson, S. L. (2020). Using personalized learning as an instructional approach to motivate learners in online higher education: Learner self-determination and intrinsic motivation.Journal of Research on Technology in Education, 52(3):322–352

work page 2020
[4]

Anders, F. (2025). PISA 2022 – das sind die zehn wichtig- sten Ergebnisse

work page 2025
[5]

Anderson, L. W. and Krathwohl, D. R. (2001). A Tax- onomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives: Complete Edition. Addison Wesley Longman, Inc

work page 2001
[6]

L., Greene, M

Bernacki, M. L., Greene, M. J., and Lobczowski, N. G. (2021). A Systematic Review of Research on Personal- ized Learning: Personalized by Whom, to What, How, and for What Purpose(s)? Educational Psychology Review , 33(4):1675–1715. BerriAI (2025). LiteLLM

work page 2021
[7]

Blessing, S. B. and Gilbert, S. (2008). Evaluating an Author- ing Tool for Model-Tracing Intelligent Tutoring Systems. In

work page 2008
[8]

Bos, W., Lankes, E.-M., Prenzel, M., Schwippert, K., Valtin, R., V oss, A., and Walther, G. (2012). Lernmotivation in Mathematik – Sch¨uler [Fragebogenskala: Version 1.0]

work page 2012
[9]

W.-Y ., Lin, T.- C., Tan, A.-L., and Tsai, C.-C

Chang, H.-Y ., Lin, T.-J., Lee, M.-H., Lee, S. W.-Y ., Lin, T.- C., Tan, A.-L., and Tsai, C.-C. (2020). A systematic review of trends and findings in research employing drawing assess- ment in science education. Studies in Science Education , 56(1):77–110

work page 2020
[10]

Christensen, R., Knezek, G., Tyler-Wood, T., and Gibson, D. (2011). simSchool: An online dynamic simulator for enhancing teacher preparation. Int. J. Learn. Technol. , 6(2):201–220

work page 2011
[11]

S., and Wen, Q

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y ., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P. S., and Wen, Q. (2025). LLM Agents for Education: Advances and Applications. Cornelsen (2025). Die KI-Toolbox cornelsen.ai

work page 2025
[12]

Darling-Hammond, L., Flook, L., Cook-Harvey, C., Barron, B., and Osher, D. (2020). Implications for educational prac- tice of the science of learning and development. Applied Developmental Science, 24(2):97–140

work page 2020
[13]

what” and “why

Deci, E. L. and Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychol. Inq., 11(4):227–268. DIPF (2025). alea.schule – Plattform f¨ur formatives Assess- ment

work page 2000
[14]

I., Junus, K., and Santoso, H

Fariani, R. I., Junus, K., and Santoso, H. B. (2022). A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning, 28(2):449–476

work page 2022
[15]

and Giulianelli, M

Frisch, I. and Giulianelli, M. (2024). LLM Agents in In- teraction: Measuring Personality Consistency and Linguis- tic Alignment in Interacting Populations of Large Language Models

work page 2024
[16]

and Pokutta, S

Haase, J. and Pokutta, S. (2025). Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research

work page 2025
[17]

Hu, B., Zhu, J., Pei, Y ., and Gu, X. (2025). Exploring the po- tential of LLM to enhance teaching plans through teaching simulation. npj Science of Learning, 10(1):7

work page 2025
[18]

Huang, M., Zhang, X., Soto, C., and Evans, J. (2024). De- signing LLM-Agents with Personalities: A Psychometric Approach

work page 2024
[19]

Kabbara, J. (2024). PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

work page 2024
[20]

Sichma, A. (2025). Deutsches Schulbarometer: Befragung Lehrkr¨afte. Ergebnisse zur aktuellen Lage an allgemein- und berufsbildenden Schulen. Technical report, Robert Bosch

work page 2025
[21]

Hill, F. (2024). Language models, like humans, show con- tent effects on reasoning tasks. PNAS nexus, 3(7):pgae233

work page 2024
[22]

Liu, H., and Liu, Z. (2025). Exploring LLM-based Student Simulation for Metacognitive Cultivation

work page 2025
[23]

Li, Y ., Huang, Y ., Wang, H., Zhang, X., Zou, J., and Sun, L. (2024). Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

work page 2024
[24]

Lv, R., Liu, Q., Gao, W., Zhang, H., Lu, J., and Zhu, L. (2025). GenAL: Generative Agent for Adaptive Learning. In The Thirty-Ninth AAAI Conference on Artificial Intelli- gence (AAAI-25)

work page 2025
[25]

Mannekote, A., Davies, A., Kang, J., and Boyer, K. E. (2025). Can LLMs Reliably Simulate Human Learner Ac- tions? A Simulation Authoring Framework for Open-Ended Learning Environments. In The Thirty-Ninth AAAI Confer- ence on Artificial Intelligence (AAAI-25)

work page 2025
[26]

Masterplan, E. (2030). Edtech Masterplan

work page 2030
[27]

Mollick, E. R. and Mollick, L. (2023). Using AI to Im- plement Effective Teaching Strategies in Classrooms: Five

work page 2023
[28]

V ., and Makarov, I

Baklashkin, M., Savchenko, A. V ., and Makarov, I. (2024). The Good, the Bad, and the Hulk-like GPT: Analyzing Emo- tional Decisions of Large Language Models in Cooperation and Bargaining Games

work page 2024
[29]

D., Vijay, P., and Liza, F

Nemani, P., Joel, Y . D., Vijay, P., and Liza, F. F. (2024). Gen- der bias in transformers: A comprehensive review of detec- tion and mitigation strategies. Natural Language Processing Journal, 6:100047. OECD (2023). PISA 2022 Results (V olume I): The State of Learning and Equity in Education, PISA. Technical report, OECD Publishing, Paris

work page 2024
[30]

Ozeki, K., Ando, R., Morishita, T., Abe, H., Mineshima, K., and Okada, M. (2024). Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset. In Findings of the Associa- tion for Computational Linguistics ACL 2024, pages 16063– 16077, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics

work page 2024
[31]

Park, M., Kim, S., Lee, S., Kwon, S., and Kim, K. (2024). Empowering Personalized Learning through a Conversation-based Tutoring System with Student Model- ing. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , CHI EA ’24, pages 1–10, New York, NY , USA. Association for Computing Machin- ery

work page 2024
[32]

Pietsch, M. (2010). Evaluation von Unterrichtsstandards. Zeitschrift f¨ur Erziehungswissenschaft, 13(1):121–148

work page 2010
[33]

A., Dimbisoa, W

Razafinirina, M. A., Dimbisoa, W. G., and Mahatody, T. (2024). Pedagogical Alignment of Large Language Mod- els (LLM) for Personalized Learning: A Survey, Trends and Challenges. Journal of Intelligent Learning Systems and Ap- plications, 16(4):448–480

work page 2024
[34]

Schummel, P., Teichmann, M., and Gonnermann-M ¨uller, J. (2025). Specifying ten roles of using ChatGPT in secondary education: A teacher’s perspective. In Thirty-Third Euro- pean Conference on Information Systems (ECIS 2025), Am- man, Jordan

work page 2025
[35]

Siepmann, P., , Dominik, R., , Frauke, M., and and R¨omhild, R. (2023). Attention to diversity in German CLIL classrooms: Multi-perspective research on students’ and teachers’ perceptions. International Journal of Bilin- gual Education and Bilingualism, 26(9):1080–1096. SquirrelAI (2025). Squirrel AI

work page 2023
[36]

Thomlinson, C. A. (2014). The Differentiated Classroom: Responding to the Needs of All Learners . ASCD, Alexan- dria

work page 2014
[37]

Tomlinson, C. A. (2017). Differentiated Instruction. InFun- damentals of Gifted Education. Routledge, 2 edition. van Vijfeijken, M., van Schilt-Mol, T., Scholte, R. H. J., and

work page 2017
[38]

Denessen, E. (2023). A quantitative study of teachers’ be- liefs and practices regarding fair classroom differentiation. SN Soc. Sci., 3(1)

work page 2023
[39]

Xie, X., and Xiong, H. (2025a). LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutor- ing System. In Companion Proceedings of the ACM on Web Conference 2025, WWW ’25, pages 510–519, New York, NY , USA. Association for Computing Machinery

work page 2025
[40]

Weissburg, I., Anand, S., Levy, S., and Jeong, H. (2025). LLMs are Biased Teachers: Evaluating LLM Bias in Per- sonalized Education

work page 2025
[41]

Xiao, Y ., Cheng, Y ., Fu, J., Wang, J., Li, W., and Liu, P. (2024). How Far Are LLMs from Believable AI? A Bench- mark for Evaluating the Believability of Human Behavior Simulation

work page 2024
[42]

Xu, B., Chen, N.-S., and Chen, G. (2020). Effects of teacher role on student engagement in WeChat-Based online discus- sion learning. Computers & Education, 157:103956

work page 2020
[43]

Zhang, X. (2025a). Classroom Simulacra: Building Con- textual Student Generative Agents in Online Education for Learning Behavioral Simulation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages 1–26

work page 2025
[44]

Xu, S., Zhang, X., and Qin, L. (2024). EduAgent: Genera- tive Student Agents in Learning

work page 2024
[45]

Yue, M., Lyu, W., Mifdal, W., Suh, J., Zhang, Y ., and Yao, Z. (2025). MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

work page 2025
[46]

decision tree

Zhang, Z., Zhang-Li, D., Yu, J., Gong, L., Zhou, J., Hao, Z., Jiang, J., Cao, J., Liu, H., Liu, Z., Hou, L., and Li, J. (2024). Simulating Classroom Education with LLM- Empowered Agents. Appendix A Example of a generated personalized worksheet using the FACET framework, tailored to a learner who demonstrates high motivation but possesses limited subject k...

work page 2024

[1] [1]

Academy, K. (2025). Khan Academy

work page 2025

[2] [2]

V ., Arriaga, R

Aher, G. V ., Arriaga, R. I., and Kalai, A. T. (2023). Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. In Proceedings of the 40th International Conference on Machine Learning, pages 337–371. PMLR

work page 2023

[3] [3]

Alamri, H., Lowell, V ., Watson, W., and Watson, S. L. (2020). Using personalized learning as an instructional approach to motivate learners in online higher education: Learner self-determination and intrinsic motivation.Journal of Research on Technology in Education, 52(3):322–352

work page 2020

[4] [4]

Anders, F. (2025). PISA 2022 – das sind die zehn wichtig- sten Ergebnisse

work page 2025

[5] [5]

Anderson, L. W. and Krathwohl, D. R. (2001). A Tax- onomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives: Complete Edition. Addison Wesley Longman, Inc

work page 2001

[6] [6]

L., Greene, M

Bernacki, M. L., Greene, M. J., and Lobczowski, N. G. (2021). A Systematic Review of Research on Personal- ized Learning: Personalized by Whom, to What, How, and for What Purpose(s)? Educational Psychology Review , 33(4):1675–1715. BerriAI (2025). LiteLLM

work page 2021

[7] [7]

Blessing, S. B. and Gilbert, S. (2008). Evaluating an Author- ing Tool for Model-Tracing Intelligent Tutoring Systems. In

work page 2008

[8] [8]

Bos, W., Lankes, E.-M., Prenzel, M., Schwippert, K., Valtin, R., V oss, A., and Walther, G. (2012). Lernmotivation in Mathematik – Sch¨uler [Fragebogenskala: Version 1.0]

work page 2012

[9] [9]

W.-Y ., Lin, T.- C., Tan, A.-L., and Tsai, C.-C

Chang, H.-Y ., Lin, T.-J., Lee, M.-H., Lee, S. W.-Y ., Lin, T.- C., Tan, A.-L., and Tsai, C.-C. (2020). A systematic review of trends and findings in research employing drawing assess- ment in science education. Studies in Science Education , 56(1):77–110

work page 2020

[10] [10]

Christensen, R., Knezek, G., Tyler-Wood, T., and Gibson, D. (2011). simSchool: An online dynamic simulator for enhancing teacher preparation. Int. J. Learn. Technol. , 6(2):201–220

work page 2011

[11] [11]

S., and Wen, Q

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y ., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P. S., and Wen, Q. (2025). LLM Agents for Education: Advances and Applications. Cornelsen (2025). Die KI-Toolbox cornelsen.ai

work page 2025

[12] [12]

Darling-Hammond, L., Flook, L., Cook-Harvey, C., Barron, B., and Osher, D. (2020). Implications for educational prac- tice of the science of learning and development. Applied Developmental Science, 24(2):97–140

work page 2020

[13] [13]

what” and “why

Deci, E. L. and Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychol. Inq., 11(4):227–268. DIPF (2025). alea.schule – Plattform f¨ur formatives Assess- ment

work page 2000

[14] [14]

I., Junus, K., and Santoso, H

Fariani, R. I., Junus, K., and Santoso, H. B. (2022). A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning, 28(2):449–476

work page 2022

[15] [15]

and Giulianelli, M

Frisch, I. and Giulianelli, M. (2024). LLM Agents in In- teraction: Measuring Personality Consistency and Linguis- tic Alignment in Interacting Populations of Large Language Models

work page 2024

[16] [16]

and Pokutta, S

Haase, J. and Pokutta, S. (2025). Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research

work page 2025

[17] [17]

Hu, B., Zhu, J., Pei, Y ., and Gu, X. (2025). Exploring the po- tential of LLM to enhance teaching plans through teaching simulation. npj Science of Learning, 10(1):7

work page 2025

[18] [18]

Huang, M., Zhang, X., Soto, C., and Evans, J. (2024). De- signing LLM-Agents with Personalities: A Psychometric Approach

work page 2024

[19] [19]

Kabbara, J. (2024). PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

work page 2024

[20] [20]

Sichma, A. (2025). Deutsches Schulbarometer: Befragung Lehrkr¨afte. Ergebnisse zur aktuellen Lage an allgemein- und berufsbildenden Schulen. Technical report, Robert Bosch

work page 2025

[21] [21]

Hill, F. (2024). Language models, like humans, show con- tent effects on reasoning tasks. PNAS nexus, 3(7):pgae233

work page 2024

[22] [22]

Liu, H., and Liu, Z. (2025). Exploring LLM-based Student Simulation for Metacognitive Cultivation

work page 2025

[23] [23]

Li, Y ., Huang, Y ., Wang, H., Zhang, X., Zou, J., and Sun, L. (2024). Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

work page 2024

[24] [24]

Lv, R., Liu, Q., Gao, W., Zhang, H., Lu, J., and Zhu, L. (2025). GenAL: Generative Agent for Adaptive Learning. In The Thirty-Ninth AAAI Conference on Artificial Intelli- gence (AAAI-25)

work page 2025

[25] [25]

Mannekote, A., Davies, A., Kang, J., and Boyer, K. E. (2025). Can LLMs Reliably Simulate Human Learner Ac- tions? A Simulation Authoring Framework for Open-Ended Learning Environments. In The Thirty-Ninth AAAI Confer- ence on Artificial Intelligence (AAAI-25)

work page 2025

[26] [26]

Masterplan, E. (2030). Edtech Masterplan

work page 2030

[27] [27]

Mollick, E. R. and Mollick, L. (2023). Using AI to Im- plement Effective Teaching Strategies in Classrooms: Five

work page 2023

[28] [28]

V ., and Makarov, I

Baklashkin, M., Savchenko, A. V ., and Makarov, I. (2024). The Good, the Bad, and the Hulk-like GPT: Analyzing Emo- tional Decisions of Large Language Models in Cooperation and Bargaining Games

work page 2024

[29] [29]

D., Vijay, P., and Liza, F

Nemani, P., Joel, Y . D., Vijay, P., and Liza, F. F. (2024). Gen- der bias in transformers: A comprehensive review of detec- tion and mitigation strategies. Natural Language Processing Journal, 6:100047. OECD (2023). PISA 2022 Results (V olume I): The State of Learning and Equity in Education, PISA. Technical report, OECD Publishing, Paris

work page 2024

[30] [30]

Ozeki, K., Ando, R., Morishita, T., Abe, H., Mineshima, K., and Okada, M. (2024). Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset. In Findings of the Associa- tion for Computational Linguistics ACL 2024, pages 16063– 16077, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics

work page 2024

[31] [31]

Park, M., Kim, S., Lee, S., Kwon, S., and Kim, K. (2024). Empowering Personalized Learning through a Conversation-based Tutoring System with Student Model- ing. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , CHI EA ’24, pages 1–10, New York, NY , USA. Association for Computing Machin- ery

work page 2024

[32] [32]

Pietsch, M. (2010). Evaluation von Unterrichtsstandards. Zeitschrift f¨ur Erziehungswissenschaft, 13(1):121–148

work page 2010

[33] [33]

A., Dimbisoa, W

Razafinirina, M. A., Dimbisoa, W. G., and Mahatody, T. (2024). Pedagogical Alignment of Large Language Mod- els (LLM) for Personalized Learning: A Survey, Trends and Challenges. Journal of Intelligent Learning Systems and Ap- plications, 16(4):448–480

work page 2024

[34] [34]

Schummel, P., Teichmann, M., and Gonnermann-M ¨uller, J. (2025). Specifying ten roles of using ChatGPT in secondary education: A teacher’s perspective. In Thirty-Third Euro- pean Conference on Information Systems (ECIS 2025), Am- man, Jordan

work page 2025

[35] [35]

Siepmann, P., , Dominik, R., , Frauke, M., and and R¨omhild, R. (2023). Attention to diversity in German CLIL classrooms: Multi-perspective research on students’ and teachers’ perceptions. International Journal of Bilin- gual Education and Bilingualism, 26(9):1080–1096. SquirrelAI (2025). Squirrel AI

work page 2023

[36] [36]

Thomlinson, C. A. (2014). The Differentiated Classroom: Responding to the Needs of All Learners . ASCD, Alexan- dria

work page 2014

[37] [37]

Tomlinson, C. A. (2017). Differentiated Instruction. InFun- damentals of Gifted Education. Routledge, 2 edition. van Vijfeijken, M., van Schilt-Mol, T., Scholte, R. H. J., and

work page 2017

[38] [38]

Denessen, E. (2023). A quantitative study of teachers’ be- liefs and practices regarding fair classroom differentiation. SN Soc. Sci., 3(1)

work page 2023

[39] [39]

Xie, X., and Xiong, H. (2025a). LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutor- ing System. In Companion Proceedings of the ACM on Web Conference 2025, WWW ’25, pages 510–519, New York, NY , USA. Association for Computing Machinery

work page 2025

[40] [40]

Weissburg, I., Anand, S., Levy, S., and Jeong, H. (2025). LLMs are Biased Teachers: Evaluating LLM Bias in Per- sonalized Education

work page 2025

[41] [41]

Xiao, Y ., Cheng, Y ., Fu, J., Wang, J., Li, W., and Liu, P. (2024). How Far Are LLMs from Believable AI? A Bench- mark for Evaluating the Believability of Human Behavior Simulation

work page 2024

[42] [42]

Xu, B., Chen, N.-S., and Chen, G. (2020). Effects of teacher role on student engagement in WeChat-Based online discus- sion learning. Computers & Education, 157:103956

work page 2020

[43] [43]

Zhang, X. (2025a). Classroom Simulacra: Building Con- textual Student Generative Agents in Online Education for Learning Behavioral Simulation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages 1–26

work page 2025

[44] [44]

Xu, S., Zhang, X., and Qin, L. (2024). EduAgent: Genera- tive Student Agents in Learning

work page 2024

[45] [45]

Yue, M., Lyu, W., Mifdal, W., Suh, J., Zhang, Y ., and Yao, Z. (2025). MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

work page 2025

[46] [46]

decision tree

Zhang, Z., Zhang-Li, D., Yu, J., Gong, L., Zhou, J., Hao, Z., Jiang, J., Cao, J., Liu, H., Liu, Z., Hou, L., and Li, J. (2024). Simulating Classroom Education with LLM- Empowered Agents. Appendix A Example of a generated personalized worksheet using the FACET framework, tailored to a learner who demonstrates high motivation but possesses limited subject k...

work page 2024