Developing Models of Procedural Skills using an AI-assisted Text-to-Model Approach

Arpit Khandelwal; Ashok K. Goel; Rahul K. Dass; Shubham Puri; Xiao Jin

arxiv: 2604.17624 · v3 · submitted 2026-04-19 · 💻 cs.HC

Developing Models of Procedural Skills using an AI-assisted Text-to-Model Approach

Rahul K. Dass , Shubham Puri , Arpit Khandelwal , Xiao Jin , Ashok K. Goel This is my paper

Pith reviewed 2026-05-10 05:02 UTC · model grok-4.3

classification 💻 cs.HC

keywords procedural skill modelingtext-to-modelTMK modelsAI tutoringLLM-assisted authoringknowledge representationeducational technologyontology-constrained generation

0 comments

The pith

AI-assisted text-to-model conversion builds valid procedural skill models in 50-70 percent less expert time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a text-to-model methodology that uses large language models to turn instructional materials into complete Task-Method-Knowledge models for procedural skills. Ontology constraints and templates handle the structural scaffolding while experts focus on content refinement. Tested on a full graduate AI course, the method generated 23 models that support an existing AI coach called Ivy. Expert time dropped by half to two-thirds, and the models stayed structurally valid and reproducible. This reduction in effort makes it realistic to equip entire courses with detailed, model-based AI tutoring instead of limiting such systems to small topics.

Core claim

The TTM methodology transforms instructional materials into schema-complete TMK models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight; when applied to a graduate-level online AI course it produced 23 TMK models enabling full-course coverage for the Ivy AI coach, reducing expert modeling time by 50-70 percent while producing structurally valid and highly reproducible models.

What carries the argument

Ontology-constrained LLM prompting combined with template-based generation in the text-to-model (TTM) approach, which automates creation of Task-Method-Knowledge (TMK) models from raw instructional text.

If this is right

Full-course coverage with structured TMK models becomes feasible for AI tutoring systems.
The generated models remain structurally valid and highly reproducible across repeated runs.
The overall cost of building procedural skill representations drops enough to support widespread use.
Expert oversight stays limited to content checks rather than building structures from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scaffolding could be tested on procedural training outside computer science, such as medical or engineering procedures.
Automatic updates to models from new course materials could become possible with repeated application.
Lower modeling cost might allow more institutions to combine these representations with learner performance data for personalized coaching.

Load-bearing premise

That ontology-constrained prompting plus templates will keep the remaining expert refinement effort low enough to preserve the reported time savings in new courses and domains.

What would settle it

Applying the method to a new course and finding that average expert refinement time exceeds 30 percent of traditional modeling time, or that many models require major semantic corrections, would show the scalability does not hold.

Figures

Figures reproduced from arXiv: 2604.17624 by Arpit Khandelwal, Ashok K. Goel, Rahul K. Dass, Shubham Puri, Xiao Jin.

**Figure 1.** Figure 1: Human-in-the-loop text-to-model architecture illustrating schema-constrained generation and iterative refinement. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

read the original abstract

Scalable AI tutoring for procedural skill learning requires structured knowledge representations, yet constructing these representations remains a labor-intensive bottleneck. This paper introduces a new LLM-assisted text-to-model (TTM) methodology that transforms instructional materials into schema-complete Task-Method-Knowledge (TMK) models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight. Applied to a graduate-level online AI course, the methodology produced 23 TMK models - enabling full-course coverage for Ivy, a deployed AI coach that relies on TMK models to support learners' procedural understanding, for the first time. AI-assisted authoring reduced expert modeling time by 50-70% while producing structurally valid and highly reproducible models. We evaluate structural validity, semantic alignment, reproducibility, and refinement effort to characterize authoring scalability. Results indicate that the TTM methodology substantially lowers the cost of constructing structured procedural representations, making course-wide deployment of structured AI tutoring systems practically feasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete LLM method for creating TMK models at course scale, but the supporting evidence for big efficiency gains is limited to one domain.

read the letter

The main takeaway is that they built a pipeline using ontology-constrained prompting and templates to turn instructional text into complete TMK models, and it produced all 23 needed for a full graduate AI course in their Ivy tutoring system. That is a practical step past the usual piecemeal modeling limit. The combination of constraints and templates is the part that feels new relative to earlier TMK work, and it kept the outputs in the right schema without constant manual fixes. They report the models were structurally valid and reproducible, which let the system move to course-wide coverage. That part holds up as useful engineering for anyone already working with structured procedural representations. The soft spots sit in the evaluation and the scope. The 50-70% time reduction is stated without the raw timing numbers, the number of experts timed, or a clear baseline comparison. Semantic alignment gets mentioned but without scores, examples of mismatches, or how they measured it against expert knowledge. Reproducibility is called high but again without the actual agreement figures. All of this comes from a single AI course, so the low refinement effort and consistent alignment may not hold when the text is denser, less procedural, or from a different field. The stress-test note on single-domain results is on target here. This is for teams building AI tutors that depend on detailed knowledge models like TMK. Readers who need a working example of LLM-assisted authoring will get something concrete from the method description and the real deployment. The thinking is straightforward and the application is grounded, so the paper deserves a serious referee even though the metrics section will need tightening.

Referee Report

3 major / 0 minor

Summary. The paper introduces a text-to-model (TTM) methodology that employs ontology-constrained LLM prompting combined with template-based generation to automatically produce schema-complete Task-Method-Knowledge (TMK) models from instructional text. The method is demonstrated on a graduate-level online AI course, yielding 23 TMK models that provide full-course coverage for the deployed Ivy AI coach. The authors claim that AI-assisted authoring reduces expert modeling time by 50-70% while achieving structural validity, semantic alignment, high reproducibility, and low refinement effort, with evaluations of these properties supporting the conclusion that the approach makes course-wide structured AI tutoring systems practically feasible.

Significance. If the time savings, model quality, and low refinement effort hold, the work would meaningfully address the knowledge-representation bottleneck in procedural AI tutoring, potentially enabling scalable deployment of structured coaching systems like Ivy across courses. The combination of automation with expert oversight is a practical strength for HCI and AI-education applications.

major comments (3)

[Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.
[Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.
[Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of evidence and qualify claims where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.

Authors: We agree the abstract summarizes at a high level. The full manuscript provides quantitative metrics, error analysis, protocols, and inter-rater details in the Evaluation section. We will revise the abstract to incorporate key quantitative results from the body of the paper to better substantiate the feasibility claim. revision: yes
Referee: [Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.

Authors: The time savings are based on expert timing logs comparing traditional TMK modeling to the TTM-assisted process, with details provided in the results section of the manuscript. We will revise the abstract to briefly describe the baseline comparison and timing protocol used for the 23 models. revision: yes
Referee: [Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.

Authors: We acknowledge the work is demonstrated on a single course. The manuscript now includes additional analysis in the discussion of how text structure and terminology density influenced results within this domain. The ontology-constrained approach is designed to be generalizable. We will revise the abstract and conclusion to qualify the feasibility claim as shown for this course while noting the need for further cross-domain validation as future work. revision: partial

Circularity Check

0 steps flagged

No circularity in TTM methodology or evaluation chain

full rationale

The paper describes an empirical AI-assisted text-to-model process for generating TMK representations from course materials, applies it to 23 models in one graduate AI course, and reports measured outcomes (time savings, structural validity, semantic alignment, reproducibility) against independent expert review and external criteria. No equations, fitted parameters, or predictions reduce to the method's own inputs by construction; the central feasibility claim rests on observable results from an external domain rather than self-referential definitions or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the pre-existing TMK ontology and the assumption that LLMs can follow structural constraints reliably; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption TMK models are suitable and sufficient representations for procedural skills in AI tutoring systems
Invoked throughout as the target output format; drawn from prior work by Goel et al.
domain assumption Ontology-constrained prompting and templates will produce structurally complete and semantically aligned models
Central premise of the TTM method stated in the abstract.

pith-pipeline@v0.9.0 · 5478 in / 1324 out tokens · 64489 ms · 2026-05-10T05:02:09.062955+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Vincent Aleven, Bruce M McLaren, Jonathan Sewall, Martin Van Velsen, Octav Popescu, Sandra Demi, Michael Ringenberg, and Kenneth R Koedinger. 2016. Example-tracing tutors: Intelligent tutor development for non-programmers. International Journal of Artificial Intelligence in Education26, 1 (2016), 224–269

work page 2016
[2]

Mona Alshahrani, Maha A Thafar, and Magbubah Essack. 2021. Application and evaluation of knowledge graph embeddings in biomedical data.PeerJ Computer Science7 (2021), e341

work page 2021
[3]

Balakrishnan Chandrasekaran. 1986. Generic tasks in knowledge-based reason- ing: High-level building blocks for expert system design.IEEE Intelligent Systems 1, 03 (1986), 23–30

work page 1986
[4]

Balakrishnan Chandrasekaran, Todd R Johnson, and Jack W Smith. 1992. Task- structure analysis for knowledge modeling.Commun. ACM35, 9 (1992), 124–137

work page 1992
[5]

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45

work page 2024
[6]

Rahul Dass, Thomas Bowlin, Zebing Li, Xiao Jin, and Ashok Goel. 2025. Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture.arXiv preprint arXiv:2511.20942(2025). L@S ’26, June 29–July 3, 2026, Seoul, Republic of Korea. Rahul K. Dass, Shubham Puri, Arpit Khandelwal, Xiao Jin, and Ashok K. Goel

work page arXiv 2025
[7]

Rahul K Dass, Rochan H Madhusudhana, Erin C Deye, Shashank Verma, Timo- thy A Bydlon, Grace Brazil, and Ashok K Goel. 2025. Ivy: a hybrid knowledge- based and generative AI coach for explaining procedural skills. InInternational Conference on Artificial Intelligence in Education. Springer, 233–246

work page 2025
[8]

Yuxin Dong, Jian Bai, Mei Li, and Wei Zhang. 2024. Large Language Models in Education: A Systematic Review.IEEE Transactions on Learning Technologies17, 3 (2024), 123–145. doi:10.1109/TLT.2024.10589960

work page doi:10.1109/tlt.2024.10589960 2024
[9]

Kutluhan Erol, James Hendler, and Dana S Nau. 1994. HTN planning: Complexity and expressivity. InAAAI, Vol. 94. 1123–1128

work page 1994
[10]

Aldo Gangemi, Carola Catenacci, Massimiliano Ciaramita, and Jos Lehmann

work page
[11]

InEuropean semantic web conference

Modelling ontology evaluation and validation. InEuropean semantic web conference. Springer, 140–154

work page
[12]

Mingqi Gao, Xinyu Hu, Xunjian Yin, Jie Ruan, Xiao Pu, and Xiaojun Wan. 2025. Llm-based nlg evaluation: Current status and challenges.Computational Linguis- tics(2025), 1–27

work page 2025
[13]

Ashok K Goel and Spencer Rugaber. 2017. GAIA: A CAD-like environment for designing game-playing agents.IEEE Intelligent Systems32, 3 (2017), 60–67

work page 2017
[14]

Hai Hoang, Stephen Lee-Urban, and Héctor Muñoz-Avila. 2005. Hierarchical plan representations for encoding strategic game ai. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 1. 63–68

work page 2005
[15]

Daniel Kahneman. 2011. Thinking, fast and slow.Farrar, Straus and Giroux (2011)

work page 2011
[16]

Kenneth R Koedinger and Vincent Aleven. 2007. Exploring the assistance dilemma in experiments with cognitive tutors.Educational psychology review19, 3 (2007), 239–264

work page 2007
[17]

Koedinger and Albert T

Kenneth R. Koedinger and Albert T. Corbett. 2006. Cognitive Tutors: Technology bringing learning science to the classroom. InThe Cambridge Handbook of the Learning Sciences, R. Keith Sawyer (Ed.). Cambridge University Press, New York, NY, 61–78

work page 2006
[18]

Stephen Lee-Urban and Héctor Muñoz-Avila. [n. d.]. A Study of Process Lan- guages for Planning Tasks.ICAPS 2006([n. d.]), 65

work page 2006
[19]

Angélique Létourneau, Marion Deslandes Martineau, Patrick Charland, John Alexander Karran, Jared Boasen, and Pierre Majorique Léger. 2025. A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 educa- tion.npj Science of Learning10, 1 (2025), 29

work page 2025
[20]

Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2024. KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction. InProceedings of the 62nd Annual Meeting of the Association for Com...

work page doi:10.18653/v1/ 2024
[21]

Cherie Lum, Erin Deye, Grace Brazil, Tim Bydlon, Shashank Verma, Rochan Madhusudhana, Rahul Dass, and Ashok Goel. 2025. Designing an AI coaching system for interactive video-based skill learning. InInternational Conference on Intelligent Tutoring Systems. Springer, 281–291

work page 2025
[22]

J William Murdock and Ashok K Goel. 2008. Meta-case-based reasoning: self- improvement through self-understanding.Journal of Experimental \& Theoretical Artificial Intelligence20, 1 (2008), 1–36

work page 2008
[23]

Tom Murray. 2003. An Overview of Intelligent Tutoring System Authoring Tools: Updated analysis of the state of the art.Authoring tools for advanced technology learning environments: Toward cost-effective adaptive, interactive and intelligent educational software(2003), 491–544

work page 2003
[24]

2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art

Tom Murray. 2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art. Springer Netherlands, Dordrecht, 491– 544

work page 2003
[25]

Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system.Journal of artificial intelligence research20 (2003), 379–404

work page 2003
[26]

Rohan Patel, Emily Fox, John Williams, and Linh Nguyen. 2023. Can Large Language Models Generate Middle School Mathematics Explanations Better than Human Teachers?. InProceedings of the Annual Meeting of the American Educational Research Association. Chicago, IL. Preprint available from the National Science Foundation award repository

work page 2023
[27]

Bárbara Rodrigues, Rui Pinto, and Gil Gonçalves. 2025. A Systematic Literature Review of AI-Driven Intelligent Tutoring Systems in Engineering Education: Emphasizing Personalization, Feedback, and Student Monitoring.IEEE Access13 (2025), 190152–190177

work page 2025
[28]

Khusniddin R Ruzimboev, Ikhtiyor D Avezmatov, and Boburjon I Shermatov

work page
[29]

In2025 10th International Conference on Computer Science and Engineering (UBMK)

A Review of Neuro-Symbolic, Multi-Modal Intelligent Tutoring Systems for Advancing Adaptive and Personalized Learning. In2025 10th International Conference on Computer Science and Engineering (UBMK). IEEE, 148–153

work page
[30]

Anubhav Shrimal, Aryan Jain, Soumyajit Chowdhury, and Promod Yenigalla

work page
[31]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.)

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.). Association for Computational Linguistics, Suzhou (China), 2749–

work page 2025
[32]

doi:10.18653/v1/2025.emnlp-industry.184

work page doi:10.18653/v1/2025.emnlp-industry.184 2025
[33]

2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques

Robert Sottilare, Arthur Graesser, Xiangen Hu, and Keith Brawner. 2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques. Robert Sottilare

work page 2015
[34]

Eleni Stroulia and Ashok K Goel. 1999. Evaluating PSMs in evolutionary design: The A UTOGNOSTIC experiments.International journal of human-computer studies51, 4 (1999), 825–847

work page 1999
[35]

Kurt VanLehn. 2006. The Behavior of Tutoring Systems.International Journal of Artificial Intelligence in Education16, 3 (2006), 227–265. doi:10.3233/IRG-2006- 16(3)02

work page doi:10.3233/irg-2006- 2006
[36]

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2025. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics51, 4 (2025), 1373–1418

work page 2025
[37]

Panpan Zhou, Zhengyong Zhang, and Fan Yang. 2025. Knowledge Graphs towards AI-assisted Smart Education. InProceedings of the 2nd International Conference on Intelligent Education and Computer Technology. 866–871

work page 2025

[1] [1]

Vincent Aleven, Bruce M McLaren, Jonathan Sewall, Martin Van Velsen, Octav Popescu, Sandra Demi, Michael Ringenberg, and Kenneth R Koedinger. 2016. Example-tracing tutors: Intelligent tutor development for non-programmers. International Journal of Artificial Intelligence in Education26, 1 (2016), 224–269

work page 2016

[2] [2]

Mona Alshahrani, Maha A Thafar, and Magbubah Essack. 2021. Application and evaluation of knowledge graph embeddings in biomedical data.PeerJ Computer Science7 (2021), e341

work page 2021

[3] [3]

Balakrishnan Chandrasekaran. 1986. Generic tasks in knowledge-based reason- ing: High-level building blocks for expert system design.IEEE Intelligent Systems 1, 03 (1986), 23–30

work page 1986

[4] [4]

Balakrishnan Chandrasekaran, Todd R Johnson, and Jack W Smith. 1992. Task- structure analysis for knowledge modeling.Commun. ACM35, 9 (1992), 124–137

work page 1992

[5] [5]

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45

work page 2024

[6] [6]

Rahul Dass, Thomas Bowlin, Zebing Li, Xiao Jin, and Ashok Goel. 2025. Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture.arXiv preprint arXiv:2511.20942(2025). L@S ’26, June 29–July 3, 2026, Seoul, Republic of Korea. Rahul K. Dass, Shubham Puri, Arpit Khandelwal, Xiao Jin, and Ashok K. Goel

work page arXiv 2025

[7] [7]

Rahul K Dass, Rochan H Madhusudhana, Erin C Deye, Shashank Verma, Timo- thy A Bydlon, Grace Brazil, and Ashok K Goel. 2025. Ivy: a hybrid knowledge- based and generative AI coach for explaining procedural skills. InInternational Conference on Artificial Intelligence in Education. Springer, 233–246

work page 2025

[8] [8]

Yuxin Dong, Jian Bai, Mei Li, and Wei Zhang. 2024. Large Language Models in Education: A Systematic Review.IEEE Transactions on Learning Technologies17, 3 (2024), 123–145. doi:10.1109/TLT.2024.10589960

work page doi:10.1109/tlt.2024.10589960 2024

[9] [9]

Kutluhan Erol, James Hendler, and Dana S Nau. 1994. HTN planning: Complexity and expressivity. InAAAI, Vol. 94. 1123–1128

work page 1994

[10] [10]

Aldo Gangemi, Carola Catenacci, Massimiliano Ciaramita, and Jos Lehmann

work page

[11] [11]

InEuropean semantic web conference

Modelling ontology evaluation and validation. InEuropean semantic web conference. Springer, 140–154

work page

[12] [12]

Mingqi Gao, Xinyu Hu, Xunjian Yin, Jie Ruan, Xiao Pu, and Xiaojun Wan. 2025. Llm-based nlg evaluation: Current status and challenges.Computational Linguis- tics(2025), 1–27

work page 2025

[13] [13]

Ashok K Goel and Spencer Rugaber. 2017. GAIA: A CAD-like environment for designing game-playing agents.IEEE Intelligent Systems32, 3 (2017), 60–67

work page 2017

[14] [14]

Hai Hoang, Stephen Lee-Urban, and Héctor Muñoz-Avila. 2005. Hierarchical plan representations for encoding strategic game ai. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 1. 63–68

work page 2005

[15] [15]

Daniel Kahneman. 2011. Thinking, fast and slow.Farrar, Straus and Giroux (2011)

work page 2011

[16] [16]

Kenneth R Koedinger and Vincent Aleven. 2007. Exploring the assistance dilemma in experiments with cognitive tutors.Educational psychology review19, 3 (2007), 239–264

work page 2007

[17] [17]

Koedinger and Albert T

Kenneth R. Koedinger and Albert T. Corbett. 2006. Cognitive Tutors: Technology bringing learning science to the classroom. InThe Cambridge Handbook of the Learning Sciences, R. Keith Sawyer (Ed.). Cambridge University Press, New York, NY, 61–78

work page 2006

[18] [18]

Stephen Lee-Urban and Héctor Muñoz-Avila. [n. d.]. A Study of Process Lan- guages for Planning Tasks.ICAPS 2006([n. d.]), 65

work page 2006

[19] [19]

Angélique Létourneau, Marion Deslandes Martineau, Patrick Charland, John Alexander Karran, Jared Boasen, and Pierre Majorique Léger. 2025. A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 educa- tion.npj Science of Learning10, 1 (2025), 29

work page 2025

[20] [20]

Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2024. KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction. InProceedings of the 62nd Annual Meeting of the Association for Com...

work page doi:10.18653/v1/ 2024

[21] [21]

Cherie Lum, Erin Deye, Grace Brazil, Tim Bydlon, Shashank Verma, Rochan Madhusudhana, Rahul Dass, and Ashok Goel. 2025. Designing an AI coaching system for interactive video-based skill learning. InInternational Conference on Intelligent Tutoring Systems. Springer, 281–291

work page 2025

[22] [22]

J William Murdock and Ashok K Goel. 2008. Meta-case-based reasoning: self- improvement through self-understanding.Journal of Experimental \& Theoretical Artificial Intelligence20, 1 (2008), 1–36

work page 2008

[23] [23]

Tom Murray. 2003. An Overview of Intelligent Tutoring System Authoring Tools: Updated analysis of the state of the art.Authoring tools for advanced technology learning environments: Toward cost-effective adaptive, interactive and intelligent educational software(2003), 491–544

work page 2003

[24] [24]

2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art

Tom Murray. 2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art. Springer Netherlands, Dordrecht, 491– 544

work page 2003

[25] [25]

Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system.Journal of artificial intelligence research20 (2003), 379–404

work page 2003

[26] [26]

Rohan Patel, Emily Fox, John Williams, and Linh Nguyen. 2023. Can Large Language Models Generate Middle School Mathematics Explanations Better than Human Teachers?. InProceedings of the Annual Meeting of the American Educational Research Association. Chicago, IL. Preprint available from the National Science Foundation award repository

work page 2023

[27] [27]

Bárbara Rodrigues, Rui Pinto, and Gil Gonçalves. 2025. A Systematic Literature Review of AI-Driven Intelligent Tutoring Systems in Engineering Education: Emphasizing Personalization, Feedback, and Student Monitoring.IEEE Access13 (2025), 190152–190177

work page 2025

[28] [28]

Khusniddin R Ruzimboev, Ikhtiyor D Avezmatov, and Boburjon I Shermatov

work page

[29] [29]

In2025 10th International Conference on Computer Science and Engineering (UBMK)

A Review of Neuro-Symbolic, Multi-Modal Intelligent Tutoring Systems for Advancing Adaptive and Personalized Learning. In2025 10th International Conference on Computer Science and Engineering (UBMK). IEEE, 148–153

work page

[30] [30]

Anubhav Shrimal, Aryan Jain, Soumyajit Chowdhury, and Promod Yenigalla

work page

[31] [31]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.)

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.). Association for Computational Linguistics, Suzhou (China), 2749–

work page 2025

[32] [32]

doi:10.18653/v1/2025.emnlp-industry.184

work page doi:10.18653/v1/2025.emnlp-industry.184 2025

[33] [33]

2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques

Robert Sottilare, Arthur Graesser, Xiangen Hu, and Keith Brawner. 2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques. Robert Sottilare

work page 2015

[34] [34]

Eleni Stroulia and Ashok K Goel. 1999. Evaluating PSMs in evolutionary design: The A UTOGNOSTIC experiments.International journal of human-computer studies51, 4 (1999), 825–847

work page 1999

[35] [35]

Kurt VanLehn. 2006. The Behavior of Tutoring Systems.International Journal of Artificial Intelligence in Education16, 3 (2006), 227–265. doi:10.3233/IRG-2006- 16(3)02

work page doi:10.3233/irg-2006- 2006

[36] [36]

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2025. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics51, 4 (2025), 1373–1418

work page 2025

[37] [37]

Panpan Zhou, Zhengyong Zhang, and Fan Yang. 2025. Knowledge Graphs towards AI-assisted Smart Education. InProceedings of the 2nd International Conference on Intelligent Education and Computer Technology. 866–871

work page 2025