Developing Models of Procedural Skills using an AI-assisted Text-to-Model Approach
Pith reviewed 2026-05-10 05:02 UTC · model grok-4.3
The pith
AI-assisted text-to-model conversion builds valid procedural skill models in 50-70 percent less expert time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The TTM methodology transforms instructional materials into schema-complete TMK models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight; when applied to a graduate-level online AI course it produced 23 TMK models enabling full-course coverage for the Ivy AI coach, reducing expert modeling time by 50-70 percent while producing structurally valid and highly reproducible models.
What carries the argument
Ontology-constrained LLM prompting combined with template-based generation in the text-to-model (TTM) approach, which automates creation of Task-Method-Knowledge (TMK) models from raw instructional text.
If this is right
- Full-course coverage with structured TMK models becomes feasible for AI tutoring systems.
- The generated models remain structurally valid and highly reproducible across repeated runs.
- The overall cost of building procedural skill representations drops enough to support widespread use.
- Expert oversight stays limited to content checks rather than building structures from scratch.
Where Pith is reading between the lines
- The same scaffolding could be tested on procedural training outside computer science, such as medical or engineering procedures.
- Automatic updates to models from new course materials could become possible with repeated application.
- Lower modeling cost might allow more institutions to combine these representations with learner performance data for personalized coaching.
Load-bearing premise
That ontology-constrained prompting plus templates will keep the remaining expert refinement effort low enough to preserve the reported time savings in new courses and domains.
What would settle it
Applying the method to a new course and finding that average expert refinement time exceeds 30 percent of traditional modeling time, or that many models require major semantic corrections, would show the scalability does not hold.
Figures
read the original abstract
Scalable AI tutoring for procedural skill learning requires structured knowledge representations, yet constructing these representations remains a labor-intensive bottleneck. This paper introduces a new LLM-assisted text-to-model (TTM) methodology that transforms instructional materials into schema-complete Task-Method-Knowledge (TMK) models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight. Applied to a graduate-level online AI course, the methodology produced 23 TMK models - enabling full-course coverage for Ivy, a deployed AI coach that relies on TMK models to support learners' procedural understanding, for the first time. AI-assisted authoring reduced expert modeling time by 50-70% while producing structurally valid and highly reproducible models. We evaluate structural validity, semantic alignment, reproducibility, and refinement effort to characterize authoring scalability. Results indicate that the TTM methodology substantially lowers the cost of constructing structured procedural representations, making course-wide deployment of structured AI tutoring systems practically feasible.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a text-to-model (TTM) methodology that employs ontology-constrained LLM prompting combined with template-based generation to automatically produce schema-complete Task-Method-Knowledge (TMK) models from instructional text. The method is demonstrated on a graduate-level online AI course, yielding 23 TMK models that provide full-course coverage for the deployed Ivy AI coach. The authors claim that AI-assisted authoring reduces expert modeling time by 50-70% while achieving structural validity, semantic alignment, high reproducibility, and low refinement effort, with evaluations of these properties supporting the conclusion that the approach makes course-wide structured AI tutoring systems practically feasible.
Significance. If the time savings, model quality, and low refinement effort hold, the work would meaningfully address the knowledge-representation bottleneck in procedural AI tutoring, potentially enabling scalable deployment of structured coaching systems like Ivy across courses. The combination of automation with expert oversight is a practical strength for HCI and AI-education applications.
major comments (3)
- [Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.
- [Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.
- [Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of evidence and qualify claims where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.
Authors: We agree the abstract summarizes at a high level. The full manuscript provides quantitative metrics, error analysis, protocols, and inter-rater details in the Evaluation section. We will revise the abstract to incorporate key quantitative results from the body of the paper to better substantiate the feasibility claim. revision: yes
-
Referee: [Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.
Authors: The time savings are based on expert timing logs comparing traditional TMK modeling to the TTM-assisted process, with details provided in the results section of the manuscript. We will revise the abstract to briefly describe the baseline comparison and timing protocol used for the 23 models. revision: yes
-
Referee: [Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.
Authors: We acknowledge the work is demonstrated on a single course. The manuscript now includes additional analysis in the discussion of how text structure and terminology density influenced results within this domain. The ontology-constrained approach is designed to be generalizable. We will revise the abstract and conclusion to qualify the feasibility claim as shown for this course while noting the need for further cross-domain validation as future work. revision: partial
Circularity Check
No circularity in TTM methodology or evaluation chain
full rationale
The paper describes an empirical AI-assisted text-to-model process for generating TMK representations from course materials, applies it to 23 models in one graduate AI course, and reports measured outcomes (time savings, structural validity, semantic alignment, reproducibility) against independent expert review and external criteria. No equations, fitted parameters, or predictions reduce to the method's own inputs by construction; the central feasibility claim rests on observable results from an external domain rather than self-referential definitions or self-citation chains.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption TMK models are suitable and sufficient representations for procedural skills in AI tutoring systems
- domain assumption Ontology-constrained prompting and templates will produce structurally complete and semantically aligned models
Reference graph
Works this paper leans on
-
[1]
Vincent Aleven, Bruce M McLaren, Jonathan Sewall, Martin Van Velsen, Octav Popescu, Sandra Demi, Michael Ringenberg, and Kenneth R Koedinger. 2016. Example-tracing tutors: Intelligent tutor development for non-programmers. International Journal of Artificial Intelligence in Education26, 1 (2016), 224–269
work page 2016
-
[2]
Mona Alshahrani, Maha A Thafar, and Magbubah Essack. 2021. Application and evaluation of knowledge graph embeddings in biomedical data.PeerJ Computer Science7 (2021), e341
work page 2021
-
[3]
Balakrishnan Chandrasekaran. 1986. Generic tasks in knowledge-based reason- ing: High-level building blocks for expert system design.IEEE Intelligent Systems 1, 03 (1986), 23–30
work page 1986
-
[4]
Balakrishnan Chandrasekaran, Todd R Johnson, and Jack W Smith. 1992. Task- structure analysis for knowledge modeling.Commun. ACM35, 9 (1992), 124–137
work page 1992
-
[5]
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45
work page 2024
-
[6]
Rahul Dass, Thomas Bowlin, Zebing Li, Xiao Jin, and Ashok Goel. 2025. Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture.arXiv preprint arXiv:2511.20942(2025). L@S ’26, June 29–July 3, 2026, Seoul, Republic of Korea. Rahul K. Dass, Shubham Puri, Arpit Khandelwal, Xiao Jin, and Ashok K. Goel
-
[7]
Rahul K Dass, Rochan H Madhusudhana, Erin C Deye, Shashank Verma, Timo- thy A Bydlon, Grace Brazil, and Ashok K Goel. 2025. Ivy: a hybrid knowledge- based and generative AI coach for explaining procedural skills. InInternational Conference on Artificial Intelligence in Education. Springer, 233–246
work page 2025
-
[8]
Yuxin Dong, Jian Bai, Mei Li, and Wei Zhang. 2024. Large Language Models in Education: A Systematic Review.IEEE Transactions on Learning Technologies17, 3 (2024), 123–145. doi:10.1109/TLT.2024.10589960
-
[9]
Kutluhan Erol, James Hendler, and Dana S Nau. 1994. HTN planning: Complexity and expressivity. InAAAI, Vol. 94. 1123–1128
work page 1994
-
[10]
Aldo Gangemi, Carola Catenacci, Massimiliano Ciaramita, and Jos Lehmann
-
[11]
InEuropean semantic web conference
Modelling ontology evaluation and validation. InEuropean semantic web conference. Springer, 140–154
-
[12]
Mingqi Gao, Xinyu Hu, Xunjian Yin, Jie Ruan, Xiao Pu, and Xiaojun Wan. 2025. Llm-based nlg evaluation: Current status and challenges.Computational Linguis- tics(2025), 1–27
work page 2025
-
[13]
Ashok K Goel and Spencer Rugaber. 2017. GAIA: A CAD-like environment for designing game-playing agents.IEEE Intelligent Systems32, 3 (2017), 60–67
work page 2017
-
[14]
Hai Hoang, Stephen Lee-Urban, and Héctor Muñoz-Avila. 2005. Hierarchical plan representations for encoding strategic game ai. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 1. 63–68
work page 2005
-
[15]
Daniel Kahneman. 2011. Thinking, fast and slow.Farrar, Straus and Giroux (2011)
work page 2011
-
[16]
Kenneth R Koedinger and Vincent Aleven. 2007. Exploring the assistance dilemma in experiments with cognitive tutors.Educational psychology review19, 3 (2007), 239–264
work page 2007
-
[17]
Kenneth R. Koedinger and Albert T. Corbett. 2006. Cognitive Tutors: Technology bringing learning science to the classroom. InThe Cambridge Handbook of the Learning Sciences, R. Keith Sawyer (Ed.). Cambridge University Press, New York, NY, 61–78
work page 2006
-
[18]
Stephen Lee-Urban and Héctor Muñoz-Avila. [n. d.]. A Study of Process Lan- guages for Planning Tasks.ICAPS 2006([n. d.]), 65
work page 2006
-
[19]
Angélique Létourneau, Marion Deslandes Martineau, Patrick Charland, John Alexander Karran, Jared Boasen, and Pierre Majorique Léger. 2025. A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 educa- tion.npj Science of Learning10, 1 (2025), 29
work page 2025
-
[20]
Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2024. KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction. InProceedings of the 62nd Annual Meeting of the Association for Com...
-
[21]
Cherie Lum, Erin Deye, Grace Brazil, Tim Bydlon, Shashank Verma, Rochan Madhusudhana, Rahul Dass, and Ashok Goel. 2025. Designing an AI coaching system for interactive video-based skill learning. InInternational Conference on Intelligent Tutoring Systems. Springer, 281–291
work page 2025
-
[22]
J William Murdock and Ashok K Goel. 2008. Meta-case-based reasoning: self- improvement through self-understanding.Journal of Experimental \& Theoretical Artificial Intelligence20, 1 (2008), 1–36
work page 2008
-
[23]
Tom Murray. 2003. An Overview of Intelligent Tutoring System Authoring Tools: Updated analysis of the state of the art.Authoring tools for advanced technology learning environments: Toward cost-effective adaptive, interactive and intelligent educational software(2003), 491–544
work page 2003
-
[24]
Tom Murray. 2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art. Springer Netherlands, Dordrecht, 491– 544
work page 2003
-
[25]
Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system.Journal of artificial intelligence research20 (2003), 379–404
work page 2003
-
[26]
Rohan Patel, Emily Fox, John Williams, and Linh Nguyen. 2023. Can Large Language Models Generate Middle School Mathematics Explanations Better than Human Teachers?. InProceedings of the Annual Meeting of the American Educational Research Association. Chicago, IL. Preprint available from the National Science Foundation award repository
work page 2023
-
[27]
Bárbara Rodrigues, Rui Pinto, and Gil Gonçalves. 2025. A Systematic Literature Review of AI-Driven Intelligent Tutoring Systems in Engineering Education: Emphasizing Personalization, Feedback, and Student Monitoring.IEEE Access13 (2025), 190152–190177
work page 2025
-
[28]
Khusniddin R Ruzimboev, Ikhtiyor D Avezmatov, and Boburjon I Shermatov
-
[29]
In2025 10th International Conference on Computer Science and Engineering (UBMK)
A Review of Neuro-Symbolic, Multi-Modal Intelligent Tutoring Systems for Advancing Adaptive and Personalized Learning. In2025 10th International Conference on Computer Science and Engineering (UBMK). IEEE, 148–153
-
[30]
Anubhav Shrimal, Aryan Jain, Soumyajit Chowdhury, and Promod Yenigalla
-
[31]
PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.). Association for Computational Linguistics, Suzhou (China), 2749–
work page 2025
-
[32]
doi:10.18653/v1/2025.emnlp-industry.184
-
[33]
Robert Sottilare, Arthur Graesser, Xiangen Hu, and Keith Brawner. 2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques. Robert Sottilare
work page 2015
-
[34]
Eleni Stroulia and Ashok K Goel. 1999. Evaluating PSMs in evolutionary design: The A UTOGNOSTIC experiments.International journal of human-computer studies51, 4 (1999), 825–847
work page 1999
-
[35]
Kurt VanLehn. 2006. The Behavior of Tutoring Systems.International Journal of Artificial Intelligence in Education16, 3 (2006), 227–265. doi:10.3233/IRG-2006- 16(3)02
-
[36]
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2025. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics51, 4 (2025), 1373–1418
work page 2025
-
[37]
Panpan Zhou, Zhengyong Zhang, and Fan Yang. 2025. Knowledge Graphs towards AI-assisted Smart Education. InProceedings of the 2nd International Conference on Intelligent Education and Computer Technology. 866–871
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.