pith. sign in

arxiv: 2604.17624 · v3 · submitted 2026-04-19 · 💻 cs.HC

Developing Models of Procedural Skills using an AI-assisted Text-to-Model Approach

Pith reviewed 2026-05-10 05:02 UTC · model grok-4.3

classification 💻 cs.HC
keywords procedural skill modelingtext-to-modelTMK modelsAI tutoringLLM-assisted authoringknowledge representationeducational technologyontology-constrained generation
0
0 comments X

The pith

AI-assisted text-to-model conversion builds valid procedural skill models in 50-70 percent less expert time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a text-to-model methodology that uses large language models to turn instructional materials into complete Task-Method-Knowledge models for procedural skills. Ontology constraints and templates handle the structural scaffolding while experts focus on content refinement. Tested on a full graduate AI course, the method generated 23 models that support an existing AI coach called Ivy. Expert time dropped by half to two-thirds, and the models stayed structurally valid and reproducible. This reduction in effort makes it realistic to equip entire courses with detailed, model-based AI tutoring instead of limiting such systems to small topics.

Core claim

The TTM methodology transforms instructional materials into schema-complete TMK models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight; when applied to a graduate-level online AI course it produced 23 TMK models enabling full-course coverage for the Ivy AI coach, reducing expert modeling time by 50-70 percent while producing structurally valid and highly reproducible models.

What carries the argument

Ontology-constrained LLM prompting combined with template-based generation in the text-to-model (TTM) approach, which automates creation of Task-Method-Knowledge (TMK) models from raw instructional text.

If this is right

  • Full-course coverage with structured TMK models becomes feasible for AI tutoring systems.
  • The generated models remain structurally valid and highly reproducible across repeated runs.
  • The overall cost of building procedural skill representations drops enough to support widespread use.
  • Expert oversight stays limited to content checks rather than building structures from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same scaffolding could be tested on procedural training outside computer science, such as medical or engineering procedures.
  • Automatic updates to models from new course materials could become possible with repeated application.
  • Lower modeling cost might allow more institutions to combine these representations with learner performance data for personalized coaching.

Load-bearing premise

That ontology-constrained prompting plus templates will keep the remaining expert refinement effort low enough to preserve the reported time savings in new courses and domains.

What would settle it

Applying the method to a new course and finding that average expert refinement time exceeds 30 percent of traditional modeling time, or that many models require major semantic corrections, would show the scalability does not hold.

Figures

Figures reproduced from arXiv: 2604.17624 by Arpit Khandelwal, Ashok K. Goel, Rahul K. Dass, Shubham Puri, Xiao Jin.

Figure 1
Figure 1. Figure 1: Human-in-the-loop text-to-model architecture illustrating schema-constrained generation and iterative refinement. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Scalable AI tutoring for procedural skill learning requires structured knowledge representations, yet constructing these representations remains a labor-intensive bottleneck. This paper introduces a new LLM-assisted text-to-model (TTM) methodology that transforms instructional materials into schema-complete Task-Method-Knowledge (TMK) models through ontology-constrained prompting and template-based generation, automating structural scaffolding while preserving expert oversight. Applied to a graduate-level online AI course, the methodology produced 23 TMK models - enabling full-course coverage for Ivy, a deployed AI coach that relies on TMK models to support learners' procedural understanding, for the first time. AI-assisted authoring reduced expert modeling time by 50-70% while producing structurally valid and highly reproducible models. We evaluate structural validity, semantic alignment, reproducibility, and refinement effort to characterize authoring scalability. Results indicate that the TTM methodology substantially lowers the cost of constructing structured procedural representations, making course-wide deployment of structured AI tutoring systems practically feasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces a text-to-model (TTM) methodology that employs ontology-constrained LLM prompting combined with template-based generation to automatically produce schema-complete Task-Method-Knowledge (TMK) models from instructional text. The method is demonstrated on a graduate-level online AI course, yielding 23 TMK models that provide full-course coverage for the deployed Ivy AI coach. The authors claim that AI-assisted authoring reduces expert modeling time by 50-70% while achieving structural validity, semantic alignment, high reproducibility, and low refinement effort, with evaluations of these properties supporting the conclusion that the approach makes course-wide structured AI tutoring systems practically feasible.

Significance. If the time savings, model quality, and low refinement effort hold, the work would meaningfully address the knowledge-representation bottleneck in procedural AI tutoring, potentially enabling scalable deployment of structured coaching systems like Ivy across courses. The combination of automation with expert oversight is a practical strength for HCI and AI-education applications.

major comments (3)
  1. [Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.
  2. [Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.
  3. [Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of evidence and qualify claims where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The evaluations of structural validity, semantic alignment, reproducibility, and refinement effort are described at a high level but provide no quantitative metrics, error analysis, measurement protocols, or inter-rater details, making it impossible to assess whether the models meet the thresholds needed for the feasibility claim.

    Authors: We agree the abstract summarizes at a high level. The full manuscript provides quantitative metrics, error analysis, protocols, and inter-rater details in the Evaluation section. We will revise the abstract to incorporate key quantitative results from the body of the paper to better substantiate the feasibility claim. revision: yes

  2. Referee: [Abstract] Abstract: The central claim of a 50-70% reduction in expert modeling time is stated without baseline measurements, per-model timing data, or description of how time was logged across the 23 models; this directly underpins the scalability and practicality conclusions yet lacks supporting evidence.

    Authors: The time savings are based on expert timing logs comparing traditional TMK modeling to the TTM-assisted process, with details provided in the results section of the manuscript. We will revise the abstract to briefly describe the baseline comparison and timing protocol used for the 23 models. revision: yes

  3. Referee: [Abstract] Abstract and conclusion: The assertion that the TTM methodology makes course-wide deployment practically feasible rests on results from a single graduate AI course; no cross-domain experiments or analysis of how instructional-text structure and terminology density affect alignment and refinement effort are provided, leaving transferability untested.

    Authors: We acknowledge the work is demonstrated on a single course. The manuscript now includes additional analysis in the discussion of how text structure and terminology density influenced results within this domain. The ontology-constrained approach is designed to be generalizable. We will revise the abstract and conclusion to qualify the feasibility claim as shown for this course while noting the need for further cross-domain validation as future work. revision: partial

Circularity Check

0 steps flagged

No circularity in TTM methodology or evaluation chain

full rationale

The paper describes an empirical AI-assisted text-to-model process for generating TMK representations from course materials, applies it to 23 models in one graduate AI course, and reports measured outcomes (time savings, structural validity, semantic alignment, reproducibility) against independent expert review and external criteria. No equations, fitted parameters, or predictions reduce to the method's own inputs by construction; the central feasibility claim rests on observable results from an external domain rather than self-referential definitions or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the pre-existing TMK ontology and the assumption that LLMs can follow structural constraints reliably; no new free parameters or invented entities are introduced.

axioms (2)
  • domain assumption TMK models are suitable and sufficient representations for procedural skills in AI tutoring systems
    Invoked throughout as the target output format; drawn from prior work by Goel et al.
  • domain assumption Ontology-constrained prompting and templates will produce structurally complete and semantically aligned models
    Central premise of the TTM method stated in the abstract.

pith-pipeline@v0.9.0 · 5478 in / 1324 out tokens · 64489 ms · 2026-05-10T05:02:09.062955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Vincent Aleven, Bruce M McLaren, Jonathan Sewall, Martin Van Velsen, Octav Popescu, Sandra Demi, Michael Ringenberg, and Kenneth R Koedinger. 2016. Example-tracing tutors: Intelligent tutor development for non-programmers. International Journal of Artificial Intelligence in Education26, 1 (2016), 224–269

  2. [2]

    Mona Alshahrani, Maha A Thafar, and Magbubah Essack. 2021. Application and evaluation of knowledge graph embeddings in biomedical data.PeerJ Computer Science7 (2021), e341

  3. [3]

    Balakrishnan Chandrasekaran. 1986. Generic tasks in knowledge-based reason- ing: High-level building blocks for expert system design.IEEE Intelligent Systems 1, 03 (1986), 23–30

  4. [4]

    Balakrishnan Chandrasekaran, Todd R Johnson, and Jack W Smith. 1992. Task- structure analysis for knowledge modeling.Commun. ACM35, 9 (1992), 124–137

  5. [5]

    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45

  6. [6]

    Rahul Dass, Thomas Bowlin, Zebing Li, Xiao Jin, and Ashok Goel. 2025. Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture.arXiv preprint arXiv:2511.20942(2025). L@S ’26, June 29–July 3, 2026, Seoul, Republic of Korea. Rahul K. Dass, Shubham Puri, Arpit Khandelwal, Xiao Jin, and Ashok K. Goel

  7. [7]

    Rahul K Dass, Rochan H Madhusudhana, Erin C Deye, Shashank Verma, Timo- thy A Bydlon, Grace Brazil, and Ashok K Goel. 2025. Ivy: a hybrid knowledge- based and generative AI coach for explaining procedural skills. InInternational Conference on Artificial Intelligence in Education. Springer, 233–246

  8. [8]

    Yuxin Dong, Jian Bai, Mei Li, and Wei Zhang. 2024. Large Language Models in Education: A Systematic Review.IEEE Transactions on Learning Technologies17, 3 (2024), 123–145. doi:10.1109/TLT.2024.10589960

  9. [9]

    Kutluhan Erol, James Hendler, and Dana S Nau. 1994. HTN planning: Complexity and expressivity. InAAAI, Vol. 94. 1123–1128

  10. [10]

    Aldo Gangemi, Carola Catenacci, Massimiliano Ciaramita, and Jos Lehmann

  11. [11]

    InEuropean semantic web conference

    Modelling ontology evaluation and validation. InEuropean semantic web conference. Springer, 140–154

  12. [12]

    Mingqi Gao, Xinyu Hu, Xunjian Yin, Jie Ruan, Xiao Pu, and Xiaojun Wan. 2025. Llm-based nlg evaluation: Current status and challenges.Computational Linguis- tics(2025), 1–27

  13. [13]

    Ashok K Goel and Spencer Rugaber. 2017. GAIA: A CAD-like environment for designing game-playing agents.IEEE Intelligent Systems32, 3 (2017), 60–67

  14. [14]

    Hai Hoang, Stephen Lee-Urban, and Héctor Muñoz-Avila. 2005. Hierarchical plan representations for encoding strategic game ai. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 1. 63–68

  15. [15]

    Daniel Kahneman. 2011. Thinking, fast and slow.Farrar, Straus and Giroux (2011)

  16. [16]

    Kenneth R Koedinger and Vincent Aleven. 2007. Exploring the assistance dilemma in experiments with cognitive tutors.Educational psychology review19, 3 (2007), 239–264

  17. [17]

    Koedinger and Albert T

    Kenneth R. Koedinger and Albert T. Corbett. 2006. Cognitive Tutors: Technology bringing learning science to the classroom. InThe Cambridge Handbook of the Learning Sciences, R. Keith Sawyer (Ed.). Cambridge University Press, New York, NY, 61–78

  18. [18]

    Stephen Lee-Urban and Héctor Muñoz-Avila. [n. d.]. A Study of Process Lan- guages for Planning Tasks.ICAPS 2006([n. d.]), 65

  19. [19]

    Angélique Létourneau, Marion Deslandes Martineau, Patrick Charland, John Alexander Karran, Jared Boasen, and Pierre Majorique Léger. 2025. A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 educa- tion.npj Science of Learning10, 1 (2025), 29

  20. [20]

    Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, and Xueqi Cheng. 2024. KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction. InProceedings of the 62nd Annual Meeting of the Association for Com...

  21. [21]

    Cherie Lum, Erin Deye, Grace Brazil, Tim Bydlon, Shashank Verma, Rochan Madhusudhana, Rahul Dass, and Ashok Goel. 2025. Designing an AI coaching system for interactive video-based skill learning. InInternational Conference on Intelligent Tutoring Systems. Springer, 281–291

  22. [22]

    J William Murdock and Ashok K Goel. 2008. Meta-case-based reasoning: self- improvement through self-understanding.Journal of Experimental \& Theoretical Artificial Intelligence20, 1 (2008), 1–36

  23. [23]

    Tom Murray. 2003. An Overview of Intelligent Tutoring System Authoring Tools: Updated analysis of the state of the art.Authoring tools for advanced technology learning environments: Toward cost-effective adaptive, interactive and intelligent educational software(2003), 491–544

  24. [24]

    2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art

    Tom Murray. 2003.An Overview of Intelligent Tutoring System Authoring Tools: Updated Analysis of the State of the Art. Springer Netherlands, Dordrecht, 491– 544

  25. [25]

    Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system.Journal of artificial intelligence research20 (2003), 379–404

  26. [26]

    Rohan Patel, Emily Fox, John Williams, and Linh Nguyen. 2023. Can Large Language Models Generate Middle School Mathematics Explanations Better than Human Teachers?. InProceedings of the Annual Meeting of the American Educational Research Association. Chicago, IL. Preprint available from the National Science Foundation award repository

  27. [27]

    Bárbara Rodrigues, Rui Pinto, and Gil Gonçalves. 2025. A Systematic Literature Review of AI-Driven Intelligent Tutoring Systems in Engineering Education: Emphasizing Personalization, Feedback, and Student Monitoring.IEEE Access13 (2025), 190152–190177

  28. [28]

    Khusniddin R Ruzimboev, Ikhtiyor D Avezmatov, and Boburjon I Shermatov

  29. [29]

    In2025 10th International Conference on Computer Science and Engineering (UBMK)

    A Review of Neuro-Symbolic, Multi-Modal Intelligent Tutoring Systems for Advancing Adaptive and Personalized Learning. In2025 10th International Conference on Computer Science and Engineering (UBMK). IEEE, 148–153

  30. [30]

    Anubhav Shrimal, Aryan Jain, Soumyajit Chowdhury, and Promod Yenigalla

  31. [31]

    InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.)

    PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, Saloni Potdar, Lina Rojas-Barahona, and Sebastien Montella (Eds.). Association for Computational Linguistics, Suzhou (China), 2749–

  32. [32]

    doi:10.18653/v1/2025.emnlp-industry.184

  33. [33]

    2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques

    Robert Sottilare, Arthur Graesser, Xiangen Hu, and Keith Brawner. 2015.De- sign recommendations for intelligent tutoring systems: Authoring tools and expert modeling techniques. Robert Sottilare

  34. [34]

    Eleni Stroulia and Ashok K Goel. 1999. Evaluating PSMs in evolutionary design: The A UTOGNOSTIC experiments.International journal of human-computer studies51, 4 (1999), 825–847

  35. [35]

    Kurt VanLehn. 2006. The Behavior of Tutoring Systems.International Journal of Artificial Intelligence in Education16, 3 (2006), 227–265. doi:10.3233/IRG-2006- 16(3)02

  36. [36]

    Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2025. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics51, 4 (2025), 1373–1418

  37. [37]

    Panpan Zhou, Zhengyong Zhang, and Fan Yang. 2025. Knowledge Graphs towards AI-assisted Smart Education. InProceedings of the 2nd International Conference on Intelligent Education and Computer Technology. 866–871