Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

Andrew Lan; Arun Balajiee Lekshmi Narayanan; Bita Akram; Mohammad Hassany; Nigel Fernandez; Peter Brusilovsky; Rafaella Sampaio de Alencar; Zhangqi Duan

arxiv: 2502.18632 · v4 · pith:M2P5HBDQnew · submitted 2025-02-25 · 💻 cs.AI · cs.CL· cs.CY· cs.LG· cs.SE

Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

Zhangqi Duan , Nigel Fernandez , Arun Balajiee Lekshmi Narayanan , Mohammad Hassany , Rafaella Sampaio de Alencar , Peter Brusilovsky , Bita Akram , Andrew Lan This is my paper

Pith reviewed 2026-05-23 01:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CYcs.LGcs.SE

keywords knowledge tracingknowledge componentsLLMcoding problemsautomated generationstudent modelingprogramming educationinterpretable models

0 comments

The pith

LLM-generated knowledge components for coding problems enable more accurate prediction of future student responses than human-written ones in knowledge tracing models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated pipeline that uses large language models to generate and tag knowledge components for open-ended programming problems, eliminating the need for manual expert labeling. It then applies these components in a knowledge tracing framework called KCGen-KT to track student mastery of specific skills. Experiments on two real student code datasets show KCGen-KT predicts future responses better than prior KT methods and those using human KCs. Learning curve analysis indicates the generated components fit cognitive models more closely than human alternatives. Instructors reviewing the mappings judged them reasonably accurate.

Core claim

KCGen-KT, which relies on LLM-generated knowledge components for programming problems, achieves superior performance in predicting future student responses compared to existing knowledge tracing methods and models built on human-written KCs, while also yielding better model fit under cognitive learning curve analysis.

What carries the argument

Automated LLM-based pipeline for generating and tagging knowledge components, integrated into the KCGen-KT knowledge tracing framework.

If this is right

KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction.
LLM-generated KCs produce a better fit than human KCs when evaluated under a cognitive model using learning curves.
The pipeline produces problem-KC mappings that course instructors rate as reasonably accurate.
The approach scales KC creation without requiring domain experts for each new problem set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New courses or languages could adopt KT without upfront expert effort to define skills.
Generated KCs might surface skill relationships that human experts overlook in coding education.
The method could support dynamic updates to KCs as more student data arrives over time.

Load-bearing premise

The generated components reflect stable, educationally meaningful skills rather than patterns that only fit the specific datasets used.

What would settle it

KCGen-KT fails to outperform human-KC baselines on a held-out dataset from a different course or programming language.

Figures

Figures reproduced from arXiv: 2502.18632 by Andrew Lan, Arun Balajiee Lekshmi Narayanan, Bita Akram, Mohammad Hassany, Nigel Fernandez, Peter Brusilovsky, Rafaella Sampaio de Alencar, Zhangqi Duan.

**Figure 2.** Figure 2: Overview of our KCGen-KT’s model with the Llama 3 LLM as the backbone. KCGen-KT leverages KC semantics, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Representative learning curves for three generated KCs (Equality Comparison, String Length Determination, and For [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: A section of the generated KC ontology (related to [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor intensive. We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on two real-world student code submission datasets in different programming languages.We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM KC generation for coding KT beats human KCs on two datasets but the stability of those KCs as real skills is untested.

read the letter

The paper's core contribution is an LLM pipeline that generates and tags knowledge components for open-ended coding problems, paired with an LLM-based KT model (KCGen-KT) that uses them. On two real student submission datasets it reports higher accuracy predicting future responses than prior KT methods and than human-written KCs, plus a better fit to learning curves under a cognitive model and reasonable instructor ratings on the mappings. That automation step addresses a genuine bottleneck in scaling fine-grained tracing for programming courses, and the quantitative plus qualitative checks are a reasonable first pass. The results are empirical rather than circular, which helps. The main uncertainty is whether the generated KCs represent stable, educationally meaningful skills or simply a finer partitioning that happens to align with the response patterns in these particular datasets. The abstract gives no sign of transfer tests to new cohorts, semesters, or held-out problems, so the reported gains could shrink under that check. Human evaluation is only described at a high level. This work is aimed at the AI-for-education community that already uses KT on code. It is coherent enough on its own terms to merit a full referee process rather than a desk reject, though any review will need to press on the generalization question and the exact evaluation protocol.

Referee Report

2 major / 3 minor

Summary. The paper presents an LLM-based pipeline (KCGen) for automatically generating and tagging knowledge components (KCs) for open-ended programming problems, along with an LLM-augmented knowledge tracing model (KCGen-KT) that uses these KCs. On two real-world student code submission datasets, it reports that KCGen-KT outperforms standard KT baselines and human-written KCs on next-response prediction, that the generated KCs yield better fits to learning curves under a cognitive model, and that instructor raters judge the problem-KC mappings as reasonably accurate.

Significance. If the performance gains reflect stable, educationally meaningful skills rather than dataset-specific partitioning, the work would meaningfully lower the barrier to fine-grained KT in programming education by automating what is currently expert labor. The dual quantitative (prediction and learning-curve) plus qualitative (instructor) evaluation is a strength, but the absence of transfer tests limits the strength of the claim that the KCs are “skills” in the intended sense.

major comments (2)

[Evaluation section] Evaluation section: the reported outperformance on the two datasets is not accompanied by any transfer evaluation (new student cohorts, later semesters, or held-out problems). Without such tests it remains possible that the accuracy lift and improved cognitive-model fit arise from finer, data-aligned partitioning that matches observed sequences rather than from stable skills, directly undermining the central claim that LLM-generated KCs capture educationally meaningful skills.
[§4] §4 (KCGen-KT framework): the description of how LLM-generated KCs are injected into the KT model does not clarify whether the KC embeddings or the tracing parameters are re-fit on the same data used to generate the KCs, raising the risk that reported gains partly reflect leakage or post-hoc alignment rather than genuine predictive improvement.

minor comments (3)

[Abstract / Introduction] The abstract and introduction use “LLM-based KT framework” without immediately distinguishing which components are LLM-generated versus which are conventional KT parameters; a short clarifying sentence would help.
[Results tables] Table or figure captions for the quantitative results should explicitly state the number of students, problems, and submissions per dataset and whether splits are temporal or random.
[Human evaluation subsection] The human-evaluation protocol (number of instructors, rating scale, inter-rater agreement) is mentioned but not detailed enough to assess reliability; add these statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review. We address each major comment below, clarifying our current evaluations and framework while acknowledging limitations where appropriate.

read point-by-point responses

Referee: [Evaluation section] the reported outperformance on the two datasets is not accompanied by any transfer evaluation (new student cohorts, later semesters, or held-out problems). Without such tests it remains possible that the accuracy lift and improved cognitive-model fit arise from finer, data-aligned partitioning that matches observed sequences rather than from stable skills, directly undermining the central claim that LLM-generated KCs capture educationally meaningful skills.

Authors: We agree that transfer evaluations on new cohorts, semesters, or held-out problems would provide stronger evidence that the KCs represent stable, educationally meaningful skills rather than dataset-specific partitions. Our current results show improved next-response prediction, better cognitive-model fit to learning curves, and instructor-validated mappings, but these are within the two provided datasets. We will add an explicit discussion of this limitation and the value of future transfer tests in the revised manuscript. revision: partial
Referee: §4 (KCGen-KT framework): the description of how LLM-generated KCs are injected into the KT model does not clarify whether the KC embeddings or the tracing parameters are re-fit on the same data used to generate the KCs, raising the risk that reported gains partly reflect leakage or post-hoc alignment rather than genuine predictive improvement.

Authors: KC generation is performed exclusively on problem statements via the LLM pipeline and does not use any student response data. The resulting KCs are fixed inputs to KCGen-KT, with model parameters trained separately on the interaction sequences. We will revise §4 to explicitly state this separation and confirm the absence of leakage between generation and fitting steps. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluations on held-out data and external benchmarks

full rationale

The paper presents an LLM-based pipeline for KC generation and tagging, followed by KCGen-KT evaluation on two real-world student code datasets. Claims rest on quantitative outperformance versus existing KT methods and human-written KCs for future response prediction, plus learning-curve fit and instructor human evaluation. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described methods. Results use held-out student data and external comparators, making the work self-contained against benchmarks rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations or modeling details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5744 in / 1096 out tokens · 47580 ms · 2026-05-23T01:46:23.226488+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

PLKT models student knowledge with Beta probabilistic embeddings and performs explicit logical reasoning over historical interactions to deliver both accurate predictions and interpretable explanations in knowledge tracing.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Tiffany Barnes. 2005. The Q-matrix method: Mining student response data for knowledge. InAmerican association for artificial intelligence 2005 educational data mining workshop. AAAI Press, Pittsburgh, PA, USA, 1–8

work page 2005
[2]

Norman Bier, Sean Lip, Ross Strader, Candace Thille, and Dawn Zimmaro. 2014. An approach to knowledge component/skill modeling in online courses.Open Learning(2014), 1–14

work page 2014
[3]

Challenge Organizers. 2021. The 2nd CSEDM Data Challenge. Online: https: //sites.google.com/ncsu.edu/csedm-dc-2021/

work page 2021
[4]

Albert Corbett and John Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge.User Model. User-adapted Interact.4, 4 (Dec. 1994), 253–278

work page 1994
[5]

DataShop. 2021. Dataset: CodeWorkout data Spring 2019. Online: https:// pslcdatashop.web.cmu.edu/Files?datasetId=3458

work page 2021
[6]

Adrian de Freitas, Joel Coffman, Michelle de Freitas, Justin Wilson, and Troy Wein- gart. 2023. FalconCode: A Multiyear Dataset of Python Code Samples from an Introductory Computer Science Course. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Computing Machinery, New ...

work page doi:10.1145/3545945.3569822 2023
[7]

Zhangqi Duan, Nigel Fernandez, Alexander Hicks, and Andrew Lan. 2025. Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

work page 2025
[8]

Jing Fan, Tsvetomila Mihaylova, Bita Akram, Narges Norouzi, Peter Brusilovsky, Arto Hellas, and Juho Leinonen. 2025. Adaptive Learning Curve Analytics with LLM-KC Identifiers for Knowledge Component Refinement. InProceedings of the 2025 Conference on UK and Ireland Computing Education Research (UKICER ’25). Association for Computing Machinery, New York, N...

work page doi:10.1145/3754508.3754514 2025
[9]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational L...

work page doi:10.18653/v1/2020.findings-emnlp.139 2020
[10]

Nigel Fernandez and Andrew Lan. 2024. Interpreting Latent Student Knowl- edge Representations in Programming Assignments. InProceedings of the 17th International Conference on Educational Data Mining, Benjamin PaaÃŸen and Carrie Demmans Epp (Eds.). International Educational Data Mining Society, Atlanta, Georgia, USA, 933–940. doi:10.5281/zenodo.12730003

work page doi:10.5281/zenodo.12730003 2024
[11]

Nigel Fernandez, Alexander Scarlatos, Wanyong Feng, Simon Woodhead, and Andrew Lan. 2024. DiVERT: Distractor Generation with Variational Errors Repre- sented as Text for Math Multiple-choice Questions. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Assoc...

work page doi:10.18653/v1/2024.emnlp-main.512 2024
[12]

Aritra Ghosh, Neil Heffernan, and Andrew S Lan. 2020. Context-Aware Attentive Knowledge Tracing. InProc. ACM SIGKDD. 2330–2339

work page 2020
[13]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural Comput.9, 8 (Nov. 1997), 1735–1780

work page 1997
[14]

Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky, and Bita Akram. 2023. SANN: programming code representation using attention neural network with optimized subtree extraction. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 783–792

work page 2023
[15]

Muntasir Hoq, Jessica Vandenberg, Bradford Mott, James Lester, Narges Norouzi, and Bita Akram. 2024. Towards Attention-Based Automatic Misconception Identification in Introductory Programming Courses. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2. 1680–1681

work page 2024
[16]

Roya Hosseini and Peter Brusilovsky. 2013. Javaparser: A fine-grain concept indexing tool for java problems. InCEUR Workshop Proceedings, Vol. 1009. Uni- versity of Pittsburgh, 60–63

work page 2013
[17]

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[18]

Yun Huang, Vincent Aleven, Elizabeth McLaughlin, and Kenneth Koedinger. 2020. A general multi-method approach to design-loop adaptivity in intelligent tutoring systems. InArtificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II 21. Springer, 124–129

work page 2020
[19]

Guimei Liu, Huijing Zhan, and Jung-jae Kim. 2024. Question Difficulty Consistent Knowledge Tracing. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 4239–4248. doi:10.1145/3589334.3645582

work page doi:10.1145/3589334.3645582 2024
[20]

Naiming Liu, Zichao Wang, Richard Baraniuk, and Andrew Lan. 2022. Open- ended Knowledge Tracing for Computer Science Education. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Process- ing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3849–...

work page doi:10.18653/v1/2022.emnlp-main.254 2022
[21]

Xin Liu, Muhammad Khalifa, and Lu Wang. 2023. BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 186–200

work page 2023
[22]

AI @ Meta Llama Team. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019
[24]

Cristina Maier, Ryan Baker, and Steve Stalzer. 2021. Challenges to Applying Performance Factor Analysis to Existing Learning Systems

work page 2021
[25]

Steven Moore, Robin Schmucker, Tom Mitchell, and John Stamper. 2024. Auto- mated generation and tagging of knowledge components from multiple-choice questions. InProceedings of the eleventh ACM conference on learning@ scale. 122–133

work page 2024
[26]

Allen Newell and Paul S Rosenbloom. 2013. Mechanisms of skill acquisition and the law of practice. InCognitive skills and their acquisition. Psychology Press, 1–55

work page 2013
[27]

OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/

work page 2024
[28]

Yilmazcan Ozyurt, Stefan Feuerriegel, and Mrinmaya Sachan. 2024. Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.arXiv preprint arXiv:2410.01727(2024)

work page arXiv 2024
[29]

Shalini Pandey and George Karypis. 2019. A self attentive model for knowledge tracing. InProc. Int. Conf. Educ. Data Mining. 384–389

work page 2019
[30]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318

work page 2002
[31]

Zachary A Pardos and Anant Dadu. 2017. Imputing KCs with representations of problem content and context. InProceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 148–155

work page 2017
[32]

Zach A Pardos and Neil T Heffernan. 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. InProc. Int. Conf. User Model. Adaptation Personalization. 255–266

work page 2010
[33]

Philip I Pavlik, Hao Cen, and Kenneth R Koedinger. 2009. Performance factors analysis–a new alternative to knowledge tracing. InArtificial intelligence in education. Ios Press, 531–538

work page 2009
[34]

Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems28 (2015)

work page 2015
[35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019
[36]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE] https://arxiv.org/abs/2009.10297

work page internal anchor Pith review Pith/arXiv arXiv 2020
[37]

Kelly Rivers, Erik Harpstead, and Kenneth R Koedinger. 2016. Learning curve analysis for programming: Which concepts do students struggle with?. InICER, Vol. 16. ACM, 143–151

work page 2016
[38]

Baker, and Andrew Lan

Alexander Scarlatos, Ryan S. Baker, and Andrew Lan. 2025. Exploring Knowl- edge Tracing in Tutor-Student Dialogues using LLMs. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

work page 2025
[39]

Yang Shi, Min Chi, Tiffany Barnes, and Thomas Price. 2022. Code-DKT: A Code- based Knowledge Tracing Model for Programming Tasks. InProceedings of the 15th International Conference on Educational Data Mining, Antonija Mitrovic and Nigel Bosch (Eds.). International Educational Data Mining Society, Durham, United Kingdom, 50–61. doi:10.5281/zenodo.6853105

work page doi:10.5281/zenodo.6853105 2022
[40]

Yang Shi, Robin Schmucker, Min Chi, Tiffany Barnes, and Thomas Price. 2023. KC-Finder: Automated Knowledge Component Discovery for Programming Problems.International Educational Data Mining Society(2023)

work page 2023
[41]

Yang Shi, Robin Schmucker, Keith Tran, John Bacher, Kenneth Koedinger, Thomas Price, Min Chi, and Tiffany Barnes. 2024. The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data. Journal of Educational Data Mining16, 1 (2024), 1–33

work page 2024
[42]

Dongmin Shin, Yugeun Shim, Hangyeol Yu, Seewoo Lee, Byungsoo Kim, and Youngduck Choi. 2021. Saint+: Integrating temporal features for ednet correct- ness prediction. In11th Int. Learn. Analytics Knowl. Conf.490–496

work page 2021
[43]

George S Snoddy. 1926. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications.Journal of Applied Psychology 10, 1 (1926), 1

work page 1926
[44]

Jianwen Sun, Fenghua Yu, Qian Wan, Qing Li, Sannyuya Liu, and Xiaoxuan Shen

work page
[45]

InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24)

Interpretable Knowledge Tracing with Multiscale State Representation. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 3265–3276. doi:10.1145/3589334.3645373

work page doi:10.1145/3589334.3645373 2024
[46]

Xinjie Sun, Qi Liu, Kai Zhang, Shen Shuanghong, Lina Yang, and Hui Li. 2025. Harnessing code domain insights: Enhancing programming Knowledge Tracing with Large Language Models.Knowledge-Based Systems317 (04 2025), 113396. doi:10.1016/j.knosys.2025.113396

work page doi:10.1016/j.knosys.2025.113396 2025
[47]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-th...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[48]

Yang Yang, Jian Shen, Yanru Qu, Yunfei Liu, Kerong Wang, Yaoming Zhu, Weinan Zhang, and Yong Yu. 2020. GIKT: A Graph-based Interaction Model for Knowl- edge Tracing. InProc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases

work page 2020
[49]

Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon. 2013. Indi- vidualized bayesian knowledge tracing models. InInt. Conf. artif. intell. educ. Springer, 171–180

work page 2013
[50]

Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key- value memory networks for knowledge tracing. InProc. Int. Conf. World Wide Web. 765–774

work page 2017
[51]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. (2019), 783–794

work page 2019
[52]

Less coverage

R. Zhu, D. Zhang, C. Han, M. Gaol, X. Lu, W. Qian, and A. Zhou. 2022. Program- ming Knowledge Tracing: A Comprehensive Dataset and A New Model. In2022 IEEE International Conference on Data Mining Workshops (ICDMW). 298–307. A Human Evaluation Details We conduct a human evaluation to assess (1) the interpretability of generated KCs as a measure of their in...

work page 2022
[53]

B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9

If both sets contain unique KCs that the other is missing which means neither list clearly dominates, select equal-or- greater coverage. B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9. B.2 Prompt for KC Correctness Labeling We show the prompt ...

work page
[54]

Analyze each solution carefully, noting critical constructs

work page
[55]

Reflect step by step on how each solution maps to distinct programming KCs that are independent and reusable

work page
[56]

Use the provided examples as reference for the appropriate level of detail

For each KC, generate a concise name and provide a one-sentence reasoning explaining why this KC is necessary based on the provided solutions. Use the provided examples as reference for the appropriate level of detail. Make sure KCs are generalizable and applicable to a wide range of similar programming problems without referencing problem-specific details

work page
[57]

KC 1": "reasoning

Ensure each KC is atomic and not bundled with others. Your final response must strictly follow this JSON template: { "KC 1": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Knowledge component name", "KC 2": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Another specific knowledge component name", ...} User prompt:...

work page
[58]

Carefully examine all the KCs in the list to ensure none are overlooked

work page
[59]

Reason explicitly whether the KCs collectively refer to the same underlying concept or skill, or if they are related but represent distinct or complementary aspects of a broader theme

work page
[60]

reasoning

Based on your reasoning: - If the KCs refer to the same concept or skill, select one KC from the list that best represents the group — choose the one that is most clearly worded, generalizable, and inclusive of the others. - If the KCs are related but too distinct to be represented by a single KC, create a concise and meaningful summary name that captures...

work page
[61]

Identify all key errors in the student’s code, and describe each error in exactly one sentence

work page
[62]

error reasoning

Assess the student’s mastery of each provided KC in the list based on the incorrect submission. - Reflect on the student’s original incorrect code. - For each KC, return a binary label which equals 1 if the student makes an error on this KC, and equals 0 if not. Your final response must strictly follow this JSON template: {"error reasoning": [ "First erro...

work page

[1] [1]

Tiffany Barnes. 2005. The Q-matrix method: Mining student response data for knowledge. InAmerican association for artificial intelligence 2005 educational data mining workshop. AAAI Press, Pittsburgh, PA, USA, 1–8

work page 2005

[2] [2]

Norman Bier, Sean Lip, Ross Strader, Candace Thille, and Dawn Zimmaro. 2014. An approach to knowledge component/skill modeling in online courses.Open Learning(2014), 1–14

work page 2014

[3] [3]

Challenge Organizers. 2021. The 2nd CSEDM Data Challenge. Online: https: //sites.google.com/ncsu.edu/csedm-dc-2021/

work page 2021

[4] [4]

Albert Corbett and John Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge.User Model. User-adapted Interact.4, 4 (Dec. 1994), 253–278

work page 1994

[5] [5]

DataShop. 2021. Dataset: CodeWorkout data Spring 2019. Online: https:// pslcdatashop.web.cmu.edu/Files?datasetId=3458

work page 2021

[6] [6]

Adrian de Freitas, Joel Coffman, Michelle de Freitas, Justin Wilson, and Troy Wein- gart. 2023. FalconCode: A Multiyear Dataset of Python Code Samples from an Introductory Computer Science Course. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Computing Machinery, New ...

work page doi:10.1145/3545945.3569822 2023

[7] [7]

Zhangqi Duan, Nigel Fernandez, Alexander Hicks, and Andrew Lan. 2025. Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

work page 2025

[8] [8]

Jing Fan, Tsvetomila Mihaylova, Bita Akram, Narges Norouzi, Peter Brusilovsky, Arto Hellas, and Juho Leinonen. 2025. Adaptive Learning Curve Analytics with LLM-KC Identifiers for Knowledge Component Refinement. InProceedings of the 2025 Conference on UK and Ireland Computing Education Research (UKICER ’25). Association for Computing Machinery, New York, N...

work page doi:10.1145/3754508.3754514 2025

[9] [9]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational L...

work page doi:10.18653/v1/2020.findings-emnlp.139 2020

[10] [10]

Nigel Fernandez and Andrew Lan. 2024. Interpreting Latent Student Knowl- edge Representations in Programming Assignments. InProceedings of the 17th International Conference on Educational Data Mining, Benjamin PaaÃŸen and Carrie Demmans Epp (Eds.). International Educational Data Mining Society, Atlanta, Georgia, USA, 933–940. doi:10.5281/zenodo.12730003

work page doi:10.5281/zenodo.12730003 2024

[11] [11]

Nigel Fernandez, Alexander Scarlatos, Wanyong Feng, Simon Woodhead, and Andrew Lan. 2024. DiVERT: Distractor Generation with Variational Errors Repre- sented as Text for Math Multiple-choice Questions. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Assoc...

work page doi:10.18653/v1/2024.emnlp-main.512 2024

[12] [12]

Aritra Ghosh, Neil Heffernan, and Andrew S Lan. 2020. Context-Aware Attentive Knowledge Tracing. InProc. ACM SIGKDD. 2330–2339

work page 2020

[13] [13]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural Comput.9, 8 (Nov. 1997), 1735–1780

work page 1997

[14] [14]

Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky, and Bita Akram. 2023. SANN: programming code representation using attention neural network with optimized subtree extraction. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 783–792

work page 2023

[15] [15]

Muntasir Hoq, Jessica Vandenberg, Bradford Mott, James Lester, Narges Norouzi, and Bita Akram. 2024. Towards Attention-Based Automatic Misconception Identification in Introductory Programming Courses. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2. 1680–1681

work page 2024

[16] [16]

Roya Hosseini and Peter Brusilovsky. 2013. Javaparser: A fine-grain concept indexing tool for java problems. InCEUR Workshop Proceedings, Vol. 1009. Uni- versity of Pittsburgh, 60–63

work page 2013

[17] [17]

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022

[18] [18]

Yun Huang, Vincent Aleven, Elizabeth McLaughlin, and Kenneth Koedinger. 2020. A general multi-method approach to design-loop adaptivity in intelligent tutoring systems. InArtificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II 21. Springer, 124–129

work page 2020

[19] [19]

Guimei Liu, Huijing Zhan, and Jung-jae Kim. 2024. Question Difficulty Consistent Knowledge Tracing. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 4239–4248. doi:10.1145/3589334.3645582

work page doi:10.1145/3589334.3645582 2024

[20] [20]

Naiming Liu, Zichao Wang, Richard Baraniuk, and Andrew Lan. 2022. Open- ended Knowledge Tracing for Computer Science Education. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Process- ing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3849–...

work page doi:10.18653/v1/2022.emnlp-main.254 2022

[21] [21]

Xin Liu, Muhammad Khalifa, and Lu Wang. 2023. BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 186–200

work page 2023

[22] [22]

AI @ Meta Llama Team. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019

[24] [24]

Cristina Maier, Ryan Baker, and Steve Stalzer. 2021. Challenges to Applying Performance Factor Analysis to Existing Learning Systems

work page 2021

[25] [25]

Steven Moore, Robin Schmucker, Tom Mitchell, and John Stamper. 2024. Auto- mated generation and tagging of knowledge components from multiple-choice questions. InProceedings of the eleventh ACM conference on learning@ scale. 122–133

work page 2024

[26] [26]

Allen Newell and Paul S Rosenbloom. 2013. Mechanisms of skill acquisition and the law of practice. InCognitive skills and their acquisition. Psychology Press, 1–55

work page 2013

[27] [27]

OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/

work page 2024

[28] [28]

Yilmazcan Ozyurt, Stefan Feuerriegel, and Mrinmaya Sachan. 2024. Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.arXiv preprint arXiv:2410.01727(2024)

work page arXiv 2024

[29] [29]

Shalini Pandey and George Karypis. 2019. A self attentive model for knowledge tracing. InProc. Int. Conf. Educ. Data Mining. 384–389

work page 2019

[30] [30]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318

work page 2002

[31] [31]

Zachary A Pardos and Anant Dadu. 2017. Imputing KCs with representations of problem content and context. InProceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 148–155

work page 2017

[32] [32]

Zach A Pardos and Neil T Heffernan. 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. InProc. Int. Conf. User Model. Adaptation Personalization. 255–266

work page 2010

[33] [33]

Philip I Pavlik, Hao Cen, and Kenneth R Koedinger. 2009. Performance factors analysis–a new alternative to knowledge tracing. InArtificial intelligence in education. Ios Press, 531–538

work page 2009

[34] [34]

Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems28 (2015)

work page 2015

[35] [35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019

[36] [36]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE] https://arxiv.org/abs/2009.10297

work page internal anchor Pith review Pith/arXiv arXiv 2020

[37] [37]

Kelly Rivers, Erik Harpstead, and Kenneth R Koedinger. 2016. Learning curve analysis for programming: Which concepts do students struggle with?. InICER, Vol. 16. ACM, 143–151

work page 2016

[38] [38]

Baker, and Andrew Lan

Alexander Scarlatos, Ryan S. Baker, and Andrew Lan. 2025. Exploring Knowl- edge Tracing in Tutor-Student Dialogues using LLMs. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

work page 2025

[39] [39]

Yang Shi, Min Chi, Tiffany Barnes, and Thomas Price. 2022. Code-DKT: A Code- based Knowledge Tracing Model for Programming Tasks. InProceedings of the 15th International Conference on Educational Data Mining, Antonija Mitrovic and Nigel Bosch (Eds.). International Educational Data Mining Society, Durham, United Kingdom, 50–61. doi:10.5281/zenodo.6853105

work page doi:10.5281/zenodo.6853105 2022

[40] [40]

Yang Shi, Robin Schmucker, Min Chi, Tiffany Barnes, and Thomas Price. 2023. KC-Finder: Automated Knowledge Component Discovery for Programming Problems.International Educational Data Mining Society(2023)

work page 2023

[41] [41]

Yang Shi, Robin Schmucker, Keith Tran, John Bacher, Kenneth Koedinger, Thomas Price, Min Chi, and Tiffany Barnes. 2024. The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data. Journal of Educational Data Mining16, 1 (2024), 1–33

work page 2024

[42] [42]

Dongmin Shin, Yugeun Shim, Hangyeol Yu, Seewoo Lee, Byungsoo Kim, and Youngduck Choi. 2021. Saint+: Integrating temporal features for ednet correct- ness prediction. In11th Int. Learn. Analytics Knowl. Conf.490–496

work page 2021

[43] [43]

George S Snoddy. 1926. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications.Journal of Applied Psychology 10, 1 (1926), 1

work page 1926

[44] [44]

Jianwen Sun, Fenghua Yu, Qian Wan, Qing Li, Sannyuya Liu, and Xiaoxuan Shen

work page

[45] [45]

InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24)

Interpretable Knowledge Tracing with Multiscale State Representation. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 3265–3276. doi:10.1145/3589334.3645373

work page doi:10.1145/3589334.3645373 2024

[46] [46]

Xinjie Sun, Qi Liu, Kai Zhang, Shen Shuanghong, Lina Yang, and Hui Li. 2025. Harnessing code domain insights: Enhancing programming Knowledge Tracing with Large Language Models.Knowledge-Based Systems317 (04 2025), 113396. doi:10.1016/j.knosys.2025.113396

work page doi:10.1016/j.knosys.2025.113396 2025

[47] [47]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-th...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[48] [48]

Yang Yang, Jian Shen, Yanru Qu, Yunfei Liu, Kerong Wang, Yaoming Zhu, Weinan Zhang, and Yong Yu. 2020. GIKT: A Graph-based Interaction Model for Knowl- edge Tracing. InProc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases

work page 2020

[49] [49]

Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon. 2013. Indi- vidualized bayesian knowledge tracing models. InInt. Conf. artif. intell. educ. Springer, 171–180

work page 2013

[50] [50]

Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key- value memory networks for knowledge tracing. InProc. Int. Conf. World Wide Web. 765–774

work page 2017

[51] [51]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. (2019), 783–794

work page 2019

[52] [52]

Less coverage

R. Zhu, D. Zhang, C. Han, M. Gaol, X. Lu, W. Qian, and A. Zhou. 2022. Program- ming Knowledge Tracing: A Comprehensive Dataset and A New Model. In2022 IEEE International Conference on Data Mining Workshops (ICDMW). 298–307. A Human Evaluation Details We conduct a human evaluation to assess (1) the interpretability of generated KCs as a measure of their in...

work page 2022

[53] [53]

B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9

If both sets contain unique KCs that the other is missing which means neither list clearly dominates, select equal-or- greater coverage. B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9. B.2 Prompt for KC Correctness Labeling We show the prompt ...

work page

[54] [54]

Analyze each solution carefully, noting critical constructs

work page

[55] [55]

Reflect step by step on how each solution maps to distinct programming KCs that are independent and reusable

work page

[56] [56]

Use the provided examples as reference for the appropriate level of detail

For each KC, generate a concise name and provide a one-sentence reasoning explaining why this KC is necessary based on the provided solutions. Use the provided examples as reference for the appropriate level of detail. Make sure KCs are generalizable and applicable to a wide range of similar programming problems without referencing problem-specific details

work page

[57] [57]

KC 1": "reasoning

Ensure each KC is atomic and not bundled with others. Your final response must strictly follow this JSON template: { "KC 1": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Knowledge component name", "KC 2": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Another specific knowledge component name", ...} User prompt:...

work page

[58] [58]

Carefully examine all the KCs in the list to ensure none are overlooked

work page

[59] [59]

Reason explicitly whether the KCs collectively refer to the same underlying concept or skill, or if they are related but represent distinct or complementary aspects of a broader theme

work page

[60] [60]

reasoning

Based on your reasoning: - If the KCs refer to the same concept or skill, select one KC from the list that best represents the group — choose the one that is most clearly worded, generalizable, and inclusive of the others. - If the KCs are related but too distinct to be represented by a single KC, create a concise and meaningful summary name that captures...

work page

[61] [61]

Identify all key errors in the student’s code, and describe each error in exactly one sentence

work page

[62] [62]

error reasoning

Assess the student’s mastery of each provided KC in the list based on the incorrect submission. - Reflect on the student’s original incorrect code. - For each KC, return a binary label which equals 1 if the student makes an error on this KC, and equals 0 if not. Your final response must strictly follow this JSON template: {"error reasoning": [ "First erro...

work page