pith. sign in

arxiv: 2502.18632 · v4 · pith:M2P5HBDQnew · submitted 2025-02-25 · 💻 cs.AI · cs.CL· cs.CY· cs.LG· cs.SE

Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

Pith reviewed 2026-05-23 01:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CYcs.LGcs.SE
keywords knowledge tracingknowledge componentsLLMcoding problemsautomated generationstudent modelingprogramming educationinterpretable models
0
0 comments X

The pith

LLM-generated knowledge components for coding problems enable more accurate prediction of future student responses than human-written ones in knowledge tracing models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated pipeline that uses large language models to generate and tag knowledge components for open-ended programming problems, eliminating the need for manual expert labeling. It then applies these components in a knowledge tracing framework called KCGen-KT to track student mastery of specific skills. Experiments on two real student code datasets show KCGen-KT predicts future responses better than prior KT methods and those using human KCs. Learning curve analysis indicates the generated components fit cognitive models more closely than human alternatives. Instructors reviewing the mappings judged them reasonably accurate.

Core claim

KCGen-KT, which relies on LLM-generated knowledge components for programming problems, achieves superior performance in predicting future student responses compared to existing knowledge tracing methods and models built on human-written KCs, while also yielding better model fit under cognitive learning curve analysis.

What carries the argument

Automated LLM-based pipeline for generating and tagging knowledge components, integrated into the KCGen-KT knowledge tracing framework.

If this is right

  • KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction.
  • LLM-generated KCs produce a better fit than human KCs when evaluated under a cognitive model using learning curves.
  • The pipeline produces problem-KC mappings that course instructors rate as reasonably accurate.
  • The approach scales KC creation without requiring domain experts for each new problem set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • New courses or languages could adopt KT without upfront expert effort to define skills.
  • Generated KCs might surface skill relationships that human experts overlook in coding education.
  • The method could support dynamic updates to KCs as more student data arrives over time.

Load-bearing premise

The generated components reflect stable, educationally meaningful skills rather than patterns that only fit the specific datasets used.

What would settle it

KCGen-KT fails to outperform human-KC baselines on a held-out dataset from a different course or programming language.

Figures

Figures reproduced from arXiv: 2502.18632 by Andrew Lan, Arun Balajiee Lekshmi Narayanan, Bita Akram, Mohammad Hassany, Nigel Fernandez, Peter Brusilovsky, Rafaella Sampaio de Alencar, Zhangqi Duan.

Figure 1
Figure 1. Figure 1: Illustration of our three-step automated KC generation and tagging pipeline. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our KCGen-KT’s model with the Llama 3 LLM as the backbone. KCGen-KT leverages KC semantics, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative learning curves for three generated KCs (Equality Comparison, String Length Determination, and For [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A section of the generated KC ontology (related to [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor intensive. We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on two real-world student code submission datasets in different programming languages.We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper presents an LLM-based pipeline (KCGen) for automatically generating and tagging knowledge components (KCs) for open-ended programming problems, along with an LLM-augmented knowledge tracing model (KCGen-KT) that uses these KCs. On two real-world student code submission datasets, it reports that KCGen-KT outperforms standard KT baselines and human-written KCs on next-response prediction, that the generated KCs yield better fits to learning curves under a cognitive model, and that instructor raters judge the problem-KC mappings as reasonably accurate.

Significance. If the performance gains reflect stable, educationally meaningful skills rather than dataset-specific partitioning, the work would meaningfully lower the barrier to fine-grained KT in programming education by automating what is currently expert labor. The dual quantitative (prediction and learning-curve) plus qualitative (instructor) evaluation is a strength, but the absence of transfer tests limits the strength of the claim that the KCs are “skills” in the intended sense.

major comments (2)
  1. [Evaluation section] Evaluation section: the reported outperformance on the two datasets is not accompanied by any transfer evaluation (new student cohorts, later semesters, or held-out problems). Without such tests it remains possible that the accuracy lift and improved cognitive-model fit arise from finer, data-aligned partitioning that matches observed sequences rather than from stable skills, directly undermining the central claim that LLM-generated KCs capture educationally meaningful skills.
  2. [§4] §4 (KCGen-KT framework): the description of how LLM-generated KCs are injected into the KT model does not clarify whether the KC embeddings or the tracing parameters are re-fit on the same data used to generate the KCs, raising the risk that reported gains partly reflect leakage or post-hoc alignment rather than genuine predictive improvement.
minor comments (3)
  1. [Abstract / Introduction] The abstract and introduction use “LLM-based KT framework” without immediately distinguishing which components are LLM-generated versus which are conventional KT parameters; a short clarifying sentence would help.
  2. [Results tables] Table or figure captions for the quantitative results should explicitly state the number of students, problems, and submissions per dataset and whether splits are temporal or random.
  3. [Human evaluation subsection] The human-evaluation protocol (number of instructors, rating scale, inter-rater agreement) is mentioned but not detailed enough to assess reliability; add these statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review. We address each major comment below, clarifying our current evaluations and framework while acknowledging limitations where appropriate.

read point-by-point responses
  1. Referee: [Evaluation section] the reported outperformance on the two datasets is not accompanied by any transfer evaluation (new student cohorts, later semesters, or held-out problems). Without such tests it remains possible that the accuracy lift and improved cognitive-model fit arise from finer, data-aligned partitioning that matches observed sequences rather than from stable skills, directly undermining the central claim that LLM-generated KCs capture educationally meaningful skills.

    Authors: We agree that transfer evaluations on new cohorts, semesters, or held-out problems would provide stronger evidence that the KCs represent stable, educationally meaningful skills rather than dataset-specific partitions. Our current results show improved next-response prediction, better cognitive-model fit to learning curves, and instructor-validated mappings, but these are within the two provided datasets. We will add an explicit discussion of this limitation and the value of future transfer tests in the revised manuscript. revision: partial

  2. Referee: §4 (KCGen-KT framework): the description of how LLM-generated KCs are injected into the KT model does not clarify whether the KC embeddings or the tracing parameters are re-fit on the same data used to generate the KCs, raising the risk that reported gains partly reflect leakage or post-hoc alignment rather than genuine predictive improvement.

    Authors: KC generation is performed exclusively on problem statements via the LLM pipeline and does not use any student response data. The resulting KCs are fixed inputs to KCGen-KT, with model parameters trained separately on the interaction sequences. We will revise §4 to explicitly state this separation and confirm the absence of leakage between generation and fitting steps. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluations on held-out data and external benchmarks

full rationale

The paper presents an LLM-based pipeline for KC generation and tagging, followed by KCGen-KT evaluation on two real-world student code datasets. Claims rest on quantitative outperformance versus existing KT methods and human-written KCs for future response prediction, plus learning-curve fit and instructor human evaluation. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described methods. Results use held-out student data and external comparators, making the work self-contained against benchmarks rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations or modeling details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5744 in / 1096 out tokens · 47580 ms · 2026-05-23T01:46:23.226488+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    PLKT models student knowledge with Beta probabilistic embeddings and performs explicit logical reasoning over historical interactions to deliver both accurate predictions and interpretable explanations in knowledge tracing.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Tiffany Barnes. 2005. The Q-matrix method: Mining student response data for knowledge. InAmerican association for artificial intelligence 2005 educational data mining workshop. AAAI Press, Pittsburgh, PA, USA, 1–8

  2. [2]

    Norman Bier, Sean Lip, Ross Strader, Candace Thille, and Dawn Zimmaro. 2014. An approach to knowledge component/skill modeling in online courses.Open Learning(2014), 1–14

  3. [3]

    Challenge Organizers. 2021. The 2nd CSEDM Data Challenge. Online: https: //sites.google.com/ncsu.edu/csedm-dc-2021/

  4. [4]

    Albert Corbett and John Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge.User Model. User-adapted Interact.4, 4 (Dec. 1994), 253–278

  5. [5]

    DataShop. 2021. Dataset: CodeWorkout data Spring 2019. Online: https:// pslcdatashop.web.cmu.edu/Files?datasetId=3458

  6. [6]

    Adrian de Freitas, Joel Coffman, Michelle de Freitas, Justin Wilson, and Troy Wein- gart. 2023. FalconCode: A Multiyear Dataset of Python Code Samples from an Introductory Computer Science Course. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Computing Machinery, New ...

  7. [7]

    Zhangqi Duan, Nigel Fernandez, Alexander Hicks, and Andrew Lan. 2025. Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

  8. [8]

    Jing Fan, Tsvetomila Mihaylova, Bita Akram, Narges Norouzi, Peter Brusilovsky, Arto Hellas, and Juho Leinonen. 2025. Adaptive Learning Curve Analytics with LLM-KC Identifiers for Knowledge Component Refinement. InProceedings of the 2025 Conference on UK and Ireland Computing Education Research (UKICER ’25). Association for Computing Machinery, New York, N...

  9. [9]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational L...

  10. [10]

    Nigel Fernandez and Andrew Lan. 2024. Interpreting Latent Student Knowl- edge Representations in Programming Assignments. InProceedings of the 17th International Conference on Educational Data Mining, Benjamin Paaßen and Carrie Demmans Epp (Eds.). International Educational Data Mining Society, Atlanta, Georgia, USA, 933–940. doi:10.5281/zenodo.12730003

  11. [11]

    Nigel Fernandez, Alexander Scarlatos, Wanyong Feng, Simon Woodhead, and Andrew Lan. 2024. DiVERT: Distractor Generation with Variational Errors Repre- sented as Text for Math Multiple-choice Questions. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Assoc...

  12. [12]

    Aritra Ghosh, Neil Heffernan, and Andrew S Lan. 2020. Context-Aware Attentive Knowledge Tracing. InProc. ACM SIGKDD. 2330–2339

  13. [13]

    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural Comput.9, 8 (Nov. 1997), 1735–1780

  14. [14]

    Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky, and Bita Akram. 2023. SANN: programming code representation using attention neural network with optimized subtree extraction. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 783–792

  15. [15]

    Muntasir Hoq, Jessica Vandenberg, Bradford Mott, James Lester, Narges Norouzi, and Bita Akram. 2024. Towards Attention-Based Automatic Misconception Identification in Introductory Programming Courses. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2. 1680–1681

  16. [16]

    Roya Hosseini and Peter Brusilovsky. 2013. Javaparser: A fine-grain concept indexing tool for java problems. InCEUR Workshop Proceedings, Vol. 1009. Uni- versity of Pittsburgh, 60–63

  17. [17]

    Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https://openreview.net/forum?id=nZeVKeeFYf9

  18. [18]

    Yun Huang, Vincent Aleven, Elizabeth McLaughlin, and Kenneth Koedinger. 2020. A general multi-method approach to design-loop adaptivity in intelligent tutoring systems. InArtificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II 21. Springer, 124–129

  19. [19]

    Guimei Liu, Huijing Zhan, and Jung-jae Kim. 2024. Question Difficulty Consistent Knowledge Tracing. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 4239–4248. doi:10.1145/3589334.3645582

  20. [20]

    Naiming Liu, Zichao Wang, Richard Baraniuk, and Andrew Lan. 2022. Open- ended Knowledge Tracing for Computer Science Education. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Process- ing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3849–...

  21. [21]

    Xin Liu, Muhammad Khalifa, and Lu Wang. 2023. BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 186–200

  22. [22]

    AI @ Meta Llama Team. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

  23. [23]

    Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

  24. [24]

    Cristina Maier, Ryan Baker, and Steve Stalzer. 2021. Challenges to Applying Performance Factor Analysis to Existing Learning Systems

  25. [25]

    Steven Moore, Robin Schmucker, Tom Mitchell, and John Stamper. 2024. Auto- mated generation and tagging of knowledge components from multiple-choice questions. InProceedings of the eleventh ACM conference on learning@ scale. 122–133

  26. [26]

    Allen Newell and Paul S Rosenbloom. 2013. Mechanisms of skill acquisition and the law of practice. InCognitive skills and their acquisition. Psychology Press, 1–55

  27. [27]

    OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/

  28. [28]

    Yilmazcan Ozyurt, Stefan Feuerriegel, and Mrinmaya Sachan. 2024. Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.arXiv preprint arXiv:2410.01727(2024)

  29. [29]

    Shalini Pandey and George Karypis. 2019. A self attentive model for knowledge tracing. InProc. Int. Conf. Educ. Data Mining. 384–389

  30. [30]

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318

  31. [31]

    Zachary A Pardos and Anant Dadu. 2017. Imputing KCs with representations of problem content and context. InProceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 148–155

  32. [32]

    Zach A Pardos and Neil T Heffernan. 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. InProc. Int. Conf. User Model. Adaptation Personalization. 255–266

  33. [33]

    Philip I Pavlik, Hao Cen, and Kenneth R Koedinger. 2009. Performance factors analysis–a new alternative to knowledge tracing. InArtificial intelligence in education. Ios Press, 531–538

  34. [34]

    Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems28 (2015)

  35. [35]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

  36. [36]

    Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE] https://arxiv.org/abs/2009.10297

  37. [37]

    Kelly Rivers, Erik Harpstead, and Kenneth R Koedinger. 2016. Learning curve analysis for programming: Which concepts do students struggle with?. InICER, Vol. 16. ACM, 143–151

  38. [38]

    Baker, and Andrew Lan

    Alexander Scarlatos, Ryan S. Baker, and Andrew Lan. 2025. Exploring Knowl- edge Tracing in Tutor-Student Dialogues using LLMs. InProceedings of the 15th Learning Analytics and Knowledge Conference, LAK 2025, Dublin, Ireland, March 3-7, 2025. ACM

  39. [39]

    Yang Shi, Min Chi, Tiffany Barnes, and Thomas Price. 2022. Code-DKT: A Code- based Knowledge Tracing Model for Programming Tasks. InProceedings of the 15th International Conference on Educational Data Mining, Antonija Mitrovic and Nigel Bosch (Eds.). International Educational Data Mining Society, Durham, United Kingdom, 50–61. doi:10.5281/zenodo.6853105

  40. [40]

    Yang Shi, Robin Schmucker, Min Chi, Tiffany Barnes, and Thomas Price. 2023. KC-Finder: Automated Knowledge Component Discovery for Programming Problems.International Educational Data Mining Society(2023)

  41. [41]

    Yang Shi, Robin Schmucker, Keith Tran, John Bacher, Kenneth Koedinger, Thomas Price, Min Chi, and Tiffany Barnes. 2024. The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data. Journal of Educational Data Mining16, 1 (2024), 1–33

  42. [42]

    Dongmin Shin, Yugeun Shim, Hangyeol Yu, Seewoo Lee, Byungsoo Kim, and Youngduck Choi. 2021. Saint+: Integrating temporal features for ednet correct- ness prediction. In11th Int. Learn. Analytics Knowl. Conf.490–496

  43. [43]

    George S Snoddy. 1926. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications.Journal of Applied Psychology 10, 1 (1926), 1

  44. [44]

    Jianwen Sun, Fenghua Yu, Qian Wan, Qing Li, Sannyuya Liu, and Xiaoxuan Shen

  45. [45]

    InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24)

    Interpretable Knowledge Tracing with Multiscale State Representation. InProceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 3265–3276. doi:10.1145/3589334.3645373

  46. [46]

    Xinjie Sun, Qi Liu, Kai Zhang, Shen Shuanghong, Lina Yang, and Hui Li. 2025. Harnessing code domain insights: Enhancing programming Knowledge Tracing with Large Language Models.Knowledge-Based Systems317 (04 2025), 113396. doi:10.1016/j.knosys.2025.113396

  47. [47]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-th...

  48. [48]

    Yang Yang, Jian Shen, Yanru Qu, Yunfei Liu, Kerong Wang, Yaoming Zhu, Weinan Zhang, and Yong Yu. 2020. GIKT: A Graph-based Interaction Model for Knowl- edge Tracing. InProc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases

  49. [49]

    Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon. 2013. Indi- vidualized bayesian knowledge tracing models. InInt. Conf. artif. intell. educ. Springer, 171–180

  50. [50]

    Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key- value memory networks for knowledge tracing. InProc. Int. Conf. World Wide Web. 765–774

  51. [51]

    Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. (2019), 783–794

  52. [52]

    Less coverage

    R. Zhu, D. Zhang, C. Han, M. Gaol, X. Lu, W. Qian, and A. Zhou. 2022. Program- ming Knowledge Tracing: A Comprehensive Dataset and A New Model. In2022 IEEE International Conference on Data Mining Workshops (ICDMW). 298–307. A Human Evaluation Details We conduct a human evaluation to assess (1) the interpretability of generated KCs as a measure of their in...

  53. [53]

    B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9

    If both sets contain unique KCs that the other is missing which means neither list clearly dominates, select equal-or- greater coverage. B Prompt B.1 Prompt for KC Generation Pipeline We show the prompt used for the KC generation in Table 8 and the prompt used for cluster summarization in Table 9. B.2 Prompt for KC Correctness Labeling We show the prompt ...

  54. [54]

    Analyze each solution carefully, noting critical constructs

  55. [55]

    Reflect step by step on how each solution maps to distinct programming KCs that are independent and reusable

  56. [56]

    Use the provided examples as reference for the appropriate level of detail

    For each KC, generate a concise name and provide a one-sentence reasoning explaining why this KC is necessary based on the provided solutions. Use the provided examples as reference for the appropriate level of detail. Make sure KCs are generalizable and applicable to a wide range of similar programming problems without referencing problem-specific details

  57. [57]

    KC 1": "reasoning

    Ensure each KC is atomic and not bundled with others. Your final response must strictly follow this JSON template: { "KC 1": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Knowledge component name", "KC 2": "reasoning": "Reasoning for this KC (exactly 1 sentence)", "name": "Another specific knowledge component name", ...} User prompt:...

  58. [58]

    Carefully examine all the KCs in the list to ensure none are overlooked

  59. [59]

    Reason explicitly whether the KCs collectively refer to the same underlying concept or skill, or if they are related but represent distinct or complementary aspects of a broader theme

  60. [60]

    reasoning

    Based on your reasoning: - If the KCs refer to the same concept or skill, select one KC from the list that best represents the group — choose the one that is most clearly worded, generalizable, and inclusive of the others. - If the KCs are related but too distinct to be represented by a single KC, create a concise and meaningful summary name that captures...

  61. [61]

    Identify all key errors in the student’s code, and describe each error in exactly one sentence

  62. [62]

    error reasoning

    Assess the student’s mastery of each provided KC in the list based on the incorrect submission. - Reflect on the student’s original incorrect code. - For each KC, return a binary label which equals 1 if the student makes an error on this KC, and equals 0 if not. Your final response must strictly follow this JSON template: {"error reasoning": [ "First erro...