pith. machine review for the scientific record. sign in

arxiv: 2604.24758 · v2 · submitted 2026-04-27 · 💻 cs.HC · cs.AI· cs.CY· cs.ET· cs.LG

Recognition: unknown

Personalized Worked Example Generation from Student Code Submissions Using Pattern-based Knowledge Components

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:00 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CYcs.ETcs.LG
keywords knowledge componentsworked example generationstudent code submissionsAST analysisgenerative modelspersonalized learningprogramming educationadaptive content
0
0 comments X

The pith

Extracting pattern-based knowledge components from student code allows generative models to create worked examples focused on students' specific logical errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a pipeline that extracts recurring structural knowledge component patterns from student programming submissions via abstract syntax tree analysis and feeds those patterns into a generative model to produce worked examples. The method aims to replace fixed, pre-authored content libraries with examples that directly address the concepts and errors evident in a given student's code. A sympathetic reader would care because current systems often deliver mismatched materials that ignore partial solutions and logical mistakes, forcing instructors to choose between high authoring costs and low personalization. Expert evaluation of the outputs indicates that the conditioned examples achieve stronger topical focus and better alignment with underlying errors than baseline generations. The result is presented as evidence that knowledge-component steering can enable scalable personalization in programming education.

Core claim

The central claim is that pattern-based knowledge components extracted from student code submissions through AST-based analysis can condition a generative model to produce worked examples. These conditioned examples exhibit improved topical focus and greater relevance to students' underlying logical errors compared with unconditioned baseline outputs, according to expert evaluation. The approach is positioned as a way to reduce reliance on fixed content libraries while supporting personalized learning at scale.

What carries the argument

Pattern-based knowledge components extracted via AST analysis of student code submissions, which identify recurring structural patterns used to condition a generative model for worked example production.

Load-bearing premise

The recurring structural patterns identified by AST analysis of student code accurately capture the logical errors and concepts students are working to understand, without the conditioning step introducing new misconceptions.

What would settle it

A controlled comparison in which expert raters judge KC-conditioned worked examples as no more relevant or topically focused than baseline outputs, or in which students show no measurable gain in addressing the targeted errors after viewing the generated examples.

Figures

Figures reproduced from arXiv: 2604.24758 by Arto Hellas, Bita Akram, Griffin Pitts, Juho Leinonen, Muntasir Hoq, Narges Norouzi, Peter Brusilovsky.

Figure 1
Figure 1. Figure 1: Example of a high-level pattern-based KC extracted from view at source ↗
read the original abstract

Adaptive programming practice often relies on fixed libraries of worked examples and practice problems, which require substantial authoring effort and may not correspond well to the logical errors and partial solutions students produce while writing code. As a result, students may receive learning content that does not directly address the concepts they are working to understand, while instructors must either invest additional effort in expanding content libraries or accept a coarse level of personalization. We present an approach for knowledge-component (KC) guided educational content generation using pattern-based KCs extracted from student code. Given a problem statement and student submissions, our pipeline extracts recurring structural KC patterns from students' code through AST-based analysis and uses them to condition a generative model. In this study, we apply this approach to worked example generation, and compare baseline and KC-conditioned outputs through expert evaluation. Results suggest that KC-conditioned generation improves topical focus and relevance to students' underlying logical errors, providing evidence that KC-based steering of generative models can support personalized learning at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a method for personalized worked example generation in programming education. It extracts pattern-based knowledge components (KCs) from student code submissions using AST analysis to identify recurring structural patterns. These KCs are used to condition a generative model to create worked examples that address students' specific logical errors. The approach is tested by having experts compare KC-conditioned outputs to baseline generations, with results indicating better topical focus and relevance for the KC-conditioned examples.

Significance. If the findings are substantiated, the work offers a promising direction for scaling personalized learning in computer science education by automating the creation of tailored content from student data. It builds on knowledge component modeling and generative AI, potentially reducing the need for static example libraries. The integration of AST-based structural analysis is a notable technical choice that could generalize to other code-related tasks. However, the current evidence base is preliminary due to limited details on the evaluation.

major comments (2)
  1. [Abstract] The abstract asserts that 'expert evaluation found KC-conditioned outputs more relevant', but omits critical details including the number of experts, the specific evaluation protocol, inter-rater agreement metrics, and any statistical tests performed. This absence weakens the support for the central claim of improved relevance to logical errors.
  2. [KC Extraction Pipeline] The extraction of pattern-based KCs relies on AST analysis of student submissions to identify recurring structures, yet the manuscript provides no separate validation step, such as expert annotation of logical misconceptions in the code followed by a comparison to the extracted patterns. This leaves the key assumption—that these patterns accurately capture the concepts students need—untested and potentially vulnerable to the alternative explanation that improvements stem from generic topic cues.
minor comments (1)
  1. [Terminology] The acronym 'KC' for knowledge components is used extensively without an initial definition or reference to standard KC literature in the introduction, which could hinder accessibility for readers outside educational technology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [Abstract] The abstract asserts that 'expert evaluation found KC-conditioned outputs more relevant', but omits critical details including the number of experts, the specific evaluation protocol, inter-rater agreement metrics, and any statistical tests performed. This absence weakens the support for the central claim of improved relevance to logical errors.

    Authors: We agree that the abstract should provide these details to better substantiate the central claim. In the revised version, we will expand the abstract to state that the evaluation involved three computer science education experts performing pairwise comparisons of KC-conditioned versus baseline examples on topical focus and relevance to logical errors. Inter-rater agreement will be reported via Fleiss' kappa, and we will note that a Wilcoxon signed-rank test showed statistically significant preference for the KC-conditioned outputs. These specifics are already detailed in the evaluation section and will now be summarized in the abstract. revision: yes

  2. Referee: [KC Extraction Pipeline] The extraction of pattern-based KCs relies on AST analysis of student submissions to identify recurring structures, yet the manuscript provides no separate validation step, such as expert annotation of logical misconceptions in the code followed by a comparison to the extracted patterns. This leaves the key assumption—that these patterns accurately capture the concepts students need—untested and potentially vulnerable to the alternative explanation that improvements stem from generic topic cues.

    Authors: This comment correctly notes the absence of a direct validation step comparing extracted patterns to expert-labeled misconceptions. Our current evidence is indirect, relying on expert judgments that KC-conditioned examples better address logical errors than baselines that share the same problem statement (thus controlling for generic topic information). In revision, we will add a dedicated subsection with concrete examples illustrating how specific AST patterns map to recurring logical errors, plus a limitations paragraph acknowledging the lack of direct KC validation and outlining it as future work. We will not claim the patterns are exhaustively validated but will clarify their structural rationale. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation relies on external expert judgment

full rationale

The paper describes an extraction pipeline for pattern-based KCs via AST analysis of student submissions, uses those to condition a generative model for worked examples, and reports results from expert preference comparisons between baseline and KC-conditioned outputs. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or uniqueness theorems appear in the provided text. The central claim rests on an independent expert evaluation step rather than any reduction of outputs to the inputs by construction, satisfying the criteria for a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on two domain assumptions: that AST structural patterns constitute valid knowledge components for programming concepts, and that conditioning a generative model on these patterns produces educationally superior output. No free parameters or new physical entities are introduced.

axioms (2)
  • domain assumption AST-based structural patterns extracted from student code correspond to the logical errors and concepts students need to learn
    Invoked when the pipeline uses these patterns to condition the generative model.
  • domain assumption Expert judgment of topical focus and relevance is a sufficient proxy for educational effectiveness
    Used to interpret the comparison between baseline and KC-conditioned outputs.
invented entities (1)
  • pattern-based knowledge components no independent evidence
    purpose: To label recurring structural patterns in student code for use as conditioning signals
    Introduced as the novel representation extracted via AST analysis; no independent falsifiable evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5501 in / 1415 out tokens · 46026 ms · 2026-05-08T02:00:48.316903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 2 canonical work pages

  1. [1]

    Paul Chandler and John Sweller. 1991. Cognitive load theory and the format of instruction.Cognition and instruction8, 4 (1991), 293–332

  2. [2]

    Andre del Carpio Gutierrez, Paul Denny, and Andrew Luxton-Reilly. 2024. Au- tomating personalized parsons problems with customized contexts and concepts. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 688–694

  3. [3]

    Stephen H Edwards and Krishnan Panamalai Murali. 2017. CodeWorkout: short programming exercises with built-in data collection. InProceedings of the 2017 ACM conference on innovation and technology in computer science education. 188– 193

  4. [4]

    Muntasir Hoq, Atharva Patil, Kamil Akhuseyinoglu, Peter Brusilovsky, and Bita Akram. 2025. An automated approach to recommending relevant worked ex- amples for programming problems. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1. 527–533

  5. [5]

    Muntasir Hoq, Griffin Pitts, Tirth Bhatt, Aum Pandya, Andrew Lan, Peter Brusilovsky, and Bita Akram. 2025. Pattern-based Knowledge Component Ex- traction from Student Code Using Representation Learning.arXiv preprint arXiv:2508.09281(2025)

  6. [6]

    Muntasir Hoq, Ananya Rao, Reisha Jaishankar, Krish Piryani, Nithya Janapati, Jessica Vandenberg, et al. 2025. Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions. InProceedings of the 18th International Conference on Educational Data Mining. 90–103

  7. [7]

    Xinying Hou, Zihan Wu, Xu Wang, and Barbara J Ericson. 2024. Codetailor: Llm-powered personalized parsons puzzles for engaging support while learning programming. InProceedings of the Eleventh ACM Conference on Learning@ Scale. 51–62

  8. [8]

    Breanna Jury, Angela Lorusso, Juho Leinonen, Paul Denny, and Andrew Luxton- Reilly. 2024. Evaluating llm-generated worked examples in an introductory programming course. InProceedings of the 26th Australasian computing education conference. 77–86

  9. [9]

    Slava Kalyuga, Paul Ayres, Paul Chandler, and John Sweller. 2003. The Expertise Reversal Effect.Educational Psychologist38, 1 (2003), 23–31. arXiv:https://doi.org/10.1207/S15326985EP3801_4 doi:10.1207/S15326985EP3 801_4

  10. [10]

    Kenneth R Koedinger, Albert T Corbett, and Charles Perfetti. 2012. The Knowledge-Learning-Instruction framework: Bridging the science-practice chasm to enhance robust student learning.Cognitive Science36, 5 (2012), 757–798

  11. [11]

    J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data.biometrics(1977), 159–174

  12. [12]

    Vincent Lindvall and Sofia Marcus. 2024. Using Large Language Models to Generate Worked Examples of CS2-Level Programming Questions

  13. [13]

    Kasia Muldner, Jay Jennings, and Veronica Chiarelli. 2022. A review of worked examples in programming activities.ACM Transactions on Computing Education 23, 1 (2022), 1–35

  14. [14]

    Griffin Pitts, Anurata Prabha Hridi, and Arun Balajiee Lekshmi Narayanan. 2025. A Survey of LLM-Based Applications in Programming Education: Balancing Automation and Human Oversight. InProceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+ NLP). 255–262

  15. [15]

    Ben Skudder and Andrew Luxton-Reilly. 2014. Worked examples in computer sci- ence. InProceedings of the Sixteenth Australasian Computing Education Conference- Volume 148. 59–64

  16. [16]

    2025.Personalising LLM-Generated Worked Examples based on Skill Level in Introductory Programming

    Yfke Smit. 2025.Personalising LLM-Generated Worked Examples based on Skill Level in Introductory Programming. Master’s thesis

  17. [17]

    John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science12, 2 (1988), 257–285. A Prompt Templates https://osf.io/4h9dn/overview?view_only=97189a73e56b4254bd22 98669b6eabc4