pith. machine review for the scientific record. sign in

arxiv: 2605.09369 · v1 · submitted 2026-05-10 · 💻 cs.AI

Recognition: unknown

Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:02 UTC · model grok-4.3

classification 💻 cs.AI
keywords knowledge tracingexplainable AIprobabilistic embeddingsBeta distributionlogical reasoningstudent performance predictioneducational data mining
0
0 comments X

The pith

PLKT represents student knowledge states as Beta-distributed probabilistic embeddings and applies explicit logical operations on historical interactions to produce accurate predictions with transparent reasoning paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Knowledge tracing models use past student interactions to forecast future performance on learning tasks. Deep models have improved accuracy but typically rely on opaque vector embeddings that obscure how specific behaviors drive each prediction. PLKT instead encodes knowledge states with Beta distributions to capture uncertainty and treats prediction as goal-conditioned evidence reasoning that applies logical operations such as conjunction directly to past behaviors. This construction yields explicit reasoning paths that link individual historical actions to the output while experiments report gains over prior KT methods in both accuracy and interpretability. Educators could therefore examine the paths to understand the basis for a predicted success or failure.

Core claim

PLKT formulates prediction as a goal-conditioned evidence reasoning process over historical learning behaviors, replacing deterministic vector embeddings with robust Beta-distributed probabilistic embeddings that support explicit logical operations such as conjunction and thereby construct transparent reasoning paths from specific past interactions to the final prediction.

What carries the argument

Beta-distributed probabilistic embeddings that enable direct logical operations on historical interaction sequences within a goal-conditioned reasoning process.

If this is right

  • PLKT reports higher predictive accuracy than state-of-the-art KT methods on standard benchmarks.
  • The model produces explicit reasoning paths that show how particular past interactions contribute to each prediction.
  • Uncertainty in historical behaviors is modeled directly through the Beta distributions rather than hidden in deterministic vectors.
  • Logical operations such as conjunction can be applied transparently to the probabilistic representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit paths could let teachers identify recurring patterns of error in a student's record and target specific remediation.
  • The same probabilistic-plus-logical structure might transfer to other sequential prediction domains that need both accuracy and auditability.
  • If the Beta distributions prove stable across datasets, the approach could reduce reliance on post-hoc explanation techniques in educational AI.

Load-bearing premise

Robust Beta-distributed embeddings together with explicit logical operations will simultaneously raise predictive accuracy and deliver genuinely human-interpretable reasoning paths without creating new opacity or overfitting to the chosen patterns.

What would settle it

An experiment in which PLKT matches or exceeds baseline accuracy but human experts rate its generated reasoning paths as no more aligned with actual student learning sequences than the paths of a standard black-box KT model.

Figures

Figures reproduced from arXiv: 2605.09369 by Cong Xu, Siyu Wu, Wei Zhang.

Figure 1
Figure 1. Figure 1: Illustration of interpretable knowledge tracing via point [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overview of the proposed PLKT framework. The right side of the figure shows the level-wise prediction scores ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AUC performance of PLKT with varying pattern-levels [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Interpretable case study: distribution of pattern contribution for [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Interpretable case study: distribution of pattern contribution for [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: AUC performance of PLKT with varying pattern-levels on [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity analysis of hyperparameter λ across four benchmark datasets. B.4 Case Study of Explainability [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Knowledge Tracing (KT) models students' knowledge states based on learning interactions to predict performance. While deep learning-based KT models have boosted predictive accuracy, most models rely on deterministic vector embeddings and opaque latent state transitions, limiting interpretability regarding how specific past behaviors influence predictions. To address this limitation, we propose Probabilistic Logical Knowledge Tracing (PLKT), an interpretable KT framework that formulates prediction as a goal-conditioned evidence reasoning process over historical learning behaviors. Instead of representing knowledge states as deterministic vector embeddings, PLKT employs robust Beta-distributed probabilistic embeddings to represent student knowledge states. This probabilistic foundation allows us to model the uncertainty of historical behaviors and perform explicit logical operations (e.g., conjunction), constructing transparent reasoning paths that reveal how specific past interactions contribute to the prediction. Extensive experiments show that PLKT outperforms state-of-the-art KT methods while achieving superior interpretability. Our code is available at https://anonymous.4open.science/r/PLKT-D3CE/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Probabilistic Logical Knowledge Tracing (PLKT), a KT framework that replaces deterministic embeddings with Beta-distributed probabilistic embeddings for student knowledge states. Prediction is formulated as goal-conditioned evidence reasoning over historical interactions using explicit logical operations (e.g., conjunction) on these distributions to produce transparent reasoning paths. The central claim is that PLKT simultaneously achieves higher predictive accuracy than SOTA KT methods and superior interpretability, supported by extensive experiments; code is released.

Significance. If the empirical claims and interpretability validation hold, the work would meaningfully advance interpretable educational AI by combining probabilistic uncertainty modeling with logical pattern-based reasoning. This addresses a key limitation of deep KT models (opacity of latent transitions) while preserving or improving accuracy. The release of code is a positive step toward reproducibility.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): The abstract asserts that 'extensive experiments show that PLKT outperforms state-of-the-art KT methods' yet reports no numerical results, specific baselines, effect sizes, or statistical significance tests. This makes the central performance claim impossible to evaluate from the provided text; the full manuscript must include these details (e.g., AUC, accuracy deltas, and p-values) for the outperformance assertion to be load-bearing.
  2. [§3.2] §3.2 (Logical Operations on Beta Embeddings): The claim that explicit logical operations (conjunction etc.) on Beta distributions yield 'transparent reasoning paths' is not accompanied by any quantitative interpretability metric (explanation fidelity, rule coverage, or inter-rater agreement with educators) or user study. If operations are implemented via t-norms or moment-matching, the propagated Beta parameters may remain non-intuitive, undermining the interpretability advantage over attention weights.
  3. [§3.1] §3.1 (Beta-distributed Embeddings): The framework relies on free Beta shape parameters (as noted in the axiom ledger) yet presents the approach as introducing new components rather than re-expressing fitted quantities. The manuscript must clarify whether performance gains reduce to these parameters by construction and provide ablation results isolating the contribution of the probabilistic embedding versus the logical reasoning layer.
minor comments (2)
  1. [§3] Notation for Beta parameters (α, β) should be explicitly defined at first use and distinguished from any other shape parameters in the model.
  2. [Abstract and §3] The abstract mentions 'pattern-based reasoning' but the full text should include a clear definition or pseudocode for how historical behaviors are mapped to logical patterns.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The abstract asserts that 'extensive experiments show that PLKT outperforms state-of-the-art KT methods' yet reports no numerical results, specific baselines, effect sizes, or statistical significance tests. This makes the central performance claim impossible to evaluate from the provided text; the full manuscript must include these details (e.g., AUC, accuracy deltas, and p-values) for the outperformance assertion to be load-bearing.

    Authors: We agree that the abstract should be more self-contained with concrete performance evidence. In the revised version we will expand the abstract to report key results, including AUC and accuracy improvements over the main baselines (DKT, AKT, and others), the magnitude of the deltas, and confirmation that statistical significance was assessed via paired t-tests (p < 0.05) as detailed in §4. The full experimental tables, baselines, effect sizes, and p-values are already present in §4; the revision will simply surface the most salient numbers in the abstract for immediate evaluability. revision: yes

  2. Referee: [§3.2] §3.2 (Logical Operations on Beta Embeddings): The claim that explicit logical operations (conjunction etc.) on Beta distributions yield 'transparent reasoning paths' is not accompanied by any quantitative interpretability metric (explanation fidelity, rule coverage, or inter-rater agreement with educators) or user study. If operations are implemented via t-norms or moment-matching, the propagated Beta parameters may remain non-intuitive, undermining the interpretability advantage over attention weights.

    Authors: We acknowledge the absence of quantitative interpretability metrics in the original submission. We will add a new paragraph in §4 that quantifies explanation fidelity on synthetic data (where ground-truth logical patterns are known) by measuring how often the constructed reasoning paths recover the injected conjunction/disjunction structure, together with rule-coverage statistics. We also clarify in §3.2 that the logical operations are realized via closed-form Beta-parameter updates (lower-bound min for conjunction, etc.) rather than generic t-norms, so the resulting distributions remain directly interpretable as combined evidence. A full educator user study lies outside the scope of the current work and will be noted as future research; we therefore mark this revision as partial. revision: partial

  3. Referee: [§3.1] §3.1 (Beta-distributed Embeddings): The framework relies on free Beta shape parameters (as noted in the axiom ledger) yet presents the approach as introducing new components rather than re-expressing fitted quantities. The manuscript must clarify whether performance gains reduce to these parameters by construction and provide ablation results isolating the contribution of the probabilistic embedding versus the logical reasoning layer.

    Authors: We appreciate the request for clarification. The Beta shape parameters are indeed learned end-to-end, but the modeling contribution is the use of full Beta distributions that enable the subsequent logical operations; deterministic embeddings cannot support the same explicit reasoning. We will revise §3.1 to state this distinction explicitly. In addition, we will insert two ablation experiments in §4: (1) replacing the logical-reasoning layer with simple aggregation while keeping Beta embeddings, and (2) replacing Beta embeddings with deterministic vectors while retaining the logical layer. These results will demonstrate that both components are necessary for the observed gains and that the improvements are not reducible to the mere presence of extra parameters. revision: yes

Circularity Check

0 steps flagged

No circularity; new probabilistic embedding and reasoning framework is self-contained without reduction to fitted inputs or self-citations

full rationale

The abstract and description introduce PLKT as a novel framework that replaces deterministic embeddings with Beta-distributed probabilistic ones and adds explicit logical operations for goal-conditioned reasoning. No equations, derivation steps, or self-citations are visible that would allow any performance claim or interpretability path to reduce by construction to prior fitted parameters or author-defined uniqueness theorems. The central claims rest on experimental outperformance and the explicitness of the new components rather than re-labeling or self-referential fitting, making the derivation independent and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the assumption that Beta distributions can faithfully represent uncertain knowledge states and that logical operations over them yield both accuracy and interpretability; no specific numerical free parameters are named.

free parameters (1)
  • Beta distribution shape parameters
    Used to encode uncertainty in student knowledge states; these are learned during training and constitute fitted values whose specific forms are not detailed in the abstract.
axioms (2)
  • domain assumption Beta distributions provide a robust probabilistic representation of knowledge state uncertainty
    Invoked when replacing deterministic vector embeddings with Beta embeddings to model historical behavior uncertainty.
  • domain assumption Explicit logical operations such as conjunction can be performed directly on probabilistic embeddings to construct transparent reasoning paths
    Required for the goal-conditioned evidence reasoning process described in the abstract.
invented entities (1)
  • Probabilistic Logical Knowledge Tracing (PLKT) framework no independent evidence
    purpose: To combine probabilistic embeddings with logical reasoning for interpretable knowledge tracing
    New named framework introduced to address limitations of prior deterministic KT models

pith-pipeline@v0.9.0 · 5461 in / 1554 out tokens · 67338 ms · 2026-05-12T03:02:15.735752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    cskt: Addressing cold-start problem in knowledge tracing via kernel bias and cone attention.Expert Systems with Applications, 266:125988,

    [Baiet al., 2025 ] Youheng Bai, Xueyi Li, Zitao Liu, Yaying Huang, Teng Guo, Mingliang Hou, Feng Xia, and Weiqi Luo. cskt: Addressing cold-start problem in knowledge tracing via kernel bias and cone attention.Expert Systems with Applications, 266:125988,

  2. [2]

    Knowledge-enhanced multi-view graph neural networks for session-based recommendation

    [Chenet al., 2023 ] Qian Chen, Zhiqiang Guo, Jianjun Li, and Guohui Li. Knowledge-enhanced multi-view graph neural networks for session-based recommendation. In Proceedings of the 46th international ACM SIGIR con- ference on research and development in information re- trieval, pages 352–361,

  3. [3]

    Uncertainty-aware knowledge tracing

    [Chenget al., 2025 ] Weihua Cheng, Hanwen Du, Chunxiao Li, Ersheng Ni, Liangdi Tan, Tianqi Xu, and Yongxin Ni. Uncertainty-aware knowledge tracing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27905–27913,

  4. [4]

    Towards an appropriate query, key, and value computation for knowledge tracing

    [Choiet al., 2020 ] Youngduck Choi, Youngnam Lee, Junghyun Cho, Jineon Baek, Byungsoo Kim, Yeongmin Cha, Dongmin Shin, Chan Bae, and Jaewe Heo. Towards an appropriate query, key, and value computation for knowledge tracing. InProceedings of the seventh ACM conference on learning@ scale, pages 341–344,

  5. [5]

    Towards more accurate and interpretable model: Fusing multiple knowledge relations into deep knowledge tracing.Expert Systems with Applications, 243:122573,

    [Duanet al., 2024 ] Zhiyi Duan, Xiaoxiao Dong, Hengnian Gu, Xiong Wu, Zhen Li, and Dongdai Zhou. Towards more accurate and interpretable model: Fusing multiple knowledge relations into deep knowledge tracing.Expert Systems with Applications, 243:122573,

  6. [6]

    Automated knowledge compo- nent generation and knowledge tracing for coding prob- lems.arXiv preprint arXiv:2502.18632,

    [Duanet al., 2025 ] Zhangqi Duan, Nigel Fernandez, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Rafaella Sampaio de Alencar, Peter Brusilovsky, Bita Akram, and Andrew Lan. Automated knowledge compo- nent generation and knowledge tracing for coding prob- lems.arXiv preprint arXiv:2502.18632,

  7. [7]

    Teachers College Press,

    [Fosnot, 2013] Catherine Twomey Fosnot.Constructivism: Theory, perspectives, and practice. Teachers College Press,

  8. [8]

    Knowledge interaction enhanced knowledge tracing for learner performance prediction

    [Ganet al., 2020 ] Wenbin Gan, Yuan Sun, and Yi Sun. Knowledge interaction enhanced knowledge tracing for learner performance prediction. In2020 7th International conference on behavioural and social computing (BESC), pages 1–6. IEEE,

  9. [9]

    Context-aware attentive knowledge tracing

    [Ghoshet al., 2020 ] Aritra Ghosh, Neil Heffernan, and An- drew S Lan. Context-aware attentive knowledge tracing. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2330–2339,

  10. [10]

    Question difficulty prediction for reading prob- lems in standard tests

    [Huanget al., 2017 ] Zhenya Huang, Qi Liu, Enhong Chen, Hongke Zhao, Mingyong Gao, Si Wei, Yu Su, and Guop- ing Hu. Question difficulty prediction for reading prob- lems in standard tests. InProceedings of the AAAI confer- ence on artificial intelligence, volume 31,

  11. [11]

    Towards robust knowl- edge tracing models via k-sparse attention

    [Huanget al., 2023 ] Shuyan Huang, Zitao Liu, Xiangyu Zhao, Weiqi Luo, and Jian Weng. Towards robust knowl- edge tracing models via k-sparse attention. InProceed- ings of the 46th international ACM SIGIR conference on research and development in information retrieval, pages 2441–2445,

  12. [12]

    Learning consistent representations with temporal and causal enhancement for knowledge tracing.Expert Sys- tems with Applications, 245:123128,

    [Huanget al., 2024 ] Changqin Huang, Hangjie Wei, Qiong- hao Huang, Fan Jiang, Zhongmei Han, and Xiaodi Huang. Learning consistent representations with temporal and causal enhancement for knowledge tracing.Expert Sys- tems with Applications, 245:123128,

  13. [13]

    A method for stochastic optimization

    [Kingaet al., 2015 ] Diederik Kinga, Jimmy Ba Adam, et al. A method for stochastic optimization. InInternational conference on learning representations (ICLR), volume

  14. [14]

    Probabilistic hash embed- dings for online learning of categorical features.arXiv preprint arXiv:2511.20893,

    [Liet al., 2025a ] Aodong Li, Abishek Sankararaman, and Balakrishnan Narayanaswamy. Probabilistic hash embed- dings for online learning of categorical features.arXiv preprint arXiv:2511.20893,

  15. [15]

    Cikt: A collaborative and iterative knowledge trac- ing framework with large language models.arXiv preprint arXiv:2505.17705,

    [Liet al., 2025b ] Runze Li, Siyu Wu, Jun Wang, and Wei Zhang. Cikt: A collaborative and iterative knowledge trac- ing framework with large language models.arXiv preprint arXiv:2505.17705,

  16. [16]

    pykt: a python library to benchmark deep learning based knowl- edge tracing models.Advances in Neural Information Pro- cessing Systems, 35:18542–18555,

    [Liuet al., 2022 ] Zitao Liu, Qiongqiong Liu, Jiahao Chen, Shuyan Huang, Jiliang Tang, and Weiqi Luo. pykt: a python library to benchmark deep learning based knowl- edge tracing models.Advances in Neural Information Pro- cessing Systems, 35:18542–18555,

  17. [17]

    simplekt: a simple but tough-to-beat baseline for knowledge tracing.arXiv preprint arXiv:2302.06881, 2023

    [Liuet al., 2023 ] Zitao Liu, Qiongqiong Liu, Jiahao Chen, Shuyan Huang, and Weiqi Luo. simplekt: a simple but tough-to-beat baseline for knowledge tracing.arXiv preprint arXiv:2302.06881,

  18. [18]

    Pattern-wise transparent sequential recommenda- tion.IEEE Transactions on Knowledge and Data Engi- neering,

    [Maet al., 2025 ] Kun Ma, Cong Xu, Zeyuan Chen, and Wei Zhang. Pattern-wise transparent sequential recommenda- tion.IEEE Transactions on Knowledge and Data Engi- neering,

  19. [19]

    Interpretable knowledge tracing: Simple and efficient student modeling with causal relations

    [Minnet al., 2022 ] Sein Minn, Jill-J ˆenn Vie, Koh Takeuchi, Hisashi Kashima, and Feida Zhu. Interpretable knowledge tracing: Simple and efficient student modeling with causal relations. InProceedings of the AAAI conference on arti- ficial intelligence, volume 36, pages 12810–12818,

  20. [20]

    Predicting students’ performance on intelli- gent tutoring system—personalized clustered bkt (pc-bkt) model

    [Nedungadi and Remya, 2014] Prema Nedungadi and MS Remya. Predicting students’ performance on intelli- gent tutoring system—personalized clustered bkt (pc-bkt) model. In2014 IEEE frontiers in education conference (FIE) proceedings, pages 1–6. IEEE,

  21. [21]

    A Self-Attentive model for Knowledge Tracing

    [Pandey and Karypis, 2019] Shalini Pandey and George Karypis. A self-attentive model for knowledge tracing. arXiv preprint arXiv:1907.06837,

  22. [22]

    Rkt: relation-aware self-attention for knowl- edge tracing

    [Pandey and Srivastava, 2020] Shalini Pandey and Jaideep Srivastava. Rkt: relation-aware self-attention for knowl- edge tracing. InProceedings of the 29th ACM interna- tional conference on information & knowledge manage- ment, pages 1205–1214,

  23. [23]

    Deep knowledge trac- ing.Advances in neural information processing systems, 28,

    [Piechet al., 2015 ] Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. Deep knowledge trac- ing.Advances in neural information processing systems, 28,

  24. [24]

    Beta embeddings for multi-hop logical reasoning in knowledge graphs.Advances in Neural Information Pro- cessing Systems, 33:19716–19726,

    [Ren and Leskovec, 2020] Hongyu Ren and Jure Leskovec. Beta embeddings for multi-hop logical reasoning in knowledge graphs.Advances in Neural Information Pro- cessing Systems, 33:19716–19726,

  25. [25]

    Learning process-consistent knowledge tracing

    [Shenet al., 2021 ] Shuanghong Shen, Qi Liu, Enhong Chen, Zhenya Huang, Wei Huang, Yu Yin, Yu Su, and Shijin Wang. Learning process-consistent knowledge tracing. InProceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1452–1460,

  26. [26]

    A survey of knowledge tracing: Models, variants, and applications.IEEE Transactions on Learn- ing Technologies, 17:1858–1879,

    [Shenet al., 2024 ] Shuanghong Shen, Qi Liu, Zhenya Huang, Yonghe Zheng, Minghao Yin, Minjuan Wang, and Enhong Chen. A survey of knowledge tracing: Models, variants, and applications.IEEE Transactions on Learn- ing Technologies, 17:1858–1879,

  27. [27]

    Enhancing knowledge tracing with question- based contrastive learning.Knowledge-Based Systems, page 113899,

    [Shenet al., 2025 ] Xiaoxuan Shen, Fenghua Yu, Yaqi Liu, Ruxia Liang, Qian Wan, Tianhao Yang, Mengtian Shi, and Jianwen Sun. Enhancing knowledge tracing with question- based contrastive learning.Knowledge-Based Systems, page 113899,

  28. [28]

    Time-and-concept enhanced deep multidimensional item response theory for interpretable knowledge tracing.Knowledge-Based Sys- tems, 218:106819,

    [Suet al., 2021 ] Yu Su, Zeyu Cheng, Pengfei Luo, Jinze Wu, Lei Zhang, Qi Liu, and Shijin Wang. Time-and-concept enhanced deep multidimensional item response theory for interpretable knowledge tracing.Knowledge-Based Sys- tems, 218:106819,

  29. [29]

    Ensemble knowledge tracing: Modeling interac- tions in learning process.Expert Systems with Applica- tions, 207:117680,

    [Sunet al., 2022 ] Jianwen Sun, Rui Zou, Ruxia Liang, Lu Gao, Sannyuya Liu, Qing Li, Kai Zhang, and Lulu Jiang. Ensemble knowledge tracing: Modeling interac- tions in learning process.Expert Systems with Applica- tions, 207:117680,

  30. [30]

    Interpretable knowledge tracing with multiscale state representation

    [Sunet al., 2024b ] Jianwen Sun, Fenghua Yu, Qian Wan, Qing Li, Sannyuya Liu, and Xiaoxuan Shen. Interpretable knowledge tracing with multiscale state representation. In Proceedings of the ACM Web Conference 2024, pages 3265–3276,

  31. [31]

    Exercise recommendation based on knowl- edge concept prediction.Knowledge-Based Systems, 210:106481,

    [Wuet al., 2020 ] Zhengyang Wu, Ming Li, Yong Tang, and Qingyu Liang. Exercise recommendation based on knowl- edge concept prediction.Knowledge-Based Systems, 210:106481,

  32. [32]

    Con- trastive personalized exercise recommendation with rein- forcement learning.IEEE Transactions on Learning Tech- nologies, 17:691–703,

    [Wuet al., 2023 ] Siyu Wu, Jun Wang, and Wei Zhang. Con- trastive personalized exercise recommendation with rein- forcement learning.IEEE Transactions on Learning Tech- nologies, 17:691–703,

  33. [33]

    A comprehensive exploration of personalized learning in smart education: From student modeling to personalized recommendations

    [Wuet al., 2024 ] Siyu Wu, Yang Cao, Jiajun Cui, Runze Li, Hong Qian, Bo Jiang, and Wei Zhang. A comprehensive exploration of personalized learning in smart education: From student modeling to personalized recommendations. arXiv preprint arXiv:2402.01666,

  34. [34]

    Flatformer: A flat transformer knowledge tracing model based on cog- nitive bias injection.arXiv preprint arXiv:2512.06629,

    [Xia and Li, 2025] Xiao-li Xia and Hou-biao Li. Flatformer: A flat transformer knowledge tracing model based on cog- nitive bias injection.arXiv preprint arXiv:2512.06629,

  35. [35]

    Geometric relational embeddings: A survey.arXiv preprint arXiv:2304.11949,

    [Xionget al., 2023 ] Bo Xiong, Mojtaba Nayyeri, Ming Jin, Yunjie He, Michael Cochez, Shirui Pan, and Steffen Staab. Geometric relational embeddings: A survey.arXiv preprint arXiv:2304.11949,

  36. [36]

    Addressing two problems in deep knowledge tracing via prediction-consistent regularization

    [Yeung and Yeung, 2018] Chun-Kit Yeung and Dit-Yan Ye- ung. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. InProceedings of the fifth annual ACM conference on learning at scale, pages 1–10,

  37. [37]

    Deep-irt: Make deep learn- ing based knowledge tracing explainable using item re- sponse theory.arXiv preprint arXiv:1904.11738,

    [Yeung, 2019] Chun-Kit Yeung. Deep-irt: Make deep learn- ing based knowledge tracing explainable using item re- sponse theory.arXiv preprint arXiv:1904.11738,

  38. [38]

    Exploring long- and short-term knowledge state graph representations with adaptive fusion for knowledge tracing.Information Pro- cessing & Management, 62(3):104074,

    [Yuet al., 2025 ] Ganfeng Yu, Zhiwen Xie, Guangyou Zhou, Zhuo Zhao, and Jimmy Xiangji Huang. Exploring long- and short-term knowledge state graph representations with adaptive fusion for knowledge tracing.Information Pro- cessing & Management, 62(3):104074,

  39. [39]

    Dynamic key-value memory networks for knowledge tracing

    [Zhanget al., 2017 ] Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. Dynamic key-value memory networks for knowledge tracing. InProceedings of the 26th inter- national conference on World Wide Web, pages 765–774,

  40. [40]

    Multi- factors aware dual-attentional knowledge tracing

    [Zhanget al., 2021 ] Moyu Zhang, Xinning Zhu, Chunhong Zhang, Yang Ji, Feng Pan, and Changchuan Yin. Multi- factors aware dual-attentional knowledge tracing. InPro- ceedings of the 30th ACM international conference on in- formation & knowledge management, pages 2588–2597,

  41. [41]

    Disentangled knowledge tracing for alleviating cognitive bias

    [Zhouet al., 2025 ] Yiyun Zhou, Zheqi Lv, Shengyu Zhang, and Jingyuan Chen. Disentangled knowledge tracing for alleviating cognitive bias. InProceedings of the ACM on Web Conference 2025, pages 2633–2645,

  42. [42]

    Correct” status for target questions (α− t+1,β − t+1)∈R 2d distribution of “Incorrect

    Notation Description u∈ U,q∈ Q,c∈ C student, question, knowledge concept r∈ {0,1} response of student (1: Correct, 0: Incorrect) d vector dimension hq emb, hc emb question/concept embedding functions α,β∈R d parameters of the Beta distribution (α+ t+1,β + t+1)∈R 2d distribution of “Correct” status for target questions (α− t+1,β − t+1)∈R 2d distribution of...

  43. [43]

    It consists of 27,145 students, 50,966 questions, 245 knowledge concepts, and 2,627,118 interaction records. Owing to its scale and dense interaction structure, ASSIST12 is frequently used to evaluate the scalability and performance of knowledge tracing models in large learning environ- ments. •Junyi 3: The Junyi dataset is collected from the Junyi Academ...