pith. machine review for the scientific record. sign in

arxiv: 2605.07040 · v1 · submitted 2026-05-07 · 💻 cs.CL · cs.AI· cs.CY

Recognition: no theorem link

Cognitive Agent Compilation for Explicit Problem Solver Modeling

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:16 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY
keywords cognitive agent compilationexplicit problem solvingeducational AILLM tutoringinspectable agentsbounded knowledgeproblem-solving policyknowledge compilation
0
0 comments X

The pith

Cognitive Agent Compilation compiles a teacher LLM's knowledge into an explicit target agent with separated components for representation, policy, and verification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models struggle in education because their broad pretraining resists the constraints and transparency that tutors and learners require. The paper proposes Cognitive Agent Compilation as a way to use a strong teacher LLM to produce an explicit agent that models problem solving in a structured, editable form. This structure keeps knowledge representation distinct from the problem-solving policy and from the rules that verify and update knowledge. The goal is to let educators see and change what the system assumes a learner knows and how it reasons. An early implementation with small models illustrates the method but also shows the tension between maintaining explicit control and preserving the ability to handle varied problems.

Core claim

CAC separates (i) knowledge representation, (ii) problem-solving policy, and (iii) verification and update rules, with the goal of making bounded problem solving more inspectable and editable in educational settings, by using a strong teacher LLM to compile problem-solving knowledge into an explicit target agent.

What carries the argument

The Cognitive Agent Compilation (CAC) framework, which separates knowledge representation, problem-solving policy, and verification and update rules to produce an inspectable target agent from a teacher LLM.

If this is right

  • Educators can inspect the knowledge states the system attributes to a learner.
  • The system can justify its actions by reference to explicit skills, misconceptions, and strategies.
  • Problem-solving behavior becomes more bounded and modifiable than in unconstrained pretrained LLMs.
  • Implementations must balance the gain in explicit control against any loss in generalization to new problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid setups could route core curriculum tasks to the explicit CAC agent while routing open-ended dialogue to an LLM component.
  • The same separation of representation, policy, and verification might apply to other domains that need transparent decision records, such as diagnostic tutoring or adaptive practice systems.
  • Scaling the teacher model size could be tested to determine whether the explicit-control versus generalization trade-off shrinks.

Load-bearing premise

A strong teacher LLM can reliably compile problem-solving knowledge into an explicit target agent that preserves inspectability and editability while managing the observed trade-off with scalable generalization.

What would settle it

A controlled test in which manual edits to the agent's explicit knowledge or policy components produce no measurable change in behavior on held-out problems, or in which the compiled agent fails to solve problems the teacher LLM itself can solve.

Figures

Figures reproduced from arXiv: 2605.07040 by Carolyn Ros\'e, Hyeongdon Moon, John Stamper.

Figure 1
Figure 1. Figure 1: Overview of the Cognitive Agent Compilation (CAC) framework. The frame￾work operates through a failure-driven learning cycle where the Cognitive Agent at￾tempts a problem , generating a Problem Solving History. The Teacher Large Language Model analyzes these traces for failures or suboptimal paths and distills corrective knowledge into an explicit Knowledge Base (KB) , prompting the agent to retry until su… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the Cognitive Agent. The agent utilizes a Small Language Model (SLM) to process an input prompt containing the current Model State, which is com￾posed of the Goal, Working Memory, and Retrieved Declarative Memory. Operating via a Query-Key-Value mechanism interacting with the Knowledge Base , the SLM makes deterministic decisions to set subgoals, apply retrieved knowledge, or generate a fin… view at source ↗
Figure 3
Figure 3. Figure 3: The CAC Training Pipeline. Following a problem-solving attempt, the Teacher LLM analyzes the Cognitive Agent’s operational traces and utilizes Helper Model Con￾text Protocols (MCPs) to perform Knowledge Base (KB) inference and similarity score calculations. The Teacher LLM iteratively generates and submits new Declarative Memory (DM) candidates to update the KB until the Cognitive Agent successfully resolv… view at source ↗
read the original abstract

Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and editable knowledge states: educators want to know what a system assumes the learner knows, and learners benefit when the system can justify actions in terms of explicit skills, misconceptions, and strategies. Inspired by cognitive architectures, we propose Cognitive Agent Compilation (CAC), a framework that uses a strong teacher LLM to compile problem-solving knowledge into an explicit target agent. CAC separates (i) knowledge representation, (ii) problem-solving policy, and (iii) verification and update rules, with the goal of making bounded problem solving more inspectable and editable in educational settings. We present an early proof of concept implemented with Small Language Models that surfaces key design trade-offs, particularly between explicit control and scalable generalization, and positions CAC as an initial step toward bounded-knowledge AI for educational applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Cognitive Agent Compilation (CAC), a framework in which a strong teacher LLM compiles problem-solving knowledge into an explicit target agent. CAC separates (i) knowledge representation, (ii) problem-solving policy, and (iii) verification and update rules to support inspectable and editable bounded problem solving in educational settings. An early proof-of-concept implemented with Small Language Models is described that surfaces design trade-offs, especially between explicit control and scalable generalization.

Significance. If realized, CAC could advance controllable, transparent AI agents for tutoring by drawing on cognitive-architecture principles and enabling educator edits to explicit knowledge states without full re-compilation. The PoC usefully identifies the explicitness-generalization trade-off as a core design tension. The work's main strength is its clean separation of concerns; however, as a preliminary conceptual proposal without quantitative evaluation, its immediate significance is prospective rather than demonstrated.

major comments (2)
  1. [Abstract / framework section] Abstract and framework description: the claim that CAC produces agents whose knowledge representation remains reliably inspectable and editable by educators (while verification rules produce predictable, bounded updates) is load-bearing for the value proposition, yet the PoC provides no concrete example of an educator edit (e.g., to a misconception rule), its effect on policy behavior, or any qualitative trace showing preserved explicitness after the edit.
  2. [Proof-of-concept section] Proof-of-concept section: no implementation details, problem domain, SLM sizes, or before/after edit examples are supplied, and no evaluation metrics (accuracy, edit success rate, readability scores) are reported. This leaves the central claim that the three components stay separable and human-editable without loss of explicitness without empirical grounding.
minor comments (2)
  1. [Framework description] Notation for the three separated components could be introduced with consistent symbols or pseudocode to improve readability.
  2. [Introduction] A short related-work subsection contrasting CAC with existing cognitive architectures (e.g., ACT-R, Soar) and LLM-based tutoring systems would help situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key opportunities to strengthen the presentation of the framework and its preliminary proof-of-concept. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract / framework section] Abstract and framework description: the claim that CAC produces agents whose knowledge representation remains reliably inspectable and editable by educators (while verification rules produce predictable, bounded updates) is load-bearing for the value proposition, yet the PoC provides no concrete example of an educator edit (e.g., to a misconception rule), its effect on policy behavior, or any qualitative trace showing preserved explicitness after the edit.

    Authors: We agree that the inspectability and editability claims are central and would benefit from a concrete illustration. In the revised manuscript we will insert a worked qualitative example in the framework section: an educator edit to a specific misconception rule, the resulting change in policy behavior on a sample problem, and a before/after trace confirming that the knowledge representation remains explicit and human-readable. This addition will make the value proposition more tangible without altering the conceptual scope. revision: yes

  2. Referee: [Proof-of-concept section] Proof-of-concept section: no implementation details, problem domain, SLM sizes, or before/after edit examples are supplied, and no evaluation metrics (accuracy, edit success rate, readability scores) are reported. This leaves the central claim that the three components stay separable and human-editable without loss of explicitness without empirical grounding.

    Authors: The PoC is explicitly described as early-stage and intended to surface design trade-offs rather than deliver a full empirical study. We will expand the section to supply the missing details: the chosen problem domain, the specific SLM parameter counts used, and before/after edit examples that demonstrate component separability. We will not add quantitative metrics such as accuracy or edit success rates, as these would require a different experimental design beyond the current conceptual contribution; instead we will clarify the qualitative nature of the evidence and the associated limitations. revision: partial

Circularity Check

0 steps flagged

No circularity: framework proposal defines new separation without reducing to fitted inputs or self-referential derivations.

full rationale

The paper proposes Cognitive Agent Compilation as a methodological framework that explicitly separates knowledge representation, problem-solving policy, and verification/update rules by construction of the proposal itself. No equations, fitted parameters, or predictive claims appear that reduce back to the inputs by definition. The early PoC with Small Language Models is presented only as surfacing design trade-offs rather than as a quantitative prediction derived from prior fitted quantities. External inspiration from cognitive architectures is cited but does not serve as a load-bearing self-citation chain or ansatz smuggling. The derivation chain is therefore self-contained as a definitional framework rather than a closed loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on assumptions about LLM capabilities as compilers and the educational value of explicit representations, with the CAC framework itself as the primary new construct; no numerical free parameters are introduced.

axioms (2)
  • domain assumption Strong LLMs can effectively compile problem-solving knowledge into explicit target agents
    Invoked in the proposal of using a teacher LLM for compilation
  • domain assumption Explicit and separated knowledge states improve inspectability and editability in educational AI
    Stated as the core goal and benefit for educators and learners
invented entities (1)
  • Cognitive Agent Compilation (CAC) framework no independent evidence
    purpose: To compile broad LLM knowledge into bounded, inspectable problem-solving agents
    Newly proposed construct without external validation or falsifiable predictions in the abstract

pith-pipeline@v0.9.0 · 5468 in / 1419 out tokens · 52871 ms · 2026-05-11T01:16:46.185342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    International Journal of Artificial Intelligence in Education19(2), 105–154 (2009)

    Aleven, V., McLaren, B.M., Sewall, J., Koedinger, K.R.: A new paradigm for intel- ligent tutoring systems: Example-tracing tutors. International Journal of Artificial Intelligence in Education19(2), 105–154 (2009)

  2. [2]

    In: Inter- national Conference on Artificial Intelligence in Education

    An, M., Stamper, J.: Deceptive overgeneralization in adaptive learning. In: Inter- national Conference on Artificial Intelligence in Education. pp. 172–179. Springer (2025)

  3. [3]

    Psychological review111(4), 1036 (2004)

    Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological review111(4), 1036 (2004)

  4. [4]

    Journal of Experimental Psychology: General128(2), 186 (1999)

    Anderson, J.R., Reder, L.M.: The fan effect: New results and new theories. Journal of Experimental Psychology: General128(2), 186 (1999)

  5. [5]

    Small Language Models are the Future of Agentic AI

    Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., Lin, Y.C., Molchanov, P.: Small language models are the future of agentic ai. arXiv preprint arXiv:2506.02153 (2025)

  6. [6]

    International Journal of Artificial Intelligence in Education 17(2), 89–120 (2007),https://dblp.org/rec/journals/aiedu/BullK07.html 14 H

    Bull, S., Kay, J.: Student models that invite the learner in: The smili: Open learner modelling framework. International Journal of Artificial Intelligence in Education 17(2), 89–120 (2007),https://dblp.org/rec/journals/aiedu/BullK07.html 14 H. Moon et al

  7. [7]

    In: Intelligent Tutoring Systems (ITS 2006)

    Cen, H., Koedinger, K.R., Junker, B.: Learning factors analysis - a general method for cognitive model evaluation and improvement. In: Intelligent Tutoring Systems (ITS 2006). Lecture Notes in Computer Science, vol. 4053, pp. 164–175. Springer (2006)

  8. [8]

    User Modeling and User-Adapted Interaction4(4), 253–278 (1994)

    Corbett, A.T., Anderson, J.R.: Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction4(4), 253–278 (1994)

  9. [9]

    Gemma Team: Gemma 3. Tech. rep., Kaggle (2025),https://goo.gle/ Gemma3Report

  10. [10]

    Computers and Education: Artificial Intelligence3, 100074 (2022)

    Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., Gašević, D.: Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence3, 100074 (2022)

  11. [11]

    In: Proceedings of the 5th International Conference on Educational Data Mining (EDM 2012) (2012)

    Koedinger, K.R., McLaughlin, E.A., Stamper, J.C.: Automated student model im- provement. In: Proceedings of the 5th International Conference on Educational Data Mining (EDM 2012) (2012)

  12. [12]

    University of Michigan pp

    Lehman, J.F., Laird, J., Rosenbloom, P., et al.: A gentle introduction to soar, an architecture for human cognition: 2006 update. University of Michigan pp. 1–37 (2006)

  13. [13]

    International Journal of Artificial Intelligence in Education (2022)

    MacLellan, C.J., Koedinger, K.R.: Domain-general tutor authoring with apprentice learner models. International Journal of Artificial Intelligence in Education (2022)

  14. [14]

    In: Advances in Neural In- formation Processing Systems 36 (NeurIPS 2023) (2023)

    Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Yang, Y., Prabhumoye, S.: Self-refine: Iterative refinement with self-feedback. In: Advances in Neural In- formation Processing Systems 36 (NeurIPS 2023) (2023)

  15. [15]

    Trends in cognitive sciences28(6), 517–540 (2024)

    Mahowald, K., Ivanova, A.A., Blank, I.A., Kanwisher, N., Tenenbaum, J.B., Fe- dorenko, E.: Dissociating language and thought in large language models. Trends in cognitive sciences28(6), 517–540 (2024)

  16. [16]

    In: Artificial Intelligence in Education (AIED 2011)

    Matsuda, N., Cohen, W.W., Koedinger, K.R.: Teaching the teacher: Tutoring sim- student leads to more effective cognitive tutor authoring. In: Artificial Intelligence in Education (AIED 2011). Lecture Notes in Computer Science, Springer (2011)

  17. [17]

    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

    Moon, H., Yang, Y., Yu, H., Lee, S., Jeong, M., Park, J., Shin, J., Kim, M., Choi, S.: Evaluating the knowledge dependency of questions. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 10512– 10526 (2022)

  18. [18]

    In: Proceedings of the third international conference on cognitive science

    Nathan, M.J., Koedinger, K.R., Alibali, M.W., et al.: Expert blind spot: When content knowledge eclipses pedagogical content knowledge. In: Proceedings of the third international conference on cognitive science. vol. 644648, pp. 644–648 (2001)

  19. [19]

    Online submission (2009)

    Pavlik Jr, P.I., Cen, H., Koedinger, K.R.: Performance factors analysis–a new al- ternative to knowledge tracing. Online submission (2009)

  20. [20]

    In: Advances in Neural Information Pro- cessing Systems 28 (NeurIPS 2015) (2015)

    Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L.J., Sohl- Dickstein, J.: Deep knowledge tracing. In: Advances in Neural Information Pro- cessing Systems 28 (NeurIPS 2015) (2015)

  21. [21]

    Rosé, C.P., McLaughlin, E.A., Liu, R., Koedinger, K.R.: Explanatory learner mod- els: Why machine learning (alone) is not the answer. Br. J. Educ. Technol.50(6), 2943–2958 (2019).https://doi.org/10.1111/BJET.12858

  22. [22]

    Transactions on Machine Learning Research (2023)

    Sumers, T., Yao, S., Narasimhan, K.R., Griffiths, T.L.: Cognitive architectures for language agents. Transactions on Machine Learning Research (2023)