pith. sign in

arxiv: 2505.17238 · v3 · submitted 2025-05-22 · 💻 cs.CL

Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval-Augmented Generation (RAG)

Pith reviewed 2026-05-22 12:53 UTC · model grok-4.3

classification 💻 cs.CL
keywords retrieval-augmented generationRAGlog-contextualized RAGpedagogical agentscollaborative learningSTEM educationpersonalized guidanceepistemic decision-making
0
0 comments X

The pith

Log-contextualized retrieval-augmented generation strengthens student-agent dialogue matching in collaborative STEM settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes log-contextualized RAG to address weak semantic links between student dialogue and curated knowledge bases in educational AI systems. Standard retrieval-augmented generation often fails in collaborative talk because direct text matching is imprecise, which can produce ungrounded or irrelevant responses from large language models. By folding in environment activity logs, the method adds context that improves retrieval accuracy. This enables the Copa agent to supply personalized guidance inside the C2STEM computational modeling environment, directly supporting students' critical thinking and epistemic choices.

Core claim

LC-RAG improves retrieval over a discourse-only baseline and enables the collaborative peer agent Copa to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in the C2STEM environment.

What carries the argument

Log-contextualized RAG (LC-RAG), which augments standard retrieval by injecting environment logs to contextualize student discourse and strengthen matches against the knowledge base.

If this is right

  • The Copa agent produces guidance that is both relevant to the immediate task and tailored to individual student moves.
  • Large language model outputs become more grounded in the curated knowledge, reducing unhelpful or invented responses.
  • Students gain concrete support for evaluating evidence and making decisions during collaborative modeling.
  • Retrieval quality rises specifically in cases where dialogue alone provides an ambiguous match to stored content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same log-augmented approach could transfer to other logged learning platforms such as virtual labs or discussion forums where activity traces are already captured.
  • By relying more on existing logs, systems might need less additional human-authored material to maintain retrieval quality.
  • Testing LC-RAG in non-STEM domains could reveal whether the benefit holds when dialogue is less structured around explicit models.

Load-bearing premise

Environment logs contain information that reliably strengthens the link between student dialogue and the knowledge base when ordinary text matching is weak.

What would settle it

A side-by-side test in the C2STEM logs showing that retrieval precision or relevance scores with log context added are no higher than those from student discourse alone.

Figures

Figures reproduced from arXiv: 2505.17238 by Angela Eeds, Ashwin T S, Caitlin Snyder, Clayton Cohn, Gautam Biswas, Joyce Fonteles, Menton Deweese, Namrata Srivastava, Naveeduddin Mohammed, Sarah K. Burriss, Shruti Jain, Surya Rayala, Umesh Timalsina.

Figure 1
Figure 1. Figure 1: Traditional RAG (top) vs. LC-RAG (bottom). [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: C2STEM Truck Task example solution with task context categories. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: LC-RAG win rates by task con￾text category averaged over all five em￾bedding models. Overall compares per￾formance across all task contexts. Across all four task context categories (and overall), LC-RAG achieved a higher win rate than the baseline when considering re￾sults from all five embedding models. LC-RAG’s outperformance was more pronounced in Initializing Variables (45% for LC-RAG, 33% for the base… view at source ↗
read the original abstract

Collaborative dialogue offers rich insights into students' learning and critical thinking, which is essential for personalizing pedagogical agent interactions in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, hallucinations undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated knowledge, but requires a clear semantic link between user input and a knowledge base, which is often weak in student dialogue. We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by using environment logs to contextualize collaborative discourse. Our findings show that LC-RAG improves retrieval over a discourse-only baseline and enables our collaborative peer agent, Copa, to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in the collaborative computational modeling environment C2STEM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes log-contextualized retrieval-augmented generation (LC-RAG) to improve personalization of student-agent interactions in STEM+C settings. By using environment logs to contextualize collaborative student dialogue, LC-RAG aims to strengthen weak semantic links to a curated knowledge base that standard RAG struggles with. The central claim is that this yields better retrieval than a discourse-only baseline and enables the collaborative peer agent Copa to deliver relevant, personalized guidance supporting critical thinking and epistemic decision-making in the C2STEM environment.

Significance. If the empirical improvement is demonstrated, the work could meaningfully advance RAG applications in educational AI by leveraging log data for personalization where pure semantic matching fails. It targets a practical gap in collaborative learning environments and offers a concrete system (Copa) for testing such ideas.

major comments (2)
  1. [Abstract] Abstract: The manuscript asserts that LC-RAG improves retrieval over a discourse-only baseline and supports critical thinking, yet reports no quantitative retrieval metrics (e.g., precision@K, relevance scores), error bars, dataset size, or statistical tests, rendering the central claim unevaluable.
  2. [Evaluation] Evaluation section: No ablation or subset analysis is presented showing that log features specifically improve retrieval on the dialogues where standard semantic similarity is low, which is the load-bearing assumption required for LC-RAG to outperform the baseline as claimed.
minor comments (2)
  1. [Method] The description of the C2STEM environment and the exact log features used for contextualization would benefit from a short table or explicit list for reproducibility.
  2. [Introduction] Consider adding a reference to prior work on log-augmented retrieval or educational RAG systems to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for identifying key areas where our claims can be more rigorously supported. We address each major comment below and commit to revisions that strengthen the empirical grounding of LC-RAG without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript asserts that LC-RAG improves retrieval over a discourse-only baseline and supports critical thinking, yet reports no quantitative retrieval metrics (e.g., precision@K, relevance scores), error bars, dataset size, or statistical tests, rendering the central claim unevaluable.

    Authors: We acknowledge that the abstract currently summarizes findings at a high level without embedding specific metrics. The evaluation section does contain comparative retrieval results, but we agree these should be quantified more explicitly to allow direct assessment of the central claim. In the revised manuscript we will expand both the abstract and evaluation to report precision@K, mean reciprocal rank, and human relevance scores for LC-RAG versus the discourse-only baseline, along with the number of student dialogues evaluated, standard error bars across retrieval runs, and statistical significance tests (e.g., Wilcoxon signed-rank). revision: yes

  2. Referee: [Evaluation] Evaluation section: No ablation or subset analysis is presented showing that log features specifically improve retrieval on the dialogues where standard semantic similarity is low, which is the load-bearing assumption required for LC-RAG to outperform the baseline as claimed.

    Authors: We agree this subset analysis is essential to substantiate the motivating hypothesis that log contextualization helps precisely where semantic similarity is weak. The present evaluation reports aggregate improvements but does not stratify by similarity threshold. We will add a dedicated ablation that partitions the dialogue corpus into low-, medium-, and high-semantic-similarity bins (using cosine similarity to the knowledge base) and demonstrates that the largest retrieval gains from LC-RAG occur in the low-similarity bin. This will be presented with both quantitative metrics and qualitative examples. revision: yes

Circularity Check

0 steps flagged

No circularity in LC-RAG proposal or evaluation

full rationale

The paper's chain consists of defining LC-RAG as an augmentation of standard RAG that incorporates environment logs to strengthen semantic links in student dialogue, then reporting empirical improvement over a discourse-only baseline in the C2STEM setting. This improvement is presented as an observed experimental outcome rather than a quantity derived by construction from fitted parameters, self-definitions, or prior self-citations. No equations, uniqueness theorems, or ansatzes are invoked that would reduce the claimed retrieval gains or downstream personalization benefits to the method's own inputs. The evaluation rests on external baseline comparison and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes curated knowledge bases and log availability but does not introduce new entities or fitted constants in the provided text.

pith-pipeline@v0.9.0 · 5722 in / 1097 out tokens · 35731 ms · 2026-05-22T12:53:31.889902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Open Textbook Library (2018),https://open

    Bourke, C.: Computer Science I. Open Textbook Library (2018),https://open. umn.edu/opentextbooks/textbooks/996

  2. [2]

    In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1

    Cochran, K., Cohn, C., Hastings, P.M.: Improving NLP model performance on small educational data sets using self-augmentation. In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1. pp. 70–78. SCITEPRESS (2023), https://doi.org/10.5220/0011857200003470

  3. [3]

    In: Inter- national Conference on Artificial Intelligence in Education

    Cochran, K., Cohn, C., Hutchins, N., Biswas, G., Hastings, P.: Improving auto- mated evaluation of formative assessments with text data augmentation. In: Inter- national Conference on Artificial Intelligence in Education. pp. 390–401. Springer (2022)

  4. [4]

    Cochran, K., Cohn, C., Rouet, J.F., Hastings, P.: Improving automated evaluation ofstudenttextresponsesusinggpt-3.5fortextdataaugmentation.In:International Conference on Artificial Intelligence in Education. pp. 217–228. Springer (2023)

  5. [5]

    In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025

    Cohn, C., Fonteles, J.H., Snyder, C., Srivastava, N., T S, A., Campbell, D., Mon- tenegro, J., Biswas, G.: Exploring the design of pedagogical agent roles in collab- orative stem+c learning. In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025. In press. International Society of the Learning Sciences (2025)

  6. [6]

    arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies

    Cohn, C., Hutchins, N., Biswas, G., et al.: Cotal: Human-in-the-loop prompt engi- neering, chain-of-thought reasoning, and active learning for generalizable formative assessment scoring. arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies. Currently under review

  7. [7]

    Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

    Cohn, C., Hutchins, N., Le, T., Biswas, G.: A chain-of-thought prompting approach with llms for evaluating students’ formative assessment responses in science. Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

  8. [8]

    British Journal of Educational Technology (2024)

    Cohn, C., Snyder, C., Fonteles, J.H., TS, A., Montenegro, J., Biswas, G.: A mul- timodal approach to support teacher, researcher and ai collaboration in stem+c learning environments. British Journal of Educational Technology (2024)

  9. [9]

    In: International Conference on Artificial Intelligence in Education

    Cohn, C., Snyder, C., Montenegro, J., Biswas, G.: Towards a human-in-the-loop llm approach to collaborative discourse analysis. In: International Conference on Artificial Intelligence in Education. pp. 11–19. Springer (2024)

  10. [10]

    Gekhman, Z., Yona, G., Aharoni, R., Eyal, M., Feder, A., Reichart, R., Herzig, J.: Does fine-tuning LLMs on new knowledge encourage hallucinations? In: Pro- ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 7765–7784. Association for Computational Linguistics (2024), https://aclanthology.org/2024.emnlp-main.444/ 14 C...

  11. [11]

    SUNY Press (2002)

    Hatch, J.A.: Doing qualitative research in education settings. SUNY Press (2002)

  12. [12]

    Journal of Science Education and Technol- ogy29, 83–100 (2020)

    Hutchins, N.M., Biswas, G., Maróti, M., Lédeczi, Á., Grover, S., Wolf, R., Blair, K.P.,Chin,D.,Conlin,L.,Basu,S.,etal.:C2stem:Asystemforsynergisticlearning of physics and computational thinking. Journal of Science Education and Technol- ogy29, 83–100 (2020)

  13. [13]

    McKee, Daniel Gillick, et al

    Jurenka, I., Kunesch, M., McKee, K.R., Gillick, D., Zhu, S., Wiltberger, S., Phal, S.M., Hermann, K., Kasenberg, D., Bhoopchand, A., et al.: Towards responsible development of generative ai for education: An evaluation-driven approach. arXiv preprint arXiv:2407.12687 (2024)

  14. [14]

    Kusmaryono, I., et al.: How are critical thinking skills related to students’ self- regulation and independent learning? Pegem journal of education and instruction 13(4), 85–92 (2023)

  15. [15]

    Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

  16. [16]

    LlamaIndex Documentation: Query transformations - llamaindex (2025), https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/ query_transformations/, accessed: 2025-02-06

  17. [17]

    Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

    Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)

  18. [18]

    Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10

    Mizrahi, M., Kaplan, G., Malkin, D., Dror, R., Shahaf, D., Stanovsky, G.: State of what art? a call for multi-prompt LLM evaluation. Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10. 1162/tacl_a_00681

  19. [19]

    International journal of artificial intelligence in education pp

    Parker, M.J., Anderson, C., Stone, C., Oh, Y.: A large language model approach to educational survey feedback analysis. International journal of artificial intelligence in education pp. 1–38 (2024)

  20. [20]

    arXiv preprint arXiv:2409.05591 (2024)

    Qian, H., Zhang, P., Liu, Z., Mao, K., Dou, Z.: Memorag: Moving towards next- gen rag via memory-inspired knowledge discovery. arXiv preprint arXiv:2409.05591 (2024)

  21. [21]

    Journal of the Learning Sciences23(4), 537–560 (2014)

    Roll, I., Baker, R.S.d., Aleven, V., Koedinger, K.R.: On the benefits of seeking (and avoiding) help in online problem-solving environments. Journal of the Learning Sciences23(4), 537–560 (2014)

  22. [22]

    In: Computer-supported collaborative learning (2019)

    Snyder, C., Biswas, G., Emara, M., Grover, S., Conlin, L.: Analyzing students’ synergistic learning processes in physics and ct by collaborative discourse analysis. In: Computer-supported collaborative learning (2019)

  23. [23]

    Learning and Individual Differences121, 102724 (2025)

    Snyder, C., Cohn, C., Fonteles, J.H., Biswas, G.: Using collaborative interactivity metrics to analyze students’ problem-solving behaviors during stem+c computa- tional modeling tasks. Learning and Individual Differences121, 102724 (2025)

  24. [24]

    In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550

    Snyder, C., Hutchins, N.M., Cohn, C., Fonteles, J.H., Biswas, G.: Analyzing stu- dents collaborative problem-solving behaviors in synergistic stem+c learning. In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550. LAK ’24, Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636912

  25. [25]

    In: International Conference on Artificial Intelligence in Education

    Stamper, J., Xiao, R., Hou, X.: Enhancing llm-based feedback: Insights from in- telligent tutoring systems and the learning sciences. In: International Conference on Artificial Intelligence in Education. pp. 32–43. Springer (2024)

  26. [26]

    OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

    Urone, P.P., Hinrichs, R., Gozuacik, F., Pattison, D., Tabor, C.: Physics. OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

  27. [27]

    Harvard university press (1978)

    Vygotsky, L.S., Cole, M.: Mind in society: Development of higher psychological processes. Harvard university press (1978)

  28. [28]

    Advances in neural information processing systems35, 24824–24837 (2022)

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

  29. [29]

    In: Interna- tional Conference on Artificial Intelligence in Education

    Yan, L., Zhao, L., Echeverria, V., Jin, Y., Alfredo, R., Li, X., Gaševi’c, D., Martinez-Maldonado, R.: Vizchat: enhancing learning analytics dashboards with contextualised explanations using multimodal generative ai chatbots. In: Interna- tional Conference on Artificial Intelligence in Education. pp. 180–193. Springer (2024)

  30. [30]

    Advances in Neural Information Processing Systems36, 46595–46623 (2023)

    Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems36, 46595–46623 (2023)