Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval-Augmented Generation (RAG)

Angela Eeds; Ashwin T S; Caitlin Snyder; Clayton Cohn; Gautam Biswas; Joyce Fonteles; Menton Deweese; Namrata Srivastava; Naveeduddin Mohammed; Sarah K. Burriss

arxiv: 2505.17238 · v3 · submitted 2025-05-22 · 💻 cs.CL

Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval-Augmented Generation (RAG)

Clayton Cohn , Surya Rayala , Caitlin Snyder , Joyce Fonteles , Shruti Jain , Naveeduddin Mohammed , Umesh Timalsina , Sarah K. Burriss

show 5 more authors

Ashwin T S Namrata Srivastava Menton Deweese Angela Eeds Gautam Biswas

This is my paper

Pith reviewed 2026-05-22 12:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords retrieval-augmented generationRAGlog-contextualized RAGpedagogical agentscollaborative learningSTEM educationpersonalized guidanceepistemic decision-making

0 comments

The pith

Log-contextualized retrieval-augmented generation strengthens student-agent dialogue matching in collaborative STEM settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes log-contextualized RAG to address weak semantic links between student dialogue and curated knowledge bases in educational AI systems. Standard retrieval-augmented generation often fails in collaborative talk because direct text matching is imprecise, which can produce ungrounded or irrelevant responses from large language models. By folding in environment activity logs, the method adds context that improves retrieval accuracy. This enables the Copa agent to supply personalized guidance inside the C2STEM computational modeling environment, directly supporting students' critical thinking and epistemic choices.

Core claim

LC-RAG improves retrieval over a discourse-only baseline and enables the collaborative peer agent Copa to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in the C2STEM environment.

What carries the argument

Log-contextualized RAG (LC-RAG), which augments standard retrieval by injecting environment logs to contextualize student discourse and strengthen matches against the knowledge base.

If this is right

The Copa agent produces guidance that is both relevant to the immediate task and tailored to individual student moves.
Large language model outputs become more grounded in the curated knowledge, reducing unhelpful or invented responses.
Students gain concrete support for evaluating evidence and making decisions during collaborative modeling.
Retrieval quality rises specifically in cases where dialogue alone provides an ambiguous match to stored content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same log-augmented approach could transfer to other logged learning platforms such as virtual labs or discussion forums where activity traces are already captured.
By relying more on existing logs, systems might need less additional human-authored material to maintain retrieval quality.
Testing LC-RAG in non-STEM domains could reveal whether the benefit holds when dialogue is less structured around explicit models.

Load-bearing premise

Environment logs contain information that reliably strengthens the link between student dialogue and the knowledge base when ordinary text matching is weak.

What would settle it

A side-by-side test in the C2STEM logs showing that retrieval precision or relevance scores with log context added are no higher than those from student discourse alone.

Figures

Figures reproduced from arXiv: 2505.17238 by Angela Eeds, Ashwin T S, Caitlin Snyder, Clayton Cohn, Gautam Biswas, Joyce Fonteles, Menton Deweese, Namrata Srivastava, Naveeduddin Mohammed, Sarah K. Burriss, Shruti Jain, Surya Rayala, Umesh Timalsina.

**Figure 2.** Figure 2: C2STEM Truck Task example solution with task context categories. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: LC-RAG win rates by task context category averaged over all five embedding models. Overall compares performance across all task contexts. Across all four task context categories (and overall), LC-RAG achieved a higher win rate than the baseline when considering results from all five embedding models. LC-RAG’s outperformance was more pronounced in Initializing Variables (45% for LC-RAG, 33% for the base… view at source ↗

read the original abstract

Collaborative dialogue offers rich insights into students' learning and critical thinking, which is essential for personalizing pedagogical agent interactions in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, hallucinations undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated knowledge, but requires a clear semantic link between user input and a knowledge base, which is often weak in student dialogue. We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by using environment logs to contextualize collaborative discourse. Our findings show that LC-RAG improves retrieval over a discourse-only baseline and enables our collaborative peer agent, Copa, to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in the collaborative computational modeling environment C2STEM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LC-RAG adds logs to RAG for student-agent grounding in C2STEM but the abstract leaves the claimed improvement unmeasured and unablated.

read the letter

The core point is that this paper takes a standard RAG setup and layers in environment logs from the C2STEM modeling tool to make retrieval more relevant when student dialogue is indirect or vague. That produces a usable variant they call LC-RAG, which they then plug into their Copa peer agent to give more personalized guidance on critical thinking and epistemic choices. The idea is straightforward and matches a real pain point in educational dialogue systems where pure text similarity often fails to hit the right knowledge chunks. They get credit for identifying that gap and for testing the approach inside an actual collaborative STEM environment rather than a toy setting. The motivation section is clear about why hallucinations matter for trust in tutoring agents, and the description of how logs contextualize discourse is concrete enough to replicate at a high level. What is actually new is the specific framing and application to collaborative computational modeling; prior RAG work has used auxiliary signals, but the log-contextualization angle for student epistemic talk does not appear in the cited baselines. On the soft spots, the abstract asserts that LC-RAG beats a discourse-only baseline and supports better student outcomes, yet it supplies no retrieval scores, no precision or relevance numbers, no error bars, and no breakdown on the subset of turns where semantic matching is weak. The stress-test note is on target here: without an ablation or metric that isolates whether the logs are fixing the exact failure cases the authors describe, the improvement could just be noise or the effect of adding any extra text. If the full paper contains those quantitative details and statistical checks, the claim strengthens; if not, the central result stays provisional. This paper is aimed at people building or evaluating LLM-based agents for STEM education, especially those already using RAG and looking for cheap context sources. A reader working on adaptive tutoring systems would pick up a workable idea and some implementation pointers, even while wanting tighter evaluation. It deserves peer review because the practical framing is sound and the setting is authentic; referees can push for the missing metrics and ablations without needing to rewrite the core contribution.

Referee Report

2 major / 2 minor

Summary. The paper proposes log-contextualized retrieval-augmented generation (LC-RAG) to improve personalization of student-agent interactions in STEM+C settings. By using environment logs to contextualize collaborative student dialogue, LC-RAG aims to strengthen weak semantic links to a curated knowledge base that standard RAG struggles with. The central claim is that this yields better retrieval than a discourse-only baseline and enables the collaborative peer agent Copa to deliver relevant, personalized guidance supporting critical thinking and epistemic decision-making in the C2STEM environment.

Significance. If the empirical improvement is demonstrated, the work could meaningfully advance RAG applications in educational AI by leveraging log data for personalization where pure semantic matching fails. It targets a practical gap in collaborative learning environments and offers a concrete system (Copa) for testing such ideas.

major comments (2)

[Abstract] Abstract: The manuscript asserts that LC-RAG improves retrieval over a discourse-only baseline and supports critical thinking, yet reports no quantitative retrieval metrics (e.g., precision@K, relevance scores), error bars, dataset size, or statistical tests, rendering the central claim unevaluable.
[Evaluation] Evaluation section: No ablation or subset analysis is presented showing that log features specifically improve retrieval on the dialogues where standard semantic similarity is low, which is the load-bearing assumption required for LC-RAG to outperform the baseline as claimed.

minor comments (2)

[Method] The description of the C2STEM environment and the exact log features used for contextualization would benefit from a short table or explicit list for reproducibility.
[Introduction] Consider adding a reference to prior work on log-augmented retrieval or educational RAG systems to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for identifying key areas where our claims can be more rigorously supported. We address each major comment below and commit to revisions that strengthen the empirical grounding of LC-RAG without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript asserts that LC-RAG improves retrieval over a discourse-only baseline and supports critical thinking, yet reports no quantitative retrieval metrics (e.g., precision@K, relevance scores), error bars, dataset size, or statistical tests, rendering the central claim unevaluable.

Authors: We acknowledge that the abstract currently summarizes findings at a high level without embedding specific metrics. The evaluation section does contain comparative retrieval results, but we agree these should be quantified more explicitly to allow direct assessment of the central claim. In the revised manuscript we will expand both the abstract and evaluation to report precision@K, mean reciprocal rank, and human relevance scores for LC-RAG versus the discourse-only baseline, along with the number of student dialogues evaluated, standard error bars across retrieval runs, and statistical significance tests (e.g., Wilcoxon signed-rank). revision: yes
Referee: [Evaluation] Evaluation section: No ablation or subset analysis is presented showing that log features specifically improve retrieval on the dialogues where standard semantic similarity is low, which is the load-bearing assumption required for LC-RAG to outperform the baseline as claimed.

Authors: We agree this subset analysis is essential to substantiate the motivating hypothesis that log contextualization helps precisely where semantic similarity is weak. The present evaluation reports aggregate improvements but does not stratify by similarity threshold. We will add a dedicated ablation that partitions the dialogue corpus into low-, medium-, and high-semantic-similarity bins (using cosine similarity to the knowledge base) and demonstrates that the largest retrieval gains from LC-RAG occur in the low-similarity bin. This will be presented with both quantitative metrics and qualitative examples. revision: yes

Circularity Check

0 steps flagged

No circularity in LC-RAG proposal or evaluation

full rationale

The paper's chain consists of defining LC-RAG as an augmentation of standard RAG that incorporates environment logs to strengthen semantic links in student dialogue, then reporting empirical improvement over a discourse-only baseline in the C2STEM setting. This improvement is presented as an observed experimental outcome rather than a quantity derived by construction from fitted parameters, self-definitions, or prior self-citations. No equations, uniqueness theorems, or ansatzes are invoked that would reduce the claimed retrieval gains or downstream personalization benefits to the method's own inputs. The evaluation rests on external baseline comparison and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes curated knowledge bases and log availability but does not introduce new entities or fitted constants in the provided text.

pith-pipeline@v0.9.0 · 5722 in / 1097 out tokens · 35731 ms · 2026-05-22T12:53:31.889902+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by using environment logs to contextualize collaborative discourse.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LC-RAG improves retrieval over a discourse-only baseline

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Open Textbook Library (2018),https://open

Bourke, C.: Computer Science I. Open Textbook Library (2018),https://open. umn.edu/opentextbooks/textbooks/996

work page 2018
[2]

In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1

Cochran, K., Cohn, C., Hastings, P.M.: Improving NLP model performance on small educational data sets using self-augmentation. In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1. pp. 70–78. SCITEPRESS (2023), https://doi.org/10.5220/0011857200003470

work page doi:10.5220/0011857200003470 2023
[3]

In: Inter- national Conference on Artificial Intelligence in Education

Cochran, K., Cohn, C., Hutchins, N., Biswas, G., Hastings, P.: Improving auto- mated evaluation of formative assessments with text data augmentation. In: Inter- national Conference on Artificial Intelligence in Education. pp. 390–401. Springer (2022)

work page 2022
[4]

Cochran, K., Cohn, C., Rouet, J.F., Hastings, P.: Improving automated evaluation ofstudenttextresponsesusinggpt-3.5fortextdataaugmentation.In:International Conference on Artificial Intelligence in Education. pp. 217–228. Springer (2023)

work page 2023
[5]

In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025

Cohn, C., Fonteles, J.H., Snyder, C., Srivastava, N., T S, A., Campbell, D., Mon- tenegro, J., Biswas, G.: Exploring the design of pedagogical agent roles in collab- orative stem+c learning. In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025. In press. International Society of the Learning Sciences (2025)

work page 2025
[6]

arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies

Cohn, C., Hutchins, N., Biswas, G., et al.: Cotal: Human-in-the-loop prompt engi- neering, chain-of-thought reasoning, and active learning for generalizable formative assessment scoring. arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies. Currently under review

work page arXiv 2025
[7]

Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

Cohn, C., Hutchins, N., Le, T., Biswas, G.: A chain-of-thought prompting approach with llms for evaluating students’ formative assessment responses in science. Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

work page doi:10.1609/aaai.v38i21.30364 2024
[8]

British Journal of Educational Technology (2024)

Cohn, C., Snyder, C., Fonteles, J.H., TS, A., Montenegro, J., Biswas, G.: A mul- timodal approach to support teacher, researcher and ai collaboration in stem+c learning environments. British Journal of Educational Technology (2024)

work page 2024
[9]

In: International Conference on Artificial Intelligence in Education

Cohn, C., Snyder, C., Montenegro, J., Biswas, G.: Towards a human-in-the-loop llm approach to collaborative discourse analysis. In: International Conference on Artificial Intelligence in Education. pp. 11–19. Springer (2024)

work page 2024
[10]

Gekhman, Z., Yona, G., Aharoni, R., Eyal, M., Feder, A., Reichart, R., Herzig, J.: Does fine-tuning LLMs on new knowledge encourage hallucinations? In: Pro- ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 7765–7784. Association for Computational Linguistics (2024), https://aclanthology.org/2024.emnlp-main.444/ 14 C...

work page 2024
[11]

SUNY Press (2002)

Hatch, J.A.: Doing qualitative research in education settings. SUNY Press (2002)

work page 2002
[12]

Journal of Science Education and Technol- ogy29, 83–100 (2020)

Hutchins, N.M., Biswas, G., Maróti, M., Lédeczi, Á., Grover, S., Wolf, R., Blair, K.P.,Chin,D.,Conlin,L.,Basu,S.,etal.:C2stem:Asystemforsynergisticlearning of physics and computational thinking. Journal of Science Education and Technol- ogy29, 83–100 (2020)

work page 2020
[13]

McKee, Daniel Gillick, et al

Jurenka, I., Kunesch, M., McKee, K.R., Gillick, D., Zhu, S., Wiltberger, S., Phal, S.M., Hermann, K., Kasenberg, D., Bhoopchand, A., et al.: Towards responsible development of generative ai for education: An evaluation-driven approach. arXiv preprint arXiv:2407.12687 (2024)

work page arXiv 2024
[14]

Kusmaryono, I., et al.: How are critical thinking skills related to students’ self- regulation and independent learning? Pegem journal of education and instruction 13(4), 85–92 (2023)

work page 2023
[15]

Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

work page 2020
[16]

LlamaIndex Documentation: Query transformations - llamaindex (2025), https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/ query_transformations/, accessed: 2025-02-06

work page 2025
[17]

Fantastically ordered prompts and where to ﬁnd them: Overcoming few-shot prompt order sensitivity

Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)

work page arXiv 2021
[18]

Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10

Mizrahi, M., Kaplan, G., Malkin, D., Dror, R., Shahaf, D., Stanovsky, G.: State of what art? a call for multi-prompt LLM evaluation. Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10. 1162/tacl_a_00681

work page 2024
[19]

International journal of artificial intelligence in education pp

Parker, M.J., Anderson, C., Stone, C., Oh, Y.: A large language model approach to educational survey feedback analysis. International journal of artificial intelligence in education pp. 1–38 (2024)

work page 2024
[20]

arXiv preprint arXiv:2409.05591 (2024)

Qian, H., Zhang, P., Liu, Z., Mao, K., Dou, Z.: Memorag: Moving towards next- gen rag via memory-inspired knowledge discovery. arXiv preprint arXiv:2409.05591 (2024)

work page arXiv 2024
[21]

Journal of the Learning Sciences23(4), 537–560 (2014)

Roll, I., Baker, R.S.d., Aleven, V., Koedinger, K.R.: On the benefits of seeking (and avoiding) help in online problem-solving environments. Journal of the Learning Sciences23(4), 537–560 (2014)

work page 2014
[22]

In: Computer-supported collaborative learning (2019)

Snyder, C., Biswas, G., Emara, M., Grover, S., Conlin, L.: Analyzing students’ synergistic learning processes in physics and ct by collaborative discourse analysis. In: Computer-supported collaborative learning (2019)

work page 2019
[23]

Learning and Individual Differences121, 102724 (2025)

Snyder, C., Cohn, C., Fonteles, J.H., Biswas, G.: Using collaborative interactivity metrics to analyze students’ problem-solving behaviors during stem+c computa- tional modeling tasks. Learning and Individual Differences121, 102724 (2025)

work page 2025
[24]

In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550

Snyder, C., Hutchins, N.M., Cohn, C., Fonteles, J.H., Biswas, G.: Analyzing stu- dents collaborative problem-solving behaviors in synergistic stem+c learning. In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550. LAK ’24, Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636912

work page doi:10.1145/3636555.3636912 2024
[25]

In: International Conference on Artificial Intelligence in Education

Stamper, J., Xiao, R., Hou, X.: Enhancing llm-based feedback: Insights from in- telligent tutoring systems and the learning sciences. In: International Conference on Artificial Intelligence in Education. pp. 32–43. Springer (2024)

work page 2024
[26]

OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

Urone, P.P., Hinrichs, R., Gozuacik, F., Pattison, D., Tabor, C.: Physics. OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

work page 2020
[27]

Harvard university press (1978)

Vygotsky, L.S., Cole, M.: Mind in society: Development of higher psychological processes. Harvard university press (1978)

work page 1978
[28]

Advances in neural information processing systems35, 24824–24837 (2022)

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

work page 2022
[29]

In: Interna- tional Conference on Artificial Intelligence in Education

Yan, L., Zhao, L., Echeverria, V., Jin, Y., Alfredo, R., Li, X., Gaševi’c, D., Martinez-Maldonado, R.: Vizchat: enhancing learning analytics dashboards with contextualised explanations using multimodal generative ai chatbots. In: Interna- tional Conference on Artificial Intelligence in Education. pp. 180–193. Springer (2024)

work page 2024
[30]

Advances in Neural Information Processing Systems36, 46595–46623 (2023)

Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems36, 46595–46623 (2023)

work page 2023

[1] [1]

Open Textbook Library (2018),https://open

Bourke, C.: Computer Science I. Open Textbook Library (2018),https://open. umn.edu/opentextbooks/textbooks/996

work page 2018

[2] [2]

In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1

Cochran, K., Cohn, C., Hastings, P.M.: Improving NLP model performance on small educational data sets using self-augmentation. In: Proceedings of the 15th In- ternational Conference on Computer Supported Education, CSEDU 2023, Prague, Czech Republic, April 21-23, 2023, Volume 1. pp. 70–78. SCITEPRESS (2023), https://doi.org/10.5220/0011857200003470

work page doi:10.5220/0011857200003470 2023

[3] [3]

In: Inter- national Conference on Artificial Intelligence in Education

Cochran, K., Cohn, C., Hutchins, N., Biswas, G., Hastings, P.: Improving auto- mated evaluation of formative assessments with text data augmentation. In: Inter- national Conference on Artificial Intelligence in Education. pp. 390–401. Springer (2022)

work page 2022

[4] [4]

Cochran, K., Cohn, C., Rouet, J.F., Hastings, P.: Improving automated evaluation ofstudenttextresponsesusinggpt-3.5fortextdataaugmentation.In:International Conference on Artificial Intelligence in Education. pp. 217–228. Springer (2023)

work page 2023

[5] [5]

In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025

Cohn, C., Fonteles, J.H., Snyder, C., Srivastava, N., T S, A., Campbell, D., Mon- tenegro, J., Biswas, G.: Exploring the design of pedagogical agent roles in collab- orative stem+c learning. In: Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025. In press. International Society of the Learning Sciences (2025)

work page 2025

[6] [6]

arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies

Cohn, C., Hutchins, N., Biswas, G., et al.: Cotal: Human-in-the-loop prompt engi- neering, chain-of-thought reasoning, and active learning for generalizable formative assessment scoring. arXiv preprint arXiv:2504.02323 (2025), submitted to IEEE Transactions on Learning Technologies. Currently under review

work page arXiv 2025

[7] [7]

Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

Cohn, C., Hutchins, N., Le, T., Biswas, G.: A chain-of-thought prompting approach with llms for evaluating students’ formative assessment responses in science. Pro- ceedings of the AAAI Conference on Artificial Intelligence38(21), 23182–23190 (Mar 2024).https://doi.org/10.1609/aaai.v38i21.30364

work page doi:10.1609/aaai.v38i21.30364 2024

[8] [8]

British Journal of Educational Technology (2024)

Cohn, C., Snyder, C., Fonteles, J.H., TS, A., Montenegro, J., Biswas, G.: A mul- timodal approach to support teacher, researcher and ai collaboration in stem+c learning environments. British Journal of Educational Technology (2024)

work page 2024

[9] [9]

In: International Conference on Artificial Intelligence in Education

Cohn, C., Snyder, C., Montenegro, J., Biswas, G.: Towards a human-in-the-loop llm approach to collaborative discourse analysis. In: International Conference on Artificial Intelligence in Education. pp. 11–19. Springer (2024)

work page 2024

[10] [10]

Gekhman, Z., Yona, G., Aharoni, R., Eyal, M., Feder, A., Reichart, R., Herzig, J.: Does fine-tuning LLMs on new knowledge encourage hallucinations? In: Pro- ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 7765–7784. Association for Computational Linguistics (2024), https://aclanthology.org/2024.emnlp-main.444/ 14 C...

work page 2024

[11] [11]

SUNY Press (2002)

Hatch, J.A.: Doing qualitative research in education settings. SUNY Press (2002)

work page 2002

[12] [12]

Journal of Science Education and Technol- ogy29, 83–100 (2020)

Hutchins, N.M., Biswas, G., Maróti, M., Lédeczi, Á., Grover, S., Wolf, R., Blair, K.P.,Chin,D.,Conlin,L.,Basu,S.,etal.:C2stem:Asystemforsynergisticlearning of physics and computational thinking. Journal of Science Education and Technol- ogy29, 83–100 (2020)

work page 2020

[13] [13]

McKee, Daniel Gillick, et al

Jurenka, I., Kunesch, M., McKee, K.R., Gillick, D., Zhu, S., Wiltberger, S., Phal, S.M., Hermann, K., Kasenberg, D., Bhoopchand, A., et al.: Towards responsible development of generative ai for education: An evaluation-driven approach. arXiv preprint arXiv:2407.12687 (2024)

work page arXiv 2024

[14] [14]

Kusmaryono, I., et al.: How are critical thinking skills related to students’ self- regulation and independent learning? Pegem journal of education and instruction 13(4), 85–92 (2023)

work page 2023

[15] [15]

Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020)

work page 2020

[16] [16]

LlamaIndex Documentation: Query transformations - llamaindex (2025), https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/ query_transformations/, accessed: 2025-02-06

work page 2025

[17] [17]

Fantastically ordered prompts and where to ﬁnd them: Overcoming few-shot prompt order sensitivity

Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)

work page arXiv 2021

[18] [18]

Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10

Mizrahi, M., Kaplan, G., Malkin, D., Dror, R., Shahaf, D., Stanovsky, G.: State of what art? a call for multi-prompt LLM evaluation. Transactions of the Asso- ciation for Computational Linguistics12, 933–949 (2024).https://doi.org/10. 1162/tacl_a_00681

work page 2024

[19] [19]

International journal of artificial intelligence in education pp

Parker, M.J., Anderson, C., Stone, C., Oh, Y.: A large language model approach to educational survey feedback analysis. International journal of artificial intelligence in education pp. 1–38 (2024)

work page 2024

[20] [20]

arXiv preprint arXiv:2409.05591 (2024)

Qian, H., Zhang, P., Liu, Z., Mao, K., Dou, Z.: Memorag: Moving towards next- gen rag via memory-inspired knowledge discovery. arXiv preprint arXiv:2409.05591 (2024)

work page arXiv 2024

[21] [21]

Journal of the Learning Sciences23(4), 537–560 (2014)

Roll, I., Baker, R.S.d., Aleven, V., Koedinger, K.R.: On the benefits of seeking (and avoiding) help in online problem-solving environments. Journal of the Learning Sciences23(4), 537–560 (2014)

work page 2014

[22] [22]

In: Computer-supported collaborative learning (2019)

Snyder, C., Biswas, G., Emara, M., Grover, S., Conlin, L.: Analyzing students’ synergistic learning processes in physics and ct by collaborative discourse analysis. In: Computer-supported collaborative learning (2019)

work page 2019

[23] [23]

Learning and Individual Differences121, 102724 (2025)

Snyder, C., Cohn, C., Fonteles, J.H., Biswas, G.: Using collaborative interactivity metrics to analyze students’ problem-solving behaviors during stem+c computa- tional modeling tasks. Learning and Individual Differences121, 102724 (2025)

work page 2025

[24] [24]

In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550

Snyder, C., Hutchins, N.M., Cohn, C., Fonteles, J.H., Biswas, G.: Analyzing stu- dents collaborative problem-solving behaviors in synergistic stem+c learning. In: Proceedingsofthe14thLearningAnalyticsandKnowledgeConference.p.540–550. LAK ’24, Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636912

work page doi:10.1145/3636555.3636912 2024

[25] [25]

In: International Conference on Artificial Intelligence in Education

Stamper, J., Xiao, R., Hou, X.: Enhancing llm-based feedback: Insights from in- telligent tutoring systems and the learning sciences. In: International Conference on Artificial Intelligence in Education. pp. 32–43. Springer (2024)

work page 2024

[26] [26]

OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

Urone, P.P., Hinrichs, R., Gozuacik, F., Pattison, D., Tabor, C.: Physics. OpenStax (2020),https://openstax.org/details/books/physics Personalizing Student-Agent Interactions Using Log-Contextualized RAG 15

work page 2020

[27] [27]

Harvard university press (1978)

Vygotsky, L.S., Cole, M.: Mind in society: Development of higher psychological processes. Harvard university press (1978)

work page 1978

[28] [28]

Advances in neural information processing systems35, 24824–24837 (2022)

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

work page 2022

[29] [29]

In: Interna- tional Conference on Artificial Intelligence in Education

Yan, L., Zhao, L., Echeverria, V., Jin, Y., Alfredo, R., Li, X., Gaševi’c, D., Martinez-Maldonado, R.: Vizchat: enhancing learning analytics dashboards with contextualised explanations using multimodal generative ai chatbots. In: Interna- tional Conference on Artificial Intelligence in Education. pp. 180–193. Springer (2024)

work page 2024

[30] [30]

Advances in Neural Information Processing Systems36, 46595–46623 (2023)

Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems36, 46595–46623 (2023)

work page 2023