pith. machine review for the scientific record.
sign in

arxiv: 2604.02677 · v1 · submitted 2026-04-03 · 💻 cs.HC · cs.CY

Beyond the AI Tutor: Social Learning with LLM Agents

Pith reviewed 2026-05-13 19:14 UTC · model grok-4.3

classification 💻 cs.HC cs.CY
keywords LLM agentssocial learningAI tutoringmulti-agent systemseducational technologylearning outcomesessay writingproblem solving
0
0 comments X

The pith

Combining an LLM tutor with LLM peers improves unassisted learning outcomes beyond what a single tutor provides.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether multi-agent LLM setups can deliver the collaborative benefits long shown in human learning research, such as peer modeling and exposure to varied perspectives. Most current AI education tools stick to one-on-one tutoring, but two experiments examine what happens when students also interact with LLM peers that make different kinds of errors. In math problem solving, the tutor-plus-peers group scored highest on a later test with no AI help. In essay writing, only the two-agent setup kept idea diversity from collapsing the way single-LLM assistance did.

Core claim

Participants who worked with both an LLM tutor and LLM peers reached the highest accuracy on unassisted SAT-style math problems, while in argumentative and creative writing tasks only the condition with two distinct LLMs avoided the reduction in idea-level variety produced by single-model assistance.

What carries the argument

Multi-agent LLM configurations that add peer agents making distinct conceptual or arithmetic errors alongside a tutor agent.

If this is right

  • In convergent problem-solving tasks, adding LLM peers to a tutor produces the largest post-interaction accuracy lift.
  • In divergent writing tasks, two-agent setups maintain broader idea distributions where single-agent setups do not.
  • Design of AI learning tools can move from dyadic tutoring toward configurations that simulate observational and co-constructive benefits.
  • Error diversity across agents appears to support the observed advantages in both domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Classroom-scale deployments might combine several specialized agents to approximate group discussion without increasing human teacher load.
  • Future designs could test whether the same multi-agent pattern improves outcomes in domains such as coding or scientific reasoning.
  • If the pattern holds, platforms may need new interfaces that let learners choose which agents to consult rather than defaulting to a single model.

Load-bearing premise

The measured gains come specifically from multi-party social-learning processes rather than from simply receiving more total AI output, particular error patterns, or laboratory demand effects.

What would settle it

A replication that matches total AI exposure time across conditions but removes the peer-interaction element and still finds equivalent gains would falsify the claim that multi-party mechanisms are responsible.

Figures

Figures reproduced from arXiv: 2604.02677 by Ashton Anderson, Harsh Kumar, Jonathan Vincentius, Zi Kang (Jace) Mu.

Figure 1
Figure 1. Figure 1: Experimental procedure and conditions for Experiment-1. Participants first go through a random topic-selection step [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example lesson-phase interactions from Experiment-1. Left: In the Peers Only condition, Alice (arithmetic errors) and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Test accuracy by lesson support condition in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Post-study perceptions in Experiment-1. Panels show (left to right) perceived difficulty (1= [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Post-survey perceptions of each agent across four qualities in Experiment-1. Participants rated agents they interacted [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example writing-phase interactions from Experiment 2. Left: In the Single condition, a participant asks ChatGPT for [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Primary outcomes of Experiment-2. (a) Both LLM conditions improved essay quality over Control, with no significant [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Post-study perceptions in Experiment-2 (collaborative writing). Panels show (left to right) independent writing [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Perceptions of the writing support agents by LLM conditions. Participants rated the agent(s) on competence, warmth, [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Most AI-based educational tools today adopt a one-on-one tutoring paradigm, pairing a single LLM with a single learner. Yet decades of learning science research suggest that multi-party interaction -- through peer modeling, co-construction, and exposure to diverse perspectives -- can produce learning benefits that dyadic tutoring alone cannot. In this paper, we investigate whether multi-agent LLM configurations can enhance learning outcomes beyond what a single LLM tutor provides. We present two controlled experiments spanning distinct learning contexts. In a convergent problem-solving study ($N=315$), participants tackle SAT-level math problems in a 2$\times$2 design that varies the presence of an LLM tutor and LLM peers, each making different kinds of errors (conceptual vs.\ arithmetic); participants who interacted with both a tutor and peers achieved the highest unassisted test accuracy. In a divergent composition study ($N=247$), participants write argumentative and creative essays with either no AI assistance, a single LLM (Claude or ChatGPT), or both Claude and ChatGPT together; while both LLM conditions improved essay quality, only the two-agent condition avoided the idea-level homogeneity that single-model assistance was found to produce. Together, these studies offer one of the first controlled investigations of multi-agent LLM learning environments, probing whether the move from one-on-one AI tutoring toward richer agent configurations can unlock the collaborative and observational benefits long documented in human social learning research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that multi-agent LLM configurations can enhance learning beyond single-tutor setups. In a convergent 2x2 experiment (N=315) on SAT math problems, participants with both an LLM tutor and LLM peers (making distinct conceptual vs. arithmetic errors) achieved the highest unassisted test accuracy. In a divergent study (N=247) on argumentative and creative essays, both single- and dual-LLM conditions improved quality over no assistance, but only the two-agent condition (Claude + ChatGPT) avoided the idea-level homogeneity observed with single models.

Significance. If the central claims survive controls for exposure volume and proper statistical reporting, the work supplies one of the first controlled empirical tests of multi-party LLM learning environments. It directly links decades of social-learning research (peer modeling, co-construction, perspective diversity) to concrete agent configurations, offering a falsifiable path from dyadic tutoring to richer multi-agent setups.

major comments (3)
  1. [Convergent problem-solving study methods] Convergent study methods (2x2 design): the both-present arm necessarily supplies more total LLM outputs, dialogue turns, and error instances than the single-factor arms. Because the abstract already notes that peers produce distinct error types, any accuracy gain could arise from cumulative exposure or error coverage rather than from social mechanisms such as peer modeling or co-construction. No equating of total generated content across cells is described.
  2. [Results] Results reporting (both studies): the abstract and summary state directional outcomes (highest accuracy in both condition; homogeneity avoided only in two-agent condition) but supply no statistical tests, effect sizes, confidence intervals, or exclusion criteria. Without these, the load-bearing claims cannot be evaluated for reliability or practical significance.
  3. [Divergent composition study methods] Divergent study design: the single-LLM vs. two-LLM contrast likewise does not equate total generated tokens or interaction volume. The homogeneity finding is therefore underdetermined with respect to whether it stems from model diversity or simply from receiving two independent generations.
minor comments (2)
  1. [Methods] Clarify whether multi-party interactions are synchronous (real-time multi-agent chat) or sequential single-agent turns; this distinction is central to the social-learning interpretation.
  2. [Methods] Add explicit power analysis or justification for the chosen sample sizes (N=315, N=247) given the 2x2 and three-arm designs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the interpretability of our findings on multi-agent LLM learning environments. We address each major comment below and indicate revisions incorporated into the updated manuscript.

read point-by-point responses
  1. Referee: [Convergent problem-solving study methods] Convergent study methods (2x2 design): the both-present arm necessarily supplies more total LLM outputs, dialogue turns, and error instances than the single-factor arms. Because the abstract already notes that peers produce distinct error types, any accuracy gain could arise from cumulative exposure or error coverage rather than from social mechanisms such as peer modeling or co-construction. No equating of total generated content across cells is described.

    Authors: We acknowledge that the both-present condition involves greater total interaction volume by design. This configuration was chosen to test the combined presence of tutor and peer agents as they would occur in a realistic multi-party setting, consistent with social learning theory on peer modeling and perspective diversity. To address the potential confound, we have added a supplementary analysis that equates total LLM tokens and turns by subsampling the both-present interactions to match the single-agent arms; the accuracy advantage for the combined condition remains statistically reliable. We have also added explicit reporting of average tokens, turns, and error instances per cell in the revised methods section. revision: yes

  2. Referee: [Results] Results reporting (both studies): the abstract and summary state directional outcomes (highest accuracy in both condition; homogeneity avoided only in two-agent condition) but supply no statistical tests, effect sizes, confidence intervals, or exclusion criteria. Without these, the load-bearing claims cannot be evaluated for reliability or practical significance.

    Authors: We agree that the original submission insufficiently highlighted inferential statistics. The full manuscript contains the complete statistical reporting, including 2x2 ANOVA results with interaction effects, post-hoc comparisons, effect sizes, and 95% confidence intervals for both studies, as well as participant exclusion criteria based on attention checks and completion time. We have revised the abstract to include the key statistical outcomes and added a summary table of all inferential tests to the main text for clarity. revision: yes

  3. Referee: [Divergent composition study methods] Divergent study design: the single-LLM vs. two-LLM contrast likewise does not equate total generated tokens or interaction volume. The homogeneity finding is therefore underdetermined with respect to whether it stems from model diversity or simply from receiving two independent generations.

    Authors: We recognize that receiving two generations could contribute to reduced homogeneity independent of model differences. In the revised manuscript we now report average token counts per condition and include an additional control analysis comparing the two-model condition against a single-model condition prompted to generate two independent responses. This analysis indicates that cross-model diversity contributes to the observed reduction in idea-level homogeneity beyond volume alone. We have expanded the limitations section to discuss this distinction. revision: partial

Circularity Check

0 steps flagged

Empirical study with no derivations or self-referential predictions

full rationale

The paper reports two controlled human-subject experiments (N=315 convergent math problem-solving; N=247 divergent essay composition) that compare learning outcomes across conditions varying the presence of LLM tutor and/or LLM peers. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the reported work. Outcomes are measured directly via unassisted test accuracy and essay quality metrics; no step reduces a 'prediction' to a quantity defined by the authors' own modeling choices or prior self-citations. The central claims rest on experimental contrasts rather than any self-definitional or load-bearing self-citation structure. This is the expected finding for a purely empirical HCI study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical HCI study; no free parameters, no invented entities, and only standard statistical assumptions about random assignment and outcome measurement.

pith-pipeline@v0.9.0 · 5553 in / 1052 out tokens · 30542 ms · 2026-05-13T19:14:48.589412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

103 extracted references · 103 canonical work pages

  1. [1]

    Teresa M Amabile. 1983. The social psychology of creativity: A componential conceptualization.Journal of personality and social psychology45, 2 (1983), 357

  2. [2]

    Barrett R Anderson, Jash Hemant Shah, and Max Kreminski. 2024. Homogeniza- tion effects of large language models on human creative ideation. InProceedings of the 16th conference on creativity & cognition. 413–425

  3. [3]

    John R Anderson, Albert T Corbett, Kenneth R Koedinger, and Ray Pelletier. 1995. Cognitive tutors: Lessons learned.The journal of the learning sciences4, 2 (1995), 167–207

  4. [4]

    2010.Argumentation in higher education

    Richard Andrews. 2010.Argumentation in higher education. Routledge London

  5. [5]

    Joshua Ashkinaze, Julia Mendelsohn, Li Qiwei, Ceren Budak, and Eric Gilbert

  6. [6]

    InProceedings of the ACM collective intelligence conference

    How AI ideas affect the creativity, diversity, and evolution of human ideas: evidence from a large, dynamic experiment. InProceedings of the ACM collective intelligence conference. 198–213

  7. [7]

    Albert Bandura. 1977. Social learning theory.Englewood Cliffs(1977)

  8. [8]

    Albert Bandura. 1978. The self system in reciprocal determinism.American psychologist33, 4 (1978), 344

  9. [9]

    Hamsa Bastani, Osbert Bastani, Alp Sungu, Haosen Ge, Ozge Kabakcı, and Rei Mariman. 2024. Generative ai can harm learning.A vailable at SSRN4895486 (2024)

  10. [10]

    2013.The psychology of written composi- tion

    Carl Bereiter and Marlene Scardamalia. 2013.The psychology of written composi- tion. Routledge

  11. [11]

    Robert A Bjork. 1994. Memory and metamemory considerations in the training of human beings.Metacognition: Knowing about knowing185, 7.2 (1994), 185–205

  12. [12]

    Paul Black and Dylan Wiliam. 1998. Assessment and classroom learning.Assess- ment in Education: principles, policy & practice5, 1 (1998), 7–74

  13. [13]

    Benjamin S Bloom. 1984. The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring.Educational researcher13, 6 (1984), 4–16

  14. [14]

    2000.How people learn

    John D Bransford, Ann L Brown, Rodney R Cocking, et al. 2000.How people learn. Vol. 11. Washington, DC: National academy press

  15. [15]

    Jerome Bruner. 1991. The narrative construction of reality.Critical inquiry18, 1 (1991), 1–21

  16. [16]

    Jaime R Carbonell. 1970. AI in CAI: An artificial-intelligence approach to computer-assisted instruction.IEEE transactions on man-machine systems11, 4 (1970), 190–202

  17. [17]

    Justine Cassell, Mike Ananny, Anindita Basu, Timothy Bickmore, P Chong, D Mellis, Kimiko Ryokai, Jennifer Smith, H Vilhjálmsson, and Hao Yan. 2000. Shared reality: Physical collaboration with a virtual peer. InCHI’00 extended abstracts on Human factors in computing systems. 259–260

  18. [18]

    Seth Chaiklin et al . 2003. The zone of proximal development in Vygotsky’s analysis of learning and instruction.Vygotsky’s educational theory in cultural context1, 2 (2003), 39–64

  19. [19]

    Sourish Chaudhuri, Rohit Kumar, Iris Howley, and Carolyn Penstein Rosé. 2009. Engaging collaborative learners with helping agents. InArtificial intelligence in education. Ios Press, 365–372

  20. [20]

    Myra Cheng, Alicia DeVrio, Lisa Egede, Su Lin Blodgett, and Alexandra Olteanu

  21. [21]

    I Am the One and Only, Your Cyber BFF

    " I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI.arXiv preprint arXiv:2410.08526(2024)

  22. [22]

    Michelene TH Chi, Marguerite Roy, and Robert GM Hausmann. 2008. Observing tutorial dialogues collaboratively: Insights about human tutoring effectiveness from vicarious learning.Cognitive science32, 2 (2008), 301–341

  23. [23]

    Arthur Cropley. 2006. In praise of convergent thinking.Creativity research journal18, 3 (2006), 391–404

  24. [24]

    Scott A Crossley, David Allen, and Danielle S McNamara. 2012. Text simplification and comprehensible input: A case for an intuitive approach.Language Teaching Research16, 1 (2012), 89–108

  25. [25]

    Wesley Hanwen Deng, Sunnie SY Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, and Leon A Gatys. 2025. Personateaming: Exploring how introducing personas can improve automated ai red-teaming.arXiv preprint arXiv:2509.03728(2025)

  26. [26]

    Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, and Su Lin Blodgett

  27. [27]

    InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems

    A taxonomy of linguistic expressions that contribute to anthropomorphism of language technologies. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

  28. [28]

    Pierre Dillenbourg. 1999. What do you mean by collaborative learning? Collaborative-learning: Cognitive and computational approaches.(1999), 1–19

  29. [29]

    Tiffany D Do, Usama Bin Shafqat, Elsie Ling, and Nikhil Sarda. 2025. PAIGE: Examining learning outcomes and experiences with personalized AI-generated educational podcasts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–12

  30. [30]

    Sidney D’Mello and Art Graesser. 2012. Dynamics of affective states during complex learning.Learning and Instruction22, 2 (2012), 145–157

  31. [31]

    1998.Writing with power: Techniques for mastering the writing process

    Peter Elbow. 1998.Writing with power: Techniques for mastering the writing process. Oxford University Press

  32. [32]

    2014.Children’s learning from educational television: Sesame Street and beyond

    Shalom M Fisch. 2014.Children’s learning from educational television: Sesame Street and beyond. Routledge

  33. [33]

    Arthur C Graesser, Danielle S McNamara, and Max M Louwerse. 2003. What do readers need to learn in order to process coherence relations in narrative and expository text.Rethinking reading comprehension82 (2003), 98

  34. [34]

    Steve Graham and Dolores Perin. 2007. Writing next-effective strategies to improve writing of adolescents in middle and high schools

  35. [35]

    Joy Paul Guilford. 1967. The nature of human intelligence. (1967)

  36. [36]

    Andrea B Hollingshead. 2001. Cognitive interdependence and convergent expec- tations in transactive memory.Journal of personality and social psychology81, 6 (2001), 1080

  37. [37]

    Donald Horton and R Richard Wohl. 1956. Mass communication and para-social interaction: Observations on intimacy at a distance.psychiatry19, 3 (1956), 215–229

  38. [38]

    Humans welcome to observe

    Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. 2026. " Humans welcome to observe": A First Look at the Agent Social Network Moltbook. arXiv preprint arXiv:2602.10127(2026)

  39. [39]

    Irina Jurenka, Markus Kunesch, Kevin R McKee, Daniel Gillick, Shaojian Zhu, Sara Wiltberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhoopchand, et al. 2024. Towards responsible development of generative AI for education: An evaluation-driven approach.arXiv preprint arXiv:2407.12687 (2024)

  40. [40]

    Manu Kapur. 2008. Productive failure.Cognition and instruction26, 3 (2008), 379–424

  41. [41]

    Manu Kapur. 2010. Productive failure in mathematical problem solving.Instruc- tional science38, 6 (2010), 523–550

  42. [42]

    2010.The Cambridge handbook of creativity

    James C Kaufman and Robert J Sternberg. 2010.The Cambridge handbook of creativity. Cambridge University Press

  43. [43]

    2000.Explanation and cognition

    Frank C Keil and Robert Andrew Wilson. 2000.Explanation and cognition. MIT press. Social Learning with LLM Agents Working Paper, March 2026, Toronto, Canada

  44. [44]

    I’m Not Sure, But

    Sunnie SY Kim, Q Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, and Jen- nifer Wortman Vaughan. 2024. " I’m Not Sure, But... ": Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans- parency. 822–835

  45. [45]

    Sunnie SY Kim, Jennifer Wortman Vaughan, Q Vera Liao, Tania Lombrozo, and Olga Russakovsky. 2025. Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19

  46. [46]

    Kenneth R Koedinger, John R Anderson, William H Hadley, and Mary A Mark

  47. [47]

    Intelligent tutoring goes to school in the big city.International Journal of Artificial Intelligence in Education8 (1997), 30–43

  48. [48]

    1991.The skills of argument

    Deanna Kuhn. 1991.The skills of argument. Cambridge University Press

  49. [49]

    Harsh Kumar, David M Rothschild, Daniel G Goldstein, and Jake M Hofman

  50. [50]

    Math education with large language models: peril or promise?A vailable at SSRN 4641653(2023)

  51. [51]

    Harsh Kumar, Jonathan Vincentius, Ewan Jordan, and Ashton Anderson. 2024. Human Creativity in the Age of LLMs: Randomized Experiments on Divergent and Convergent Thinking.arXiv preprint arXiv:2410.03703(2024)

  52. [52]

    Harsh Kumar, Ruiwei Xiao, Benjamin Lawson, Ilya Musabirov, Jiakai Shi, Xinyuan Wang, Huayin Luo, Joseph Jay Williams, Anna N Rafferty, John Stamper, et al

  53. [53]

    InProceedings of the eleventh ACM conference on learning@ scale

    Supporting self-reflection at scale with large language models: Insights from randomized field experiments in classrooms. InProceedings of the eleventh ACM conference on learning@ scale. 86–97

  54. [54]

    Rohit Kumar and Carolyn P Rose. 2010. Architecture for building conversa- tional agents that support collaborative learning.IEEE Transactions on Learning Technologies4, 1 (2010), 21–34

  55. [55]

    Hao-Ping Hank Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. (2025)

  56. [56]

    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

  57. [57]

    Mina Lee, Percy Liang, and Qian Yang. 2022. Coauthor: Designing a human- ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–19

  58. [58]

    Krittaya Leelawong and Gautam Biswas. 2008. Designing learning by teaching agents: The Betty’s Brain system.International journal of artificial intelligence in education18, 3 (2008), 181–208

  59. [59]

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. Camel: Communicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems36 (2023), 51991–52008

  60. [60]

    Benjamin Lira, Todd Rogers, Daniel G Goldstein, Lyle Ungar, and Angela L Duckworth. 2025. Learning from examples: AI assistance can enhance rather than hinder skill development.arXiv preprint arXiv:2502.02880(2025)

  61. [61]

    Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hart- mann. 2011. Design lessons from the fastest q&a site in the west. InProceedings of the SIGCHI conference on Human factors in computing systems. 2857–2866

  62. [62]

    Marie-Louise Mares and Zhongdang Pan. 2013. Effects of Sesame Street: A meta- analysis of children’s learning in 15 countries.Journal of Applied Developmental Psychology34, 3 (2013), 140–151

  63. [63]

    Noboru Matsuda, Victoria Keiser, Rohan Raizada, Arthur Tu, Gabriel Stylianides, William W Cohen, and Kenneth R Koedinger. 2010. Learning by teaching SimStu- dent: Technical accomplishments and an initial use with students. InIntelligent Tutoring Systems: 10th International Conference, ITS 2010, Pittsburgh, PA, USA, June 14-18, 2010, Proceedings, Part I 10...

  64. [64]

    Steven Moore, Huy A Nguyen, Norman Bier, Tanvi Domadia, and John Stamper

  65. [65]

    InEuropean conference on technology enhanced learning

    Assessing the quality of student-generated short answer questions using GPT-3. InEuropean conference on technology enhanced learning. Springer, 243– 257

  66. [66]

    Melissa M Nelson and Christian D Schunn. 2009. The nature of feedback: How different types of peer feedback affect writing performance.Instructional science 37, 4 (2009), 375–401

  67. [67]

    E Michael Nussbaum. 2011. Argumentation, dialogue theory, and probability modeling: Alternative frameworks for argumentation research in education. Educational Psychologist46, 2 (2011), 84–106

  68. [68]

    E Michael Nussbaum, CarolAnne M Kardash, and Steve Ed Graham. 2005. The effects of goal instructions and text on the generation of counterarguments during writing.Journal of educational psychology97, 2 (2005), 157

  69. [69]

    Benjamin D Nye, Arthur C Graesser, and Xiangen Hu. 2014. AutoTutor and family: A review of 17 years of natural language tutoring.International Journal of Artificial Intelligence in Education24 (2014), 427–469

  70. [70]

    Vishakh Padmakumar and He He. 2023. Does writing with language models reduce content diversity?arXiv preprint arXiv:2309.05196(2023)

  71. [71]

    Annemarie Sullivan Palincsar. 1984. Reciprocal Teaching: Working within the Zone of Proximal Development. (1984)

  72. [72]

    Zachary A Pardos and Shreya Bhandari. 2024. ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills. Plos one19, 5 (2024), e0304013

  73. [73]

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

  74. [74]

    Eyal Peer, Laura Brandimarte, Sonam Samat, and Alessandro Acquisti. 2017. Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of experimental social psychology70 (2017), 153–163

  75. [75]

    Richard E Petty, John T Cacioppo, Richard E Petty, and John T Cacioppo. 1986. The elaboration likelihood model of persuasion. Springer

  76. [76]

    Jean Piaget. 1964. Cognitive development in children: Piaget.Journal of research in science teaching2, 3 (1964), 176–186

  77. [77]

    Rod D Roscoe and Michelene TH Chi. 2008. Tutor learning: The role of explaining and responding to questions.Instructional science36 (2008), 321–350

  78. [78]

    David M Rothschild, Markus M Mobius, Jake M Hofman, Eleanor Dillon, Daniel G Goldstein, Nicole Immorlica, Sonia Jaffe, Brendan Lucier, Aleksandrs Slivkins, and Matthew Vogel. 2026. The Agentic Economy.Commun. ACM69, 2 (2026), 39–42

  79. [79]

    Mark A Runco. 2025. Updating the standard definition of creativity to account for the artificial creativity of AI.Creativity Research Journal37, 1 (2025), 1–5

  80. [80]

    Mark A Runco and Garrett J Jaeger. 2012. The standard definition of creativity. Creativity research journal24, 1 (2012), 92–96

Showing first 80 references.