Learning to make MISTAKEs : Modeling incorrect student thinking and key errors

Alexis Ross, Jacob Andreas · 2025 · arXiv 2510.11502

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

cs.CL · 2026-05-12 · conditional · novelty 7.0

LLM simulators exhibit near-zero selective response to targeted misconception feedback and behave sycophantically, but SFT and SFS-aligned RL improve this property.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

SalesSim benchmarks MLLMs as retail user simulators, finds gaps in persona adherence and over-persuasion, and introduces UserGRPO RL to raise decision alignment by 13.8%.

Reinforcing Human Behavior Simulation via Verbal Feedback

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

DITTO uses RL with verbal feedback to train LLMs for human behavior simulation, reporting 36% average gains over base models and outperforming GPT-5.4 on 6 of 10 SOUL benchmark tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators cs.CL · 2026-05-12 · conditional · none · ref 25
LLM simulators exhibit near-zero selective response to targeted misconception feedback and behave sycophantically, but SFT and SFS-aligned RL improve this property.
SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators cs.CL · 2026-05-08 · unverdicted · none · ref 32
SalesSim benchmarks MLLMs as retail user simulators, finds gaps in persona adherence and over-persuasion, and introduces UserGRPO RL to raise decision alignment by 13.8%.
Reinforcing Human Behavior Simulation via Verbal Feedback cs.LG · 2026-05-19 · unverdicted · none · ref 28
DITTO uses RL with verbal feedback to train LLMs for human behavior simulation, reporting 36% average gains over base models and outperforming GPT-5.4 on 6 of 10 SOUL benchmark tasks.

Learning to make MISTAKEs : Modeling incorrect student thinking and key errors

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer