{"paper":{"title":"Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators","license":"http://creativecommons.org/licenses/by/4.0/","headline":"LLM student simulators correct answers at similar rates whether feedback targets the actual misconception or not.","cross_cats":["cs.AI","cs.CY","cs.LG"],"primary_cat":"cs.CL","authors_text":"Heejin Do, Mrinmaya Sachan, Shashank Sonkar","submitted_at":"2026-05-12T20:55:23Z","abstract_excerpt":"Large language models (LLMs) can fluently generate student-like responses, making them attractive as simulated students for training and evaluating AI tutors and human educators. Yet such simulators are typically evaluated by output similarity to real students, not by whether they behave like students with coherent misconceptions during interaction. We introduce a controlled framework for evaluating misconception faithfulness, whether a simulator maintains a misconception-driven belief state and updates selectively when feedback addresses the underlying misconception. Central to our framework "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Across seven LLMs (4B-120B), multiple datasets, and prompting strategies, simulators exhibit near-zero SFS, correcting their answers at similarly high rates regardless of feedback relevance.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the misconception-contrastive feedback protocol (targeted vs misaligned vs generic) cleanly isolates whether a simulator maintains a misconception-driven belief state rather than other response patterns.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LLM simulators exhibit near-zero selective response to targeted misconception feedback and behave sycophantically, but SFT and SFS-aligned RL improve this property.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LLM student simulators correct answers at similar rates whether feedback targets the actual misconception or not.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"053d62565e936a4a9796d88c1e9f5a7e1a3b1183349fca2ff834f639a4dd5303"},"source":{"id":"2605.12748","kind":"arxiv","version":1},"verdict":{"id":"7fb1b3f2-e3c9-4f06-8f32-6af31bb790a7","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:42:17.903145Z","strongest_claim":"Across seven LLMs (4B-120B), multiple datasets, and prompting strategies, simulators exhibit near-zero SFS, correcting their answers at similarly high rates regardless of feedback relevance.","one_line_summary":"LLM simulators exhibit near-zero selective response to targeted misconception feedback and behave sycophantically, but SFT and SFS-aligned RL improve this property.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the misconception-contrastive feedback protocol (targeted vs misaligned vs generic) cleanly isolates whether a simulator maintains a misconception-driven belief state rather than other response patterns.","pith_extraction_headline":"LLM student simulators correct answers at similar rates whether feedback targets the actual misconception or not."},"references":{"count":36,"sample":[{"doi":"","year":2025,"title":"gpt-oss-120b & gpt-oss-20b Model Card","work_id":"178c1f7e-4f19-4392-a45d-45a6dfa88ead","ref_index":1,"cited_arxiv_id":"2508.10925","is_internal_anchor":true},{"doi":"","year":1995,"title":"Cognitive tutors: Lessons learned.The journal of the learning sciences, 4(2):167–207, 1995","work_id":"d21426f7-587f-41db-9232-efa69c6e9cb7","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1984,"title":"Order and equivalence of rational numbers: A clinical teaching experiment.Journal for Research in Mathematics Education, 15(5):323–341, 1984","work_id":"4dde8321-6e10-434c-aacb-17029c882e51","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1978,"title":"Diagnostic models for procedural bugs in basic mathematical skills.Cognitive science, 2(2):155–192, 1978","work_id":"486bc8bb-6e5e-4d9a-a1c6-1e36c8687d36","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"The impact of high school life science teachers’ subject matter knowledge and knowledge of student misconceptions on students’ learning.CBE—Life Sciences Education, 19(1):ar9, 2020","work_id":"b512f3de-5c2e-49d7-815c-1c714cebda24","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":36,"snapshot_sha256":"395891a307dfd599042be8e7ed414502b63d187fe9ffca424fa282206174dc13","internal_anchors":6},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}