"It Talks Like a Patient, But Feels Different": Co-Designing AI Standardized Patients with Medical Learners
Pith reviewed 2026-05-16 06:52 UTC · model grok-4.3
The pith
Instructional usability, not conversational realism, drives the value of AI standardized patients for medical learners.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our findings position AI-SPs as tools for deliberate practice and show that instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value. The six learner-centered needs identified through interviews and workshops were translated into concrete AI-SP design requirements and a synthesized conceptual workflow.
What carries the argument
The six learner-centered needs from co-design sessions, translated into AI-SP design requirements and a conceptual workflow that supports deliberate practice.
If this is right
- AI-SPs become viable for on-demand, scalable practice that supplements traditional standardized patient encounters.
- Design efforts should prioritize instructional features such as structured feedback and goal alignment over perfect conversational realism.
- Learner trust and engagement increase when AI-SPs address the identified needs for instructional support.
- The conceptual workflow provides a practical guide for turning learner needs into deployable AI training tools.
Where Pith is reading between the lines
- The same co-design approach could identify needs for AI simulators in adjacent fields such as nursing or pharmacy training.
- Developers could test the workflow by building prototype AI-SPs and measuring real usage patterns in medical curricula.
- Widespread adoption might lower overall costs of communication skills training by reducing reliance on human standardized patients.
Load-bearing premise
The six learner-centered needs from interviews with 12 students and three workshops will generalize to other medical learners and directly produce effective AI implementations.
What would settle it
A larger controlled study that implements AI-SPs with the proposed usability features versus versions focused only on dialogue realism and measures differences in learner trust, engagement, and skill improvement.
Figures
read the original abstract
Standardized patients (SPs) play a central role in clinical communication training but are costly, difficult to scale, and inconsistent. Large language model (LLM) based AI standardized patients (AI-SPs) promise flexible, on-demand practice, yet learners often report that they talk like a patient but feel different. We interviewed 12 clinical-year medical students and conducted three co-design workshops to examine how learners experience constraints of SP encounters and what they expect from AI-SPs. We identified six learner-centered needs, translated them into AI-SP design requirements, and synthesized a conceptual workflow. Our findings position AI-SPs as tools for deliberate practice and show that instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a qualitative co-design study with 12 clinical-year medical students, using semi-structured interviews and three workshops, to identify constraints in traditional standardized patient (SP) encounters and learner expectations for LLM-based AI standardized patients (AI-SPs). It extracts six learner-centered needs, translates them into design requirements, and synthesizes a conceptual workflow. The central claim is that AI-SPs function as tools for deliberate practice and that instructional usability (rather than conversational realism alone) drives learner trust, engagement, and educational value.
Significance. If the identified needs and workflow hold under further testing, the work offers a learner-centered foundation for designing scalable AI tools in clinical communication training, addressing cost and consistency limitations of human SPs. The emphasis on usability factors beyond realism provides actionable guidance for HCI researchers building educational AI systems, though the absence of any implemented prototype or outcome data limits immediate applicability.
major comments (2)
- [Abstract and Discussion] Abstract and Discussion: The assertion that 'instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value' is load-bearing for the central claim yet rests solely on expectations elicited from 12 students and workshops; no AI-SP prototype was built or evaluated with outcome measures, so the data cannot distinguish whether the proposed usability factors would produce the claimed effects when implemented.
- [Methods and Results] Methods and Results: The six learner-centered needs are presented as directly informing design requirements, but the manuscript provides no detail on the thematic analysis process, inter-rater reliability, or how needs were validated or saturated across the 12 interviews and three workshops; this weakens the evidential basis for generalizing the needs to broader populations or AI implementations.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly state the study's limitations regarding sample size and lack of prototype evaluation to better contextualize the strength of the claims.
- [Methods] Clarify the exact structure of the three co-design workshops (e.g., activities, participant roles, outputs) to improve reproducibility of the co-design process.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. We agree that the study is exploratory and qualitative in nature, and we will revise the manuscript to more precisely scope our claims, add methodological transparency, and explicitly acknowledge limitations regarding the lack of prototype implementation. These changes will strengthen the paper without altering its core contribution as a co-design study informing future AI-SP development.
read point-by-point responses
-
Referee: [Abstract and Discussion] Abstract and Discussion: The assertion that 'instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value' is load-bearing for the central claim yet rests solely on expectations elicited from 12 students and workshops; no AI-SP prototype was built or evaluated with outcome measures, so the data cannot distinguish whether the proposed usability factors would produce the claimed effects when implemented.
Authors: We agree that the study elicits learner expectations rather than measuring implemented effects. The central claim is presented as learners' reported priorities from the co-design process, not as validated outcomes from an AI-SP system. We will revise the abstract and discussion to explicitly qualify the claim as based on learner expectations for what would drive trust and engagement, add a limitations paragraph acknowledging the absence of prototype evaluation and outcome data, and frame the work as foundational for future empirical studies. revision: yes
-
Referee: [Methods and Results] Methods and Results: The six learner-centered needs are presented as directly informing design requirements, but the manuscript provides no detail on the thematic analysis process, inter-rater reliability, or how needs were validated or saturated across the 12 interviews and three workshops; this weakens the evidential basis for generalizing the needs to broader populations or AI implementations.
Authors: We will expand the Methods section to provide a detailed description of the reflexive thematic analysis process, including how the codebook was iteratively developed through team discussions, how saturation was assessed across interviews and workshops, and the collaborative validation steps used to refine the six needs. While traditional inter-rater reliability statistics are not standard for reflexive approaches, we will clarify our consensus-building procedures. We do not claim broad generalizability and will add language noting the sample scope. revision: yes
Circularity Check
No circularity: qualitative co-design study with no derivations or self-referential predictions
full rationale
The paper reports an empirical qualitative study consisting of interviews with 12 medical students and three co-design workshops. It extracts six learner-centered needs, translates them into design requirements, and synthesizes a conceptual workflow. No equations, fitted parameters, predictions, or mathematical derivations appear in the provided text. The central claims rest on direct participant data rather than any self-referential loop, self-citation chain, or renaming of prior results. The work is therefore self-contained as standard qualitative HCI research with no internal circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz
Saleema Amershi, Daniel Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. Guidelines for human-ai interac- tion. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 2019
work page 2019
-
[2]
Tariq Osman Andersen, Francisco Nunes, Lauren Wilcox, Enrico Coiera, and Yvonne Rogers. Introduction to the special issue on human-centred ai in health- care: Challenges appearing in the wild.ACM Transactions on Computer-Human Interaction, 30(2), 2023
work page 2023
-
[3]
Howard S Barrows. An overview of the uses of standardized patients for teaching and evaluating clinical skills.Academic Medicine, 68(6):443–451, 1993
work page 1993
-
[4]
Anton Borg, Ioannis Parodis, and Gabriel Skantze. Creating virtual patients using robots and large language models: A preliminary study with medical students. InCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’24), pages 273–277, 2024
work page 2024
-
[5]
Using thematic analysis in psychology
Virginia Braun and Victoria Clarke. Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2):77–101, 2006
work page 2006
-
[6]
Reflecting on reflexive thematic analysis
Virginia Braun and Victoria Clarke. Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4):589–597, 2019
work page 2019
-
[7]
Language Models are Few-Shot Learners
Tom B Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.arXiv preprint arXiv:2005.14165, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[8]
Kathy Charmaz.Constructing Grounded Theory: A Practical Guide Through Qual- itative Analysis. SAGE, 2006
work page 2006
-
[9]
Jennifer A Cleland, Kazuya Abe, and Jan-Joost Rethans. The use of simulated patients in medical education: Amee guide no 42.Medical Teacher, 31(6):477–486, 2009
work page 2009
-
[10]
David A Cook, Patricia J Erwin, and Marc M Triola. Computerized virtual patients in health professions education: a systematic review and meta-analysis.Academic Medicine, 85(10):1589–1602, 2010
work page 2010
-
[11]
David A Cook and Marc M Triola. Virtual patients: a critical literature review and proposed next steps.Medical Education, 43(4):303–311, 2009
work page 2009
-
[12]
Joseph Cross, Tarron Kayalackakom, Raymond E. Robinson, Andrea Vaughans, Roopa Sebastian, Ricardo Hood, Courtney Lewis, Sumanth Devaraju, Prasanna Honnavar, Sheetal Naik, Jillwin Joseph, Nikhilesh Anand, Abdalla Mohammed, Asjah Johnson, Eliran Cohen, Teniola Adeniji, Aisling Nnenna Nnaji, and Ju- lia Elizabeth George. Assessing chatgpt’s capability as a n...
work page 2025
-
[13]
Chukwuebuka Elendu, Daniella C Amaechi, et al. The impact of simulation-based training in medical education: A review.Medicine, 103(27):e38813, 2024
work page 2024
-
[14]
Andrea Ferrario, Jana Sedlakova, and Manuel Trachsel. The role of humanization and robustness of large language models in conversational artificial intelligence for individuals with depression: A critical analysis.JMIR Mental Health, 11:e56569, 2024
work page 2024
-
[15]
Standardized patients in medical education: A review of the literature.Cureus, 15(7):e42027, 2023
Olivia L Flanagan and Kathleen M Cummings. Standardized patients in medical education: A review of the literature.Cureus, 15(7):e42027, 2023
work page 2023
-
[16]
Yann Hicke, Jadon Geathers, Niroop Rajashekar, et al. Medsimai: Simulation and formative feedback generation to enhance deliberate practice in medical education.arXiv preprint arXiv:2503.05793, 2025
-
[17]
Friederike Holderried, Christian Stegemann-Philipps, Lea Herschbach, et al. A generative pretrained transformer (gpt)-powered chatbot as a simulated patient to practice history taking: prospective, mixed methods study.JMIR Medical Education, 10(1):e53961, 2024
work page 2024
-
[18]
Eric B Larson and Xin Yao. Clinical empathy as emotional labor in the patient- physician relationship.Jama, 293(9):1100–1106, 2005
work page 2005
-
[19]
Dongliang Li and Syaheerah Lebai Lutfi. Large language model-based virtual pa- tient systems for history-taking in medical education: Comprehensive systematic review.JMIR Medical Informatics, 14:e79039, 2026
work page 2026
-
[20]
Margaretha Meerdink and Javaid Khan. Comparison of the use of manikins and simulated patients in a multidisciplinary in situ medical simulation program for healthcare professionals in the united kingdom.Journal of Educational Evaluation for Health Professions, 18:8, 2021
work page 2021
-
[21]
Simulated patient methodology in health professional education
Debra Nestel and Margaret Bearman. Simulated patient methodology in health professional education. pages 1–10, 2014
work page 2014
-
[22]
Snowball sampling.SAGE research methods foundations, 2019
Charlie Parker, Sam Scott, and Alistair Geddes. Snowball sampling.SAGE research methods foundations, 2019
work page 2019
-
[23]
The use of standardized patients to teach communication skills—a systematic review
Tonya Rutherford-Hemming, Alaina Herrington, and Thye Peng Ngo. The use of standardized patients to teach communication skills—a systematic review. Simulation in Healthcare, 19(1S):S122–S128, 2024
work page 2024
-
[24]
Sanders and Pieter Jan Stappers
Elizabeth B.-N. Sanders and Pieter Jan Stappers. Co-creation and the new land- scapes of design.CoDesign, 4(1):5–18, 2008
work page 2008
-
[25]
16 the paradox of teaching empathy in medical education
Johanna Shapiro. 16 the paradox of teaching empathy in medical education. Empathy: From bench to bedside, page 275, 2011
work page 2011
-
[26]
Ian Steenstra, Farnaz Nouraei, and Timothy Bickmore. Scaffolding empathy: Training counselors with simulated patients and utterance-level performance vi- sualizations. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI ’25, Yokohama, Japan, 2025. Association for Computing Machinery
work page 2025
-
[27]
Manikins ver- sus simulated patients in emergency medicine training: a comparative analysis
Janina Sterz, Nils Gutenberger, Maria Cristina Stefanescu, et al. Manikins ver- sus simulated patients in emergency medicine training: a comparative analysis. European Journal of Trauma and Emergency Surgery, 48(5):3793–3801, 2022
work page 2022
-
[28]
Towards conversational diagnostic ai.arXiv preprint arXiv:2401.05654,
Tao Tu, Anil Palepu, Mike Schaekermann, et al. Towards conversational diag- nostic ai.arXiv preprint arXiv:2401.05654, 2024
-
[29]
Kiami, Leanne Chukoskie, and Eileen McGivney
Xiuqi Tommy Zhu, Heidi Cheerman, Minxin Cheng, Sheri R. Kiami, Leanne Chukoskie, and Eileen McGivney. Designing VR simulation system for clinical communication training with LLMs-based embodied conversational agents. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’25, Yokohama, Japan, 2025. Association for Comput...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.