pith. sign in

arxiv: 2602.05856 · v1 · submitted 2026-02-05 · 💻 cs.HC

"It Talks Like a Patient, But Feels Different": Co-Designing AI Standardized Patients with Medical Learners

Pith reviewed 2026-05-16 06:52 UTC · model grok-4.3

classification 💻 cs.HC
keywords AI standardized patientsmedical educationco-designdeliberate practiceclinical communicationinstructional usabilityhuman-AI interaction
0
0 comments X

The pith

Instructional usability, not conversational realism, drives the value of AI standardized patients for medical learners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that AI standardized patients work best as tools for deliberate practice when built around six specific learner needs rather than just realistic conversation. Interviews with 12 clinical-year medical students and three co-design workshops revealed what makes current encounters feel off and what features would make AI versions trustworthy and educationally useful. A sympathetic reader cares because human standardized patients are expensive, hard to schedule, and inconsistent, so scalable AI alternatives could expand practice opportunities if they actually meet real learner priorities. The work translates the needs into design requirements and a conceptual workflow that prioritizes instructional support.

Core claim

Our findings position AI-SPs as tools for deliberate practice and show that instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value. The six learner-centered needs identified through interviews and workshops were translated into concrete AI-SP design requirements and a synthesized conceptual workflow.

What carries the argument

The six learner-centered needs from co-design sessions, translated into AI-SP design requirements and a conceptual workflow that supports deliberate practice.

If this is right

  • AI-SPs become viable for on-demand, scalable practice that supplements traditional standardized patient encounters.
  • Design efforts should prioritize instructional features such as structured feedback and goal alignment over perfect conversational realism.
  • Learner trust and engagement increase when AI-SPs address the identified needs for instructional support.
  • The conceptual workflow provides a practical guide for turning learner needs into deployable AI training tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same co-design approach could identify needs for AI simulators in adjacent fields such as nursing or pharmacy training.
  • Developers could test the workflow by building prototype AI-SPs and measuring real usage patterns in medical curricula.
  • Widespread adoption might lower overall costs of communication skills training by reducing reliance on human standardized patients.

Load-bearing premise

The six learner-centered needs from interviews with 12 students and three workshops will generalize to other medical learners and directly produce effective AI implementations.

What would settle it

A larger controlled study that implements AI-SPs with the proposed usability features versus versions focused only on dialogue realism and measures differences in learner trust, engagement, and skill improvement.

Figures

Figures reproduced from arXiv: 2602.05856 by Benyou Wang, Bingquan Zhang, Dongyijie Primo Pan, Guo Zhu, Haoming Tang, Huarui Luo, Jiahuan Pei, Jie Li, Zhiqi Gao.

Figure 1
Figure 1. Figure 1: Conceptual workflow of the co-designed AI standardized patient (AI-SP) system. Learners (1) choose OSCE mode [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

Standardized patients (SPs) play a central role in clinical communication training but are costly, difficult to scale, and inconsistent. Large language model (LLM) based AI standardized patients (AI-SPs) promise flexible, on-demand practice, yet learners often report that they talk like a patient but feel different. We interviewed 12 clinical-year medical students and conducted three co-design workshops to examine how learners experience constraints of SP encounters and what they expect from AI-SPs. We identified six learner-centered needs, translated them into AI-SP design requirements, and synthesized a conceptual workflow. Our findings position AI-SPs as tools for deliberate practice and show that instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a qualitative co-design study with 12 clinical-year medical students, using semi-structured interviews and three workshops, to identify constraints in traditional standardized patient (SP) encounters and learner expectations for LLM-based AI standardized patients (AI-SPs). It extracts six learner-centered needs, translates them into design requirements, and synthesizes a conceptual workflow. The central claim is that AI-SPs function as tools for deliberate practice and that instructional usability (rather than conversational realism alone) drives learner trust, engagement, and educational value.

Significance. If the identified needs and workflow hold under further testing, the work offers a learner-centered foundation for designing scalable AI tools in clinical communication training, addressing cost and consistency limitations of human SPs. The emphasis on usability factors beyond realism provides actionable guidance for HCI researchers building educational AI systems, though the absence of any implemented prototype or outcome data limits immediate applicability.

major comments (2)
  1. [Abstract and Discussion] Abstract and Discussion: The assertion that 'instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value' is load-bearing for the central claim yet rests solely on expectations elicited from 12 students and workshops; no AI-SP prototype was built or evaluated with outcome measures, so the data cannot distinguish whether the proposed usability factors would produce the claimed effects when implemented.
  2. [Methods and Results] Methods and Results: The six learner-centered needs are presented as directly informing design requirements, but the manuscript provides no detail on the thematic analysis process, inter-rater reliability, or how needs were validated or saturated across the 12 interviews and three workshops; this weakens the evidential basis for generalizing the needs to broader populations or AI implementations.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly state the study's limitations regarding sample size and lack of prototype evaluation to better contextualize the strength of the claims.
  2. [Methods] Clarify the exact structure of the three co-design workshops (e.g., activities, participant roles, outputs) to improve reproducibility of the co-design process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We agree that the study is exploratory and qualitative in nature, and we will revise the manuscript to more precisely scope our claims, add methodological transparency, and explicitly acknowledge limitations regarding the lack of prototype implementation. These changes will strengthen the paper without altering its core contribution as a co-design study informing future AI-SP development.

read point-by-point responses
  1. Referee: [Abstract and Discussion] Abstract and Discussion: The assertion that 'instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value' is load-bearing for the central claim yet rests solely on expectations elicited from 12 students and workshops; no AI-SP prototype was built or evaluated with outcome measures, so the data cannot distinguish whether the proposed usability factors would produce the claimed effects when implemented.

    Authors: We agree that the study elicits learner expectations rather than measuring implemented effects. The central claim is presented as learners' reported priorities from the co-design process, not as validated outcomes from an AI-SP system. We will revise the abstract and discussion to explicitly qualify the claim as based on learner expectations for what would drive trust and engagement, add a limitations paragraph acknowledging the absence of prototype evaluation and outcome data, and frame the work as foundational for future empirical studies. revision: yes

  2. Referee: [Methods and Results] Methods and Results: The six learner-centered needs are presented as directly informing design requirements, but the manuscript provides no detail on the thematic analysis process, inter-rater reliability, or how needs were validated or saturated across the 12 interviews and three workshops; this weakens the evidential basis for generalizing the needs to broader populations or AI implementations.

    Authors: We will expand the Methods section to provide a detailed description of the reflexive thematic analysis process, including how the codebook was iteratively developed through team discussions, how saturation was assessed across interviews and workshops, and the collaborative validation steps used to refine the six needs. While traditional inter-rater reliability statistics are not standard for reflexive approaches, we will clarify our consensus-building procedures. We do not claim broad generalizability and will add language noting the sample scope. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative co-design study with no derivations or self-referential predictions

full rationale

The paper reports an empirical qualitative study consisting of interviews with 12 medical students and three co-design workshops. It extracts six learner-centered needs, translates them into design requirements, and synthesizes a conceptual workflow. No equations, fitted parameters, predictions, or mathematical derivations appear in the provided text. The central claims rest on direct participant data rather than any self-referential loop, self-citation chain, or renaming of prior results. The work is therefore self-contained as standard qualitative HCI research with no internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative HCI study with no mathematical derivations, free parameters, axioms, or invented entities; all content derives from participant input and researcher synthesis.

pith-pipeline@v0.9.0 · 5457 in / 988 out tokens · 18167 ms · 2026-05-16T06:52:36.233820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Daniel Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. Guidelines for human-ai interac- tion. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 2019

  2. [2]

    Introduction to the special issue on human-centred ai in health- care: Challenges appearing in the wild.ACM Transactions on Computer-Human Interaction, 30(2), 2023

    Tariq Osman Andersen, Francisco Nunes, Lauren Wilcox, Enrico Coiera, and Yvonne Rogers. Introduction to the special issue on human-centred ai in health- care: Challenges appearing in the wild.ACM Transactions on Computer-Human Interaction, 30(2), 2023

  3. [3]

    An overview of the uses of standardized patients for teaching and evaluating clinical skills.Academic Medicine, 68(6):443–451, 1993

    Howard S Barrows. An overview of the uses of standardized patients for teaching and evaluating clinical skills.Academic Medicine, 68(6):443–451, 1993

  4. [4]

    Creating virtual patients using robots and large language models: A preliminary study with medical students

    Anton Borg, Ioannis Parodis, and Gabriel Skantze. Creating virtual patients using robots and large language models: A preliminary study with medical students. InCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’24), pages 273–277, 2024

  5. [5]

    Using thematic analysis in psychology

    Virginia Braun and Victoria Clarke. Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2):77–101, 2006

  6. [6]

    Reflecting on reflexive thematic analysis

    Virginia Braun and Victoria Clarke. Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4):589–597, 2019

  7. [7]

    Language Models are Few-Shot Learners

    Tom B Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.arXiv preprint arXiv:2005.14165, 2020

  8. [8]

    SAGE, 2006

    Kathy Charmaz.Constructing Grounded Theory: A Practical Guide Through Qual- itative Analysis. SAGE, 2006

  9. [9]

    The use of simulated patients in medical education: Amee guide no 42.Medical Teacher, 31(6):477–486, 2009

    Jennifer A Cleland, Kazuya Abe, and Jan-Joost Rethans. The use of simulated patients in medical education: Amee guide no 42.Medical Teacher, 31(6):477–486, 2009

  10. [10]

    Computerized virtual patients in health professions education: a systematic review and meta-analysis.Academic Medicine, 85(10):1589–1602, 2010

    David A Cook, Patricia J Erwin, and Marc M Triola. Computerized virtual patients in health professions education: a systematic review and meta-analysis.Academic Medicine, 85(10):1589–1602, 2010

  11. [11]

    Virtual patients: a critical literature review and proposed next steps.Medical Education, 43(4):303–311, 2009

    David A Cook and Marc M Triola. Virtual patients: a critical literature review and proposed next steps.Medical Education, 43(4):303–311, 2009

  12. [12]

    Joseph Cross, Tarron Kayalackakom, Raymond E. Robinson, Andrea Vaughans, Roopa Sebastian, Ricardo Hood, Courtney Lewis, Sumanth Devaraju, Prasanna Honnavar, Sheetal Naik, Jillwin Joseph, Nikhilesh Anand, Abdalla Mohammed, Asjah Johnson, Eliran Cohen, Teniola Adeniji, Aisling Nnenna Nnaji, and Ju- lia Elizabeth George. Assessing chatgpt’s capability as a n...

  13. [13]

    The impact of simulation-based training in medical education: A review.Medicine, 103(27):e38813, 2024

    Chukwuebuka Elendu, Daniella C Amaechi, et al. The impact of simulation-based training in medical education: A review.Medicine, 103(27):e38813, 2024

  14. [14]

    Andrea Ferrario, Jana Sedlakova, and Manuel Trachsel. The role of humanization and robustness of large language models in conversational artificial intelligence for individuals with depression: A critical analysis.JMIR Mental Health, 11:e56569, 2024

  15. [15]

    Standardized patients in medical education: A review of the literature.Cureus, 15(7):e42027, 2023

    Olivia L Flanagan and Kathleen M Cummings. Standardized patients in medical education: A review of the literature.Cureus, 15(7):e42027, 2023

  16. [16]

    Medsimai: Simulation and formative feedback generation to enhance deliberate practice in medical education.arXiv preprint arXiv:2503.05793, 2025

    Yann Hicke, Jadon Geathers, Niroop Rajashekar, et al. Medsimai: Simulation and formative feedback generation to enhance deliberate practice in medical education.arXiv preprint arXiv:2503.05793, 2025

  17. [17]

    Friederike Holderried, Christian Stegemann-Philipps, Lea Herschbach, et al. A generative pretrained transformer (gpt)-powered chatbot as a simulated patient to practice history taking: prospective, mixed methods study.JMIR Medical Education, 10(1):e53961, 2024

  18. [18]

    Clinical empathy as emotional labor in the patient- physician relationship.Jama, 293(9):1100–1106, 2005

    Eric B Larson and Xin Yao. Clinical empathy as emotional labor in the patient- physician relationship.Jama, 293(9):1100–1106, 2005

  19. [19]

    Large language model-based virtual pa- tient systems for history-taking in medical education: Comprehensive systematic review.JMIR Medical Informatics, 14:e79039, 2026

    Dongliang Li and Syaheerah Lebai Lutfi. Large language model-based virtual pa- tient systems for history-taking in medical education: Comprehensive systematic review.JMIR Medical Informatics, 14:e79039, 2026

  20. [20]

    Margaretha Meerdink and Javaid Khan. Comparison of the use of manikins and simulated patients in a multidisciplinary in situ medical simulation program for healthcare professionals in the united kingdom.Journal of Educational Evaluation for Health Professions, 18:8, 2021

  21. [21]

    Simulated patient methodology in health professional education

    Debra Nestel and Margaret Bearman. Simulated patient methodology in health professional education. pages 1–10, 2014

  22. [22]

    Snowball sampling.SAGE research methods foundations, 2019

    Charlie Parker, Sam Scott, and Alistair Geddes. Snowball sampling.SAGE research methods foundations, 2019

  23. [23]

    The use of standardized patients to teach communication skills—a systematic review

    Tonya Rutherford-Hemming, Alaina Herrington, and Thye Peng Ngo. The use of standardized patients to teach communication skills—a systematic review. Simulation in Healthcare, 19(1S):S122–S128, 2024

  24. [24]

    Sanders and Pieter Jan Stappers

    Elizabeth B.-N. Sanders and Pieter Jan Stappers. Co-creation and the new land- scapes of design.CoDesign, 4(1):5–18, 2008

  25. [25]

    16 the paradox of teaching empathy in medical education

    Johanna Shapiro. 16 the paradox of teaching empathy in medical education. Empathy: From bench to bedside, page 275, 2011

  26. [26]

    Scaffolding empathy: Training counselors with simulated patients and utterance-level performance vi- sualizations

    Ian Steenstra, Farnaz Nouraei, and Timothy Bickmore. Scaffolding empathy: Training counselors with simulated patients and utterance-level performance vi- sualizations. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI ’25, Yokohama, Japan, 2025. Association for Computing Machinery

  27. [27]

    Manikins ver- sus simulated patients in emergency medicine training: a comparative analysis

    Janina Sterz, Nils Gutenberger, Maria Cristina Stefanescu, et al. Manikins ver- sus simulated patients in emergency medicine training: a comparative analysis. European Journal of Trauma and Emergency Surgery, 48(5):3793–3801, 2022

  28. [28]

    Towards conversational diagnostic ai.arXiv preprint arXiv:2401.05654,

    Tao Tu, Anil Palepu, Mike Schaekermann, et al. Towards conversational diag- nostic ai.arXiv preprint arXiv:2401.05654, 2024

  29. [29]

    Kiami, Leanne Chukoskie, and Eileen McGivney

    Xiuqi Tommy Zhu, Heidi Cheerman, Minxin Cheng, Sheri R. Kiami, Leanne Chukoskie, and Eileen McGivney. Designing VR simulation system for clinical communication training with LLMs-based embodied conversational agents. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’25, Yokohama, Japan, 2025. Association for Comput...