Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

Angela Chen; Canwen Wang; Catherine Bao; Haiyi Zhu; Holly Swartz; Robert E Kraut; Siwei Jin; Tongshuang Wu

arxiv: 2601.10970 · v2 · submitted 2026-01-16 · 💻 cs.CY · cs.HC

Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

Canwen Wang , Angela Chen , Catherine Bao , Siwei Jin , Holly Swartz , Tongshuang Wu , Robert E Kraut , Haiyi Zhu This is my paper

Pith reviewed 2026-05-16 14:13 UTC · model grok-4.3

classification 💻 cs.CY cs.HC

keywords multi-agent simulationcouples therapy trainingdemand-withdraw conflictstateful agentssense-plan-act architecturetherapist practicesimulation evaluation

0 comments

The pith

A stateful multi-agent simulation lets therapists practice couple conflict with consistent, theory-based responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-agent system that models couples therapy as a controlled dynamical process in which two client agents move through six stages of demand-withdraw conflict. The system uses a sense-plan-act loop: it senses the therapist's words, updates each agent's internal state according to psychotherapy theory and transcript patterns, and then generates verbal and emotional replies. In a study with 21 licensed U.S. therapists, participants identified the hidden state transitions more accurately and rated the simulation higher for realism and responsiveness than a prompt-only baseline. Traditional role-play lacks repeatability and precise control over emotional dynamics, so the stateful approach supplies repeatable practice that stays grounded in observed interaction sequences.

Core claim

The authors claim that representing therapy sessions as a multi-agent dynamical system with six evolving stages, updated via a sense-plan-act architecture that draws on psychotherapy theory and transcript analysis, produces responses that licensed therapists judge more realistic and that allow more accurate detection of state changes than non-stateful prompt baselines.

What carries the argument

The sense-plan-act architecture that detects therapist input, updates client-agent states across six demand-withdraw stages using theory and transcripts, and generates verbal plus emotional outputs.

If this is right

Therapists obtain repeatable, controllable practice sessions for recognizing and responding to evolving emotional states.
The closed-loop design supplies consistent feedback on how specific interventions shift the interaction.
State transitions become explicit and trackable, supporting deliberate practice on timing and phrasing.
Evaluation results indicate higher accuracy in identifying conflict-stage changes compared with prompt-based alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stage-and-update structure could be adapted to simulate other recurring conflict patterns by redefining the state rules from new transcript sets.
Pairing the simulation with automated logging of intervention effects might surface which therapist moves most reliably de-escalate demand-withdraw cycles.
Deploying the system across many training sessions could accumulate data on response patterns that are difficult to isolate in live supervision.

Load-bearing premise

The state updates drawn from psychotherapy theory and transcript analysis correctly capture how real couples react to therapist interventions in demand-withdraw patterns.

What would settle it

A side-by-side check of whether the six-stage transitions and agent replies produced by the simulation match the actual sequence of statements and emotional shifts observed in recorded real therapy sessions with the same conflict pattern.

Figures

Figures reproduced from arXiv: 2601.10970 by Angela Chen, Canwen Wang, Catherine Bao, Haiyi Zhu, Holly Swartz, Robert E Kraut, Siwei Jin, Tongshuang Wu.

**Figure 1.** Figure 1: System Overview. (1) Sense-Plan-Act Architecture: Detect inputs from therapist and couple agents, follow the designed stage controller rules to determine the interaction stage, and then generate responses appropriate to the stage output. (2) Interface overview with sample conversation from Escalation stage, showing the multimodal therapy session with agent responses (text, voice, and emotion indicator); se… view at source ↗

**Figure 2.** Figure 2: 3.2.2 Agent-to-Agent Interaction. Our interviews with expert therapists also suggest that the demand-withdraw pattern typically emerges during the problem-raising and escalation stages characterized by partner-to -partner interactions. Therefore, we explicitly simulate demand-withdraw conversations in those two stages by promoting agent-to-agent interactions. Our system constantly predicts who the next s… view at source ↗

**Figure 2.** Figure 2: Stage transition rules from the stage-based interaction controller (highlighted in yellow), along with example [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Interaction plot showing the realism of behavior [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Interface presented to participants (with agent role labels and stage indicators removed) [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

Couples therapy requires managing complex, evolving emotional dynamics between partners, but traditional training methods for therapists, like role-play, lack realism, consistency, and control. We present a multi-modal simulation that models therapy as a controlled, multi-agent dynamical system with structured interaction stages. Therapists practice with a pair of client-agents who go through six evolving stages that respond to therapist actions. This simulation enables practice with demand-withdraw conflict patterns in a closed-loop environment. The simulation uses a sense-plan-act architecture: it detects the therapist's input, updates agents' interaction states based on psychotherapy theory and transcript analysis, and generates realistic verbal and emotional responses. In an experiment with 21 licensed U.S. therapists, participants more accurately identified state transitions and rated the system as more realistic and responsive than a prompt-based baseline, demonstrating the value of stateful, interpretable simulation for therapist training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a six-stage stateful multi-agent simulator for couples therapy training that beats a prompt baseline in a 21-therapist study, but the state rules rest on theory and transcripts without a direct check against real couple response data.

read the letter

The core contribution is a closed-loop simulation where two client agents move through six evolving stages of demand-withdraw conflict. The system senses therapist input, updates states from psychotherapy theory plus transcript patterns, then produces verbal and emotional replies. In the study, 21 licensed therapists identified state changes more accurately and rated the output more realistic and responsive than a plain prompt baseline. That is the main new piece: a structured, interpretable dynamical system rather than one-shot prompting. The experiment supplies concrete evidence that the added structure helps users track the interaction better. The design choices around sense-plan-act and the six stages are laid out clearly enough to replicate the basic setup. The soft spot is the missing quantitative check on whether the state transitions actually match how real couples respond. The rules come from theory and the same transcripts used to build the model, but there is no held-out prediction test or comparison of simulated reply distributions against fresh observed data. Therapists are asked to follow the system's own labels, so the ratings do not yet show that the simulation would prepare them for unscripted sessions. Minor issues include the small sample and lack of detail on exact statistical tests in the abstract, but those are fixable. This work is aimed at researchers building AI tools for clinical training and at therapists who want controlled practice scenarios. It is worth sending to peer review because the evaluation is grounded in actual users and the architecture is reproducible; referees can push for the fidelity tests that would make the training claim stronger.

Referee Report

2 major / 2 minor

Summary. The paper presents a multi-agent simulation for couples therapy training that models demand-withdraw conflict via a six-stage state machine updated from psychotherapy theory and transcript analysis. Using a sense-plan-act architecture, client agents generate verbal and emotional responses in a closed-loop setting. A user study with 21 licensed U.S. therapists reports higher accuracy in identifying state transitions and better ratings for realism and responsiveness compared to a prompt-based baseline.

Significance. If the state transitions prove faithful to real couple dynamics, the work provides a valuable controlled environment for therapist training that improves on inconsistent role-play methods. The direct evidence from the 21-therapist study—showing measurable gains in state identification and realism ratings—is a clear strength and supports the value of stateful, interpretable multi-agent designs over purely prompt-driven baselines.

major comments (2)

[Experiment section] The central claim that the simulation offers realistic training value rests on the six-stage state machine accurately modeling real demand-withdraw responses. However, the evaluation only asks therapists to track the system's own labels; no held-out transcript prediction, inter-rater agreement on state assignments, or comparison of simulated versus observed response distributions is reported (Experiment section).
[Model architecture section] State updates are described as derived from psychotherapy theory and transcript analysis, yet no quantitative fidelity check (e.g., predictive accuracy on unseen data or expert validation of transition rules) is provided. This is load-bearing because therapists' improved identification scores could reflect internal consistency rather than external validity (Model architecture section).

minor comments (2)

[Abstract] The abstract omits exact performance metrics, statistical tests, and details on the prompt-based baseline implementation, making it difficult to assess the strength of the reported improvements.
[System design section] Notation for the sense-plan-act loop and state-transition functions could be clarified with a diagram or pseudocode to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying the role of the therapist evaluation while acknowledging limitations in external validation. Revisions have been made to improve transparency on these points.

read point-by-point responses

Referee: [Experiment section] The central claim that the simulation offers realistic training value rests on the six-stage state machine accurately modeling real demand-withdraw responses. However, the evaluation only asks therapists to track the system's own labels; no held-out transcript prediction, inter-rater agreement on state assignments, or comparison of simulated versus observed response distributions is reported (Experiment section).

Authors: We agree that the evaluation centers on therapists identifying the system's predefined state labels rather than independent prediction of real transcripts. This design choice prioritizes demonstrating training utility and interpretability in a controlled setting, where consistent state tracking is essential for skill practice. The 21-therapist study shows statistically higher identification accuracy and realism ratings versus the prompt baseline, providing evidence that the states support effective training. We acknowledge the absence of held-out transcript prediction or direct distribution comparisons as a limitation. In the revised manuscript, we have expanded the Experiment section with a dedicated limitations paragraph outlining these gaps and proposing future work on transcript-based validation. No inter-rater agreement on state assignments was computed, as states are system-defined for training consistency rather than derived from open-ended coding. revision: partial
Referee: [Model architecture section] State updates are described as derived from psychotherapy theory and transcript analysis, yet no quantitative fidelity check (e.g., predictive accuracy on unseen data or expert validation of transition rules) is provided. This is load-bearing because therapists' improved identification scores could reflect internal consistency rather than external validity (Model architecture section).

Authors: The transition rules were constructed from established demand-withdraw literature (e.g., Christensen & Heavey) combined with qualitative review of therapy transcripts to map observed verbal and emotional patterns to the six stages. We have revised the Model architecture section to include additional detail on this derivation process and the clinical expertise of the research team. We concur that a quantitative fidelity metric, such as predictive accuracy on unseen transcripts, is not reported and would strengthen claims of external validity. The therapist study's superior performance over the prompt-based baseline helps mitigate concerns of mere internal consistency, as the baseline shares the same generative model but lacks structured state updates. We have added explicit discussion of this distinction and noted the lack of formal predictive validation as a limitation for future research. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is independent of internal definitions

full rationale

The paper defines a six-stage state machine from external psychotherapy theory and transcript analysis, then evaluates the resulting simulation through an external experiment with 21 licensed therapists who rate realism, responsiveness, and state-transition identification accuracy against a prompt-based baseline. No parameters are fitted to the evaluation outcomes, no self-citation chain supports the core claims, and the reported results (human preference and identification accuracy) do not reduce by construction to the input rules or definitions. The derivation chain remains self-contained against external human benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The simulation rests on standard psychotherapy concepts applied to agents rather than new fitted parameters or invented entities.

axioms (2)

domain assumption Psychotherapy theory and transcript analysis provide a valid basis for defining interaction stages and state transitions.
Invoked to update agent states from therapist inputs in the sense-plan-act loop.
domain assumption The six stages accurately represent evolving demand-withdraw conflict dynamics.
Central to the structured interaction model described.

pith-pipeline@v0.9.0 · 5472 in / 1320 out tokens · 38301 ms · 2026-05-16T14:13:13.362799+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The simulation uses a sense-plan-act architecture: it detects the therapist's input, updates agents' interaction states based on psychotherapy theory and transcript analysis, and generates realistic verbal and emotional responses... six evolving stages

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works

2025. Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works. Subscription Database. Accessed on 2025-04-28. URL typically provided via institutional access. General product infor- mation: https://alexanderstreet.com/products/counseling-and-psychotherapy- transcripts-client-narratives-and-reference-works

work page 2025
[2]

Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently simulating human personas with multi-turn reinforcement learning.arXiv preprint arXiv:2511.00222(2025). Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

work page arXiv 2025
[3]

Jess K Alberts and Gillian Driscoll. 1992. Containment versus escalation: The trajectory of couples’ conversational complaints.Western Journal of Commu- nication (includes Communication Reports)56, 4 (1992), 394–412. doi:10.1080/ 10570319209374425

work page 1992
[4]

Elizabeth S Allen, David C Atkins, Donald H Baucom, Douglas K Snyder, Kristina Coop Gordon, and Shirley P Glass. 2005. Intrapersonal, interpersonal, and contextual factors in engaging in and responding to extramarital involve- ment.Clinical Psychology: Science and Practice12, 2 (2005), 101

work page 2005
[5]

Mohammad Almansoori, Komal Kumar, and Hisham Cholakkal. 2025. MedA- gentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interac- tions. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, Vol. LNCS 15968. Springer Nature Switzerland, 362–372

work page 2025
[6]

Mina Almasi and Ross Deans Kristensen-McLachlan. 2025. Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring. InProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, Vienna, Austria, 70–88. doi:10.18653/v1/2025.bea-1.6

work page doi:10.18653/v1/2025.bea-1.6 2025
[7]

Lotta G Andersson, Mark H Butler, and Ryan B Seedall. 2006. Couples’ experience of enactments and softening in marital therapy.The American Journal of Family Therapy34, 4 (2006), 301–315

work page 2006
[8]

Lisa A Benson, Meghan M McGinn, and Andrew Christensen. 2012. Common principles of couple therapy.Behavior therapy43, 1 (2012), 25–35. doi:10.1016/j. beth.2010.12.009

work page doi:10.1016/j 2012
[9]

Guanqun Bi, Zhuang Chen, Zhoufu Liu, Hongkai Wang, Xiyao Xiao, Yuqiang Xie, Wen Zhang, Yongkang Huang, Yuxuan Chen, Libiao Peng, and Minlie Huang

work page
[10]

In: Zong, C., Xia, F., Li, W., Navigli, R

MAGI: Multi-Agent Guided Interview for Psychiatric Assessment. In Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 24898–24921. doi:10.18653/v1/ 2025.findings-acl.1278

work page doi:10.18653/v1/ 2025
[11]

Mark H Butler and Brandt C Gardner. 2003. Adapting enactments to couple reactivity: Five developmental stages.Journal of Marital and Family therapy29, 3 (2003), 311–327

work page 2003
[12]

Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Caiming Xiong, Shiva Kumar Pentyala, and Chien-Sheng Wu. 2025. Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Au...

work page doi:10.18653/v1/2025.findings-acl.203 2025
[13]

Andrew Christensen and Christopher L. Heavey. 1990. Gender and Social Struc- ture in the Demand/Withdraw Pattern of Marital Conflict.Journal of Personality and Social Psychology59, 1 (1990), 73–81. doi:10.1037/0022-3514.59.1.73

work page doi:10.1037/0022-3514.59.1.73 1990
[14]

Wong, Elisa Hollenberg, and Wendy Levinson

David A. Cook, Patricia J. Erwin, and Marc M. Triola. 2010. Computerized Virtual Patients in Health Professions Education: A Systematic Review and Meta-Analysis.Academic Medicine85, 10 (2010), 1589–1602. doi:10.1097/ACM. 0b013e3181edfe13

work page doi:10.1097/acm 2010
[15]

Alexander O Crenshaw, Andrew Christensen, Donald H Baucom, Norman B Epstein, and Brian RW Baucom. 2017. Revised scoring and improved reliability for the Communication Patterns Questionnaire.Psychological assessment29, 7 (2017), 913

work page 2017
[16]

William J Doherty, Steven M Harris, and Kadija Mussa. 2024. Relationship undermining in couple therapy.Contemporary Family Therapy46, 3 (2024), 243–248

work page 2024
[17]

Brian D Doss, McKenzie K Roddy, Stephanie A Wiebe, and Susan M Johnson

work page
[18]

doi:10.1111/jmft.12552

A review of the research during 2010–2019 on evidence-based treatments for couple relationship distress.Journal of marital and family therapy48, 1 (2022), 283–306. doi:10.1111/jmft.12552

work page doi:10.1111/jmft.12552 2010
[19]

Brian D Doss, Lorelei E Simpson, and Andrew Christensen. 2004. Why do couples seek marital therapy?Professional psychology: Research and practice35, 6 (2004), 608

work page 2004
[20]

Kathleen A Eldridge, Andrew Christensen, et al. 2002. Demand-withdraw commu- nication during couple conflict: A review and analysis.Understanding marriage: Developments in the study of couple interaction(2002), 289–322

work page 2002
[21]

Daniel J Fischer and Brandi C Fink. 2014. Clinical processes in behavioral couples therapy.Psychotherapy51, 1 (2014), 11. doi:10.1037/a0033823

work page doi:10.1037/a0033823 2014
[22]

Ananya Ganesh, Martha Palmer, and Katharina Kann. 2023. A Survey of Chal- lenges and Methods in the Computational Modeling of Multi-Party Dialog. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023). Association for Computational Linguistics, Toronto, Canada, 140–154. doi:10.18653/v1/2023.nlp4convai-1.12

work page doi:10.18653/v1/2023.nlp4convai-1.12 2023
[23]

John M Gottman. 2017. The roles of conflict engagement, escalation, and avoid- ance in marital interaction: A longitudinal view of five types of couples. In Interpersonal development. Routledge, 359–368

work page 2017
[24]

Greenman and Susan M

Paul S. Greenman and Susan M. Johnson. 2013. Process Research on Emotionally Focused Therapy (EFT) for Couples: Linking Theory to Practice.Family Process 52, 1 (2013), 46–61. doi:10.1111/famp.12015

work page doi:10.1111/famp.12015 2013
[25]

2008.Clinical handbook of couple therapy

Alan S Gurman. 2008.Clinical handbook of couple therapy. Guilford Press

work page 2008
[26]

1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

W Halford and Howard J Markman. 1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

work page 1997
[27]

Patricia Huerta, Caitlin Edwards, Ronald Asiimwe, Morgan PettyJohn, Jennifer VanBoxel, Preston Morgan, and Andrea K Wittenborn. 2023. Exploratory anal- ysis of pursue-withdraw patterns, attachment, and gender among couples in emotionally focused therapy.The American Journal of Family Therapy51, 1 (2023), 57–75. doi:10.1080/01926187.2022.2129521

work page doi:10.1080/01926187.2022.2129521 2023
[28]

Johnson and Paul S

Susan M. Johnson and Paul S. Greenman. 2006. The Path to a Secure Bond: Emotionally Focused Couple Therapy.Journal of Clinical Psychology62, 5 (2006), 597–609. doi:10.1002/jclp.20251

work page doi:10.1002/jclp.20251 2006
[29]

Ronald C Kessler, Katherine A McGonagle, Shanyang Zhao, Christopher B Nel- son, Michael Hughes, Suzann Eshleman, Hans-Ulrich Wittchen, and Kenneth S Kendler. 1994. Lifetime and 12-month prevalence of DSM-III-R psychiatric disor- ders in the United States: results from the National Comorbidity Survey.Archives of general psychiatry51, 1 (1994), 8–19

work page 1994
[30]

Claire Lane and Stephen Rollnick. 2007. The use of simulated patients and role- play in communication skills training: a review of the literature to August 2005. Patient education and counseling67, 1-2 (2007), 13–20

work page 2007
[31]

Zhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, YuQian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, and Deyi Xiong. 2025. ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents. InProceedings of the 63rd Annual Meeting of the Association for Computa...

work page doi:10.18653/v1/2025.acl-long.863 2025
[32]

Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator.arXiv preprint arXiv:2403.08495(2024). https://arxiv.org/abs/2403.08495

work page arXiv 2024
[33]

Yifan Liu, Wei Wei, Jiayi Liu, Xianling Mao, Rui Fang, and Dangyang Chen. 2022. Improving personality consistency in conversation by persona extending. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1350–1359

work page 2022
[34]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles.arXiv preprint arXiv:2407.00870 (2024)

work page arXiv 2024
[35]

Dongxu Lu, Johan Jeuring, and Albert Gatt. 2025. Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues. InProceedings of the 18th International Natural Language Generation Conference. 20–40

work page 2025
[36]

Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth

Pedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth. 2026. Persistent Personas? Role-Playing, Instruc- tion Following, and Safety in Extended Interactions. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguis- tics (Volume 1: Long Papers). Association ...

work page doi:10.18653/v1/2026.eacl-long.246 2026
[37]

Andrew Maxim, Roshan Venkatakrishnan, and Benjamin Lok. 2025. Perceived Realism and Voice Naturalness of Virtual Humans: Structural Equation Modeling of Behavioral Intentions among Black American Adults.ACM Transactions on Applied Perception22, 4 (2025), 1–16

work page 2025
[38]

Meghan M McGinn, Pamela T McFarland, and Andrew Christensen. 2009. An- tecedents and consequences of demand/withdraw.Journal of Family Psychology 23, 5 (2009), 749

work page 2009
[39]

Richard B Miller, Jeremy B Yorgason, Jonathan G Sandberg, and Mark B White

work page
[40]

The American Journal of Family Therapy31, 5 (2003), 395–407

Problems that couples bring to therapy: A view across the family life cycle. The American Journal of Family Therapy31, 5 (2003), 395–407

work page 2003
[41]

Rajeswari Natrajan-Tyagi, Nicole Sabatini Gutierrez, Marlene Elizalde, Vivian Mai Phan, Jacquelyn F Christensen, Linda Crossley, Kimberly Drollinger, Maya Faulk, Rebeca Garcia, and Crystal Meng. 2016. Using a constant family in role-plays for training MFTs.The American Journal of Family Therapy44, 5 (2016), 221–233

work page 2016
[42]

Papp, Chrystyna D

Lauren M. Papp, Chrystyna D. Kouros, and E. Mark Cummings. 2009. Demand- Withdraw Patterns in Marital Conflict in the Home.Personal Relationships16, 2 (2009), 285–300. doi:10.1111/j.1475-6811.2009.01223.x

work page doi:10.1111/j.1475-6811.2009.01223.x 2009
[43]

Lauren M Papp, Chrystyna D Kouros, and E Mark Cummings. 2009. Demand- withdraw patterns in marital conflict in the home.Personal Relationships16, 2 (2009), 285–300

work page 2009
[44]

Marie-Aude Piot, Agnès Dechartres, Chris Attoe, Fabrice Jollant, Cédric Lemogne, Carine Layat Burn, Jan-Joost Rethans, Daphne Michelet, Sean Cross, Gregoire Billon, et al. 2020. Simulation in psychiatry for medical doctors: a systematic review and meta-analysis.Medical education54, 8 (2020), 696–708

work page 2020
[45]

Fredric E Rabinowitz. 1997. Teaching counseling through a semester-long role play.Counselor Education and Supervision36, 3 (1997), 216–223

work page 1997
[46]

Karen H Rosen, Jennifer L Matheson, Sandra M Stith, Eric E McCollum, and Lisa D Locke. 2003. Negotiated time-out: a de-escalation tool for couples.Journal of Marital and Family Therapy29, 3 (2003), 291–298

work page 2003
[47]

Sujin Shin, Jin-Hwa Park, and Jung-Hee Kim. 2015. Effectiveness of patient simulation in nursing education: meta-analysis.Nurse education today35, 1 (2015), 176–182

work page 2015
[48]

W Matthew Shurts, Craig S Cashwell, Shawn L Spurgeon, Suzanne Degges-White, Casey A Barrio, and Kerrie N Kardatzke. 2006. Preparing counselors-in-training Wang et al. to work with couples: Using role-plays and reflecting teams.The Family Journal 14, 2 (2006), 151–157

work page 2006
[49]

Eric Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, and Jason Weston. 2022. Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents. InProceedings of the 4th Workshop on NLP for Conversational AI. 77–97

work page 2022
[50]

Douglas K Snyder, Angela M Castellani, and Mark A Whisman. 2006. Current status and future directions in couple therapy.Annu. Rev. Psychol.57, 1 (2006), 317–344. doi:10.1146/annurev.psych.56.091103.070154

work page doi:10.1146/annurev.psych.56.091103.070154 2006
[51]

Ian Steenstra, Farnaz Nouraei, and Timothy Bickmore. 2025. Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 593, 22 pages. doi:10.1145/3706598.3714014

work page doi:10.1145/3706598.3714014 2025
[52]

Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V Stolyar, Katelyn Polanska, Karleigh R McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, et al. 2024. A framework for human evaluation of large language models in healthcare derived from literature review.NPJ digital medicine7, 1 (2024), 258

work page 2024
[53]

Ruiyi Wang, Stephanie Milani, Jamie C Chiu, Jiayin Zhi, Shaun M Eack, Travis Labrum, Samuel M Murphy, Nev Jones, Kate Hardy, Hong Shen, et al . 2024. Patient-{\Psi}: Using large language models to simulate patients for training mental health professionals.arXiv preprint arXiv:2405.19660(2024)

work page arXiv 2024
[54]

Mark A Whisman, Amy E Dixon, and Benjamin Johnson. 1997. Therapists’ perspectives of couple problems and treatment issues in couple therapy.Journal of family psychology11, 3 (1997), 361

work page 1997
[55]

Scott R Woolley, Karen S Wampler, and Sean D Davis. 2012. Enactments in couple therapy: Identifying therapist interventions associated with positive change.Journal of Family Therapy34, 3 (2012), 284–305

work page 2012
[56]

Yizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, and Ee-Peng Lim. 2025. Consistent Client Simulation for Motivational Interviewing- based Counseling. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long P...

work page doi:10.18653/v1/2025.acl-long.1021 2025
[57]

Williams

Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP- based Statistical Spoken Dialogue Systems: A Review.Proc. IEEE101, 5 (2013), 1160–1179. doi:10.1109/JPROC.2012.2225812

work page doi:10.1109/jproc.2012.2225812 2013
[58]

Benjamin Zendejas, Ryan Brydges, Amy T Wang, and David A Cook. 2013. Patient outcomes in simulation-based medical education: a systematic review. Journal of general internal medicine28, 8 (2013), 1078–1089

work page 2013
[59]

how are you

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623. Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Pr...

work page 2023
[60]

Only greetings -> Greeting

work page
[61]

Introducing issues -> Problem Raising

work page
[62]

Ongoing anger/blame/defensiveness -> Escalation

work page
[63]

Therapist calming, no vulnerability -> De-Escalation

work page
[64]

Vulnerable emotions expressed -> Enactment

work page
[65]

demander

Session closing -> Wrap-up C.2 Rule-Based Stage Transition Constraints Table 3: Rule-Based Stage Transition Constraints Condition Result Rationale Turn≤5 No Escalation allowed yet Ensures trainees have sufficient context about the couple’s issues Turn 7 AND no prior Escalation AND stage = Problem Raising Force Escalation Guarantees conflict exposure for a...

work page
[66]

therapist

Therapist sends a message and is not being ignored -> "therapist"

work page
[67]

Alex," or

Therapist directly addresses one patient by name (e.g., "Alex," or "Jordan.") -> that patient

work page
[68]

Therapist message is not directed at anyone -> "both"

work page
[69]

you" referring to Jordan's actions ->

Alex says "you" referring to Jordan's actions -> "Jordan"

work page
[70]

you" referring to Alex's actions ->

Jordan says "you" referring to Alex's actions -> "Alex"

work page
[71]

Alex speaks directly to Jordan -> "Jordan"

work page
[72]

Jordan speaks directly to Alex -> "Alex"

work page
[73]

therapist

Alex or Jordan speak without addressing the other -> "therapist" Constraint:The therapist never replies to themselves. If the last message is from the therapist, only "Alex", "Jordan", or "both" are valid. D.4 Text-to-speech (TTS) Prompts forAlexandJordan Alex (Demander) Neutral: Serious, subdued tone; gentle sadness, slight heaviness or sigh at sentence ...

work page 2007

[1] [1]

Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works

2025. Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works. Subscription Database. Accessed on 2025-04-28. URL typically provided via institutional access. General product infor- mation: https://alexanderstreet.com/products/counseling-and-psychotherapy- transcripts-client-narratives-and-reference-works

work page 2025

[2] [2]

Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently simulating human personas with multi-turn reinforcement learning.arXiv preprint arXiv:2511.00222(2025). Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

work page arXiv 2025

[3] [3]

Jess K Alberts and Gillian Driscoll. 1992. Containment versus escalation: The trajectory of couples’ conversational complaints.Western Journal of Commu- nication (includes Communication Reports)56, 4 (1992), 394–412. doi:10.1080/ 10570319209374425

work page 1992

[4] [4]

Elizabeth S Allen, David C Atkins, Donald H Baucom, Douglas K Snyder, Kristina Coop Gordon, and Shirley P Glass. 2005. Intrapersonal, interpersonal, and contextual factors in engaging in and responding to extramarital involve- ment.Clinical Psychology: Science and Practice12, 2 (2005), 101

work page 2005

[5] [5]

Mohammad Almansoori, Komal Kumar, and Hisham Cholakkal. 2025. MedA- gentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interac- tions. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, Vol. LNCS 15968. Springer Nature Switzerland, 362–372

work page 2025

[6] [6]

Mina Almasi and Ross Deans Kristensen-McLachlan. 2025. Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring. InProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, Vienna, Austria, 70–88. doi:10.18653/v1/2025.bea-1.6

work page doi:10.18653/v1/2025.bea-1.6 2025

[7] [7]

Lotta G Andersson, Mark H Butler, and Ryan B Seedall. 2006. Couples’ experience of enactments and softening in marital therapy.The American Journal of Family Therapy34, 4 (2006), 301–315

work page 2006

[8] [8]

Lisa A Benson, Meghan M McGinn, and Andrew Christensen. 2012. Common principles of couple therapy.Behavior therapy43, 1 (2012), 25–35. doi:10.1016/j. beth.2010.12.009

work page doi:10.1016/j 2012

[9] [9]

Guanqun Bi, Zhuang Chen, Zhoufu Liu, Hongkai Wang, Xiyao Xiao, Yuqiang Xie, Wen Zhang, Yongkang Huang, Yuxuan Chen, Libiao Peng, and Minlie Huang

work page

[10] [10]

In: Zong, C., Xia, F., Li, W., Navigli, R

MAGI: Multi-Agent Guided Interview for Psychiatric Assessment. In Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 24898–24921. doi:10.18653/v1/ 2025.findings-acl.1278

work page doi:10.18653/v1/ 2025

[11] [11]

Mark H Butler and Brandt C Gardner. 2003. Adapting enactments to couple reactivity: Five developmental stages.Journal of Marital and Family therapy29, 3 (2003), 311–327

work page 2003

[12] [12]

Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Caiming Xiong, Shiva Kumar Pentyala, and Chien-Sheng Wu. 2025. Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Au...

work page doi:10.18653/v1/2025.findings-acl.203 2025

[13] [13]

Andrew Christensen and Christopher L. Heavey. 1990. Gender and Social Struc- ture in the Demand/Withdraw Pattern of Marital Conflict.Journal of Personality and Social Psychology59, 1 (1990), 73–81. doi:10.1037/0022-3514.59.1.73

work page doi:10.1037/0022-3514.59.1.73 1990

[14] [14]

Wong, Elisa Hollenberg, and Wendy Levinson

David A. Cook, Patricia J. Erwin, and Marc M. Triola. 2010. Computerized Virtual Patients in Health Professions Education: A Systematic Review and Meta-Analysis.Academic Medicine85, 10 (2010), 1589–1602. doi:10.1097/ACM. 0b013e3181edfe13

work page doi:10.1097/acm 2010

[15] [15]

Alexander O Crenshaw, Andrew Christensen, Donald H Baucom, Norman B Epstein, and Brian RW Baucom. 2017. Revised scoring and improved reliability for the Communication Patterns Questionnaire.Psychological assessment29, 7 (2017), 913

work page 2017

[16] [16]

William J Doherty, Steven M Harris, and Kadija Mussa. 2024. Relationship undermining in couple therapy.Contemporary Family Therapy46, 3 (2024), 243–248

work page 2024

[17] [17]

Brian D Doss, McKenzie K Roddy, Stephanie A Wiebe, and Susan M Johnson

work page

[18] [18]

doi:10.1111/jmft.12552

A review of the research during 2010–2019 on evidence-based treatments for couple relationship distress.Journal of marital and family therapy48, 1 (2022), 283–306. doi:10.1111/jmft.12552

work page doi:10.1111/jmft.12552 2010

[19] [19]

Brian D Doss, Lorelei E Simpson, and Andrew Christensen. 2004. Why do couples seek marital therapy?Professional psychology: Research and practice35, 6 (2004), 608

work page 2004

[20] [20]

Kathleen A Eldridge, Andrew Christensen, et al. 2002. Demand-withdraw commu- nication during couple conflict: A review and analysis.Understanding marriage: Developments in the study of couple interaction(2002), 289–322

work page 2002

[21] [21]

Daniel J Fischer and Brandi C Fink. 2014. Clinical processes in behavioral couples therapy.Psychotherapy51, 1 (2014), 11. doi:10.1037/a0033823

work page doi:10.1037/a0033823 2014

[22] [22]

Ananya Ganesh, Martha Palmer, and Katharina Kann. 2023. A Survey of Chal- lenges and Methods in the Computational Modeling of Multi-Party Dialog. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023). Association for Computational Linguistics, Toronto, Canada, 140–154. doi:10.18653/v1/2023.nlp4convai-1.12

work page doi:10.18653/v1/2023.nlp4convai-1.12 2023

[23] [23]

John M Gottman. 2017. The roles of conflict engagement, escalation, and avoid- ance in marital interaction: A longitudinal view of five types of couples. In Interpersonal development. Routledge, 359–368

work page 2017

[24] [24]

Greenman and Susan M

Paul S. Greenman and Susan M. Johnson. 2013. Process Research on Emotionally Focused Therapy (EFT) for Couples: Linking Theory to Practice.Family Process 52, 1 (2013), 46–61. doi:10.1111/famp.12015

work page doi:10.1111/famp.12015 2013

[25] [25]

2008.Clinical handbook of couple therapy

Alan S Gurman. 2008.Clinical handbook of couple therapy. Guilford Press

work page 2008

[26] [26]

1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

W Halford and Howard J Markman. 1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

work page 1997

[27] [27]

Patricia Huerta, Caitlin Edwards, Ronald Asiimwe, Morgan PettyJohn, Jennifer VanBoxel, Preston Morgan, and Andrea K Wittenborn. 2023. Exploratory anal- ysis of pursue-withdraw patterns, attachment, and gender among couples in emotionally focused therapy.The American Journal of Family Therapy51, 1 (2023), 57–75. doi:10.1080/01926187.2022.2129521

work page doi:10.1080/01926187.2022.2129521 2023

[28] [28]

Johnson and Paul S

Susan M. Johnson and Paul S. Greenman. 2006. The Path to a Secure Bond: Emotionally Focused Couple Therapy.Journal of Clinical Psychology62, 5 (2006), 597–609. doi:10.1002/jclp.20251

work page doi:10.1002/jclp.20251 2006

[29] [29]

Ronald C Kessler, Katherine A McGonagle, Shanyang Zhao, Christopher B Nel- son, Michael Hughes, Suzann Eshleman, Hans-Ulrich Wittchen, and Kenneth S Kendler. 1994. Lifetime and 12-month prevalence of DSM-III-R psychiatric disor- ders in the United States: results from the National Comorbidity Survey.Archives of general psychiatry51, 1 (1994), 8–19

work page 1994

[30] [30]

Claire Lane and Stephen Rollnick. 2007. The use of simulated patients and role- play in communication skills training: a review of the literature to August 2005. Patient education and counseling67, 1-2 (2007), 13–20

work page 2007

[31] [31]

Zhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, YuQian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, and Deyi Xiong. 2025. ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents. InProceedings of the 63rd Annual Meeting of the Association for Computa...

work page doi:10.18653/v1/2025.acl-long.863 2025

[32] [32]

Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator.arXiv preprint arXiv:2403.08495(2024). https://arxiv.org/abs/2403.08495

work page arXiv 2024

[33] [33]

Yifan Liu, Wei Wei, Jiayi Liu, Xianling Mao, Rui Fang, and Dangyang Chen. 2022. Improving personality consistency in conversation by persona extending. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1350–1359

work page 2022

[34] [34]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles.arXiv preprint arXiv:2407.00870 (2024)

work page arXiv 2024

[35] [35]

Dongxu Lu, Johan Jeuring, and Albert Gatt. 2025. Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues. InProceedings of the 18th International Natural Language Generation Conference. 20–40

work page 2025

[36] [36]

Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth

Pedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth. 2026. Persistent Personas? Role-Playing, Instruc- tion Following, and Safety in Extended Interactions. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguis- tics (Volume 1: Long Papers). Association ...

work page doi:10.18653/v1/2026.eacl-long.246 2026

[37] [37]

Andrew Maxim, Roshan Venkatakrishnan, and Benjamin Lok. 2025. Perceived Realism and Voice Naturalness of Virtual Humans: Structural Equation Modeling of Behavioral Intentions among Black American Adults.ACM Transactions on Applied Perception22, 4 (2025), 1–16

work page 2025

[38] [38]

Meghan M McGinn, Pamela T McFarland, and Andrew Christensen. 2009. An- tecedents and consequences of demand/withdraw.Journal of Family Psychology 23, 5 (2009), 749

work page 2009

[39] [39]

Richard B Miller, Jeremy B Yorgason, Jonathan G Sandberg, and Mark B White

work page

[40] [40]

The American Journal of Family Therapy31, 5 (2003), 395–407

Problems that couples bring to therapy: A view across the family life cycle. The American Journal of Family Therapy31, 5 (2003), 395–407

work page 2003

[41] [41]

Rajeswari Natrajan-Tyagi, Nicole Sabatini Gutierrez, Marlene Elizalde, Vivian Mai Phan, Jacquelyn F Christensen, Linda Crossley, Kimberly Drollinger, Maya Faulk, Rebeca Garcia, and Crystal Meng. 2016. Using a constant family in role-plays for training MFTs.The American Journal of Family Therapy44, 5 (2016), 221–233

work page 2016

[42] [42]

Papp, Chrystyna D

Lauren M. Papp, Chrystyna D. Kouros, and E. Mark Cummings. 2009. Demand- Withdraw Patterns in Marital Conflict in the Home.Personal Relationships16, 2 (2009), 285–300. doi:10.1111/j.1475-6811.2009.01223.x

work page doi:10.1111/j.1475-6811.2009.01223.x 2009

[43] [43]

Lauren M Papp, Chrystyna D Kouros, and E Mark Cummings. 2009. Demand- withdraw patterns in marital conflict in the home.Personal Relationships16, 2 (2009), 285–300

work page 2009

[44] [44]

Marie-Aude Piot, Agnès Dechartres, Chris Attoe, Fabrice Jollant, Cédric Lemogne, Carine Layat Burn, Jan-Joost Rethans, Daphne Michelet, Sean Cross, Gregoire Billon, et al. 2020. Simulation in psychiatry for medical doctors: a systematic review and meta-analysis.Medical education54, 8 (2020), 696–708

work page 2020

[45] [45]

Fredric E Rabinowitz. 1997. Teaching counseling through a semester-long role play.Counselor Education and Supervision36, 3 (1997), 216–223

work page 1997

[46] [46]

Karen H Rosen, Jennifer L Matheson, Sandra M Stith, Eric E McCollum, and Lisa D Locke. 2003. Negotiated time-out: a de-escalation tool for couples.Journal of Marital and Family Therapy29, 3 (2003), 291–298

work page 2003

[47] [47]

Sujin Shin, Jin-Hwa Park, and Jung-Hee Kim. 2015. Effectiveness of patient simulation in nursing education: meta-analysis.Nurse education today35, 1 (2015), 176–182

work page 2015

[48] [48]

W Matthew Shurts, Craig S Cashwell, Shawn L Spurgeon, Suzanne Degges-White, Casey A Barrio, and Kerrie N Kardatzke. 2006. Preparing counselors-in-training Wang et al. to work with couples: Using role-plays and reflecting teams.The Family Journal 14, 2 (2006), 151–157

work page 2006

[49] [49]

Eric Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, and Jason Weston. 2022. Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents. InProceedings of the 4th Workshop on NLP for Conversational AI. 77–97

work page 2022

[50] [50]

Douglas K Snyder, Angela M Castellani, and Mark A Whisman. 2006. Current status and future directions in couple therapy.Annu. Rev. Psychol.57, 1 (2006), 317–344. doi:10.1146/annurev.psych.56.091103.070154

work page doi:10.1146/annurev.psych.56.091103.070154 2006

[51] [51]

Ian Steenstra, Farnaz Nouraei, and Timothy Bickmore. 2025. Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 593, 22 pages. doi:10.1145/3706598.3714014

work page doi:10.1145/3706598.3714014 2025

[52] [52]

Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V Stolyar, Katelyn Polanska, Karleigh R McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, et al. 2024. A framework for human evaluation of large language models in healthcare derived from literature review.NPJ digital medicine7, 1 (2024), 258

work page 2024

[53] [53]

Ruiyi Wang, Stephanie Milani, Jamie C Chiu, Jiayin Zhi, Shaun M Eack, Travis Labrum, Samuel M Murphy, Nev Jones, Kate Hardy, Hong Shen, et al . 2024. Patient-{\Psi}: Using large language models to simulate patients for training mental health professionals.arXiv preprint arXiv:2405.19660(2024)

work page arXiv 2024

[54] [54]

Mark A Whisman, Amy E Dixon, and Benjamin Johnson. 1997. Therapists’ perspectives of couple problems and treatment issues in couple therapy.Journal of family psychology11, 3 (1997), 361

work page 1997

[55] [55]

Scott R Woolley, Karen S Wampler, and Sean D Davis. 2012. Enactments in couple therapy: Identifying therapist interventions associated with positive change.Journal of Family Therapy34, 3 (2012), 284–305

work page 2012

[56] [56]

Yizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, and Ee-Peng Lim. 2025. Consistent Client Simulation for Motivational Interviewing- based Counseling. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long P...

work page doi:10.18653/v1/2025.acl-long.1021 2025

[57] [57]

Williams

Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP- based Statistical Spoken Dialogue Systems: A Review.Proc. IEEE101, 5 (2013), 1160–1179. doi:10.1109/JPROC.2012.2225812

work page doi:10.1109/jproc.2012.2225812 2013

[58] [58]

Benjamin Zendejas, Ryan Brydges, Amy T Wang, and David A Cook. 2013. Patient outcomes in simulation-based medical education: a systematic review. Journal of general internal medicine28, 8 (2013), 1078–1089

work page 2013

[59] [59]

how are you

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623. Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Pr...

work page 2023

[60] [60]

Only greetings -> Greeting

work page

[61] [61]

Introducing issues -> Problem Raising

work page

[62] [62]

Ongoing anger/blame/defensiveness -> Escalation

work page

[63] [63]

Therapist calming, no vulnerability -> De-Escalation

work page

[64] [64]

Vulnerable emotions expressed -> Enactment

work page

[65] [65]

demander

Session closing -> Wrap-up C.2 Rule-Based Stage Transition Constraints Table 3: Rule-Based Stage Transition Constraints Condition Result Rationale Turn≤5 No Escalation allowed yet Ensures trainees have sufficient context about the couple’s issues Turn 7 AND no prior Escalation AND stage = Problem Raising Force Escalation Guarantees conflict exposure for a...

work page

[66] [66]

therapist

Therapist sends a message and is not being ignored -> "therapist"

work page

[67] [67]

Alex," or

Therapist directly addresses one patient by name (e.g., "Alex," or "Jordan.") -> that patient

work page

[68] [68]

Therapist message is not directed at anyone -> "both"

work page

[69] [69]

you" referring to Jordan's actions ->

Alex says "you" referring to Jordan's actions -> "Jordan"

work page

[70] [70]

you" referring to Alex's actions ->

Jordan says "you" referring to Alex's actions -> "Alex"

work page

[71] [71]

Alex speaks directly to Jordan -> "Jordan"

work page

[72] [72]

Jordan speaks directly to Alex -> "Alex"

work page

[73] [73]

therapist

Alex or Jordan speak without addressing the other -> "therapist" Constraint:The therapist never replies to themselves. If the last message is from the therapist, only "Alex", "Jordan", or "both" are valid. D.4 Text-to-speech (TTS) Prompts forAlexandJordan Alex (Demander) Neutral: Serious, subdued tone; gentle sadness, slight heaviness or sigh at sentence ...

work page 2007