pith. sign in

arxiv: 2601.10970 · v2 · submitted 2026-01-16 · 💻 cs.CY · cs.HC

Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

Pith reviewed 2026-05-16 14:13 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords multi-agent simulationcouples therapy trainingdemand-withdraw conflictstateful agentssense-plan-act architecturetherapist practicesimulation evaluation
0
0 comments X

The pith

A stateful multi-agent simulation lets therapists practice couple conflict with consistent, theory-based responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-agent system that models couples therapy as a controlled dynamical process in which two client agents move through six stages of demand-withdraw conflict. The system uses a sense-plan-act loop: it senses the therapist's words, updates each agent's internal state according to psychotherapy theory and transcript patterns, and then generates verbal and emotional replies. In a study with 21 licensed U.S. therapists, participants identified the hidden state transitions more accurately and rated the simulation higher for realism and responsiveness than a prompt-only baseline. Traditional role-play lacks repeatability and precise control over emotional dynamics, so the stateful approach supplies repeatable practice that stays grounded in observed interaction sequences.

Core claim

The authors claim that representing therapy sessions as a multi-agent dynamical system with six evolving stages, updated via a sense-plan-act architecture that draws on psychotherapy theory and transcript analysis, produces responses that licensed therapists judge more realistic and that allow more accurate detection of state changes than non-stateful prompt baselines.

What carries the argument

The sense-plan-act architecture that detects therapist input, updates client-agent states across six demand-withdraw stages using theory and transcripts, and generates verbal plus emotional outputs.

If this is right

  • Therapists obtain repeatable, controllable practice sessions for recognizing and responding to evolving emotional states.
  • The closed-loop design supplies consistent feedback on how specific interventions shift the interaction.
  • State transitions become explicit and trackable, supporting deliberate practice on timing and phrasing.
  • Evaluation results indicate higher accuracy in identifying conflict-stage changes compared with prompt-based alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stage-and-update structure could be adapted to simulate other recurring conflict patterns by redefining the state rules from new transcript sets.
  • Pairing the simulation with automated logging of intervention effects might surface which therapist moves most reliably de-escalate demand-withdraw cycles.
  • Deploying the system across many training sessions could accumulate data on response patterns that are difficult to isolate in live supervision.

Load-bearing premise

The state updates drawn from psychotherapy theory and transcript analysis correctly capture how real couples react to therapist interventions in demand-withdraw patterns.

What would settle it

A side-by-side check of whether the six-stage transitions and agent replies produced by the simulation match the actual sequence of statements and emotional shifts observed in recorded real therapy sessions with the same conflict pattern.

Figures

Figures reproduced from arXiv: 2601.10970 by Angela Chen, Canwen Wang, Catherine Bao, Haiyi Zhu, Holly Swartz, Robert E Kraut, Siwei Jin, Tongshuang Wu.

Figure 1
Figure 1. Figure 1: System Overview. (1) Sense-Plan-Act Architecture: Detect inputs from therapist and couple agents, follow the designed stage controller rules to determine the interaction stage, and then generate responses appropriate to the stage output. (2) Interface overview with sample conversation from Escalation stage, showing the multimodal therapy session with agent responses (text, voice, and emotion indicator); se… view at source ↗
Figure 2
Figure 2. Figure 2: 3.2.2 Agent-to-Agent Interaction. Our interviews with expert ther￾apists also suggest that the demand-withdraw pattern typically emerges during the problem-raising and escalation stages charac￾terized by partner-to -partner interactions. Therefore, we explicitly simulate demand-withdraw conversations in those two stages by promoting agent-to-agent interactions. Our system constantly predicts who the next s… view at source ↗
Figure 2
Figure 2. Figure 2: Stage transition rules from the stage-based interaction controller (highlighted in yellow), along with example [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Interaction plot showing the realism of behavior [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Interface presented to participants (with agent role labels and stage indicators removed) [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

Couples therapy requires managing complex, evolving emotional dynamics between partners, but traditional training methods for therapists, like role-play, lack realism, consistency, and control. We present a multi-modal simulation that models therapy as a controlled, multi-agent dynamical system with structured interaction stages. Therapists practice with a pair of client-agents who go through six evolving stages that respond to therapist actions. This simulation enables practice with demand-withdraw conflict patterns in a closed-loop environment. The simulation uses a sense-plan-act architecture: it detects the therapist's input, updates agents' interaction states based on psychotherapy theory and transcript analysis, and generates realistic verbal and emotional responses. In an experiment with 21 licensed U.S. therapists, participants more accurately identified state transitions and rated the system as more realistic and responsive than a prompt-based baseline, demonstrating the value of stateful, interpretable simulation for therapist training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a multi-agent simulation for couples therapy training that models demand-withdraw conflict via a six-stage state machine updated from psychotherapy theory and transcript analysis. Using a sense-plan-act architecture, client agents generate verbal and emotional responses in a closed-loop setting. A user study with 21 licensed U.S. therapists reports higher accuracy in identifying state transitions and better ratings for realism and responsiveness compared to a prompt-based baseline.

Significance. If the state transitions prove faithful to real couple dynamics, the work provides a valuable controlled environment for therapist training that improves on inconsistent role-play methods. The direct evidence from the 21-therapist study—showing measurable gains in state identification and realism ratings—is a clear strength and supports the value of stateful, interpretable multi-agent designs over purely prompt-driven baselines.

major comments (2)
  1. [Experiment section] The central claim that the simulation offers realistic training value rests on the six-stage state machine accurately modeling real demand-withdraw responses. However, the evaluation only asks therapists to track the system's own labels; no held-out transcript prediction, inter-rater agreement on state assignments, or comparison of simulated versus observed response distributions is reported (Experiment section).
  2. [Model architecture section] State updates are described as derived from psychotherapy theory and transcript analysis, yet no quantitative fidelity check (e.g., predictive accuracy on unseen data or expert validation of transition rules) is provided. This is load-bearing because therapists' improved identification scores could reflect internal consistency rather than external validity (Model architecture section).
minor comments (2)
  1. [Abstract] The abstract omits exact performance metrics, statistical tests, and details on the prompt-based baseline implementation, making it difficult to assess the strength of the reported improvements.
  2. [System design section] Notation for the sense-plan-act loop and state-transition functions could be clarified with a diagram or pseudocode to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying the role of the therapist evaluation while acknowledging limitations in external validation. Revisions have been made to improve transparency on these points.

read point-by-point responses
  1. Referee: [Experiment section] The central claim that the simulation offers realistic training value rests on the six-stage state machine accurately modeling real demand-withdraw responses. However, the evaluation only asks therapists to track the system's own labels; no held-out transcript prediction, inter-rater agreement on state assignments, or comparison of simulated versus observed response distributions is reported (Experiment section).

    Authors: We agree that the evaluation centers on therapists identifying the system's predefined state labels rather than independent prediction of real transcripts. This design choice prioritizes demonstrating training utility and interpretability in a controlled setting, where consistent state tracking is essential for skill practice. The 21-therapist study shows statistically higher identification accuracy and realism ratings versus the prompt baseline, providing evidence that the states support effective training. We acknowledge the absence of held-out transcript prediction or direct distribution comparisons as a limitation. In the revised manuscript, we have expanded the Experiment section with a dedicated limitations paragraph outlining these gaps and proposing future work on transcript-based validation. No inter-rater agreement on state assignments was computed, as states are system-defined for training consistency rather than derived from open-ended coding. revision: partial

  2. Referee: [Model architecture section] State updates are described as derived from psychotherapy theory and transcript analysis, yet no quantitative fidelity check (e.g., predictive accuracy on unseen data or expert validation of transition rules) is provided. This is load-bearing because therapists' improved identification scores could reflect internal consistency rather than external validity (Model architecture section).

    Authors: The transition rules were constructed from established demand-withdraw literature (e.g., Christensen & Heavey) combined with qualitative review of therapy transcripts to map observed verbal and emotional patterns to the six stages. We have revised the Model architecture section to include additional detail on this derivation process and the clinical expertise of the research team. We concur that a quantitative fidelity metric, such as predictive accuracy on unseen transcripts, is not reported and would strengthen claims of external validity. The therapist study's superior performance over the prompt-based baseline helps mitigate concerns of mere internal consistency, as the baseline shares the same generative model but lacks structured state updates. We have added explicit discussion of this distinction and noted the lack of formal predictive validation as a limitation for future research. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is independent of internal definitions

full rationale

The paper defines a six-stage state machine from external psychotherapy theory and transcript analysis, then evaluates the resulting simulation through an external experiment with 21 licensed therapists who rate realism, responsiveness, and state-transition identification accuracy against a prompt-based baseline. No parameters are fitted to the evaluation outcomes, no self-citation chain supports the core claims, and the reported results (human preference and identification accuracy) do not reduce by construction to the input rules or definitions. The derivation chain remains self-contained against external human benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The simulation rests on standard psychotherapy concepts applied to agents rather than new fitted parameters or invented entities.

axioms (2)
  • domain assumption Psychotherapy theory and transcript analysis provide a valid basis for defining interaction stages and state transitions.
    Invoked to update agent states from therapist inputs in the sense-plan-act loop.
  • domain assumption The six stages accurately represent evolving demand-withdraw conflict dynamics.
    Central to the structured interaction model described.

pith-pipeline@v0.9.0 · 5472 in / 1320 out tokens · 38301 ms · 2026-05-16T14:13:13.362799+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

  1. [1]

    Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works

    2025. Counseling and Psychotherapy Transcripts, Client Narratives, and Reference Works. Subscription Database. Accessed on 2025-04-28. URL typically provided via institutional access. General product infor- mation: https://alexanderstreet.com/products/counseling-and-psychotherapy- transcripts-client-narratives-and-reference-works

  2. [2]

    Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently simulating human personas with multi-turn reinforcement learning.arXiv preprint arXiv:2511.00222(2025). Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Practice

  3. [3]

    Jess K Alberts and Gillian Driscoll. 1992. Containment versus escalation: The trajectory of couples’ conversational complaints.Western Journal of Commu- nication (includes Communication Reports)56, 4 (1992), 394–412. doi:10.1080/ 10570319209374425

  4. [4]

    Elizabeth S Allen, David C Atkins, Donald H Baucom, Douglas K Snyder, Kristina Coop Gordon, and Shirley P Glass. 2005. Intrapersonal, interpersonal, and contextual factors in engaging in and responding to extramarital involve- ment.Clinical Psychology: Science and Practice12, 2 (2005), 101

  5. [5]

    Mohammad Almansoori, Komal Kumar, and Hisham Cholakkal. 2025. MedA- gentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interac- tions. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, Vol. LNCS 15968. Springer Nature Switzerland, 362–372

  6. [6]

    Mina Almasi and Ross Deans Kristensen-McLachlan. 2025. Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring. InProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, Vienna, Austria, 70–88. doi:10.18653/v1/2025.bea-1.6

  7. [7]

    Lotta G Andersson, Mark H Butler, and Ryan B Seedall. 2006. Couples’ experience of enactments and softening in marital therapy.The American Journal of Family Therapy34, 4 (2006), 301–315

  8. [8]

    Lisa A Benson, Meghan M McGinn, and Andrew Christensen. 2012. Common principles of couple therapy.Behavior therapy43, 1 (2012), 25–35. doi:10.1016/j. beth.2010.12.009

  9. [9]

    Guanqun Bi, Zhuang Chen, Zhoufu Liu, Hongkai Wang, Xiyao Xiao, Yuqiang Xie, Wen Zhang, Yongkang Huang, Yuxuan Chen, Libiao Peng, and Minlie Huang

  10. [10]

    In: Zong, C., Xia, F., Li, W., Navigli, R

    MAGI: Multi-Agent Guided Interview for Psychiatric Assessment. In Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 24898–24921. doi:10.18653/v1/ 2025.findings-acl.1278

  11. [11]

    Mark H Butler and Brandt C Gardner. 2003. Adapting enactments to couple reactivity: Five developmental stages.Journal of Marital and Family therapy29, 3 (2003), 311–327

  12. [12]

    Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Caiming Xiong, Shiva Kumar Pentyala, and Chien-Sheng Wu. 2025. Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Au...

  13. [13]

    Andrew Christensen and Christopher L. Heavey. 1990. Gender and Social Struc- ture in the Demand/Withdraw Pattern of Marital Conflict.Journal of Personality and Social Psychology59, 1 (1990), 73–81. doi:10.1037/0022-3514.59.1.73

  14. [14]

    Wong, Elisa Hollenberg, and Wendy Levinson

    David A. Cook, Patricia J. Erwin, and Marc M. Triola. 2010. Computerized Virtual Patients in Health Professions Education: A Systematic Review and Meta-Analysis.Academic Medicine85, 10 (2010), 1589–1602. doi:10.1097/ACM. 0b013e3181edfe13

  15. [15]

    Alexander O Crenshaw, Andrew Christensen, Donald H Baucom, Norman B Epstein, and Brian RW Baucom. 2017. Revised scoring and improved reliability for the Communication Patterns Questionnaire.Psychological assessment29, 7 (2017), 913

  16. [16]

    William J Doherty, Steven M Harris, and Kadija Mussa. 2024. Relationship undermining in couple therapy.Contemporary Family Therapy46, 3 (2024), 243–248

  17. [17]

    Brian D Doss, McKenzie K Roddy, Stephanie A Wiebe, and Susan M Johnson

  18. [18]

    doi:10.1111/jmft.12552

    A review of the research during 2010–2019 on evidence-based treatments for couple relationship distress.Journal of marital and family therapy48, 1 (2022), 283–306. doi:10.1111/jmft.12552

  19. [19]

    Brian D Doss, Lorelei E Simpson, and Andrew Christensen. 2004. Why do couples seek marital therapy?Professional psychology: Research and practice35, 6 (2004), 608

  20. [20]

    Kathleen A Eldridge, Andrew Christensen, et al. 2002. Demand-withdraw commu- nication during couple conflict: A review and analysis.Understanding marriage: Developments in the study of couple interaction(2002), 289–322

  21. [21]

    Daniel J Fischer and Brandi C Fink. 2014. Clinical processes in behavioral couples therapy.Psychotherapy51, 1 (2014), 11. doi:10.1037/a0033823

  22. [22]

    Ananya Ganesh, Martha Palmer, and Katharina Kann. 2023. A Survey of Chal- lenges and Methods in the Computational Modeling of Multi-Party Dialog. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023). Association for Computational Linguistics, Toronto, Canada, 140–154. doi:10.18653/v1/2023.nlp4convai-1.12

  23. [23]

    John M Gottman. 2017. The roles of conflict engagement, escalation, and avoid- ance in marital interaction: A longitudinal view of five types of couples. In Interpersonal development. Routledge, 359–368

  24. [24]

    Greenman and Susan M

    Paul S. Greenman and Susan M. Johnson. 2013. Process Research on Emotionally Focused Therapy (EFT) for Couples: Linking Theory to Practice.Family Process 52, 1 (2013), 46–61. doi:10.1111/famp.12015

  25. [25]

    2008.Clinical handbook of couple therapy

    Alan S Gurman. 2008.Clinical handbook of couple therapy. Guilford Press

  26. [26]

    1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

    W Halford and Howard J Markman. 1997.Clinical handbook of marriage and couples interventions.John Wiley & Sons, Inc

  27. [27]

    Patricia Huerta, Caitlin Edwards, Ronald Asiimwe, Morgan PettyJohn, Jennifer VanBoxel, Preston Morgan, and Andrea K Wittenborn. 2023. Exploratory anal- ysis of pursue-withdraw patterns, attachment, and gender among couples in emotionally focused therapy.The American Journal of Family Therapy51, 1 (2023), 57–75. doi:10.1080/01926187.2022.2129521

  28. [28]

    Johnson and Paul S

    Susan M. Johnson and Paul S. Greenman. 2006. The Path to a Secure Bond: Emotionally Focused Couple Therapy.Journal of Clinical Psychology62, 5 (2006), 597–609. doi:10.1002/jclp.20251

  29. [29]

    Ronald C Kessler, Katherine A McGonagle, Shanyang Zhao, Christopher B Nel- son, Michael Hughes, Suzann Eshleman, Hans-Ulrich Wittchen, and Kenneth S Kendler. 1994. Lifetime and 12-month prevalence of DSM-III-R psychiatric disor- ders in the United States: results from the National Comorbidity Survey.Archives of general psychiatry51, 1 (1994), 8–19

  30. [30]

    Claire Lane and Stephen Rollnick. 2007. The use of simulated patients and role- play in communication skills training: a review of the literature to August 2005. Patient education and counseling67, 1-2 (2007), 13–20

  31. [31]

    Zhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, YuQian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, and Deyi Xiong. 2025. ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents. InProceedings of the 63rd Annual Meeting of the Association for Computa...

  32. [32]

    Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator.arXiv preprint arXiv:2403.08495(2024). https://arxiv.org/abs/2403.08495

  33. [33]

    Yifan Liu, Wei Wei, Jiayi Liu, Xianling Mao, Rui Fang, and Dangyang Chen. 2022. Improving personality consistency in conversation by persona extending. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1350–1359

  34. [34]

    Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles.arXiv preprint arXiv:2407.00870 (2024)

  35. [35]

    Dongxu Lu, Johan Jeuring, and Albert Gatt. 2025. Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues. InProceedings of the 18th International Natural Language Generation Conference. 20–40

  36. [36]

    Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth

    Pedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schuetze, and Benjamin Roth. 2026. Persistent Personas? Role-Playing, Instruc- tion Following, and Safety in Extended Interactions. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguis- tics (Volume 1: Long Papers). Association ...

  37. [37]

    Andrew Maxim, Roshan Venkatakrishnan, and Benjamin Lok. 2025. Perceived Realism and Voice Naturalness of Virtual Humans: Structural Equation Modeling of Behavioral Intentions among Black American Adults.ACM Transactions on Applied Perception22, 4 (2025), 1–16

  38. [38]

    Meghan M McGinn, Pamela T McFarland, and Andrew Christensen. 2009. An- tecedents and consequences of demand/withdraw.Journal of Family Psychology 23, 5 (2009), 749

  39. [39]

    Richard B Miller, Jeremy B Yorgason, Jonathan G Sandberg, and Mark B White

  40. [40]

    The American Journal of Family Therapy31, 5 (2003), 395–407

    Problems that couples bring to therapy: A view across the family life cycle. The American Journal of Family Therapy31, 5 (2003), 395–407

  41. [41]

    Rajeswari Natrajan-Tyagi, Nicole Sabatini Gutierrez, Marlene Elizalde, Vivian Mai Phan, Jacquelyn F Christensen, Linda Crossley, Kimberly Drollinger, Maya Faulk, Rebeca Garcia, and Crystal Meng. 2016. Using a constant family in role-plays for training MFTs.The American Journal of Family Therapy44, 5 (2016), 221–233

  42. [42]

    Papp, Chrystyna D

    Lauren M. Papp, Chrystyna D. Kouros, and E. Mark Cummings. 2009. Demand- Withdraw Patterns in Marital Conflict in the Home.Personal Relationships16, 2 (2009), 285–300. doi:10.1111/j.1475-6811.2009.01223.x

  43. [43]

    Lauren M Papp, Chrystyna D Kouros, and E Mark Cummings. 2009. Demand- withdraw patterns in marital conflict in the home.Personal Relationships16, 2 (2009), 285–300

  44. [44]

    Marie-Aude Piot, Agnès Dechartres, Chris Attoe, Fabrice Jollant, Cédric Lemogne, Carine Layat Burn, Jan-Joost Rethans, Daphne Michelet, Sean Cross, Gregoire Billon, et al. 2020. Simulation in psychiatry for medical doctors: a systematic review and meta-analysis.Medical education54, 8 (2020), 696–708

  45. [45]

    Fredric E Rabinowitz. 1997. Teaching counseling through a semester-long role play.Counselor Education and Supervision36, 3 (1997), 216–223

  46. [46]

    Karen H Rosen, Jennifer L Matheson, Sandra M Stith, Eric E McCollum, and Lisa D Locke. 2003. Negotiated time-out: a de-escalation tool for couples.Journal of Marital and Family Therapy29, 3 (2003), 291–298

  47. [47]

    Sujin Shin, Jin-Hwa Park, and Jung-Hee Kim. 2015. Effectiveness of patient simulation in nursing education: meta-analysis.Nurse education today35, 1 (2015), 176–182

  48. [48]

    W Matthew Shurts, Craig S Cashwell, Shawn L Spurgeon, Suzanne Degges-White, Casey A Barrio, and Kerrie N Kardatzke. 2006. Preparing counselors-in-training Wang et al. to work with couples: Using role-plays and reflecting teams.The Family Journal 14, 2 (2006), 151–157

  49. [49]

    Eric Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, and Jason Weston. 2022. Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents. InProceedings of the 4th Workshop on NLP for Conversational AI. 77–97

  50. [50]

    Douglas K Snyder, Angela M Castellani, and Mark A Whisman. 2006. Current status and future directions in couple therapy.Annu. Rev. Psychol.57, 1 (2006), 317–344. doi:10.1146/annurev.psych.56.091103.070154

  51. [51]

    Ian Steenstra, Farnaz Nouraei, and Timothy Bickmore. 2025. Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 593, 22 pages. doi:10.1145/3706598.3714014

  52. [52]

    Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V Stolyar, Katelyn Polanska, Karleigh R McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, et al. 2024. A framework for human evaluation of large language models in healthcare derived from literature review.NPJ digital medicine7, 1 (2024), 258

  53. [53]

    Ruiyi Wang, Stephanie Milani, Jamie C Chiu, Jiayin Zhi, Shaun M Eack, Travis Labrum, Samuel M Murphy, Nev Jones, Kate Hardy, Hong Shen, et al . 2024. Patient-{\Psi}: Using large language models to simulate patients for training mental health professionals.arXiv preprint arXiv:2405.19660(2024)

  54. [54]

    Mark A Whisman, Amy E Dixon, and Benjamin Johnson. 1997. Therapists’ perspectives of couple problems and treatment issues in couple therapy.Journal of family psychology11, 3 (1997), 361

  55. [55]

    Scott R Woolley, Karen S Wampler, and Sean D Davis. 2012. Enactments in couple therapy: Identifying therapist interventions associated with positive change.Journal of Family Therapy34, 3 (2012), 284–305

  56. [56]

    Yizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, and Ee-Peng Lim. 2025. Consistent Client Simulation for Motivational Interviewing- based Counseling. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long P...

  57. [57]

    Williams

    Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP- based Statistical Spoken Dialogue Systems: A Review.Proc. IEEE101, 5 (2013), 1160–1179. doi:10.1109/JPROC.2012.2225812

  58. [58]

    Benjamin Zendejas, Ryan Brydges, Amy T Wang, and David A Cook. 2013. Patient outcomes in simulation-based medical education: a systematic review. Journal of general internal medicine28, 8 (2013), 1078–1089

  59. [59]

    how are you

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623. Simulating Couple Conflict: Designing A Multi-Agent System for Therapy Training and Pr...

  60. [60]

    Only greetings -> Greeting

  61. [61]

    Introducing issues -> Problem Raising

  62. [62]

    Ongoing anger/blame/defensiveness -> Escalation

  63. [63]

    Therapist calming, no vulnerability -> De-Escalation

  64. [64]

    Vulnerable emotions expressed -> Enactment

  65. [65]

    demander

    Session closing -> Wrap-up C.2 Rule-Based Stage Transition Constraints Table 3: Rule-Based Stage Transition Constraints Condition Result Rationale Turn≤5 No Escalation allowed yet Ensures trainees have sufficient context about the couple’s issues Turn 7 AND no prior Escalation AND stage = Problem Raising Force Escalation Guarantees conflict exposure for a...

  66. [66]

    therapist

    Therapist sends a message and is not being ignored -> "therapist"

  67. [67]

    Alex," or

    Therapist directly addresses one patient by name (e.g., "Alex," or "Jordan.") -> that patient

  68. [68]

    Therapist message is not directed at anyone -> "both"

  69. [69]

    you" referring to Jordan's actions ->

    Alex says "you" referring to Jordan's actions -> "Jordan"

  70. [70]

    you" referring to Alex's actions ->

    Jordan says "you" referring to Alex's actions -> "Alex"

  71. [71]

    Alex speaks directly to Jordan -> "Jordan"

  72. [72]

    Jordan speaks directly to Alex -> "Alex"

  73. [73]

    therapist

    Alex or Jordan speak without addressing the other -> "therapist" Constraint:The therapist never replies to themselves. If the last message is from the therapist, only "Alex", "Jordan", or "both" are valid. D.4 Text-to-speech (TTS) Prompts forAlexandJordan Alex (Demander) Neutral: Serious, subdued tone; gentle sadness, slight heaviness or sigh at sentence ...