Worst-case seeker simulations show that emotional support dialogue systems suffer substantial performance drops, with large general LLMs more robust than specialized models but still limited in sustaining engagement.
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Creating effective dialogue systems for mental health support requires high-quality multi-turn counseling dialogue data, yet collecting real counselor-client conversations presents significant challenges, including privacy concerns, high costs, and limited scalability. We present \textbf{Interactive Agents}, a novel framework that simulates naturalistic counseling dialogues through controlled LLM-to-LLM interactions. The framework introduces two key innovations: (1) a personalized client agent that maintains consistent psychological characteristics throughout a session, and (2) a counselor agent that implements a theoretically grounded three-stage therapeutic model comprising the exploration, insight, and action phases. Through rigorous evaluation using both automatic metrics and professional-counselor assessments based on the Working Alliance Inventory, we demonstrate that our framework generates therapeutically valid dialogues that are comparable in quality to human-generated sessions. Models fine-tuned on our proposed synthetic dataset (SimPsyDial) achieve state-of-the-art performance in a standard pairwise chatbot-arena evaluation of LLM-based counselors. Our framework provides a scalable, privacy-preserving method for generating high-quality counseling dialogue data while maintaining professional therapeutic standards.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GenPT applies generative projective testing to LLM agents and reports lower directional bias plus greater longitudinal sensitivity than self-report questionnaires.
citing papers explorer
-
When Seekers Are Hard to Help: Evaluating Emotional Support Dialogue Systems in Worst-Case Interactions
Worst-case seeker simulations show that emotional support dialogue systems suffer substantial performance drops, with large general LLMs more robust than specialized models but still limited in sustaining engagement.