arxiv: 2604.20382 · v1 · submitted 2026-04-22 · 💻 cs.CL

Recognition: unknown

Graph2Counsel: Clinically Grounded Synthetic Counseling Dialogue Generation from Client Psychological Graphs

Aishik Mandal , Hiba Arnaout , Clarissa W. Ong , Juliet Bockhorst , Kate Sheehan , Rachael Moldow , Tanmoy Chakraborty , Iryna Gurevych

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:19 UTC · model grok-4.3

classification 💻 cs.CL

keywords synthetic datacounseling dialoguespsychological graphsLLM fine-tuningmental healthdialogue generationsafety evaluation

0 comments

The pith

Structuring client psychological states into graphs produces more realistic and safer synthetic counseling dialogues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that builds graphs connecting a client's thoughts, emotions, and behaviors to guide the creation of artificial counseling sessions. These graphs direct a sequence of prompts that incorporate professional counseling approaches, resulting in 760 dialogues from 76 different client profiles. Mental health experts judged the sessions superior to earlier synthetic collections in how specific, authentic, competent, and safe they appear, with strong agreement in their assessments. Training an open language model on this collection raises its performance on two standard counseling evaluation sets. The work addresses the shortage of usable real counseling data caused by privacy rules by supplying structured synthetic alternatives.

Core claim

Graph2Counsel generates synthetic counseling sessions from Client Psychological Graphs encoding relationships among thoughts, emotions, and behaviors, employing structured prompting with counselor strategies to create 760 sessions. Expert evaluation demonstrates outperformance over prior datasets in specificity, counselor competence, authenticity, conversational flow, and safety with Krippendorff's α of 0.70. Fine-tuning an open-source model on the dataset yields improvements on CounselingBench and CounselBench.

What carries the argument

Client Psychological Graphs (CPGs) that encode relationships among clients' thoughts, emotions, and behaviors, directing a structured prompting pipeline for dialogue generation.

If this is right

The dataset provides higher quality training material for adapting LLMs to counseling tasks.
Fine-tuned models show enhanced results on established counseling benchmarks.
Generated dialogues exhibit better psychological consistency and safety.
The approach supports generation across diverse client profiles using varied graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This graph-based method may help overcome data scarcity in other privacy-sensitive conversational AI applications.
The graphs could potentially be used to simulate sessions for training new human counselors.
If the quality gains hold, it might accelerate the creation of reliable AI-assisted mental health tools.

Load-bearing premise

Expert evaluators can correctly judge the psychological consistency, safety, and authenticity of the generated dialogues, and the Client Psychological Graphs accurately represent real client states without adding artifacts.

What would settle it

Expert re-evaluation finding no advantage or a disadvantage for the new dataset on the rated dimensions, or fine-tuned models failing to improve or worsening on the benchmarks would indicate the method does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.20382 by Aishik Mandal, Clarissa W. Ong, Hiba Arnaout, Iryna Gurevych, Juliet Bockhorst, Kate Sheehan, Rachael Moldow, Tanmoy Chakraborty.

**Figure 2.** Figure 2: Structured knowledge inferred from real counseling sessions: CPGs, CPG-derived client profiles, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt used to generate a single CPG-grounded client profile. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: System Prompt used to generate diverse CPG-grounded client profiles. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: User prompt used to generate diverse CPG-grounded client profiles. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Global constraints for counseling dialogue generation. [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Counselor guidelines (designed with direct input from clinicians) for counseling dialogue generation. [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Client guidelines (designed with direct input from clinicians) for counseling dialogue generation. [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Common Pitfalls (designed with direct input from clinicians) for counseling dialogue generation to avoid. [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: System Prompt used to generate synthetic counseling sessions using base prompting technique. [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: User prompt used to generate synthetic counseling sessions with CPG input and base prompting [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: User prompt used to generate synthetic counseling sessions with CPG-grounded client profile input and [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

**Figure 13.** Figure 13: User prompt used to generate synthetic counseling sessions with both CPG and CPG-grounded client [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗

**Figure 14.** Figure 14: System prompt used to generate synthetic counseling sessions with Guided Counseling prompting [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗

**Figure 15.** Figure 15: User prompt used to generate synthetic counseling sessions with CPG as input and Guided Counseling [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗

**Figure 16.** Figure 16: User prompt used to generate synthetic counseling sessions with CPG-grounded client profile as input [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗

**Figure 17.** Figure 17: User prompt used to generate synthetic counseling sessions with CPG and CPG-grounded client profile [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗

**Figure 18.** Figure 18: System prompt used to generate synthetic counseling sessions with GC+CoT prompting technique. [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗

**Figure 19.** Figure 19: User prompt used to generate synthetic counseling sessions with CPG as input and GC+CoT prompting [PITH_FULL_IMAGE:figures/full_fig_p034_19.png] view at source ↗

**Figure 20.** Figure 20: User prompt used to generate synthetic counseling sessions with CPG-grounded client profile as input [PITH_FULL_IMAGE:figures/full_fig_p034_20.png] view at source ↗

**Figure 21.** Figure 21: User prompt used to generate synthetic counseling sessions with CPG and CPG-grounded client profile [PITH_FULL_IMAGE:figures/full_fig_p034_21.png] view at source ↗

**Figure 22.** Figure 22: System prompt used to generate feedback for sessions generated with GC+MA prompting technique. [PITH_FULL_IMAGE:figures/full_fig_p035_22.png] view at source ↗

**Figure 23.** Figure 23: System prompt used to regenerate sessions with GC+MA prompting technique. [PITH_FULL_IMAGE:figures/full_fig_p036_23.png] view at source ↗

**Figure 24.** Figure 24: User prompt used to generate feedback for sessions generated with CPG as input with GC+MA prompting [PITH_FULL_IMAGE:figures/full_fig_p037_24.png] view at source ↗

**Figure 25.** Figure 25: User prompt used to generate revised sessions with CPG as input with GC+MA prompting technique. [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗

**Figure 26.** Figure 26: User prompt used to generate feedback for sessions generated with CPG-grounded client profile as input [PITH_FULL_IMAGE:figures/full_fig_p037_26.png] view at source ↗

**Figure 27.** Figure 27: User prompt used to generate revised sessions with CPG-grounded client profile as input with GC+MA [PITH_FULL_IMAGE:figures/full_fig_p038_27.png] view at source ↗

**Figure 28.** Figure 28: User prompt used to generate feedback for sessions generated with CPG and CPG-grounded client profile [PITH_FULL_IMAGE:figures/full_fig_p038_28.png] view at source ↗

**Figure 29.** Figure 29: User prompt used to generate revised sessions with CPG and CPG-grounded client profile as input with [PITH_FULL_IMAGE:figures/full_fig_p038_29.png] view at source ↗

**Figure 30.** Figure 30: Prompt used to extract counselor strategies used from real counseling session. [PITH_FULL_IMAGE:figures/full_fig_p039_30.png] view at source ↗

**Figure 31.** Figure 31: Prompt used to QLoRA fine-tune a Llama3-8B-Instruct model using data from SQPsychConv dataset. [PITH_FULL_IMAGE:figures/full_fig_p039_31.png] view at source ↗

**Figure 32.** Figure 32: Prompt used to QLoRA fine-tune a Llama3-8B-Instruct model using data from MAGneT dataset. [PITH_FULL_IMAGE:figures/full_fig_p040_32.png] view at source ↗

**Figure 33.** Figure 33: Prompt used to QLoRA fine-tune a Llama3-8B-Instruct model using data from Graph2Counsel dataset. [PITH_FULL_IMAGE:figures/full_fig_p040_33.png] view at source ↗

**Figure 34.** Figure 34: Prompt used to evaluate the generated counseling sessions on CTRS. [PITH_FULL_IMAGE:figures/full_fig_p041_34.png] view at source ↗

**Figure 35.** Figure 35: Prompt used to evaluate the generated counseling sessions on WAI. [PITH_FULL_IMAGE:figures/full_fig_p041_35.png] view at source ↗

**Figure 36.** Figure 36: Prompt used to extract client issues from counseling session dialogues in SQPsychConv dataset. [PITH_FULL_IMAGE:figures/full_fig_p042_36.png] view at source ↗

**Figure 37.** Figure 37: A sample dialogue excerpt from Graph2Counsel: Client Alex, with feedback from experts [PITH_FULL_IMAGE:figures/full_fig_p043_37.png] view at source ↗

**Figure 38.** Figure 38: A sample dialogue excerpt from Graph2Counsel: Client David. [PITH_FULL_IMAGE:figures/full_fig_p044_38.png] view at source ↗

**Figure 39.** Figure 39: Prompt used to evaluate the faithfulness of the generated counseling sessions to the input CPG. [PITH_FULL_IMAGE:figures/full_fig_p045_39.png] view at source ↗

**Figure 40.** Figure 40: Prompt used to evaluate the faithfulness of the generated counseling sessions to the input CPG-grounded [PITH_FULL_IMAGE:figures/full_fig_p045_40.png] view at source ↗

**Figure 41.** Figure 41: Prompt used to generate model responses to questions in CounselBench-Eval and CounselBench-Adv. [PITH_FULL_IMAGE:figures/full_fig_p045_41.png] view at source ↗

**Figure 42.** Figure 42: Prompt used to evaluate model responses on CounselBench-Eval. [PITH_FULL_IMAGE:figures/full_fig_p046_42.png] view at source ↗

**Figure 43.** Figure 43: Prompt used to evaluate model responses on CounselBench-Adv. [PITH_FULL_IMAGE:figures/full_fig_p046_43.png] view at source ↗

**Figure 44.** Figure 44: Prompt used to generate model responses to questions in CounselingBench using Zero-Shot (ZS) [PITH_FULL_IMAGE:figures/full_fig_p047_44.png] view at source ↗

**Figure 45.** Figure 45: Prompt used to generate model responses to questions in CounselingBench using Few-Shot (FS) [PITH_FULL_IMAGE:figures/full_fig_p048_45.png] view at source ↗

**Figure 46.** Figure 46: Prompt used to generate model responses to questions in CounselingBench using Few-Shot Chain-of [PITH_FULL_IMAGE:figures/full_fig_p049_46.png] view at source ↗

read the original abstract

Rising demand for mental health support has increased interest in using Large Language Models (LLMs) for counseling. However, adapting LLMs to this high-risk safety-critical domain is hindered by the scarcity of real-world counseling data due to privacy constraints. Synthetic datasets provide a promising alternative, but existing approaches often rely on unstructured or semi-structured text inputs and overlook structural dependencies between a client's cognitive, emotional, and behavioral states, often producing psychologically inconsistent interactions and reducing data realism and quality. We introduce Graph2Counsel, a framework for generating synthetic counseling sessions grounded in Client Psychological Graphs (CPGs) that encode relationships among clients' thoughts, emotions, and behaviors. Graph2Counsel employs a structured prompting pipeline guided by counselor strategies and CPG, and explores prompting strategies including CoT (Wei et al., 2022) and Multi-Agent Feedback (Li et al., 2025a). Graph2Counsel produces 760 sessions from 76 CPGs across diverse client profiles. In expert evaluation, our dataset outperforms prior datasets on specificity, counselor competence, authenticity, conversational flow, and safety, with substantial inter-annotator agreement (Krippendorff's $\alpha$ = 0.70). Fine-tuning an open-source model on this dataset improves performance on CounselingBench (Nguyen et al., 2025) and CounselBench (Li et al., 2025b), showing downstream utility. We also make our code and data public.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Graph2Counsel adds a graph layer to synthetic counseling data generation that could help consistency, but the quality claims hinge on expert ratings without clear blinding or clinical validation.

read the letter

The punchline is that this work gives a structured way to build synthetic counseling dialogues by encoding client states as graphs, which is a clear step past the usual free-text or template-based approaches. They produce 760 sessions from 76 graphs, incorporate counselor strategies in prompting, and test CoT plus multi-agent feedback. The dataset is released with code, and fine-tuning on it lifts results on CounselingBench and CounselBench. That combination of structure and public release is the useful part for anyone trying to train safer mental-health models without real session transcripts. The expert ratings (Krippendorff α = 0.70) show higher marks on specificity, authenticity, flow, and safety than prior sets, which is at least a starting signal. The paper does a straightforward job laying out the data-scarcity problem and showing a pipeline that tries to respect cognitive-emotional-behavioral links. The downstream utility numbers are the kind of evidence that matters for this domain. The soft spots sit in the evaluation. The superiority claim rests on human raters, yet the abstract gives no sign the evaluation was blinded or that raters were licensed clinicians who could catch clinically meaningful inconsistencies. The graphs themselves are presented as faithful, but without details on how they were built or checked against real client data, any artifacts could simply be reproduced downstream. That single assumption carries a lot of weight for both the rating results and the fine-tuning gains. Minor gaps include missing statistical significance tests on the benchmark improvements and limited discussion of how diverse the 76 client profiles actually are. This paper is for groups working on synthetic data pipelines or LLM safety in counseling. A reader who needs concrete prompting examples or a new dataset to experiment with will get value from the released material. It is coherent enough on its own terms to deserve referee time, even if the human-evaluation protocol needs tightening. I would send it to review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Graph2Counsel, a framework for generating synthetic counseling dialogues grounded in Client Psychological Graphs (CPGs) that encode relationships among clients' cognitive, emotional, and behavioral states. Using structured prompting with counselor strategies, Chain-of-Thought, and multi-agent feedback, it produces 760 sessions from 76 CPGs across diverse profiles. Expert evaluation indicates superiority over prior datasets in specificity, counselor competence, authenticity, conversational flow, and safety, with Krippendorff's α = 0.70. Fine-tuning an open-source LLM on the dataset yields improvements on CounselingBench and CounselBench, and the authors make code and data publicly available.

Significance. Should the expert evaluations and downstream improvements prove robust, this approach could significantly advance the creation of high-quality, privacy-preserving synthetic data for LLM training in counseling, addressing key challenges in safety-critical mental health applications. The emphasis on structural dependencies in client states and the public release of resources are notable strengths for reproducibility and further research.

major comments (3)

[Expert Evaluation] The claim that the generated dataset outperforms prior ones on specificity, counselor competence, authenticity, conversational flow, and safety rests entirely on expert human ratings (Krippendorff’s α = 0.70). The manuscript provides no information on whether raters are licensed clinicians, whether evaluation was blinded to dataset origin, or whether ratings correlate with external criteria such as real-session transcripts or client outcomes. This is load-bearing for the central claim of improved psychological consistency and safety.
[CPG Construction and Dataset Generation] The construction of the Client Psychological Graphs (CPGs) is described at a high level but lacks details on whether it is manual, LLM-assisted, or hybrid, and no validation against actual client data or assessment of introduced artifacts is reported. Any such artifacts could be reproduced in the generated dialogues without detection by raters, undermining the grounding and realism claims.
[Downstream Evaluation] The abstract and results report that fine-tuning improves performance on CounselingBench and CounselBench but supply no quantitative details, effect sizes, baseline comparisons, or statistical significance tests. This prevents assessment of the practical magnitude of the downstream utility.

minor comments (2)

[Abstract] The abstract could briefly note the number of expert annotators and the exact rating scales used, in addition to the reported α value, to strengthen the inter-annotator agreement claim.
[Method] Even with the public code release, including one or two example prompt templates from the structured prompting pipeline in the main text or appendix would improve immediate readability and reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript on Graph2Counsel. Their comments are valuable for improving the clarity and robustness of our claims. We address each major comment below, indicating the revisions we intend to make to the manuscript.

read point-by-point responses

Referee: [Expert Evaluation] The claim that the generated dataset outperforms prior ones on specificity, counselor competence, authenticity, conversational flow, and safety rests entirely on expert human ratings (Krippendorff’s α = 0.70). The manuscript provides no information on whether raters are licensed clinicians, whether evaluation was blinded to dataset origin, or whether ratings correlate with external criteria such as real-session transcripts or client outcomes. This is load-bearing for the central claim of improved psychological consistency and safety.

Authors: We agree that more details on the expert evaluation are necessary to support our claims. The manuscript will be revised to include information on the raters' qualifications and expertise in counseling psychology, the blinding procedure employed during rating, and any efforts or limitations in correlating ratings with external criteria like real transcripts or outcomes. We will also highlight the role of the structured prompting and multi-agent feedback in promoting psychological consistency and safety. These additions will address the load-bearing nature of the evaluation for our central claims. revision: yes
Referee: [CPG Construction and Dataset Generation] The construction of the Client Psychological Graphs (CPGs) is described at a high level but lacks details on whether it is manual, LLM-assisted, or hybrid, and no validation against actual client data or assessment of introduced artifacts is reported. Any such artifacts could be reproduced in the generated dialogues without detection by raters, undermining the grounding and realism claims.

Authors: We acknowledge the high-level description of CPG construction in the current manuscript. We will expand this section to detail the construction methodology, including the extent to which it is manual, LLM-assisted, or hybrid, and describe the validation processes used, such as expert review for consistency with psychological principles. We will also include an analysis of potential artifacts and how the generation pipeline, including Chain-of-Thought and multi-agent feedback, helps to detect and mitigate them. While direct validation against real client data is precluded by privacy considerations, we will discuss this as a limitation and explain the grounding mechanisms employed. revision: yes
Referee: [Downstream Evaluation] The abstract and results report that fine-tuning improves performance on CounselingBench and CounselBench but supply no quantitative details, effect sizes, baseline comparisons, or statistical significance tests. This prevents assessment of the practical magnitude of the downstream utility.

Authors: We concur that quantitative details are essential for evaluating the downstream utility. The revised manuscript will include specific performance numbers, effect sizes, baseline comparisons (e.g., against models fine-tuned on other synthetic datasets), and statistical tests for significance on both CounselingBench and CounselBench. These will be added to the results section to demonstrate the practical improvements achieved by fine-tuning on the Graph2Counsel dataset. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical generation and evaluation pipeline

full rationale

The paper presents an empirical framework that constructs Client Psychological Graphs, applies structured prompting (including external techniques like Chain-of-Thought) to generate dialogues, evaluates them via expert ratings, and tests downstream utility on separate benchmarks. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided text. All load-bearing claims rest on external expert judgments and benchmark results rather than any self-referential reduction, self-citation chain, or ansatz smuggled through prior work by the same authors. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about LLM controllability and evaluator reliability rather than new mathematical axioms or fitted parameters.

axioms (2)

domain assumption Structured graphs plus counselor strategies can guide LLMs to produce psychologically consistent counseling dialogues
Core premise of the prompting pipeline described in the abstract.
domain assumption Human experts can reliably assess psychological consistency, safety, and authenticity in generated dialogues
Required for the quality and safety claims.

invented entities (1)

Client Psychological Graphs (CPGs) no independent evidence
purpose: To encode relationships among clients' thoughts, emotions, and behaviors for grounding dialogue generation
New representational structure introduced by the paper.

pith-pipeline@v0.9.0 · 5594 in / 1447 out tokens · 45691 ms · 2026-05-10T00:19:07.374615+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Julian Burger, Christina Ralph-Nearman, and Cheri A

A novel approach for constructing per- sonalized networks from longitudinal perceived causal relations.Behaviour Research and Therapy, 173:104456. Julian Burger, Christina Ralph-Nearman, and Cheri A. Levinson. 2022. Integrating clinician and patient case conceptualization with momentary assessment data to construct idiographic networks: Moving to- ward pe...

work page arXiv 2022
[2]

InThe Twelfth International Con- ference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024

Talk like a graph: Encoding graphs for large language models. InThe Twelfth International Con- ference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. Aaron J. Fisher, Hannah G. Bosley, Katya C. Fernandez, Jonathan W. Reeves, Peter D. Soyster, Allison E. Diamond, and Jonathan Barkin. 2019. Open trial of a personali...

2024
[3]

Zhijun Guo, Alvina Lai, Johan H Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li, and 1 others

OpenReview.net. Zhijun Guo, Alvina Lai, Johan H Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li, and 1 others
[4]

Stephen N

Large language models for mental health ap- plications: systematic review.JMIR mental health, 11(1):e57400. Stephen N. Haynes, William H. O’Brien, and Antonio Godoy. 2020. A proposed model for the psychomet- ric evaluation of clinical case formulations with quan- tified causal diagrams.Psychological Assessment, 32(6):541–552. Place: US Publisher: American...

2020
[5]

arXiv preprint arXiv:2506.08584 (2025)

Personalizing eating disorder treatment us- ing idiographic models: An open series trial.Journal of Consulting and Clinical Psychology, 91(1):14–28. Place: US Publisher: American Psychological Asso- ciation. Xiang Li, Duyi Pan, Hongru Xiao, Jiale Han, Jing Tang, Jiabao Ma, Wei Wang, and Bo Cheng. 2025a. Dia- logueagents: A hybrid agent-based speech synthe...

work page internal anchor Pith review arXiv 2025
[6]

InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 13750–13770, Vienna, Austria

Eeyore: Realistic depression simulation via expert-in-the-loop supervised and preference opti- mization. InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 13750–13770, Vienna, Austria. Association for Computational Lin- guistics. Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023b. G-eval: NLG e...

work page arXiv 2025
[7]

Clarissa W

Association for Computational Linguistics. Clarissa W. Ong, Hiba Arnaout, Kate Sheehan, Es- tella Fox, Eugen Owtscharow, and Iryna Gurevych. 2025a. Using large language models to create per- sonalized networks from therapy sessions.CoRR, abs/2512.05836. Clarissa W. Ong, Kate Sheehan, Adam J.D. Mann, and Estella Fox. 2025b. Examining the effects of process...

work page arXiv 2024
[8]

InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 2438–2459, Abu Dhabi, United Arab Emirates

D4: a Chinese dialogue dataset for depression- diagnosis-oriented chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 2438–2459, Abu Dhabi, United Arab Emirates. Association for Computa- tional Linguistics. Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, and Xun Jiang. 2025. ...

work page arXiv 2022
[9]

I am not sure

and the Working Alliance Inventory (WAI) (Horvath and Greenberg, 1989). Both metrics are scored by GPT-4o (OpenAI, 2024) in an LLM-as- a-judge setup evaluating the generated counseling sessions. 8DeepSpeed 9Hugging Face 15 Strategy Evidence Therapy Modality Alternative Perspec- tive Counselor: So, now that you have a bit this stance, what are some other a...

work page arXiv 1989
[11]

Presenting Problem - What issue/symptoms do you want to discuss? (If there are multiple issues, discuss with the counselor to determine the most important or first issue to address) - When did the problem/symptoms start? - What was the stress level when the problem/symptoms first occurred? (What do you think might be the cause?) - How has the problem/symp...
[15]

24 CPG-grounded diverse Client Profile Generation System Prompt Your task is to generate diverse synthetic client intake forms for mental health counseling sessions

Is there anyone you can talk to or get help from when you encounter difficulties or problems? ##Example output 1: {example_output_1} ##Example output 2: {example_output_2} Client Intake Form: Figure 3: Prompt used to generate a single CPG-grounded client profile. 24 CPG-grounded diverse Client Profile Generation System Prompt Your task is to generate dive...
[16]

You must generate exactly 10 distinct client intake forms
[17]

Each profile must be unique in: -name, age, gender, background -symptom expression and wording -life history, stressors, and coping attempts
[18]

DO NOT copy, paraphrase, or structurally reuse the example profiles
[19]

DO NOT reuse sentence templates, phrasing, or paragraph structure from the examples
[20]

DO NOT repeat any example content, even partially
[21]

DO NOT mention clinical models, diagnoses, or technical psychological terminology
[22]

Write strictly from the client’s perspective, using everyday language
[23]

The graph reflects expert knowledge, but the client is unaware of the graph and should not sound clinically insightful
[24]

Infer content from the graph implicitly, not by naming nodes or edges
[25]

No profile may resemble another in tone, life stage, or narrative arc
[26]

Output must be valid JSON only, with no surrounding text or commentary
[27]

## Task Infer 10 diverse client intake forms based on a client graph

Any violation of formatting or repetition invalidates the output. ## Task Infer 10 diverse client intake forms based on a client graph. The **client graph** is given as a list of nodes representing recurring psychological and behavioral patterns of the client, and edges representing connections between them. **Each client intake form must include the foll...
[28]

Basic Information - name, age, gender, occupation, education, marital status, family details
[29]

Presenting Problem - What symptoms do you want to discuss? - When did the problem/symptoms start? - What was the stress level when the problem/symptoms first occurred? (What do you think might be the cause?) - How has the problem/symptoms progressed? (Changes over time, aggravating factors, alleviating factors, etc.) - Currently, in what situations, how o...
[30]

Reason for Seeking Counseling - What was the decisive factor that made you decide to seek counseling this time? (If the problem has been long standing, what made you decide to seek counseling now?)
[31]

Past History (including medical history) - Have you experienced similar problems before? Under what circumstances or stress did the problem occur, and what were the patterns? How did you cope? - Have you received treatment/counseling for other psychological problems/symptoms? (When, for how long, any medication use, reasons for stopping - improved? stoppe...
[32]

Academic/occupational functioning level (attendance, grades/job performance, etc.) - Interpersonal relationships - Daily life (including sleep, eating, self-care, etc.) - Social Support System
[33]

profile":

Is there anyone you can talk to or get help from when you encounter difficulties or problems? ## Output Format - Output only valid JSON - Do not include any explanation or comments. Just output the profiles. ## Example Output The following is an example output. Do not copy any profiles directly. [ {{"profile": "Example profile number 1"}}, {{"profile": "E...
[34]

{Not used in Profile}

The dialogue must be consistent with the client intake form {not used in CPG} and client graph. {Not used in Profile}
[35]

{Not used in Profile}

Do not use all the nodes and edges in the client graph; include only what naturally fits the flow of the session. {Not used in Profile}
[36]

mm-hm",

Use natural conversational signals whenever appropriate (e.g., "mm-hm", "um", "yeah","right","...")
[37]

When explaining experiences, emotions, reflections, or psychoeducation, **both counselor and client must use multi-sentence utterances (3–5 sentences)**
[38]

Do **not** advance to new topics or conclusions in consecutive turns

The session should progress through ideas gradually. Do **not** advance to new topics or conclusions in consecutive turns. Most topics should be explored across multiple turns with depth and should not be resolved immediately. Figure 6: Global constraints for counseling dialogue generation. 26 Counselor guidelines for counseling dialogue generation
[39]

For counselor turns, encourage natural elaboration rather than brevity. In each counselor utter- ance, explicitly use at least one counseling technique, such as reflection, open-ended questioning, summarizing, or gentle reframing, without sounding mechanical or repetitive
[40]

Do not dismiss the client’s experience

Maintain a nonjudgmental, collaborative stance; avoid jumping to conclusions or positioning yourself as the authority. Do not dismiss the client’s experience
[41]

take your time

The counselor should support the client in examining, questioning, and reshaping their own thoughts and experiences at their own pace through acknowledging pauses, hesitation, or silence (e.g., “take your time”, “we can sit with that for a moment”)
[42]

The dialogue should not feel like an interview**

**The counselor must not end every utterance with a question. The dialogue should not feel like an interview**
[43]

Rather they should build towards the information they introduce

The counselor should not introduce new information randomly. Rather they should build towards the information they introduce
[44]

The counselor should encourage the client to apply concepts to their real life, specific scenarios and/or review past week and upcoming week assignments, focusing on specific ways to connect session content with real-life applications
[45]

does this sound useful to you?

The counselor should prioritize understanding, emotional safety, and rapport before offering interventions or insights. When appropriate, the counselor should check in to ensure shared under- standing (e.g., “does this sound useful to you?”, “does this make sense?”, “sounds like you’re going through [client’s issue] — is that right?”). **These check-ins s...
[46]

I want to check in about something, just to make sure I understand how you’re doing

The counselor should do assessment/follow-up on client comments that could be indicative of a larger issue (e.g., hopelessness = assess for suicidality, weight loss = assess for eating disorder/ap- petite changes, difficult relationship = assess for safety at home, etc). The counselor should frame these questions as curiosity and care, not assumptions (e....
[47]

Psychoeducation should be preceded by a brief reflection or summary that connects it directly to what the client just shared

The counselor should offer psychoeducation when it directly supports the client’s understanding or client expresses misunderstanding of treatment concepts. Psychoeducation should be preceded by a brief reflection or summary that connects it directly to what the client just shared. The counselor should use clear, everyday language for psychoeducation and c...
[48]

The counselor must respect pacing and readiness; invite exploration without rushing
[49]

**Repeating exact phrasing is disallowed**; repeating therapeutic functions (e.g., reflection, validation) using varied language is expected
[50]

It shouldn’t be the other way around where the counselor asks the client if some techniques comes to mind

**The counselor should be the one to suggest psychological techniques to the client**. It shouldn’t be the other way around where the counselor asks the client if some techniques comes to mind. It is fine for the counselor to ask the client if they have tried anything already. Figure 7: Counselor guidelines (designed with direct input from clinicians) for...
[51]

{Not used in Profile}

The client graph influences responses implicitly and must not be named directly. {Not used in Profile}
[52]

attachment issues,

The client should express their experiences in everyday, non-clinical language and should not self-diagnose or use professional terminology (e.g., “attachment issues,” “cognitive distortions”) as a trained counselor would. The client may, however, use informal or popular mental health terms commonly encountered on social media, as well as terms that have ...
[53]

The client should not provide detailed concrete descriptions or mini- narratives unless the counselor explicitly asks for elaboration or invites reflection

Early in the session, the client should respond with **brief, surface-level descriptions** of emotions and experiences. The client should not provide detailed concrete descriptions or mini- narratives unless the counselor explicitly asks for elaboration or invites reflection. Detailed emotional descriptions and mini-narratives (what happened, what was not...
[54]

I hear what you’re saying, but at the same time it doesn’t really feel true for me

**The client can express ambivalence, confusion, or difficulty naming emotions when appropri- ate**. Ambivalence should be expressed as simultaneous pull in opposing directions, not passive agreement (e.g.,“I hear what you’re saying, but at the same time it doesn’t really feel true for me.”)
[55]

I’m not really sure what that means

When the counselor offers an interpretation, suggestion, coping strategy, or reframing, the client must first respond with at least one of the following before any agreement: Confusion (“I’m not really sure what that means. . . ”), Skepticism (“I don’t see how that would help. . . ”), Partial resistance (“I get what you’re saying, but. . . ”), Difficulty ...
[56]

speaker":

A structured rationale explaining why that utterance was generated at that moment, grounded in: - The current dialogue history, - Appropriate counseling techniques for counselor turns, - The client intake form Not used in CPG and client graph Not used in Profile for client turns. The rationale should justify intent and alignment, not reveal hidden interna...
[59]

If there is any deficiency, no matter how minor, assign a score of 4 or lower

Assign a score based on the criteria, grading very strictly and uptight. If there is any deficiency, no matter how minor, assign a score of 4 or lower
[60]

Do not add any prefix

Output the score and the explanation, separated by a comma. Do not add any prefix. Counseling conversation: {conversation} Evaluation Question: {question} Criteria: {criteria} Figure 34: Prompt used to evaluate the generated counseling sessions on CTRS. W AI LLM-as-a-judge Evaluation Prompt The following is a psychological counseling session between a cou...
[61]

Read the counseling session transcript carefully
[62]

Review the evaluation questions and criteria provided below
[63]

Assign a score based on the criteria, grading very strictly
[64]

utterance 1

Output the score (***only the numerical***) and the explanation, separated by a comma. ***Do not add any prefix.*** Counseling conversation: {conversation} Question: {question} Criteria: {criteria} Figure 35: Prompt used to evaluate the generated counseling sessions on W AI. 41 Client Issue Extraction Prompt for SQPsychConv Extract the presenting problem ...