Recognition: unknown
Graph2Counsel: Clinically Grounded Synthetic Counseling Dialogue Generation from Client Psychological Graphs
Pith reviewed 2026-05-10 00:19 UTC · model grok-4.3
The pith
Structuring client psychological states into graphs produces more realistic and safer synthetic counseling dialogues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Graph2Counsel generates synthetic counseling sessions from Client Psychological Graphs encoding relationships among thoughts, emotions, and behaviors, employing structured prompting with counselor strategies to create 760 sessions. Expert evaluation demonstrates outperformance over prior datasets in specificity, counselor competence, authenticity, conversational flow, and safety with Krippendorff's α of 0.70. Fine-tuning an open-source model on the dataset yields improvements on CounselingBench and CounselBench.
What carries the argument
Client Psychological Graphs (CPGs) that encode relationships among clients' thoughts, emotions, and behaviors, directing a structured prompting pipeline for dialogue generation.
If this is right
- The dataset provides higher quality training material for adapting LLMs to counseling tasks.
- Fine-tuned models show enhanced results on established counseling benchmarks.
- Generated dialogues exhibit better psychological consistency and safety.
- The approach supports generation across diverse client profiles using varied graphs.
Where Pith is reading between the lines
- This graph-based method may help overcome data scarcity in other privacy-sensitive conversational AI applications.
- The graphs could potentially be used to simulate sessions for training new human counselors.
- If the quality gains hold, it might accelerate the creation of reliable AI-assisted mental health tools.
Load-bearing premise
Expert evaluators can correctly judge the psychological consistency, safety, and authenticity of the generated dialogues, and the Client Psychological Graphs accurately represent real client states without adding artifacts.
What would settle it
Expert re-evaluation finding no advantage or a disadvantage for the new dataset on the rated dimensions, or fine-tuned models failing to improve or worsening on the benchmarks would indicate the method does not deliver the claimed benefits.
Figures
read the original abstract
Rising demand for mental health support has increased interest in using Large Language Models (LLMs) for counseling. However, adapting LLMs to this high-risk safety-critical domain is hindered by the scarcity of real-world counseling data due to privacy constraints. Synthetic datasets provide a promising alternative, but existing approaches often rely on unstructured or semi-structured text inputs and overlook structural dependencies between a client's cognitive, emotional, and behavioral states, often producing psychologically inconsistent interactions and reducing data realism and quality. We introduce Graph2Counsel, a framework for generating synthetic counseling sessions grounded in Client Psychological Graphs (CPGs) that encode relationships among clients' thoughts, emotions, and behaviors. Graph2Counsel employs a structured prompting pipeline guided by counselor strategies and CPG, and explores prompting strategies including CoT (Wei et al., 2022) and Multi-Agent Feedback (Li et al., 2025a). Graph2Counsel produces 760 sessions from 76 CPGs across diverse client profiles. In expert evaluation, our dataset outperforms prior datasets on specificity, counselor competence, authenticity, conversational flow, and safety, with substantial inter-annotator agreement (Krippendorff's $\alpha$ = 0.70). Fine-tuning an open-source model on this dataset improves performance on CounselingBench (Nguyen et al., 2025) and CounselBench (Li et al., 2025b), showing downstream utility. We also make our code and data public.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Graph2Counsel, a framework for generating synthetic counseling dialogues grounded in Client Psychological Graphs (CPGs) that encode relationships among clients' cognitive, emotional, and behavioral states. Using structured prompting with counselor strategies, Chain-of-Thought, and multi-agent feedback, it produces 760 sessions from 76 CPGs across diverse profiles. Expert evaluation indicates superiority over prior datasets in specificity, counselor competence, authenticity, conversational flow, and safety, with Krippendorff's α = 0.70. Fine-tuning an open-source LLM on the dataset yields improvements on CounselingBench and CounselBench, and the authors make code and data publicly available.
Significance. Should the expert evaluations and downstream improvements prove robust, this approach could significantly advance the creation of high-quality, privacy-preserving synthetic data for LLM training in counseling, addressing key challenges in safety-critical mental health applications. The emphasis on structural dependencies in client states and the public release of resources are notable strengths for reproducibility and further research.
major comments (3)
- [Expert Evaluation] The claim that the generated dataset outperforms prior ones on specificity, counselor competence, authenticity, conversational flow, and safety rests entirely on expert human ratings (Krippendorff’s α = 0.70). The manuscript provides no information on whether raters are licensed clinicians, whether evaluation was blinded to dataset origin, or whether ratings correlate with external criteria such as real-session transcripts or client outcomes. This is load-bearing for the central claim of improved psychological consistency and safety.
- [CPG Construction and Dataset Generation] The construction of the Client Psychological Graphs (CPGs) is described at a high level but lacks details on whether it is manual, LLM-assisted, or hybrid, and no validation against actual client data or assessment of introduced artifacts is reported. Any such artifacts could be reproduced in the generated dialogues without detection by raters, undermining the grounding and realism claims.
- [Downstream Evaluation] The abstract and results report that fine-tuning improves performance on CounselingBench and CounselBench but supply no quantitative details, effect sizes, baseline comparisons, or statistical significance tests. This prevents assessment of the practical magnitude of the downstream utility.
minor comments (2)
- [Abstract] The abstract could briefly note the number of expert annotators and the exact rating scales used, in addition to the reported α value, to strengthen the inter-annotator agreement claim.
- [Method] Even with the public code release, including one or two example prompt templates from the structured prompting pipeline in the main text or appendix would improve immediate readability and reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review of our manuscript on Graph2Counsel. Their comments are valuable for improving the clarity and robustness of our claims. We address each major comment below, indicating the revisions we intend to make to the manuscript.
read point-by-point responses
-
Referee: [Expert Evaluation] The claim that the generated dataset outperforms prior ones on specificity, counselor competence, authenticity, conversational flow, and safety rests entirely on expert human ratings (Krippendorff’s α = 0.70). The manuscript provides no information on whether raters are licensed clinicians, whether evaluation was blinded to dataset origin, or whether ratings correlate with external criteria such as real-session transcripts or client outcomes. This is load-bearing for the central claim of improved psychological consistency and safety.
Authors: We agree that more details on the expert evaluation are necessary to support our claims. The manuscript will be revised to include information on the raters' qualifications and expertise in counseling psychology, the blinding procedure employed during rating, and any efforts or limitations in correlating ratings with external criteria like real transcripts or outcomes. We will also highlight the role of the structured prompting and multi-agent feedback in promoting psychological consistency and safety. These additions will address the load-bearing nature of the evaluation for our central claims. revision: yes
-
Referee: [CPG Construction and Dataset Generation] The construction of the Client Psychological Graphs (CPGs) is described at a high level but lacks details on whether it is manual, LLM-assisted, or hybrid, and no validation against actual client data or assessment of introduced artifacts is reported. Any such artifacts could be reproduced in the generated dialogues without detection by raters, undermining the grounding and realism claims.
Authors: We acknowledge the high-level description of CPG construction in the current manuscript. We will expand this section to detail the construction methodology, including the extent to which it is manual, LLM-assisted, or hybrid, and describe the validation processes used, such as expert review for consistency with psychological principles. We will also include an analysis of potential artifacts and how the generation pipeline, including Chain-of-Thought and multi-agent feedback, helps to detect and mitigate them. While direct validation against real client data is precluded by privacy considerations, we will discuss this as a limitation and explain the grounding mechanisms employed. revision: yes
-
Referee: [Downstream Evaluation] The abstract and results report that fine-tuning improves performance on CounselingBench and CounselBench but supply no quantitative details, effect sizes, baseline comparisons, or statistical significance tests. This prevents assessment of the practical magnitude of the downstream utility.
Authors: We concur that quantitative details are essential for evaluating the downstream utility. The revised manuscript will include specific performance numbers, effect sizes, baseline comparisons (e.g., against models fine-tuned on other synthetic datasets), and statistical tests for significance on both CounselingBench and CounselBench. These will be added to the results section to demonstrate the practical improvements achieved by fine-tuning on the Graph2Counsel dataset. revision: yes
Circularity Check
No circularity: empirical generation and evaluation pipeline
full rationale
The paper presents an empirical framework that constructs Client Psychological Graphs, applies structured prompting (including external techniques like Chain-of-Thought) to generate dialogues, evaluates them via expert ratings, and tests downstream utility on separate benchmarks. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided text. All load-bearing claims rest on external expert judgments and benchmark results rather than any self-referential reduction, self-citation chain, or ansatz smuggled through prior work by the same authors. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Structured graphs plus counselor strategies can guide LLMs to produce psychologically consistent counseling dialogues
- domain assumption Human experts can reliably assess psychological consistency, safety, and authenticity in generated dialogues
invented entities (1)
-
Client Psychological Graphs (CPGs)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Julian Burger, Christina Ralph-Nearman, and Cheri A
A novel approach for constructing per- sonalized networks from longitudinal perceived causal relations.Behaviour Research and Therapy, 173:104456. Julian Burger, Christina Ralph-Nearman, and Cheri A. Levinson. 2022. Integrating clinician and patient case conceptualization with momentary assessment data to construct idiographic networks: Moving to- ward pe...
-
[2]
InThe Twelfth International Con- ference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024
Talk like a graph: Encoding graphs for large language models. InThe Twelfth International Con- ference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. Aaron J. Fisher, Hannah G. Bosley, Katya C. Fernandez, Jonathan W. Reeves, Peter D. Soyster, Allison E. Diamond, and Jonathan Barkin. 2019. Open trial of a personali...
2024
-
[3]
Zhijun Guo, Alvina Lai, Johan H Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li, and 1 others
OpenReview.net. Zhijun Guo, Alvina Lai, Johan H Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li, and 1 others
-
[4]
Stephen N
Large language models for mental health ap- plications: systematic review.JMIR mental health, 11(1):e57400. Stephen N. Haynes, William H. O’Brien, and Antonio Godoy. 2020. A proposed model for the psychomet- ric evaluation of clinical case formulations with quan- tified causal diagrams.Psychological Assessment, 32(6):541–552. Place: US Publisher: American...
2020
-
[5]
arXiv preprint arXiv:2506.08584 (2025)
Personalizing eating disorder treatment us- ing idiographic models: An open series trial.Journal of Consulting and Clinical Psychology, 91(1):14–28. Place: US Publisher: American Psychological Asso- ciation. Xiang Li, Duyi Pan, Hongru Xiao, Jiale Han, Jing Tang, Jiabao Ma, Wei Wang, and Bo Cheng. 2025a. Dia- logueagents: A hybrid agent-based speech synthe...
work page internal anchor Pith review arXiv 2025
-
[6]
Eeyore: Realistic depression simulation via expert-in-the-loop supervised and preference opti- mization. InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 13750–13770, Vienna, Austria. Association for Computational Lin- guistics. Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023b. G-eval: NLG e...
-
[7]
Association for Computational Linguistics. Clarissa W. Ong, Hiba Arnaout, Kate Sheehan, Es- tella Fox, Eugen Owtscharow, and Iryna Gurevych. 2025a. Using large language models to create per- sonalized networks from therapy sessions.CoRR, abs/2512.05836. Clarissa W. Ong, Kate Sheehan, Adam J.D. Mann, and Estella Fox. 2025b. Examining the effects of process...
-
[8]
D4: a Chinese dialogue dataset for depression- diagnosis-oriented chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 2438–2459, Abu Dhabi, United Arab Emirates. Association for Computa- tional Linguistics. Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, and Xun Jiang. 2025. ...
-
[9]
and the Working Alliance Inventory (WAI) (Horvath and Greenberg, 1989). Both metrics are scored by GPT-4o (OpenAI, 2024) in an LLM-as- a-judge setup evaluating the generated counseling sessions. 8DeepSpeed 9Hugging Face 15 Strategy Evidence Therapy Modality Alternative Perspec- tive Counselor: So, now that you have a bit this stance, what are some other a...
-
[11]
Presenting Problem - What issue/symptoms do you want to discuss? (If there are multiple issues, discuss with the counselor to determine the most important or first issue to address) - When did the problem/symptoms start? - What was the stress level when the problem/symptoms first occurred? (What do you think might be the cause?) - How has the problem/symp...
-
[15]
24 CPG-grounded diverse Client Profile Generation System Prompt Your task is to generate diverse synthetic client intake forms for mental health counseling sessions
Is there anyone you can talk to or get help from when you encounter difficulties or problems? ##Example output 1: {example_output_1} ##Example output 2: {example_output_2} Client Intake Form: Figure 3: Prompt used to generate a single CPG-grounded client profile. 24 CPG-grounded diverse Client Profile Generation System Prompt Your task is to generate dive...
-
[16]
You must generate exactly 10 distinct client intake forms
-
[17]
Each profile must be unique in: -name, age, gender, background -symptom expression and wording -life history, stressors, and coping attempts
-
[18]
DO NOT copy, paraphrase, or structurally reuse the example profiles
-
[19]
DO NOT reuse sentence templates, phrasing, or paragraph structure from the examples
-
[20]
DO NOT repeat any example content, even partially
-
[21]
DO NOT mention clinical models, diagnoses, or technical psychological terminology
-
[22]
Write strictly from the client’s perspective, using everyday language
-
[23]
The graph reflects expert knowledge, but the client is unaware of the graph and should not sound clinically insightful
-
[24]
Infer content from the graph implicitly, not by naming nodes or edges
-
[25]
No profile may resemble another in tone, life stage, or narrative arc
-
[26]
Output must be valid JSON only, with no surrounding text or commentary
-
[27]
## Task Infer 10 diverse client intake forms based on a client graph
Any violation of formatting or repetition invalidates the output. ## Task Infer 10 diverse client intake forms based on a client graph. The **client graph** is given as a list of nodes representing recurring psychological and behavioral patterns of the client, and edges representing connections between them. **Each client intake form must include the foll...
-
[28]
Basic Information - name, age, gender, occupation, education, marital status, family details
-
[29]
Presenting Problem - What symptoms do you want to discuss? - When did the problem/symptoms start? - What was the stress level when the problem/symptoms first occurred? (What do you think might be the cause?) - How has the problem/symptoms progressed? (Changes over time, aggravating factors, alleviating factors, etc.) - Currently, in what situations, how o...
-
[30]
Reason for Seeking Counseling - What was the decisive factor that made you decide to seek counseling this time? (If the problem has been long standing, what made you decide to seek counseling now?)
-
[31]
Past History (including medical history) - Have you experienced similar problems before? Under what circumstances or stress did the problem occur, and what were the patterns? How did you cope? - Have you received treatment/counseling for other psychological problems/symptoms? (When, for how long, any medication use, reasons for stopping - improved? stoppe...
-
[32]
Academic/occupational functioning level (attendance, grades/job performance, etc.) - Interpersonal relationships - Daily life (including sleep, eating, self-care, etc.) - Social Support System
-
[33]
profile":
Is there anyone you can talk to or get help from when you encounter difficulties or problems? ## Output Format - Output only valid JSON - Do not include any explanation or comments. Just output the profiles. ## Example Output The following is an example output. Do not copy any profiles directly. [ {{"profile": "Example profile number 1"}}, {{"profile": "E...
-
[34]
{Not used in Profile}
The dialogue must be consistent with the client intake form {not used in CPG} and client graph. {Not used in Profile}
-
[35]
{Not used in Profile}
Do not use all the nodes and edges in the client graph; include only what naturally fits the flow of the session. {Not used in Profile}
-
[36]
mm-hm",
Use natural conversational signals whenever appropriate (e.g., "mm-hm", "um", "yeah","right","...")
-
[37]
When explaining experiences, emotions, reflections, or psychoeducation, **both counselor and client must use multi-sentence utterances (3–5 sentences)**
-
[38]
Do **not** advance to new topics or conclusions in consecutive turns
The session should progress through ideas gradually. Do **not** advance to new topics or conclusions in consecutive turns. Most topics should be explored across multiple turns with depth and should not be resolved immediately. Figure 6: Global constraints for counseling dialogue generation. 26 Counselor guidelines for counseling dialogue generation
-
[39]
For counselor turns, encourage natural elaboration rather than brevity. In each counselor utter- ance, explicitly use at least one counseling technique, such as reflection, open-ended questioning, summarizing, or gentle reframing, without sounding mechanical or repetitive
-
[40]
Do not dismiss the client’s experience
Maintain a nonjudgmental, collaborative stance; avoid jumping to conclusions or positioning yourself as the authority. Do not dismiss the client’s experience
-
[41]
take your time
The counselor should support the client in examining, questioning, and reshaping their own thoughts and experiences at their own pace through acknowledging pauses, hesitation, or silence (e.g., “take your time”, “we can sit with that for a moment”)
-
[42]
The dialogue should not feel like an interview**
**The counselor must not end every utterance with a question. The dialogue should not feel like an interview**
-
[43]
Rather they should build towards the information they introduce
The counselor should not introduce new information randomly. Rather they should build towards the information they introduce
-
[44]
The counselor should encourage the client to apply concepts to their real life, specific scenarios and/or review past week and upcoming week assignments, focusing on specific ways to connect session content with real-life applications
-
[45]
does this sound useful to you?
The counselor should prioritize understanding, emotional safety, and rapport before offering interventions or insights. When appropriate, the counselor should check in to ensure shared under- standing (e.g., “does this sound useful to you?”, “does this make sense?”, “sounds like you’re going through [client’s issue] — is that right?”). **These check-ins s...
-
[46]
I want to check in about something, just to make sure I understand how you’re doing
The counselor should do assessment/follow-up on client comments that could be indicative of a larger issue (e.g., hopelessness = assess for suicidality, weight loss = assess for eating disorder/ap- petite changes, difficult relationship = assess for safety at home, etc). The counselor should frame these questions as curiosity and care, not assumptions (e....
-
[47]
Psychoeducation should be preceded by a brief reflection or summary that connects it directly to what the client just shared
The counselor should offer psychoeducation when it directly supports the client’s understanding or client expresses misunderstanding of treatment concepts. Psychoeducation should be preceded by a brief reflection or summary that connects it directly to what the client just shared. The counselor should use clear, everyday language for psychoeducation and c...
-
[48]
The counselor must respect pacing and readiness; invite exploration without rushing
-
[49]
**Repeating exact phrasing is disallowed**; repeating therapeutic functions (e.g., reflection, validation) using varied language is expected
-
[50]
It shouldn’t be the other way around where the counselor asks the client if some techniques comes to mind
**The counselor should be the one to suggest psychological techniques to the client**. It shouldn’t be the other way around where the counselor asks the client if some techniques comes to mind. It is fine for the counselor to ask the client if they have tried anything already. Figure 7: Counselor guidelines (designed with direct input from clinicians) for...
-
[51]
{Not used in Profile}
The client graph influences responses implicitly and must not be named directly. {Not used in Profile}
-
[52]
attachment issues,
The client should express their experiences in everyday, non-clinical language and should not self-diagnose or use professional terminology (e.g., “attachment issues,” “cognitive distortions”) as a trained counselor would. The client may, however, use informal or popular mental health terms commonly encountered on social media, as well as terms that have ...
-
[53]
The client should not provide detailed concrete descriptions or mini- narratives unless the counselor explicitly asks for elaboration or invites reflection
Early in the session, the client should respond with **brief, surface-level descriptions** of emotions and experiences. The client should not provide detailed concrete descriptions or mini- narratives unless the counselor explicitly asks for elaboration or invites reflection. Detailed emotional descriptions and mini-narratives (what happened, what was not...
-
[54]
I hear what you’re saying, but at the same time it doesn’t really feel true for me
**The client can express ambivalence, confusion, or difficulty naming emotions when appropri- ate**. Ambivalence should be expressed as simultaneous pull in opposing directions, not passive agreement (e.g.,“I hear what you’re saying, but at the same time it doesn’t really feel true for me.”)
-
[55]
I’m not really sure what that means
When the counselor offers an interpretation, suggestion, coping strategy, or reframing, the client must first respond with at least one of the following before any agreement: Confusion (“I’m not really sure what that means. . . ”), Skepticism (“I don’t see how that would help. . . ”), Partial resistance (“I get what you’re saying, but. . . ”), Difficulty ...
-
[56]
speaker":
A structured rationale explaining why that utterance was generated at that moment, grounded in: - The current dialogue history, - Appropriate counseling techniques for counselor turns, - The client intake form Not used in CPG and client graph Not used in Profile for client turns. The rationale should justify intent and alignment, not reveal hidden interna...
-
[59]
If there is any deficiency, no matter how minor, assign a score of 4 or lower
Assign a score based on the criteria, grading very strictly and uptight. If there is any deficiency, no matter how minor, assign a score of 4 or lower
-
[60]
Do not add any prefix
Output the score and the explanation, separated by a comma. Do not add any prefix. Counseling conversation: {conversation} Evaluation Question: {question} Criteria: {criteria} Figure 34: Prompt used to evaluate the generated counseling sessions on CTRS. W AI LLM-as-a-judge Evaluation Prompt The following is a psychological counseling session between a cou...
-
[61]
Read the counseling session transcript carefully
-
[62]
Review the evaluation questions and criteria provided below
-
[63]
Assign a score based on the criteria, grading very strictly
-
[64]
utterance 1
Output the score (***only the numerical***) and the explanation, separated by a comma. ***Do not add any prefix.*** Counseling conversation: {conversation} Question: {question} Criteria: {criteria} Figure 35: Prompt used to evaluate the generated counseling sessions on W AI. 41 Client Issue Extraction Prompt for SQPsychConv Extract the presenting problem ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.