pith. sign in

arxiv: 2605.22140 · v1 · pith:QCT3VDODnew · submitted 2026-05-21 · 💻 cs.CL

Psy-Chronicle:A Structured Pipeline for Synthesizing Long-Horizon Campus Psychological Counseling Dialogues

Pith reviewed 2026-05-22 06:27 UTC · model grok-4.3

classification 💻 cs.CL
keywords long-horizon dialoguespsychological counselingdata synthesiscampus stress eventsmulti-agent simulationmemory integrationlanguage model evaluation
0
0 comments X

The pith

Psy-Chronicle creates long-horizon counseling dialogues by simulating student and counselor interactions across a semester of campus events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a framework called Psy-Chronicle to generate extended psychological counseling dialogues that span multiple sessions over a college semester. It builds a graph of stress events in chronological order and uses two AI agents to simulate ongoing conversations while maintaining memory of prior exchanges. This produces a large open-source dataset of 90,000 Chinese dialogues from 100 student profiles. Evaluation on a new benchmark reveals gains in session responses and memory recall but highlights ongoing difficulties with causal connections between events.

Core claim

Psy-Chronicle is a structured pipeline that first constructs a semester-spanning temporal stress event graph to capture the order and dependencies of campus life events, then employs interactive simulation between a student agent and a counselor agent augmented by a structured memory integration mechanism to produce continuous long-horizon dialogues that model the accumulation and evolution of psychological distress.

What carries the argument

The temporal stress event graph combined with agent-based interactive simulation and memory integration mechanism, which ensures continuity and evolution across multiple counseling sessions.

If this is right

  • Training models on the resulting CPCD dataset leads to better session-level response generation.
  • Models exhibit improved long-horizon memory recall across counseling sessions.
  • Performance on temporal-causal reasoning shows limited gains, pointing to event-chain organization as a remaining challenge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar structured pipelines could extend to modeling long-term dynamics in other domains such as chronic health management or educational tutoring.
  • The dataset opens opportunities for testing how well models handle accumulating context in real psychological support applications.
  • Addressing the causal reasoning gap may require integrating explicit reasoning modules or causal inference techniques into the dialogue models.

Load-bearing premise

That the agent simulations and memory mechanism produce dialogues that genuinely reflect how students' psychological distress builds and changes through real campus life events over time.

What would settle it

An experiment comparing model performance on actual recorded long-term student counseling sessions versus those generated by Psy-Chronicle to check if improvements transfer to real data.

Figures

Figures reproduced from arXiv: 2605.22140 by Chaogui Gou, Jiarui Liang.

Figure 1
Figure 1. Figure 1: Comparison of three psychological counseling data paradigms.Compared with single-turn [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Psy-Chronicle framework for long-horizon campus counseling data synthesis, post [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CPCD Data Distribution. (a) Distribution of campus stress event domains, covering [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Profile diversity analysis of CPCD. (a) Pairwise TF-IDF cosine similarity distribution among [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Semantic similarity analysis of counseling dialogues. (a) Average semantic similarity of [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

In recent years, large language models have shown substantial potential in psychological support tasks. However, existing psychological counseling data mostly rely on single-turn question answering or short multi-turn dialogues, making it difficult to characterize how college students' psychological distress accumulates, interacts, and gradually evolves over long periods within campus life events. To address this issue, this paper proposes Psy-Chronicle, a structured data-generation framework for synthesizing long-horizon campus psychological counseling dialogues. We generate a semester-spanning temporal stress event graph to model the chronological order and evolutionary dependencies among campus stress events. Through interactive simulation between a student agent and a counselor agent, together with a structured memory integration mechanism, Psy-Chronicle generates long-horizon dialogues with continuity across counseling sessions. Based on Psy-Chronicle, we construct and open-source CPCD, a Chinese long-horizon dialogue dataset for college psychological counseling, containing 100 student profiles, 90,000 counseling dialogues. We further build CPCD-Bench to evaluate models' long-horizon campus counseling capabilities from three dimensions: session-level response, long-horizon memory recall, and temporal-causal reasoning. Experimental results show that CPCD effectively improves session-level response generation and long-horizon memory recall for models with the same base architecture. Meanwhile, improvements in temporal-causal reasoning remain limited, indicating that event-chain organization and causal explanation are key challenges in long-horizon psychological counseling modeling. The related code and data are available at: https://github.com/EdwinUSTB/Psy-Chronicle

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes Psy-Chronicle, a structured pipeline that generates a semester-spanning temporal stress event graph and uses interactive student-counselor agent simulations with structured memory integration to synthesize long-horizon campus psychological counseling dialogues. From this process the authors release the CPCD dataset (100 student profiles, 90,000 dialogues) and the CPCD-Bench evaluation suite, which measures session-level response generation, long-horizon memory recall, and temporal-causal reasoning. Experiments on models of the same base architecture report gains in response quality and memory recall but only limited improvement in causal reasoning, which the authors interpret as evidence that event-chain organization remains a key challenge.

Significance. If the synthetic dialogues and event chains can be shown to align with real campus counseling processes, the released dataset and benchmark would address a genuine scarcity of long-horizon resources for psychological-support modeling. The open-sourcing of code and data is a clear positive contribution. However, the absence of external anchors (real transcripts, clinical ratings, or human-authored long-horizon dialogues) and the shared generation pipeline between training data and benchmark limit the strength of claims about generalization.

major comments (1)
  1. [Abstract / Experimental results] The central experimental claim (Abstract and Experimental results) that CPCD improves session-level response generation and long-horizon memory recall rests on CPCD-Bench, which is constructed from the identical Psy-Chronicle pipeline (temporal stress event graph + student/counselor agent simulation + memory integration). Because the benchmark and training data share the same generative process, measured gains may largely reflect consistency with the generator’s own patterns rather than independent generalization to authentic student-counselor interactions. No comparison to real counseling transcripts or expert clinical ratings is reported, making the effectiveness claim difficult to interpret.
minor comments (2)
  1. [Abstract] The abstract states that improvements are “targeted” yet provides no details on baseline training procedures, hyper-parameter settings, or statistical significance tests for the reported gains.
  2. [Methodology] Notation for the memory integration mechanism and the precise definition of “temporal-causal reasoning” metrics could be clarified with an explicit equation or pseudocode example.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the thoughtful and detailed review. We address the major comment on the experimental claims and evaluation design below, acknowledging the shared pipeline between data generation and benchmarking as a substantive limitation.

read point-by-point responses
  1. Referee: [Abstract / Experimental results] The central experimental claim (Abstract and Experimental results) that CPCD improves session-level response generation and long-horizon memory recall rests on CPCD-Bench, which is constructed from the identical Psy-Chronicle pipeline (temporal stress event graph + student/counselor agent simulation + memory integration). Because the benchmark and training data share the same generative process, measured gains may largely reflect consistency with the generator’s own patterns rather than independent generalization to authentic student-counselor interactions. No comparison to real counseling transcripts or expert clinical ratings is reported, making the effectiveness claim difficult to interpret.

    Authors: We agree that CPCD-Bench shares the same generative process as the training data, which means the reported gains primarily demonstrate improved consistency with the temporal stress event graphs and memory mechanisms of Psy-Chronicle rather than broad generalization to real counseling. This controlled synthetic setup was chosen to isolate and measure long-horizon capabilities that are difficult to evaluate without such structure. We did not report comparisons to real transcripts or clinical ratings because publicly available long-horizon campus counseling data are scarce and real student records are protected by strict privacy regulations and ethical review requirements. In the revised version we will add an expanded Limitations section that explicitly discusses this point, qualifies the interpretation of the experimental results, and outlines future work involving human evaluation or external anchors where feasible. revision: partial

standing simulated objections not resolved
  • Direct empirical comparison against real counseling transcripts or expert clinical ratings, which cannot be performed without access to protected data

Circularity Check

0 steps flagged

Psy-Chronicle synthetic pipeline and CPCD-Bench evaluation are self-contained with no load-bearing reduction to inputs

full rationale

The paper describes a data-generation pipeline that produces a temporal stress graph, runs student/counselor agent simulations with memory integration, and emits CPCD dialogues; it then builds an internal CPCD-Bench from the same pipeline and reports that fine-tuning on CPCD improves session-level and memory-recall metrics on that bench for same-architecture models. This is a standard synthetic-data train/test split on a held-out portion of the generated distribution; the abstract explicitly limits the claim to the synthetic setting and notes that temporal-causal gains remain limited. No equation, parameter fit, or self-citation is shown to be definitionally equivalent to the reported improvement, and the evaluation dimensions are distinct from the generation procedure itself.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The framework depends on the untested premise that LLM agents can faithfully simulate evolving psychological states and on the utility of the newly introduced temporal stress graph; no external validation or real-world grounding for either is supplied in the abstract.

free parameters (2)
  • Number of student profiles
    Chosen to produce a dataset of 100 profiles for diversity in the generated dialogues.
  • Total number of counseling dialogues
    Set at 90,000 to create a large-scale training resource.
axioms (1)
  • domain assumption Large language models can realistically simulate student psychological distress accumulation and counselor responses across multiple sessions when equipped with memory.
    Invoked to justify the interactive simulation and memory integration mechanism that produces the long-horizon dialogues.
invented entities (1)
  • Semester-spanning temporal stress event graph no independent evidence
    purpose: Models chronological order and evolutionary dependencies among campus stress events to structure dialogue generation.
    Newly constructed structure that organizes the long-horizon synthesis process.

pith-pipeline@v0.9.0 · 5811 in / 1537 out tokens · 79334 ms · 2026-05-22T06:27:49.240343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    The profile should be realistic and credible, reflecting the actual situation of contemporary college students

  2. [2]

    It should include diverse backgrounds, personalities, and psychological difficulties

  3. [3]

    Avoid stereotypes and show individual uniqueness

  4. [4]

    Psychological difficulties should be specific and have depth

  5. [5]

    Please generate a detailed psychological profile of a college student

    The output must conform to the specified JSON format. Please generate a detailed psychological profile of a college student. You must strictly follow the following specified attributes:

  6. [6]

    Category: {category_name} Detailed category description: {category_desc} Please ensure that the profile includes:

  7. [7]

    Basic information: name, gender, age, grade, major

  8. [8]

    Personality traits: describe them in an MBTI or Big Five style and specify stress-coping mechanisms

  9. [9]

    Family and social background: include possible factors that may lead to psychological risks

  10. [10]

    Please strictly output according to the JSON schema and do not add additional explanations

    Core psychological conflict or difficulty: summarize the student’s main internal conflict in one sentence, and ensure that it is related to the corresponding stress domain. Please strictly output according to the JSON schema and do not add additional explanations. B Temporal Stress Event Graph Temporal stress event graph generation takes the student profi...

  11. [11]

    Academic stress - e.g., failing a course, exams, papers

  12. [12]

    Interpersonal relationships - e.g., dormitory conflicts, isolation, romantic relationship issues

  13. [13]

    Career development - e.g., internship, job search, uncertainty about postgraduate entrance exams

  14. [14]

    Family and finances - e.g., insufficient living expenses, family changes

  15. [15]

    Physical and mental health - e.g., insomnia, eating problems, anxiety attacks [Output Requirements]:

  16. [16]

    Output a JSON list containing 10-15 key events

  17. [17]

    id": event ID (e.g.,

    Each event must include the following fields: - "id": event ID (e.g., "E1", "E2") - "week": occurrence week (1-16, integer) - "domain": event domain (select from the above five domains) - "event_content": specific description of the event (first-person or third-person is acceptable; include concrete details) - "psychological_impact": description of psycho...

  18. [18]

    The events must reflect the [Core Conflict] in the profile

  19. [19]

    Events should be logically coherent, and stress should gradually accumulate or erupt as the semester progresses

  20. [20]

    C Evaluation Protocol Detail CPCD-Bench includes three task types: Session-level Response (SR), Memory Recall (MR), and Temporal-Causal Reasoning (TCR)

    Return the JSON list directly and do not include Markdown formatting marks. C Evaluation Protocol Detail CPCD-Bench includes three task types: Session-level Response (SR), Memory Recall (MR), and Temporal-Causal Reasoning (TCR). Because all three tasks are open-ended generation or open-ended question answering, evaluation does not primarily rely on litera...