Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

Chi Zhang; Jieming Cui; Muyang Li; Ye Zhao; Yinyin Zang; Yixin Zhu; Yixuan Wang; Yu Li; Yuxi Ma

arxiv: 2604.27846 · v1 · submitted 2026-04-30 · 💻 cs.CL

Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

Yuxi Ma , Jieming Cui , Muyang Li , Ye Zhao , Yu Li , Yixuan Wang , Chi Zhang , Yinyin Zang

show 1 more author

Yixin Zhu

This is my paper

Pith reviewed 2026-05-07 04:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords narrative evaluationmental health predictionlarge language modelstherapeutic writingdiscourse structuredepressionanxietymulti-level framework

0 comments

The pith

Macro-level LLM evaluation of narrative structure outperforms lexical features and embeddings for predicting mental health from therapeutic writing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

People narrate experiences in ways that reflect how their minds organize them, offering clues to conditions like depression, anxiety, or trauma. The paper introduces a three-level framework that separates micro-level word features, meso-level semantic embeddings, and macro-level assessment of overall story organization using large language models. Applied to 830 Chinese therapeutic texts, the macro level substantially outperforms the others in mental health prediction. This matters because it challenges the field's focus on counting words and shows that global discourse structure carries stronger clinical signal. The framework draws on discourse theory to map analysis levels onto how narratives are actually built.

Core claim

Across 830 Chinese therapeutic texts spanning depression, anxiety, and trauma, macro-level LLM narrative evaluation substantially outperforms micro-level lexical features and meso-level semantic embeddings for mental health prediction. Formal structural features such as Labov's story grammar, RST coherence, and propositional composition show that narrative organization per se carries predictive signal, while clinically-grounded narrative dimensions capture how psychological states are expressed through discourse. Semantic embeddings add minimal independent value but yield incremental gains in multi-level classification.

What carries the argument

A three-level framework that maps micro-level lexical features, meso-level semantic embeddings, and macro-level LLM narrative evaluation onto the hierarchical processes of narrative construction, with the macro level assessing global organization via story grammar, coherence relations, and clinically relevant dimensions.

If this is right

Narrative organization at the macro level carries predictive signal independent of lexical counts.
Semantic embeddings contribute only incremental value when added to other levels in classification.
Clinically-grounded dimensions of narratives express psychological states through discourse structure.
The framework generates testable hypotheses for intervention design based on narrative patterns.
Longitudinal studies can track changes in macro-level organization during therapy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The macro-level advantage could be tested by applying the framework to English-language or non-Chinese therapeutic texts to check cross-linguistic stability.
Integration into digital therapy tools might enable real-time feedback on global story coherence to support patient progress.
Direct comparison of LLM macro scores against human clinician ratings on the same texts would clarify clinical validity.

Load-bearing premise

That large language model assessments of narrative structure accurately reflect clinically relevant discourse organization without bias from the model or prompt design.

What would settle it

A replication on a fresh dataset in which macro-level LLM narrative scores show no predictive advantage over lexical feature baselines for mental health outcomes.

Figures

Figures reproduced from arXiv: 2604.27846 by Chi Zhang, Jieming Cui, Muyang Li, Ye Zhao, Yinyin Zang, Yixin Zhu, Yixuan Wang, Yu Li, Yuxi Ma.

**Figure 1.** Figure 1: Multi-level analytical framework for therapeutic writing. Three computational layers—grounded in discourse processing theory (Kintsch & Van Dijk, 1978)—operationalize hierarchical narrative construction. The micro-level captures lexical patterns via LIWC; the meso-level quantifies semantic coherence through sentence embeddings; the macro-level employs LLMs as structured evaluators, distinguishing formal… view at source ↗

**Figure 2.** Figure 2: Performance across layered feature sets. Radar plot comparing regression (R 2 ) and classification (AUC) performance across five tasks. Each line represents a feature combination, from the baseline alone (B) through sequential layer addition up to B + L1 + L2 + L3. All metrics are normalized to a maximum of 1.00. Layer 3 (macro-level) alone approaches full-model performance, while Layer 2 (meso-level) alon… view at source ↗

**Figure 3.** Figure 3: SHAP summary plots for depression and anxiety score prediction. Features are ranked top-to-bottom by mean absolute SHAP value (global importance). Each dot represents one sample; horizontal position indicates the SHAP value (contribution to model output), and color encodes the original feature magnitude (red = high, blue = low). Feature labels indicate their computational layer: L3a (formal structural), L3… view at source ↗

read the original abstract

How people narrate their experiences offers a window into how the mind organizes them. Computational approaches to therapeutic writing have evolved from lexical counting to neural methods, yet remain fragmented: dictionary tools miss discourse structure, while embeddings conflate local coherence with global organization. No existing framework maps these techniques onto the hierarchical processes through which narratives are constructed. Here we introduce a three-level framework - micro-level lexical features, meso-level semantic embeddings, and macro-level LLM narrative evaluation - and show, across 830 Chinese therapeutic texts spanning depression, anxiety, and trauma, that macro-level evaluation substantially outperforms lexical and embedding features for mental health prediction. This challenges the field's emphasis on word-counting: formal structural features (Labov's story grammar, RST coherence, propositional composition) demonstrate that narrative organization per se carries predictive signal, while clinically-grounded narrative dimensions capture how psychological states are expressed through discourse. Semantic embeddings add minimal independent value but yield incremental gains in multi-level classification. By grounding computational levels in discourse processing theory, this framework identifies macro-structural organization as the primary locus of clinical signal and generates testable hypotheses for intervention design and longitudinal research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The three-level framework is a clean theoretical move but the macro-level outperformance claim needs explicit validation to separate discourse signal from LLM artifacts.

read the letter

The paper proposes a three-level hierarchy for therapeutic narratives: micro lexical features, meso embeddings, and macro LLM narrative evaluation. It tests this on 830 Chinese texts and reports that the macro level substantially outperforms the others for mental health prediction. The main new element is the explicit mapping onto discourse processing theory (Labov story grammar, RST coherence, propositional structure) and the empirical result that macro organization adds value beyond embeddings on non-English clinical writing. That framing is useful because most prior work stays at the lexical or embedding level without testing whether higher discourse structure carries independent clinical information. The Chinese data set is also a practical plus. The authors are right that embeddings alone add little incremental signal here. The soft spots are concentrated in the macro evaluation itself. The abstract gives no prompt template, no list of scored narrative dimensions, no LLM details, no temperature or few-shot setup, and no human validation of the ratings. Without those, the reported gains could easily come from the LLM re-expressing lexical or stylistic patterns already captured (or better captured) by the lower levels, especially on Chinese text where tokenization and cultural fit are uneven. The claim that formal structural features demonstrate independent signal would be stronger with the actual ablation numbers, statistical tests, and error analysis. This work is for researchers in computational mental health or clinical NLP who want to move past bag-of-words toward discourse structure. A reader already thinking about narrative psychology or LLM use in therapy transcripts would get a usable starting framework, though they would have to fill in the validation gaps themselves. I would send it to peer review. The theoretical grounding and data size are enough to justify referee time, but the authors should expect pointed questions on the LLM setup and controls.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a three-level framework for computational analysis of therapeutic writing: micro-level lexical features, meso-level semantic embeddings, and macro-level LLM narrative evaluation. On a corpus of 830 Chinese texts spanning depression, anxiety, and trauma, it claims that macro-level LLM evaluation substantially outperforms lexical and embedding baselines for mental-health prediction. The work grounds the levels in discourse-processing theory (Labov story grammar, RST coherence, propositional composition) and argues that narrative organization per se supplies clinically relevant signal beyond what lower-level features capture, while also noting that embeddings add only incremental value in multi-level models.

Significance. If the reported superiority is robust and the LLM scores are shown to be independent of lexical leakage, the paper would advance the field by replacing fragmented word-counting or embedding approaches with a theoretically motivated hierarchy that identifies macro-structural organization as the primary locus of clinical signal. The explicit linkage to Labov/RST constructs and the use of Chinese therapeutic data are positive features that could generate testable hypotheses for longitudinal studies and intervention design. However, the current absence of methodological transparency and quantitative results prevents any assessment of whether these benefits are realized.

major comments (3)

[Abstract] Abstract: The central claim that 'macro-level evaluation substantially outperforms lexical and embedding features' is asserted without any performance metrics, statistical tests, baseline definitions, or error analysis. This absence makes it impossible to evaluate whether the data support the claim or whether the reported gains are additive beyond what embeddings already encode.
[Methods] Methods (inferred from abstract description): No details are supplied on the LLM employed, the precise prompt template, the set of narrative dimensions scored, temperature, few-shot examples, or any human validation of the LLM ratings. Without these, it cannot be determined whether the macro scores reflect independent discourse organization (Labov/RST/propositional structure) or simply re-express lexical or stylistic cues already captured at the micro and meso levels, especially on Chinese text where tokenization and cultural alignment issues are well-documented.
[Results] Results section: The statements that 'semantic embeddings add minimal independent value' and that 'formal structural features demonstrate that narrative organization per se carries predictive signal' require explicit ablation studies, feature-importance metrics, or cross-level correlation analyses. The current text provides none, leaving open the possibility that the macro-level advantage is an artifact of prompt leakage rather than genuine hierarchical structure.

minor comments (2)

[Abstract] The abstract would be strengthened by a single sentence reporting the key quantitative result (e.g., macro F1 or AUC) and the exact number of classes or regression targets.
[Dataset] Clarify whether the 830 texts are balanced across the three mental-health categories and whether any demographic or writing-length covariates were controlled.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which identifies important opportunities to strengthen the transparency and rigor of our presentation. We address each major comment below and will make substantial revisions to the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'macro-level evaluation substantially outperforms lexical and embedding features' is asserted without any performance metrics, statistical tests, baseline definitions, or error analysis. This absence makes it impossible to evaluate whether the data support the claim or whether the reported gains are additive beyond what embeddings already encode.

Authors: We agree that the abstract should include quantitative evidence to support the central claim. In the revised manuscript, we will add specific performance metrics (accuracy, macro-F1), statistical test results (e.g., paired t-tests or McNemar tests with p-values), brief baseline definitions, and a summary of error patterns. These additions will allow readers to directly assess the magnitude and reliability of the reported outperformance. revision: yes
Referee: [Methods] Methods (inferred from abstract description): No details are supplied on the LLM employed, the precise prompt template, the set of narrative dimensions scored, temperature, few-shot examples, or any human validation of the LLM ratings. Without these, it cannot be determined whether the macro scores reflect independent discourse organization (Labov/RST/propositional structure) or simply re-express lexical or stylistic cues already captured at the micro and meso levels, especially on Chinese text where tokenization and cultural alignment issues are well-documented.

Authors: We acknowledge that the current Methods section lacks sufficient implementation detail. In the revision, we will expand it with a new subsection that specifies the exact LLM (model name, version, and provider), the full prompt template (including the narrative dimensions explicitly mapped to Labov story grammar, RST relations, and propositional structure), all hyperparameters (temperature, top-p, few-shot examples if used), and quantitative human validation results (e.g., Pearson correlations or Cohen’s kappa between LLM scores and expert annotations on a held-out subset). We will also add a paragraph addressing Chinese-language tokenization and cultural alignment, with evidence that the macro scores target discourse-level constructs rather than lexical or stylistic leakage. revision: yes
Referee: [Results] Results section: The statements that 'semantic embeddings add minimal independent value' and that 'formal structural features demonstrate that narrative organization per se carries predictive signal' require explicit ablation studies, feature-importance metrics, or cross-level correlation analyses. The current text provides none, leaving open the possibility that the macro-level advantage is an artifact of prompt leakage rather than genuine hierarchical structure.

Authors: We agree that the Results section requires additional quantitative support for the hierarchical claims. In the revised manuscript, we will insert new analyses including: (1) systematic ablation experiments that remove each level in turn and report the resulting performance drops, (2) feature-importance rankings (e.g., from logistic regression or random-forest coefficients) across the multi-level model, and (3) cross-level correlation matrices together with partial-correlation controls to quantify independence. We will also include a targeted check for prompt leakage by correlating macro scores against micro- and meso-level features and by reporting performance on lexical-only controls. These additions will directly address the concern that the macro advantage may be artifactual. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework is self-contained

full rationale

The paper introduces a three-level framework (micro lexical features, meso semantic embeddings, macro LLM narrative evaluation) and reports empirical results showing macro-level evaluation outperforms baselines on 830 Chinese therapeutic texts for mental health prediction. The central claim rests on direct experimental comparisons against external mental-health labels, with grounding in discourse theory (Labov, RST) but no mathematical derivations, self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce outputs to inputs by construction. No equations or closed loops appear in the provided text; the superiority claim is tested rather than assumed via prior author work. This is the standard case of an honest empirical study with independent validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that LLM judgments of narrative structure validly reflect psychological states and that the 830-text corpus is representative of the target clinical conditions.

axioms (2)

domain assumption LLM narrative evaluation can reliably assess macro-level discourse features such as Labov's story grammar and RST coherence in a clinically meaningful way
Invoked when the abstract states that formal structural features demonstrate predictive signal and that narrative organization carries clinical information.
domain assumption The 830 Chinese therapeutic texts spanning depression, anxiety, and trauma constitute a representative sample for mental-health prediction
Underlies the reported empirical comparison across conditions.

pith-pipeline@v0.9.0 · 9301 in / 1196 out tokens · 143136 ms · 2026-05-07T04:53:55.592568+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references

[1]

Adler, J. M. (2012). Living into the story: Agency and coher- ence in a longitudinal study of narrative identity develop- ment and mental health over the course of psychotherapy. Journal of Personality and Social Psychology,102(2), 367 (cit. on pp. 3, 5, 6)

2012
[2]

M., Lodi-Smith, J., Philippe, F

Adler, J. M., Lodi-Smith, J., Philippe, F. L., & Houle, I. (2016). The incremental validity of narrative identity in predicting well-being: A review of the field and recommendations for the future.Personality and Social Psychology Review,20(2), 142–175 (cit. on pp. 1–3, 5)

2016
[3]

Barzilay, R., & Lapata, M. (2008). Modeling local coherence: An entity-based approach.Computational Linguistics,34(1), 1–34 (cit. on p. 2)

2008
[4]

T., Epstein, N., Brown, G., & Steer, R

Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties.Journal of Consulting and Clinical Psychology, 56(6), 893 (cit. on p. 3)

1988
[5]

T., Rush, A

Beck, A. T., Rush, A. J., Shaw, B. F., Emery, G., DeRubeis, R. J., & Hollon, S. D. (1979).Cognitive therapy of depres- sion. Guilford Press. (Cit. on pp. 3, 6)

1979
[6]

T., Steer, R

Beck, A. T., Steer, R. A., Ball, R., & Ranieri, W. F. (1996). Comparison of beck depression inventories-ia and-ii in psychiatric outpatients.Journal of Personality Assessment, 67(3), 588–597 (cit. on p. 2)

1996
[7]

L., & Schwartz, H

Boyd, R. L., & Schwartz, H. A. (2021). Natural language analysis and the psychology of verbal behavior: The past, present, and future states of the field.Journal of Language and Social Psychology,40(1), 21–41 (cit. on p. 1)

2021
[8]

R., Gregory, J

Brewin, C. R., Gregory, J. D., Lipton, M., & Burgess, N. (2010). Intrusive images in psychological disorders: Char- acteristics, neural mechanisms, and treatment implications. Psychological Review,117(1), 210 (cit. on p. 2)

2010
[9]

Bruner, J. (1991). The narrative construction of reality.Critical inquiry,18(1), 1–21 (cit. on p. 3)

1991
[10]

Francis, S. E. (2000). Assessment of symptoms of dsm-iv anxiety and depression in children: A revised child anxiety and depression scale.Behaviour Research and Therapy, 38(8), 835–855 (cit. on p. 3)

2000
[11]

Cohen, J., Mannarino, A., Deblinger, E., et al. (2006). Treat- ing trauma and traumatic grief in children and adolescents. Guilford Publications(cit. on p. 3)

2006
[12]

Young, J., Higa-McMillan, C., & Weisz, J. R. (2012). The revised child anxiety and depression scale-short version: Scale reduction via exploratory bifactor modeling of the broad anxiety factor.Psychological Assessment,24(4), 833 (cit. on p. 3)

2012
[13]

J., Wade, T

Egan, S. J., Wade, T. D., & Shafran, R. (2011). Perfectionism as a transdiagnostic process: A clinical review.Clinical Psychology Review,31(2), 203–212 (cit. on p. 3)

2011
[14]

E., & Diener, E

Eid, M. E., & Diener, E. E. (2006).Handbook of multimethod measurement in psychology.American Psychological Asso- ciation. (Cit. on p. 1)

2006
[15]

B., Asnaani, A., Zang, Y ., Capaldi, S., & Yeh, R

Foa, E. B., Asnaani, A., Zang, Y ., Capaldi, S., & Yeh, R. (2018). Psychometrics of the child ptsd symptom scale for dsm-5 for trauma-exposed children and adolescents.Journal of Clinical Child & Adolescent Psychology,47(1), 38–46 (cit. on p. 3)

2018
[16]

B., & Kauffman, B

Porter, K., Knowles, K., Powers, M. B., & Kauffman, B. Y . (2016). Psychometric properties of the posttraumatic stress disorder symptom scale interview for dsm–5 (pssi–5).Psy- chological Assessment,28(10), 1159 (cit. on p. 3)

2016
[17]

Frattaroli, J. (2006). Experimental disclosure and its modera- tors: A meta-analysis.Psychological Bulletin,132(6), 823 (cit. on p. 2)

2006
[18]

Gao, R., Hao, B., Li, H., Gao, Y ., & Zhu, T. (2013). Devel- oping simplified chinese psychological linguistic analysis dictionary for microblog.International Conference on Brain and Health Informatics(cit. on p. 3)

2013
[19]

C., McNamara, D

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-metrix: Analysis of text on cohesion and language.Behavior Research Methods, Instruments, & Com- puters,36(2), 193–202 (cit. on p. 1)

2004
[20]

H., Farrington, J., Keen, T., Li, K., et al

Guo, Z., Lai, A., Thygesen, J. H., Farrington, J., Keen, T., Li, K., et al. (2024). Large language models for mental health applications: Systematic review.JMIR Mental Health,11(1), e57400 (cit. on p. 2)

2024
[21]

Halliday, M. A. K., & Hasan, R. (1976).Cohesion in english. Routledge. (Cit. on p. 2)

1976
[22]

R., Ting, D

Abdullah, H. R., Ting, D. S. W., & Liu, N. (2024). Miti- gating cognitive biases in clinical decision-making through multi-agent conversations using large language models: Sim- ulation study.Journal of Medical Internet Research,26, e59439 (cit. on p. 2)

2024
[23]

Kintsch, W., & Van Dijk, T. A. (1978). Toward a model of text comprehension and production.Psychological Review, 85(5), 363 (cit. on pp. 1, 5)

1978
[24]

L., & Williams, J

Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The phq- 9: Validity of a brief depression severity measure.Journal of General Internal Medicine,16(9), 606–613 (cit. on p. 3)

2001
[25]

L., Williams, J

Kroenke, K., Spitzer, R. L., Williams, J. B., & Löwe, B. (2009). An ultra-brief screening scale for anxiety and depression: The phq–4.Psychosomatics,50(6), 613–621 (cit. on p. 3)

2009
[26]

(1972).Language in the inner city: Studies in the black english vernacular(V ol

Labov, W. (1972).Language in the inner city: Studies in the black english vernacular(V ol. 3). University of Pennsylva- nia Press. (Cit. on p. 3)

1972
[27]

Li, J., & Hovy, E. (2014). A model of coherence based on dis- tributed sentence representation.Annual Conference on Em- pirical Methods in Natural Language Processing (EMNLP) (cit. on p. 3)

2014
[28]

Li, M., Zhao, Y ., Guo, Z., Wei, M., Fan, S., Chen, Q., Li, Y ., & Zang, Y . (2025). Written exposure therapy for posttrau- matic stress disorder and integration of a mindfulness based app in china: A pilot randomized controlled trial.Behavior Therapy(cit. on p. 2)

2025
[29]

Li, M., Zhao, Y ., Rosenfield, D., Guo, Z., Wei, M., Fan, S., Li, Y ., & Zang, Y . (2025). An online guided written exposure therapy for symptoms of posttraumatic stress disorder: A randomized controlled trial.Psychotherapy and Psychoso- matics(cit. on p. 2)

2025
[30]

C., & Thompson, S

Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization.Text- interdisciplinary Journal for the Study of Discourse,8(3), 243–281 (cit. on p. 3)

1988
[31]

S., & Dean, J

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality.Proceedings of Advances in Neural Information Processing Systems (NeurIPS)(cit. on p. 1)

2013
[32]

Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an ai language model for automated essay scoring. Research Methods in Applied Linguistics,2(2), 100050 (cit. on p. 2)

2023
[33]

Nolen-Hoeksema, S. (1991). Responses to depression and their effects on the duration of depressive episodes.Journal of Abnormal Psychology,100(4), 569 (cit. on pp. 2, 3, 6)

1991
[34]

Pennebaker, J. W. (1997). Writing about emotional experi- ences as a therapeutic process.Psychological Science,8(3), 162–166 (cit. on p. 3)

1997
[35]

Pennebaker, J. W. (2016).Opening up by writing it down: How expressive writing improves health and eases emotional pain. Guilford Publications. (Cit. on pp. 1, 2)

2016
[36]

W., Boyd, R

Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of liwc2015 (cit. on pp. 1–3)

2015
[37]

J., Smolenski, D

Prins, A., Bovin, M. J., Smolenski, D. J., Marx, B. P., Kimer- ling, R., Jenkins-Guarnieri, M. A., Kaloupek, D. G., Schnurr, P. P., Kaiser, A. P., Leyva, Y . E., et al. (2016). The primary care ptsd screen for dsm-5 (pc-ptsd-5): Development and evaluation within a veteran primary care sample.Journal of General Internal Medicine,31(10), 1206–1211 (cit. on p. 3)

2016
[38]

Pyszczynski, T., & Greenberg, J. (1987). Self-regulatory per- severation and the depressive self-focusing style: A self- awareness theory of reactive depression.Psychological Bul- letin,102(1), 122 (cit. on p. 3)

1987
[39]

Rude, S., Gortner, E.-M., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion,18(8), 1121–1133 (cit. on pp. 1, 2)

2004
[40]

W., Miner, A

Sharma, A., Lin, I. W., Miner, A. S., Atkins, D. C., & Althoff, T. (2023). Human–ai collaboration enables more empathic conversations in text-based peer-to-peer mental health sup- port.Nature Machine Intelligence,5(1), 46–57 (cit. on p. 2)

2023
[41]

Smith, P., Perrin, S., Dyregrov, A., & Yule, W. (2003). Princi- pal components analysis of the impact of event scale with children in war.Personality and Individual Differences, 34(2), 315–322 (cit. on p. 3)

2003
[42]

R., Kumar, N., & De Choudhury, M

Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2025). The typing cure: Experiences with large language model chatbots for mental health support.ACM Conference on Human Factors in Computing Systems (CHI)(cit. on p. 2)

2025
[43]

L., Kroenke, K., Williams, J

Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The gad-7.Archives of Internal Medicine,166(10), 1092– 1097 (cit. on p. 3)

2006
[44]

Willer, R., & Eichstaedt, J. C. (2024). Large language mod- els could change the future of behavioral healthcare: A pro- posal for responsible development and evaluation.NPJ Men- tal Health Research,3(1), 12 (cit. on p. 2)

2024
[45]

Taraban, R., & Abusal, K. (2019). Analyzing topic differences, writing quality, and rhetorical context in college students’ essays using linguistic inquiry and word count (liwc).East European Journal of Psycholinguistics(cit. on p. 2)

2019
[46]

Teng, Q., Liu, Z., Song, Y ., Han, K., & Lu, Y . (2022). A survey on the interpretability of deep learning in medical diagnosis. Multimedia Systems,28(6), 2335–2355 (cit. on p. 1)

2022
[47]

R., Choi, H., & Valenstein, M

Teo, A. R., Choi, H., & Valenstein, M. (2013). Social relation- ships and depression: Ten-year follow-up from a nationally representative study.PloS one,8(4), e62396 (cit. on p. 3). Van Dijk, T. A. (2019).Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Routledge. (Cit. on p. 3). Van Dijk, T. A., K...

2013
[48]

Williams, J. M. G., Barnhofer, T., Crane, C., Herman, D., Raes, F., Watkins, E., & Dalgleish, T. (2007). Autobiographical memory specificity and emotional disorder.Psychological bulletin,133(1), 122 (cit. on p. 3)

2007
[49]

Yang, K., Ji, S., Zhang, T., Xie, Q., Kuang, Z., & Ananiadou, S. (2023). Towards interpretable mental health analysis with large language models.Annual Conference on Empirical Methods in Natural Language Processing (EMNLP)(cit. on p. 2)

2023
[50]

Zirikly, A., Resnik, P., Uzuner, O., & Hollingshead, K. (2019). Clpsych 2019 shared task: Predicting the degree of suicide risk in reddit posts.The Sixth Workshop on Computational Linguistics and Clinical Psychology(cit. on p. 2)

2019

[1] [1]

Adler, J. M. (2012). Living into the story: Agency and coher- ence in a longitudinal study of narrative identity develop- ment and mental health over the course of psychotherapy. Journal of Personality and Social Psychology,102(2), 367 (cit. on pp. 3, 5, 6)

2012

[2] [2]

M., Lodi-Smith, J., Philippe, F

Adler, J. M., Lodi-Smith, J., Philippe, F. L., & Houle, I. (2016). The incremental validity of narrative identity in predicting well-being: A review of the field and recommendations for the future.Personality and Social Psychology Review,20(2), 142–175 (cit. on pp. 1–3, 5)

2016

[3] [3]

Barzilay, R., & Lapata, M. (2008). Modeling local coherence: An entity-based approach.Computational Linguistics,34(1), 1–34 (cit. on p. 2)

2008

[4] [4]

T., Epstein, N., Brown, G., & Steer, R

Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties.Journal of Consulting and Clinical Psychology, 56(6), 893 (cit. on p. 3)

1988

[5] [5]

T., Rush, A

Beck, A. T., Rush, A. J., Shaw, B. F., Emery, G., DeRubeis, R. J., & Hollon, S. D. (1979).Cognitive therapy of depres- sion. Guilford Press. (Cit. on pp. 3, 6)

1979

[6] [6]

T., Steer, R

Beck, A. T., Steer, R. A., Ball, R., & Ranieri, W. F. (1996). Comparison of beck depression inventories-ia and-ii in psychiatric outpatients.Journal of Personality Assessment, 67(3), 588–597 (cit. on p. 2)

1996

[7] [7]

L., & Schwartz, H

Boyd, R. L., & Schwartz, H. A. (2021). Natural language analysis and the psychology of verbal behavior: The past, present, and future states of the field.Journal of Language and Social Psychology,40(1), 21–41 (cit. on p. 1)

2021

[8] [8]

R., Gregory, J

Brewin, C. R., Gregory, J. D., Lipton, M., & Burgess, N. (2010). Intrusive images in psychological disorders: Char- acteristics, neural mechanisms, and treatment implications. Psychological Review,117(1), 210 (cit. on p. 2)

2010

[9] [9]

Bruner, J. (1991). The narrative construction of reality.Critical inquiry,18(1), 1–21 (cit. on p. 3)

1991

[10] [10]

Francis, S. E. (2000). Assessment of symptoms of dsm-iv anxiety and depression in children: A revised child anxiety and depression scale.Behaviour Research and Therapy, 38(8), 835–855 (cit. on p. 3)

2000

[11] [11]

Cohen, J., Mannarino, A., Deblinger, E., et al. (2006). Treat- ing trauma and traumatic grief in children and adolescents. Guilford Publications(cit. on p. 3)

2006

[12] [12]

Young, J., Higa-McMillan, C., & Weisz, J. R. (2012). The revised child anxiety and depression scale-short version: Scale reduction via exploratory bifactor modeling of the broad anxiety factor.Psychological Assessment,24(4), 833 (cit. on p. 3)

2012

[13] [13]

J., Wade, T

Egan, S. J., Wade, T. D., & Shafran, R. (2011). Perfectionism as a transdiagnostic process: A clinical review.Clinical Psychology Review,31(2), 203–212 (cit. on p. 3)

2011

[14] [14]

E., & Diener, E

Eid, M. E., & Diener, E. E. (2006).Handbook of multimethod measurement in psychology.American Psychological Asso- ciation. (Cit. on p. 1)

2006

[15] [15]

B., Asnaani, A., Zang, Y ., Capaldi, S., & Yeh, R

Foa, E. B., Asnaani, A., Zang, Y ., Capaldi, S., & Yeh, R. (2018). Psychometrics of the child ptsd symptom scale for dsm-5 for trauma-exposed children and adolescents.Journal of Clinical Child & Adolescent Psychology,47(1), 38–46 (cit. on p. 3)

2018

[16] [16]

B., & Kauffman, B

Porter, K., Knowles, K., Powers, M. B., & Kauffman, B. Y . (2016). Psychometric properties of the posttraumatic stress disorder symptom scale interview for dsm–5 (pssi–5).Psy- chological Assessment,28(10), 1159 (cit. on p. 3)

2016

[17] [17]

Frattaroli, J. (2006). Experimental disclosure and its modera- tors: A meta-analysis.Psychological Bulletin,132(6), 823 (cit. on p. 2)

2006

[18] [18]

Gao, R., Hao, B., Li, H., Gao, Y ., & Zhu, T. (2013). Devel- oping simplified chinese psychological linguistic analysis dictionary for microblog.International Conference on Brain and Health Informatics(cit. on p. 3)

2013

[19] [19]

C., McNamara, D

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-metrix: Analysis of text on cohesion and language.Behavior Research Methods, Instruments, & Com- puters,36(2), 193–202 (cit. on p. 1)

2004

[20] [20]

H., Farrington, J., Keen, T., Li, K., et al

Guo, Z., Lai, A., Thygesen, J. H., Farrington, J., Keen, T., Li, K., et al. (2024). Large language models for mental health applications: Systematic review.JMIR Mental Health,11(1), e57400 (cit. on p. 2)

2024

[21] [21]

Halliday, M. A. K., & Hasan, R. (1976).Cohesion in english. Routledge. (Cit. on p. 2)

1976

[22] [22]

R., Ting, D

Abdullah, H. R., Ting, D. S. W., & Liu, N. (2024). Miti- gating cognitive biases in clinical decision-making through multi-agent conversations using large language models: Sim- ulation study.Journal of Medical Internet Research,26, e59439 (cit. on p. 2)

2024

[23] [23]

Kintsch, W., & Van Dijk, T. A. (1978). Toward a model of text comprehension and production.Psychological Review, 85(5), 363 (cit. on pp. 1, 5)

1978

[24] [24]

L., & Williams, J

Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The phq- 9: Validity of a brief depression severity measure.Journal of General Internal Medicine,16(9), 606–613 (cit. on p. 3)

2001

[25] [25]

L., Williams, J

Kroenke, K., Spitzer, R. L., Williams, J. B., & Löwe, B. (2009). An ultra-brief screening scale for anxiety and depression: The phq–4.Psychosomatics,50(6), 613–621 (cit. on p. 3)

2009

[26] [26]

(1972).Language in the inner city: Studies in the black english vernacular(V ol

Labov, W. (1972).Language in the inner city: Studies in the black english vernacular(V ol. 3). University of Pennsylva- nia Press. (Cit. on p. 3)

1972

[27] [27]

Li, J., & Hovy, E. (2014). A model of coherence based on dis- tributed sentence representation.Annual Conference on Em- pirical Methods in Natural Language Processing (EMNLP) (cit. on p. 3)

2014

[28] [28]

Li, M., Zhao, Y ., Guo, Z., Wei, M., Fan, S., Chen, Q., Li, Y ., & Zang, Y . (2025). Written exposure therapy for posttrau- matic stress disorder and integration of a mindfulness based app in china: A pilot randomized controlled trial.Behavior Therapy(cit. on p. 2)

2025

[29] [29]

Li, M., Zhao, Y ., Rosenfield, D., Guo, Z., Wei, M., Fan, S., Li, Y ., & Zang, Y . (2025). An online guided written exposure therapy for symptoms of posttraumatic stress disorder: A randomized controlled trial.Psychotherapy and Psychoso- matics(cit. on p. 2)

2025

[30] [30]

C., & Thompson, S

Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization.Text- interdisciplinary Journal for the Study of Discourse,8(3), 243–281 (cit. on p. 3)

1988

[31] [31]

S., & Dean, J

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality.Proceedings of Advances in Neural Information Processing Systems (NeurIPS)(cit. on p. 1)

2013

[32] [32]

Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an ai language model for automated essay scoring. Research Methods in Applied Linguistics,2(2), 100050 (cit. on p. 2)

2023

[33] [33]

Nolen-Hoeksema, S. (1991). Responses to depression and their effects on the duration of depressive episodes.Journal of Abnormal Psychology,100(4), 569 (cit. on pp. 2, 3, 6)

1991

[34] [34]

Pennebaker, J. W. (1997). Writing about emotional experi- ences as a therapeutic process.Psychological Science,8(3), 162–166 (cit. on p. 3)

1997

[35] [35]

Pennebaker, J. W. (2016).Opening up by writing it down: How expressive writing improves health and eases emotional pain. Guilford Publications. (Cit. on pp. 1, 2)

2016

[36] [36]

W., Boyd, R

Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of liwc2015 (cit. on pp. 1–3)

2015

[37] [37]

J., Smolenski, D

Prins, A., Bovin, M. J., Smolenski, D. J., Marx, B. P., Kimer- ling, R., Jenkins-Guarnieri, M. A., Kaloupek, D. G., Schnurr, P. P., Kaiser, A. P., Leyva, Y . E., et al. (2016). The primary care ptsd screen for dsm-5 (pc-ptsd-5): Development and evaluation within a veteran primary care sample.Journal of General Internal Medicine,31(10), 1206–1211 (cit. on p. 3)

2016

[38] [38]

Pyszczynski, T., & Greenberg, J. (1987). Self-regulatory per- severation and the depressive self-focusing style: A self- awareness theory of reactive depression.Psychological Bul- letin,102(1), 122 (cit. on p. 3)

1987

[39] [39]

Rude, S., Gortner, E.-M., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion,18(8), 1121–1133 (cit. on pp. 1, 2)

2004

[40] [40]

W., Miner, A

Sharma, A., Lin, I. W., Miner, A. S., Atkins, D. C., & Althoff, T. (2023). Human–ai collaboration enables more empathic conversations in text-based peer-to-peer mental health sup- port.Nature Machine Intelligence,5(1), 46–57 (cit. on p. 2)

2023

[41] [41]

Smith, P., Perrin, S., Dyregrov, A., & Yule, W. (2003). Princi- pal components analysis of the impact of event scale with children in war.Personality and Individual Differences, 34(2), 315–322 (cit. on p. 3)

2003

[42] [42]

R., Kumar, N., & De Choudhury, M

Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2025). The typing cure: Experiences with large language model chatbots for mental health support.ACM Conference on Human Factors in Computing Systems (CHI)(cit. on p. 2)

2025

[43] [43]

L., Kroenke, K., Williams, J

Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The gad-7.Archives of Internal Medicine,166(10), 1092– 1097 (cit. on p. 3)

2006

[44] [44]

Willer, R., & Eichstaedt, J. C. (2024). Large language mod- els could change the future of behavioral healthcare: A pro- posal for responsible development and evaluation.NPJ Men- tal Health Research,3(1), 12 (cit. on p. 2)

2024

[45] [45]

Taraban, R., & Abusal, K. (2019). Analyzing topic differences, writing quality, and rhetorical context in college students’ essays using linguistic inquiry and word count (liwc).East European Journal of Psycholinguistics(cit. on p. 2)

2019

[46] [46]

Teng, Q., Liu, Z., Song, Y ., Han, K., & Lu, Y . (2022). A survey on the interpretability of deep learning in medical diagnosis. Multimedia Systems,28(6), 2335–2355 (cit. on p. 1)

2022

[47] [47]

R., Choi, H., & Valenstein, M

Teo, A. R., Choi, H., & Valenstein, M. (2013). Social relation- ships and depression: Ten-year follow-up from a nationally representative study.PloS one,8(4), e62396 (cit. on p. 3). Van Dijk, T. A. (2019).Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Routledge. (Cit. on p. 3). Van Dijk, T. A., K...

2013

[48] [48]

Williams, J. M. G., Barnhofer, T., Crane, C., Herman, D., Raes, F., Watkins, E., & Dalgleish, T. (2007). Autobiographical memory specificity and emotional disorder.Psychological bulletin,133(1), 122 (cit. on p. 3)

2007

[49] [49]

Yang, K., Ji, S., Zhang, T., Xie, Q., Kuang, Z., & Ananiadou, S. (2023). Towards interpretable mental health analysis with large language models.Annual Conference on Empirical Methods in Natural Language Processing (EMNLP)(cit. on p. 2)

2023

[50] [50]

Zirikly, A., Resnik, P., Uzuner, O., & Hollingshead, K. (2019). Clpsych 2019 shared task: Predicting the degree of suicide risk in reddit posts.The Sixth Workshop on Computational Linguistics and Clinical Psychology(cit. on p. 2)

2019