pith. sign in

arxiv: 2605.09838 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.LG

The Association of Transformer-based Sentiment Analysis with Symptom Distress and Deterioration in Routine Psychotherapy Care

Pith reviewed 2026-05-12 03:51 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords sentiment analysispsychotherapytransformer modelsOQ-45client distressdeteriorationnatural language processingpsychometric assessment
0
0 comments X

The pith

Transformer-derived sentiment scores from psychotherapy sessions correlate with validated measures of patient distress and risk of deterioration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether sentiment analysis from transformer models applied to therapy transcripts can act as a practical indicator of how clients are doing emotionally. It derives utterance-level and session-level sentiment features from a large set of sessions and compares them directly to the OQ-45 questionnaire, which is a standard tool for tracking symptom severity. The authors show that these features align most closely with the emotional-valence parts of the OQ-45 and separate patients flagged as high-risk for worsening or dropping out. A reader would care because the result points to an automated way to monitor progress in routine care without relying solely on periodic full questionnaires.

Core claim

Using utterance-level and session-level sentiment features extracted from a fine-grained transformer model on 751 psychotherapy sessions, the paper demonstrates statistically significant correlations with OQ-45 components tied to emotional valence and significant differences in sentiment distributions between patients identified as at risk of deterioration or dropout by both rational and empirical OQ outcome models, establishing the features as adjunctive measures of client distress.

What carries the argument

Utterance-level and session-level sentiment features derived from a transformer-based fine-grained sentiment model applied to psychotherapy transcripts; these features aggregate emotional valence signals to track patterns of distress.

If this is right

  • Sentiment features could supplement full OQ-45 administrations for ongoing monitoring in routine therapy.
  • Real-time aggregation of session sentiment might allow earlier detection of clients heading toward deterioration or dropout.
  • The features provide a text-only signal that tracks emotional distress components more directly than overall symptom scores.
  • These measures could be integrated into clinical dashboards to flag sessions needing immediate therapist attention.
  • The approach offers a scalable way to analyze large archives of therapy recordings for outcome research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the sentiment features prove stable, they could be adapted to flag risk in text-based counseling platforms or chat interventions.
  • Combining sentiment with other session metadata such as talk-turn ratios might yield stronger predictive models for dropout.
  • Testing the same features on non-English therapy data would reveal whether language-specific valence patterns hold.
  • The correlations suggest potential for sentiment as an early-warning signal that could be checked after every session rather than only at scheduled assessment points.

Load-bearing premise

A general-domain transformer sentiment model accurately captures the emotional content expressed in psychotherapy conversations without any domain-specific training or separate clinical validation.

What would settle it

A replication study on new psychotherapy transcripts that finds no statistically significant correlation between the same sentiment features and the emotional-valence subscales of the OQ-45, or no difference in sentiment distributions for patients flagged at risk by OQ models, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.09838 by Alexandre Vaz, Douglas K. Faust, Peter Awad, Tony Rousmaniere.

Figure 2
Figure 2. Figure 2: The time evolution and locations of peaks or high and low sentiment are a rich feature [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example sentiment score timeline for a single session. The levels, in blue, mark the [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histogram distribution of client and therapist sentiment scores for 751 individual (one-on [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Boxplots by OQ-45 alert indicator category. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

Sentiment analysis has been of long-standing interest in psychotherapy research. Recently, the Transformer deep learning architecture has produced text-based sentiment analysis models that are highly accurate and context-aware. These models have been explored as proxies for emotion measurement instruments in psychotherapy, but not investigated as stand-alone psychometric tools. Using proposed utterance-level and session-level sentiment features derived from a fine-grained sentiment model on a large corpus of psychotherapy sessions (N = 751), we investigate the distribution of session aggregated sentiment scores. Further, we characterize the relationship of these features to individual components and the overall score of the OQ-45 instrument and find that this sentiment feature is most strongly correlated to components related to emotional valence in directionally intuitive ways. Finally, we report that there are statistically significant differences between the sentiment distributions for patients flagged as at risk of deterioration or dropping out of care via either the OQ Rational or Empirical outcome models. These correlations to a fully-validated psychometric instrument demonstrate that these proposed sentiment features are, at least, adjunctive measures of client distress and deterioration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript applies a transformer-based fine-grained sentiment model to 751 psychotherapy sessions to derive utterance- and session-level sentiment features. It reports the distribution of aggregated session sentiment scores, their correlations with OQ-45 total and subscale scores (strongest for emotional-valence components), and statistically significant differences in sentiment distributions between patients flagged as at-risk for deterioration or dropout by the OQ Rational and Empirical models. The authors conclude that the sentiment features constitute adjunctive measures of client distress and deterioration.

Significance. If the reported associations hold after addressing methodological gaps, the work could demonstrate a scalable NLP-based complement to validated instruments like the OQ-45 for routine monitoring of emotional distress in psychotherapy. The large session corpus and linkage to an established psychometric tool are strengths that could support future real-time clinical applications, though current evidence remains preliminary.

major comments (3)
  1. [Abstract] Abstract: the claim of statistically significant differences and correlations is presented without effect sizes, confidence intervals, or any mention of controls for session length, number of utterances, or therapist-level effects. These omissions are load-bearing because the central claim that sentiment features track symptom distress and deterioration cannot be evaluated for practical or clinical relevance without them.
  2. [Abstract] Abstract: the 'fine-grained sentiment model' is applied without any description of its training data, fine-tuning on psychotherapy transcripts, or validation against clinician-annotated gold-standard labels for therapy dialogue. This is critical because general-domain models may misalign with indirect expressions of distress, turn-taking patterns, or session context, undermining the interpretation that correlations reflect clinically meaningful valence rather than surface polarity.
  3. [Abstract] Abstract: no details are supplied on the aggregation procedure from utterance-level to session-level sentiment scores (e.g., mean, weighted average, or handling of therapist vs. client turns). This procedural gap directly affects the reported distributions, correlations, and group-difference tests.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the abstract requires additional detail to support evaluation of the claims and will revise it accordingly, along with clarifications in the methods. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of statistically significant differences and correlations is presented without effect sizes, confidence intervals, or any mention of controls for session length, number of utterances, or therapist-level effects. These omissions are load-bearing because the central claim that sentiment features track symptom distress and deterioration cannot be evaluated for practical or clinical relevance without them.

    Authors: We agree that effect sizes and confidence intervals should be included in the abstract to convey practical significance. The full manuscript reports Pearson correlations (with p-values) and group-difference tests; we will add the corresponding r values, Cohen's d, and 95% CIs to the abstract. For controls, we will explicitly state in the methods that session-level scores are means of utterance-level predictions and that we computed partial correlations controlling for number of utterances per session (results unchanged). Therapist-level clustering is not modeled in the current patient-focused analysis, but we will add this as a limitation and note that mixed-effects models are planned for future work. revision: yes

  2. Referee: [Abstract] Abstract: the 'fine-grained sentiment model' is applied without any description of its training data, fine-tuning on psychotherapy transcripts, or validation against clinician-annotated gold-standard labels for therapy dialogue. This is critical because general-domain models may misalign with indirect expressions of distress, turn-taking patterns, or session context, undermining the interpretation that correlations reflect clinically meaningful valence rather than surface polarity.

    Authors: We will revise the abstract to include a concise description of the model: a RoBERTa-based transformer fine-tuned for three-class (positive/negative/neutral) sentiment on a large general-domain corpus, applied zero-shot to the psychotherapy transcripts. The manuscript does not include psychotherapy-specific fine-tuning or clinician-annotated validation for this dataset; we will add this limitation to the discussion and note that the observed correlations with OQ-45 emotional-valence subscales provide indirect support for clinical relevance, while recommending future clinician-rated validation studies. revision: yes

  3. Referee: [Abstract] Abstract: no details are supplied on the aggregation procedure from utterance-level to session-level sentiment scores (e.g., mean, weighted average, or handling of therapist vs. client turns). This procedural gap directly affects the reported distributions, correlations, and group-difference tests.

    Authors: We will update the abstract to state that session-level sentiment is the unweighted mean of all utterance-level scores within the session. The methods section will clarify that both client and therapist utterances are included (as the overall session valence is the target construct) and that we also computed client-only aggregates as a robustness check. These details will be added to ensure the reported distributions and statistical tests are fully reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical correlations to independent instrument

full rationale

The manuscript applies an off-the-shelf fine-grained sentiment model to psychotherapy transcripts, aggregates utterance- and session-level scores, and performs standard correlation and group-difference tests against the externally validated OQ-45. No equations, fitted parameters, or self-citations are used to define the sentiment features in terms of the OQ-45 outcomes or vice versa. The reported associations are therefore computed from independently measured quantities rather than reduced by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the untested assumption that a general-domain transformer sentiment model produces scores that meaningfully track clinical distress in therapy language; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The transformer sentiment model accurately reflects emotional valence relevant to psychotherapy distress
    Invoked when the authors treat the derived sentiment features as proxies for OQ-45 emotional components.

pith-pipeline@v0.9.0 · 5492 in / 1250 out tokens · 51392 ms · 2026-05-12T03:51:22.697344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Greenberg, L. S. Emotions, the great captains of our lives: Their role in the process of change in psychotherapy. American Psychologist, 67(8), 697–707. https://doi.org/10.1037/a0029858 (2012)

  2. [2]

    Ekman, P., & Friesen, W. V. Unmasking the face: A guide to recognizing emotions from facial clues (Vol. 10). Malor Books. (2003)

  3. [3]

    Kreibig, S. D. Autonomic nervous system activity in emotion: A review. Biological Psychology, 84(3), 394–421. https://doi.org/10.1016/j.biopsycho.2010.03.010 (2010)

  4. [4]

    BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J., Chang, M.W., Lee, K., Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186 (2019)

  5. [5]

    A review on sentiment analysis and emotion detection from text

    Nandwani, P., Verma, R. A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 11, 81 https://doi.org/10.1007/s13278-021-00776-6 (2021)

  6. [6]

    Attention is All you Need

    Vaswani, A et al.. Attention is All you Need. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett (ed.). 31st Conference on Neural Information Processing Systems (NIPS). Advances in Neural Information Processing Systems. Vol. 30. (2017)

  7. [7]

    Resonating minds: A school-independent theoretical conception and its empirical application to psychotherapeutic processes

    Mergenthaler, E. Resonating minds: A school-independent theoretical conception and its empirical application to psychotherapeutic processes. Psychotherapy Research, 18(2), 109–

  8. [8]

    https://doi.org/101080/10503300701883741 (2008)

  9. [9]

    R., & Pennebaker, J

    Tausczik, Y. R., & Pennebaker, J. W. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–

  10. [10]

    https://doi.org/10.1177/0261927X09351676 (2010)

  11. [11]

    Eberhardt, S. T. et al. Decoding emotions: Exploring the validity of sentiment analysis in psychotherapy. Psychother. Res. 35, 174–189 (2024)

  12. [12]

    M., Lorr, M., & Droppleman, L

    McNair, D. M., Lorr, M., & Droppleman, L. F. POMS manual - Profile of mood questionnaire. Educational and Industrial Testing Services. (1992)

  13. [13]

    Atzil-Slonim, D. et al. Leveraging natural language processing to study emotional coherence in psychotherapy. Psychotherapy 61, 82–92 (2024)

  14. [14]

    Ekman, P. (1992). Are there basic emotions? Psychological Review, 99(3), 550–553. https://doi.org/10.1037/0033-295X.99.3.550 (1992)

  15. [15]

    Levenson, R. W. Blood, sweat, and fears: The autonomic architecture of emotion. In P. Ekman, J. J. Campos, R. J. Davidson, & F. B. M. de Waal (Eds.), Emotions inside out (pp. 348–366). The New York Academy of Sciences. https://doi.org/10.1196/annals.1280.016 (2003)

  16. [16]

    Shatte, A. B. R., Hutchinson, D. M., Teague, S. J.. Machine learning in mental health: a scoping review of methods and applications. Psychol Med. Jul;49(9):1426-1448. doi: 10.1017/S0033291719000151. Epub 2019 Feb 12. PMID: 30744717. (2019)

  17. [17]

    The performance of artificial intelligence-driven technologies in diagnosing mental disorders: an umbrella review

    Abd-alrazaq, A., et al. The performance of artificial intelligence-driven technologies in diagnosing mental disorders: an umbrella review. npj Digit. Med. 5, 87. https://doi.org/10.1038/s41746-022-00631-8 (2022)

  18. [18]

    B., Lam, S

    Goldberg, S. B., Lam, S. U., Simonsson, O., Torous, J., Sun, S. Mobile phone-based interventions for mental health: A systematic meta-review of 14 meta-analyses of randomized controlled trials. PLOS Digit Health 1(1): e0000002. https://doi.org/10.1371/journal.pdig.0000002 (2022)

  19. [19]

    Tanana, M. J. et al. How do you feel? Using natural language processing to automatically rate emotion in psychotherapy. Behav. Res. Methods. 53, 2069–2082 (2021)

  20. [20]

    Using topic models to identify clients’ functioning levels and alliance ruptures in psychotherapy

    Atzil-Slonim, D., et al. Using topic models to identify clients’ functioning levels and alliance ruptures in psychotherapy. Psychotherapy, 58(2), 324–339. https://doi.org/10.1037/pst0000362 (2021)

  21. [21]

    A., et al

    Smink, W. A., et al. Understanding therapeutic change process research through multilevel modeling and text mining. Frontiers in Psychology, 10, https://doi.org/10.3389/fpsyg.2019.01186 (2019)

  22. [22]

    J., Gregersen, A

    Lambert, M. J., Gregersen, A. T., & Burlingame, G. M. The Outcome Questionnaire-45. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment: Instruments for adults (3rd ed., pp. 191–234). Lawrence Erlbaum Associates Publishers (2004)

  23. [23]

    Lambert, M. J.. Prevention of treatment failure: The use of measuring, monitoring, and feedback in clinical practice. American Psychological Association (2010)

  24. [24]

    doi:10.1037/0022- 3514.50.2.229

    Vermeersch, et al. Outcome Questionnaire: Is It Sensitive to Changes in Counseling Center Clients? Journal of Counseling Psychology, 51(1), 38–49. https://doi.org/10.1037/0022- 0167.51.1.38 (2004)

  25. [25]

    Lambert, M. J. Outcome in psychotherapy: The past and important advances. Psychotherapy, 50(1), 42–51. https://doi.org/10.1037/a0030682 (2013)

  26. [26]

    J., & Harmon, K

    Lambert, M. J., & Harmon, K. L. The merits of implementing routine outcome monitoring in clinical practice. Clinical Psychology: Science and Practice, 25(4), Article e12268. https://doi.org/10.1111/cpsp.12268 (2018)

  27. [27]

    Deliberate practice supervision in action: The Sentio Supervision Model

    Brand, J., Miller‐Bottome, M., Vaz, A., & Rousmaniere, T. Deliberate practice supervision in action: The Sentio Supervision Model. Journal of Clinical Psychology, 81(6), 462–472. https://doi.org/10.1002/jclp.23790 (2025)

  28. [28]

    F., Constantino, M

    Boswell, J. F., Constantino, M. J., & Goldfried, M. R. A proposed makeover of psychotherapy training: Contents, methods, and outcomes. Clinical Psychology: Science and Practice, 27(3), Article e12340. https://doi.org/10.1111/cpsp.12340 (2020)

  29. [29]

    E., Knox, S

    Hill, C. E., Knox, S. Training and supervision in psychotherapy: Evidence for effective practice. In Lambert M. J. (Ed.), Handbook of psychotherapy and behavior change (6th ed., pp. 775-811). New York, NY: John Wiley (2013)

  30. [30]

    A scoping review of deliberate practice in the acquisition of therapeutic skills and practices

    Mahon, D. A scoping review of deliberate practice in the acquisition of therapeutic skills and practices. Counselling and Psychotherapy Research, 23, 965–981. https://doi.org/10.1002/capr.12601 (2023)

  31. [31]

    The influence of deliberate practice on skill performance in therapeutic practice: A systematic review of early studies

    Nurse, K., O'Shea, M., Ling, M., Castle, N., Sheen, J. The influence of deliberate practice on skill performance in therapeutic practice: A systematic review of early studies. Psychother Res. 2025 Mar;35(3):353-367. doi: 10.1080/10503307.2024.2308159 (2024)

  32. [32]

    WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

    Bain, M., Huh, J., Han, T., Zisserman, A. WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. Proc. Interspeech 2023, 4489-4493, doi: 10.21437/Interspeech.2023-78 (2023)

  33. [33]

    Zheng, C

    Bredin et al., Pyannote.Audio: Neural Building Blocks for Speaker Diarization, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, pp. 7124-7128, doi: 10.1109/ICASSP40776.2020.9052974 (2020)

  34. [34]

    pyannote-audio, GitHub, https://github.com/pyannote/pyannote-audio/releases/tag/3.1.1 (2023)

    Bredin et al. pyannote-audio, GitHub, https://github.com/pyannote/pyannote-audio/releases/tag/3.1.1 (2023)

  35. [35]

    Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions

    Eberhardt, S.T., et al. Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions. Sci Rep 15, 29541. https://doi.org/10.1038/s41598-025-14923-y (2025)

  36. [36]

    1-5, doi: 10.1109/AITB48515.2019.8947435 (2019)

    Munikar, M., Shakya, S., and Shrestha, A., Fine-grained Sentiment Classification using BERT, 2019 Artificial Intelligence for Transforming Business and Society (AITB), Kathmandu, Nepal, pp. 1-5, doi: 10.1109/AITB48515.2019.8947435 (2019)

  37. [37]

    UNSO/Roberta-Large-Finetuned-SST5 · Hugging Face.” Unso/Roberta-Large-Finetuned- Sst5 · Hugging Face, huggingface.co/Unso/roberta-large-finetuned-sst5 (2025)

  38. [38]

    Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp

    Socher, R., et al. Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642 (2013)

  39. [39]

    Sun, Z. et al. Self-Explaining Structures Improve NLP Models, arXiv:2012.01786 (2020)

  40. [40]

    An Algorithm for Routing Vectors in Sequence, https://arxiv.org/abs/2211.11754 (2022)

    Heinsen, F. An Algorithm for Routing Vectors in Sequence, https://arxiv.org/abs/2211.11754 (2022)

  41. [41]

    Hutto, E

    Hutto, C., & Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225. https://doi.org/10.1609/icwsm.v8i1.14550 (2014)

  42. [42]

    S., Socher, R., and Manning, C

    Tai, K. S., Socher, R., and Manning, C. D. Improved semantic representations from tree- structured long short-term memory networks, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566 (2015)

  43. [43]

    B., & Rottenberg, J

    Kashdan, T. B., & Rottenberg, J. Psychological flexibility as a fundamental aspect of health. Clinical Psychology Review, 30(7), 865-878. https://doi.org/10.1016/j.cpr.2010.03.001 (2010)

  44. [44]

    O., Del Re, A

    Horvath, A. O., Del Re, A. C., Flückiger, C., & Symonds, D. Alliance in individual psychotherapy. Psychotherapy, 48(1), 9-16. https://doi.org/10.1037/a0022186 (2011)

  45. [45]

    Understanding transference: The Core Conflictual Relationship Theme method (2nd ed.)

    Luborsky, L., & Crits-Christoph, P.. Understanding transference: The Core Conflictual Relationship Theme method (2nd ed.). American Psychological Association (1998)

  46. [46]

    Stiles, W. B. et al. Assimilation of problematic experiences by clients in psychotherapy. Psychotherapy: Theory, Research, Practice, Training, 27(3), 411-420. https://doi.org/10.1037/0033-3204.27.3.411 (1990)