Mental-R1: Aligning LLM Reasoning for Mental Health Assessment

Boyan Gao; David A. Clifton; Xin Wang; Yibo Yang

arxiv: 2606.13176 · v1 · pith:TSPKMOK7new · submitted 2026-06-11 · 💻 cs.AI

Mental-R1: Aligning LLM Reasoning for Mental Health Assessment

Xin Wang , Boyan Gao , Yibo Yang , David A. Clifton This is my paper

Pith reviewed 2026-06-27 07:13 UTC · model grok-4.3

classification 💻 cs.AI

keywords mental health assessmentlarge language modelsreinforcement learningcognitive reasoning stagesentropy regularizationpolicy optimization

0 comments

The pith

Cognitive Relative Policy Optimization trains LLMs to assess mental health by shifting from broad exploration to confident conclusions in staged reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reinforcement learning method called CRPO to make large language models reason about mental health problems in a way that follows human cognitive patterns. It adds a stage-wise entropy regularization term that keeps early reasoning phases open to many possibilities and later phases more decisive. The method also defines explicit cognitive reasoning stages drawn from appraisal theory. Across eight mental health datasets the resulting model improves weighted F1 score by 10.4 percentage points over the strongest prior reinforcement learning baseline and shows clearer gains on cases that require extended reasoning.

Core claim

CRPO extends group relative policy optimization by integrating stage-dependent uncertainty modeling into the policy optimization process, using stage-wise entropy regularization that encourages broad exploration in early reasoning phases and progressively enforces confident decision-making in later stages, while formalizing cognitive reasoning stages inspired by cognitive appraisal theory to guide theory-grounded interpretable inference.

What carries the argument

Stage-wise entropy regularization inside the policy optimization loop, which modulates exploration confidence according to the current phase of reasoning.

If this is right

Models trained with CRPO produce more reliable reasoning on complex mental health cases.
The stage definitions allow more interpretable inference steps than standard RL training.
Performance gains hold across multiple distinct mental health assessment datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged regularization pattern could be tested in other sequential decision domains such as medical diagnosis.
Explicit stage labels may make it easier to audit where an LLM's reasoning diverges from clinical guidelines.
Deployment studies on real patient data would reveal whether the reported F1 gains translate to practical clinical utility.

Load-bearing premise

That adjusting entropy by reasoning stage produces outputs that are both more reliable and closer to human assessment processes than ordinary reinforcement learning.

What would settle it

A controlled ablation that removes only the stage-wise entropy term and measures whether the 10.4-point F1 gain on the eight datasets disappears.

Figures

Figures reproduced from arXiv: 2606.13176 by Boyan Gao, David A. Clifton, Xin Wang, Yibo Yang.

**Figure 1.** Figure 1: Bottom: Overall illustration of the proposed CRPO reinforcement learning framework. Our method extends GRPO, where the total objective is derived [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of reasoning entropy across stages. We plot the average [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Reasoning-focused evaluation results (Weighted F1-score, mean [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of reasoning outputs. (a) Standard reasoning produced by a GRPO-trained model. (b) Cognition-aligned reasoning generated by CRPO [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Analysis of Uncertainty Scheduling. (a) Visualization of uncertainty scheduling strategies and their corresponding performance. Left: different scheduling [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Mental health problems such as anxiety, depression, and suicide remain urgent global challenges, where timely and accurate assessment is critical for effective intervention. Recently, large language models have been explored for mental health assessment. However, existing general-purpose post-training methods do not align with the cognitive processes of human assessment, which may lead to unreliable reasoning outcomes. To bridge this gap, we propose Cognitive Relative Policy Optimization (CRPO), a reinforcement learning framework tailored for the mental health domain. CRPO extends group relative policy optimization by integrating stage-dependent uncertainty modeling into the policy optimization process. Specifically, we introduce a stage-wise entropy regularization mechanism that encourages broad exploration in early reasoning phases and progressively enforces confident decision-making in later stages, mimicking the human cognitive shift from uncertainty to certainty. In addition, inspired by cognitive appraisal theory, we formalize cognitive reasoning stages, thereby guiding theory-grounded interpretable inference. Experiments on 8 mental health datasets show that CRPO achieves an average improvement of 10.4 percentage points in weighted F1-score over the best reinforcement learning baseline. Furthermore, the CRPO-trained model Mental-R1 demonstrates clear advantages compared with existing large language models on reasoning-intensive cases, suggesting that CRPO enhances reasoning capabilities for mental health assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CRPO adds stage-wise entropy regularization to GRPO for mental health tasks and reports a 10.4 pp F1 gain, but the gains are not isolated to the new mechanism.

read the letter

The main point is that this paper proposes CRPO, which extends group relative policy optimization with stage-dependent entropy regularization and a formalization of cognitive reasoning stages drawn from appraisal theory. They train Mental-R1 on mental health assessment and claim an average 10.4 point weighted F1 improvement over the strongest RL baseline across eight datasets, plus better handling of reasoning-intensive cases.

The new element is the explicit stage-wise schedule that encourages exploration early and tighter decisions later, tied to a cognitive model. That gives a clear rationale for the regularization and makes the inference steps more interpretable in principle. The reported lift on multiple datasets is the concrete result they put forward.

The soft spot is exactly the one in the stress-test note: there is no ablation that holds other factors fixed and shows the stage-dependent entropy is what produces the gain. Without that, or without details on dataset sizes, baseline implementations, and statistical tests, the attribution stays unverified. The abstract gives the headline number but little else to inspect.

This is aimed at people working on domain-specific LLM alignment or RL for structured reasoning in healthcare. A reader who wants a practical example of theory-inspired regularization in a high-stakes setting can extract the method description and the empirical claim.

The work shows clear thinking on the alignment problem and honest engagement with the application area. It deserves peer review so the experimental claims can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper proposes Cognitive Relative Policy Optimization (CRPO), an extension of group relative policy optimization that adds stage-dependent uncertainty modeling and stage-wise entropy regularization. Drawing on cognitive appraisal theory, it formalizes discrete cognitive reasoning stages to structure LLM inference for mental health assessment. The resulting Mental-R1 model is reported to deliver an average 10.4 percentage-point gain in weighted F1-score over the strongest RL baseline across eight mental-health datasets, with particular gains on reasoning-intensive instances.

Significance. If the empirical gains can be shown to arise specifically from the stage-wise entropy schedule and the cognitive-stage formalization rather than from generic RL tuning, the work would offer a concrete route to theory-aligned, more interpretable reasoning in a high-stakes domain. The explicit linkage to cognitive appraisal theory is a distinctive feature that could be leveraged for future interpretability studies.

major comments (2)

[Experiments] The central empirical claim attributes the 10.4 pp weighted-F1 improvement to the stage-wise entropy regularization and cognitive-stage formalization, yet no ablation isolates these components from other GRPO extensions or hyper-parameter choices. Without such controls (e.g., a direct GRPO baseline with matched entropy schedule), the causal link between the proposed mechanisms and the measured gain remains unverified.
[Method] The formalization of cognitive reasoning stages (inspired by appraisal theory) is presented as guiding interpretable inference, but the manuscript supplies neither the explicit mapping from theory constructs to the RL reward or policy terms nor any verification that this mapping is non-circular. This mapping is load-bearing for the claim of “theory-grounded” alignment.

minor comments (2)

[Experiments] Dataset names, sizes, class distributions, and preprocessing steps for the eight mental-health benchmarks are not summarized in the experimental section; these details are required for reproducibility and to assess whether the reported gains generalize across data regimes.
The abstract and results tables omit error bars, confidence intervals, or statistical significance tests for the 10.4 pp average improvement; inclusion of these would allow readers to judge whether the gains exceed run-to-run variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments highlight important areas for strengthening the empirical and theoretical claims. We address each point below and will incorporate the suggested revisions to improve the manuscript.

read point-by-point responses

Referee: [Experiments] The central empirical claim attributes the 10.4 pp weighted-F1 improvement to the stage-wise entropy regularization and cognitive-stage formalization, yet no ablation isolates these components from other GRPO extensions or hyper-parameter choices. Without such controls (e.g., a direct GRPO baseline with matched entropy schedule), the causal link between the proposed mechanisms and the measured gain remains unverified.

Authors: We agree that the current experiments do not fully isolate the contributions of stage-wise entropy regularization and cognitive-stage formalization from other design choices. In the revised version, we will add targeted ablations: (i) GRPO with a matched (non-stage-dependent) entropy schedule, (ii) CRPO without the cognitive-stage formalization, and (iii) variants that vary only the stage-dependent uncertainty modeling. These controls will directly test the causal role of the proposed mechanisms in the reported 10.4 pp gain. revision: yes
Referee: [Method] The formalization of cognitive reasoning stages (inspired by appraisal theory) is presented as guiding interpretable inference, but the manuscript supplies neither the explicit mapping from theory constructs to the RL reward or policy terms nor any verification that this mapping is non-circular. This mapping is load-bearing for the claim of “theory-grounded” alignment.

Authors: We acknowledge that the current manuscript does not provide an explicit, non-circular mapping from cognitive appraisal theory constructs to the specific RL reward and policy terms. In the revision, we will add a dedicated subsection that (a) states the precise correspondence between appraisal-theory stages and the stage-dependent uncertainty parameters, (b) shows how these parameters enter the advantage and entropy terms, and (c) includes a short verification that the mapping introduces constraints beyond standard GRPO (e.g., by contrasting with a generic stage-agnostic entropy schedule). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; empirical claims rest on external dataset evaluations

full rationale

The paper introduces CRPO as an extension of GRPO with added stage-wise entropy regularization and cognitive-stage formalization drawn from appraisal theory. These are presented as modeling choices whose value is measured by downstream F1 improvements on 8 held-out mental health datasets. No equations, fitted parameters, or self-citations are shown that would make the reported 10.4 pp gain equivalent to the input mechanisms by construction. The central claim therefore remains an empirical assertion open to falsification by ablation or replication rather than a self-referential identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The method itself is the contribution.

pith-pipeline@v0.9.1-grok · 5751 in / 1173 out tokens · 21193 ms · 2026-06-27T07:13:17.349326+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 16 canonical work pages · 9 internal anchors

[1]

Suicide,

World Health Organization, “Suicide,” https://www.who.int/news-room/ fact-sheets/detail/suicide, 2023, accessed: 2025-09-09

2023
[2]

Many ways to be lonely: fine-grained characterization of loneliness and its potential changes in covid-19,

Y . Jiang, Y . Jiang, L. Leqi, and P. Winkielman, “Many ways to be lonely: fine-grained characterization of loneliness and its potential changes in covid-19,” inProceedings of the International AAAI Conference on Web and Social Media, vol. 16, 2022, pp. 405–416

2022
[3]

Decoding loneliness: Can explainable ai help in understanding language differences in lonely older adults?

N. Wang, S. Goel, S. Ibrahim, V . D. Badal, C. Depp, E. Bilal, K. Subbalakshmi, and E. Lee, “Decoding loneliness: Can explainable ai help in understanding language differences in lonely older adults?” Psychiatry research, vol. 339, p. 116078, 2024

2024
[4]

Data set creation and empirical analysis for detecting signs of depression from social media postings,

K. Sampath and T. Durairaj, “Data set creation and empirical analysis for detecting signs of depression from social media postings,” inIn- ternational Conference on Computational Intelligence in Data Science. Springer, 2022, pp. 136–151

2022
[5]

Early identification of depression severity levels on reddit using ordinal classification,

U. Naseem, A. G. Dunn, J. Kim, and M. Khushi, “Early identification of depression severity levels on reddit using ordinal classification,” in Proceedings of the ACM web conference 2022, 2022, pp. 2563–2572

2022
[6]

Leverage social media for personalized stress detection,

X. Wang, H. Zhang, L. Cao, and L. Feng, “Leverage social media for personalized stress detection,” inProceedings of the 28th ACM international conference on multimedia, 2020, pp. 2710–2718

2020
[7]

A meta- learning based stress category detection framework on social media,

X. Wang, L. Cao, H. Zhang, L. Feng, Y . Ding, and N. Li, “A meta- learning based stress category detection framework on social media,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 2925–2935

2022
[8]

Towards preemptive detection of depression and anxiety in twitter,

D. Owen, J. Camacho-Collados, and L. E. Anke, “Towards preemptive detection of depression and anxiety in twitter,” inProceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, 2020, pp. 82–89

2020
[9]

Automatic anxiety recognition method based on microblog text analysis,

Y . Yu, Q. Li, and X. Liu, “Automatic anxiety recognition method based on microblog text analysis,”Frontiers in Public Health, vol. 11, p. 1080013, 2023

2023
[10]

Latent suicide risk detection on microblog via suicide-oriented word embeddings and layered attention,

L. Cao, H. Zhang, L. Feng, Z. Wei, X. Wang, N. Li, and X. He, “Latent suicide risk detection on microblog via suicide-oriented word embeddings and layered attention,” inProceedings of the 2019 Con- ference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp....

2019
[11]

Learning users inner thoughts and emotion changes for social media based suicide risk detection,

L. Cao, H. Zhang, X. Wang, and L. Feng, “Learning users inner thoughts and emotion changes for social media based suicide risk detection,” IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1280– 1296, 2021

2021
[12]

Mentalqlm: A lightweight large language model for mental healthcare based on instruction tuning and dual lora modules,

J. Shi, Z. Wang, J. Zhou, C. Liu, P. Z. Sun, E. Zhao, and L. Lu, “Mentalqlm: A lightweight large language model for mental healthcare based on instruction tuning and dual lora modules,”IEEE Journal of Biomedical and Health Informatics, 2025

2025
[13]

Mental-llm: Leveraging large language models for mental health prediction via online text data,

X. Xu, B. Yao, Y . Dong, S. Gabriel, H. Yu, J. Hendler, M. Ghassemi, A. K. Dey, and D. Wang, “Mental-llm: Leveraging large language models for mental health prediction via online text data,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 1, pp. 1–32, 2024

2024
[14]

Mentallama: interpretable mental health analysis on social media with large language models,

K. Yang, T. Zhang, Z. Kuang, Q. Xie, J. Huang, and S. Ananiadou, “Mentallama: interpretable mental health analysis on social media with large language models,” inProceedings of the ACM Web Conference 2024, 2024, pp. 4489–4500

2024
[15]

From pattern recognizers to personalized companions: A survey of large language models in mental health,

H. Hu, Y . Zhou, Q. Wang, Y . Zou, C. Ma, J. Si, J. Liu, Z. Yu, L. Cui, and F. Ma, “From pattern recognizers to personalized companions: A survey of large language models in mental health,” 2025

2025
[16]

Are llms effective psychological assessors? leveraging adaptive rag for interpretable mental health screening through psychometric practice,

F. Ravenda, S. A. Bahrainian, A. Raballo, A. Mira, and N. Kando, “Are llms effective psychological assessors? leveraging adaptive rag for interpretable mental health screening through psychometric practice,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 8975– 8991

2025
[17]

Re- view of predictive techniques for detecting mental disorders from user- generated content on social media,

M. S. Rohei, K. D. Varathan, S. Palaiahnakote, and N. B. Anuar, “Re- view of predictive techniques for detecting mental disorders from user- generated content on social media,”PeerJ Computer Science, vol. 12, p. e3559, 2026

2026
[18]

Mentalglm series: Explainable large language models for mental health analysis on chinese social media,

W. Zhai, N. Bai, Q. Zhao, J. Li, F. Wang, H. Qi, M. Jiang, X. Wang, B. X. Yang, and G. Fu, “Mentalglm series: Explainable large language models for mental health analysis on chinese social media,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 13 599–13 614

2025
[19]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, vol. 16, no. 5, pp. 1–72, 2025

2025
[20]

Cyberconfucius: An interactive confucian philosophical counseling system for mental health,

L. Zhao, B. Chen, W. Zheng, L. Zhou, and X. Ding, “Cyberconfucius: An interactive confucian philosophical counseling system for mental health,” inCompanion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2025, pp. 343–347

2025
[21]

Beyond empathy: Integrating diagnostic and therapeutic reasoning with large language models for mental health counseling,

H. Hu, Y . Zhou, J. Si, Q. Wang, H. Zhang, F. Ren, F. Ma, L. Cui, and Q. Tian, “Beyond empathy: Integrating diagnostic and therapeutic reasoning with large language models for mental health counseling,” arXiv preprint arXiv:2505.15715, 2025

work page arXiv 2025
[22]

EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

H. Hu, Y . Zhou, L. You, H. Xu, Q. Wang, Z. Lian, F. R. Yu, F. Ma, and L. Cui, “Emobench-m: Benchmarking emotional intelligence for multimodal large language models,”arXiv preprint arXiv:2502.04424, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Personax: A recommendation agent-oriented user modeling framework for long behavior sequence,

Y . Shi, W. Xu, Z. Zeqi, X. Zi, Q. Wu, and M. Xu, “Personax: A recommendation agent-oriented user modeling framework for long behavior sequence,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 5764–5787

2025
[24]

Agentselect: Benchmark for narrative query-to- agent recommendation,

Y . Shi, W. Xu, T. Chen, H. Shang, L. Yang, Y . Wan, Z. Cao, X. Zi, D. N. Metaxas, and M. Xu, “Agentselect: Benchmark for narrative query-to- agent recommendation,”arXiv preprint arXiv:2603.03761, 2026

work page arXiv 2026
[25]

Llm-assisted systematic review of large language models in clinical medicine,

S. F. Chen, A. Alyakin, A. Seas, E. Yang, J. J. Choi, J. V . Lee, A. L. Chen, P. I. Warman, R. T. Bitolas, R. J. Steeleet al., “Llm-assisted systematic review of large language models in clinical medicine,”Nature medicine, pp. 1–8, 2026

2026
[26]

Optimization inspired few-shot adaptation for large language models,

B. Gao, X. Wang, Y . Yang, and D. A. Clifton, “Optimization inspired few-shot adaptation for large language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=rZ2nSt1X58

2025
[27]

Llm post-training: A deep dive into reasoning large language models,

K. Kumar, T. Ashraf, O. Thawakar, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, P. H. Torr, F. S. Khan, and S. Khan, “Llm post-training: A deep dive into reasoning large language models,”arXiv preprint arXiv:2502.21321, 2025

work page arXiv 2025
[28]

J. S. Beck,Cognitive behavior therapy: Basics and beyond. Guilford Publications, 2020

2020
[29]

J. B. Persons,The case formulation approach to cognitive-behavior therapy. Guilford Press, 2012

2012
[30]

Kuyken, C

W. Kuyken, C. A. Padesky, and R. Dudley,Collaborative case con- ceptualization: Working effectively with clients in cognitive-behavioral therapy. Guilford Press, 2011

2011
[31]

T. D. Eells,Handbook of psychotherapy case formulation. Guilford Publications, 2022

2022
[32]

A. S. Elstein, L. S. Shulman, and S. A. Sprafka,Medical problem solving: An analysis of clinical reasoning. Harvard University Press, 1978

1978
[33]

Higgs, G

J. Higgs, G. M. Jensen, S. Loftus, F. V . Trede, and S. Grace,Clinical Reasoning in the Health Professions E-Book: Clinical Reasoning in the Health Professions E-Book. Elsevier Health Sciences, 2024

2024
[34]

H. N. Garb,Studying the clinician. American Psychological Association (APA), 1998

1998
[35]

The neural basis of decision making

J. Gold and M. Shadlen, “The neural basis of decision making.”Annual Review of Neuroscience, vol. 30, pp. 535–574, 2007

2007
[36]

Whatever next? predictive brains, situated agents, and the future of cognitive science,

A. Clark, “Whatever next? predictive brains, situated agents, and the future of cognitive science,”Behavioral and brain sciences, vol. 36, no. 3, pp. 181–204, 2013

2013
[37]

J. H. Wright, G. K. Brown, M. E. Thase, and M. R. Basco,Learning cognitive-behavior therapy: An illustrated guide. American Psychiatric Pub, 2017

2017
[38]

Reason and emotion in psychotherapy

A. Ellis, “Reason and emotion in psychotherapy.” 1962. 12

1962
[39]

A. T. Beck,Cognitive therapy and the emotional disorders. Penguin, 1979

1979
[40]

R. S. Lazarus and S. Folkman,Stress, appraisal, and coping. Springer publishing company, 1984

1984
[41]

Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models,

Z. Li, T. Xu, Y . Zhang, Z. Lin, Y . Yu, R. Sun, and Z. Luo, “Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models,” inForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024

2024
[42]

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization

J. Hu, J. K. Liu, H. Xu, and W. Shen, “Reinforce++: An efficient rlhf algorithm with robustness to both prompt and reward models,”arXiv preprint arXiv:2501.03262, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Some implications of cognitive appraisal theories of emotion,

P. C. Ellsworth, “Some implications of cognitive appraisal theories of emotion,” 1991

1991
[45]

Causes and consequences of emotions on consumer behaviour: A review and integrative cognitive appraisal theory,

L. Watson and M. T. Spence, “Causes and consequences of emotions on consumer behaviour: A review and integrative cognitive appraisal theory,”European Journal of marketing, vol. 41, no. 5/6, pp. 487–511, 2007

2007
[46]

Language-based detection of depression with machine learning: systematic review and meta-analysis,

H. Fisher, N. M. Jaffe, K. Pidvirny, A. O. Tierney, M. S. Vaidean, P. Dongre, and C. A. Webb, “Language-based detection of depression with machine learning: systematic review and meta-analysis,”npj Digital Medicine, 2026

2026
[47]

Contrastive learning of stress-specific word embedding for social media based stress detection,

X. Wang, H. Zhang, L. Cao, K. Zeng, Q. Li, N. Li, and L. Feng, “Contrastive learning of stress-specific word embedding for social media based stress detection,” inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 5137– 5149

2023
[48]

Mise: Meta-knowledge inheritance for social media-based stressor estimation,

X. Wang, L. Feng, H. Zhang, L. Cao, K. Zeng, Q. Li, Y . Ding, Y . Dai, and D. Clifton, “Mise: Meta-knowledge inheritance for social media-based stressor estimation,” inProceedings of the ACM on Web Conference 2025, 2025, pp. 1866–1876

2025
[49]

A scoping review of natural language processing for detecting work-related stress among health professionals,

C. Ikae, S. Ben Souissi, J. S. Bieri, T. J. M ¨uller, M. C. Feuz-Schlunegger, and C. Golz, “A scoping review of natural language processing for detecting work-related stress among health professionals,”Discover Computing, vol. 29, no. 1, p. 14, 2026

2026
[50]

Language anxiety detection in english texts using bert and svm,

R. Hidayat, K. M. Lhaksmana, and I. K. Nurhayati, “Language anxiety detection in english texts using bert and svm,” in2025 International Conference on Data Science and Its Applications (ICoDSA). IEEE, 2025, pp. 357–362

2025
[51]

Knowledge-aware assessment of severity of suicide risk for early intervention,

M. Gaur, A. Alambo, J. P. Sain, U. Kursuncu, K. Thirunarayan, R. Kavuluru, A. Sheth, R. Welton, and J. Pathak, “Knowledge-aware assessment of severity of suicide risk for early intervention,” inThe world wide web conference, 2019, pp. 514–525

2019
[52]

Suicide ideation detection using social media data and ensemble machine learning model,

E. KINA, J.-G. Choi, A. Ishaq, R. Shafique, M. G. Villar, E. S. Alvarado, I. d. l. T. Diez, and I. Ashraf, “Suicide ideation detection using social media data and ensemble machine learning model,”International Journal of Computational Intelligence Systems, 2026

2026
[53]

Category-aware chronic stress detection on microblogs,

L. Cao, H. Zhang, N. Li, X. Wang, W. Ri, and L. Feng, “Category-aware chronic stress detection on microblogs,”IEEE journal of biomedical and health informatics, vol. 26, no. 2, pp. 852–864, 2021

2021
[54]

Rsd-15k: A large-scale user-level annotated dataset for suicide risk detection on social media,

S. Zheng, Y . Tao, and T. Zhou, “Rsd-15k: A large-scale user-level annotated dataset for suicide risk detection on social media,” in2025 IEEE 41st International Conference on Data Engineering Workshops (ICDEW). IEEE, 2025, pp. 190–196

2025
[55]

Evaluation of chatgpt for nlp-based mental health applications,

B. Lamichhane, “Evaluation of chatgpt for nlp-based mental health applications,”arXiv preprint arXiv:2303.15727, 2023

work page arXiv 2023
[56]

Feelings behind words: A systematic review on how effective is nlp-based assessment for mental health diagnosis in human studies,

E. P. Yulianti, Y . S. E. Putri, B. A. Keliat, and A. N. Hidayanto, “Feelings behind words: A systematic review on how effective is nlp-based assessment for mental health diagnosis in human studies,”International Journal of Medical Informatics, p. 106129, 2025

2025
[57]

Detecting multiple mental health disorders with large language models,

M. Nanda, D. Inkpen, and A. Dargel, “Detecting multiple mental health disorders with large language models,” in2024 28th International Conference Information Visualisation (IV). IEEE, 2024, pp. 252–257

2024
[58]

The applications of large language models in mental health: scoping review,

Y . Jin, J. Liu, P. Li, B. Wang, Y . Yan, H. Zhang, C. Ni, J. Wang, Y . Li, Y . Buet al., “The applications of large language models in mental health: scoping review,”Journal of Medical Internet Research, vol. 27, no. 1, p. e69284, 2025

2025
[59]

Let’s verify step by step,

H. Lightman, V . Kosaraju, Y . Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” inThe twelfth international conference on learning represen- tations, 2023

2023
[60]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

Consistent paths lead to truth: Self- rewarding reinforcement learning for llm reasoning,

W. Zhang, Y . Liuet al., “Consistent paths lead to truth: Self- rewarding reinforcement learning for llm reasoning,”arXiv preprint arXiv:2506.08745, 2025. [Online]. Available: https://arxiv.org/abs/2506. 08745

work page arXiv 2025
[62]

Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model?

Y . Yue, Z. Chen, R. Lu, A. Zhao, Z. Wang, Y . Yue, S. Song, and G. Huang, “Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model?” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=4OsgYD7em5

2025
[63]

A Survey of Reinforcement Learning for Large Reasoning Models

K. Zhang, Y . Zuo, B. He, Y . Sun, R. Liu, C. Jiang, Y . Fan, K. Tian, G. Jia, P. Liet al., “A survey of reinforcement learning for large reasoning models,”arXiv preprint arXiv:2509.08827, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[64]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022

2022
[65]

Rlaif: Scaling re- inforcement learning from human feedback with ai feedback

H. Lee, S. Phatale, H. Mansoor, K. R. Lu, T. Mesnard, J. Ferret, C. Bishop, E. Hall, V . Carbune, and A. Rastogi, “Rlaif: Scaling re- inforcement learning from human feedback with ai feedback.”
[66]

Direct preference optimization: Your language model is secretly a reward model,

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023

2023
[67]

Rest-rl: Achieving accurate code reasoning of llms with optimized self-training and decoding,

Y . Zhou, B. Lianget al., “Rest-rl: Achieving accurate code reasoning of llms with optimized self-training and decoding,” arXiv preprint arXiv:2508.19576, 2025. [Online]. Available: https: //arxiv.org/abs/2508.19576

work page arXiv 2025
[69]

Available: https://arxiv.org/abs/2505.15034

[Online]. Available: https://arxiv.org/abs/2505.15034

work page arXiv
[70]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Q. Yu, Z. Zhang, R. Zhu, Y . Yuan, X. Zuo, Y . Yue, W. Dai, T. Fan, G. Liu, L. Liuet al., “Dapo: An open-source llm reinforcement learning system at scale,”arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[71]

Dreaddit: A reddit dataset for stress analysis in social media,

E. Turcan and K. McKeown, “Dreaddit: A reddit dataset for stress analysis in social media,”EMNLP-IJCNLP 2019, p. 97, 2019

2019
[72]

Deep learning for suicide and depression identification with unsupervised label correction,

A. Haque, V . Reddi, and T. Giallanza, “Deep learning for suicide and depression identification with unsupervised label correction,” in International Conference on Artificial Neural Networks. Springer, 2021, pp. 436–447

2021
[73]

Loneliness-intensity,

Y . K. Yael Katsman, Hilly Segal, “Loneliness-intensity,” https:// huggingface.co/datasets/yael-katsman/Loneliness-Causes-and-Intensity, 2025

2025
[74]

Back to basics: Revisiting reinforce-style optimization for learning from human feedback in llms,

A. Ahmadian, C. Cremer, M. Gall ´e, M. Fadaee, J. Kreutzer, O. Pietquin, A. ¨Ust¨un, and S. Hooker, “Back to basics: Revisiting reinforce-style optimization for learning from human feedback in llms,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16,...

2024
[75]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[76]

Gemma 2: Improving Open Language Models at a Practical Size

G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ram ´eet al., “Gemma 2: Improving open language models at a practical size,”arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[77]

The llama 3 herd of models,

A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv e-prints, pp. arXiv–2407, 2024

2024
[78]

DeepSeek-V3 Technical Report

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[79]

Gpt3.5 turbo fine-tuning and api updates,

OpenAI, “Gpt3.5 turbo fine-tuning and api updates,” https://openai.com/ index/gpt-3-5-turbo-fine-tuning-and-api-updates/, 2023

2023
[80]

Hello gpt-4o,

Openai, “Hello gpt-4o,” https://openai.com/index/hello-gpt-4o/, 2024

2024
[81]

Gpt-5 is here,

OpenAI, “Gpt-5 is here,” https://openai.com/gpt-5/, 2025. 13 APPENDIX A. System Prompt for Mental-R1 The prompt in last line will be replaced with the specific mental health question during usage. “A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The Assistant must explicitly think through the reasoning pro...

2025

[1] [1]

Suicide,

World Health Organization, “Suicide,” https://www.who.int/news-room/ fact-sheets/detail/suicide, 2023, accessed: 2025-09-09

2023

[2] [2]

Many ways to be lonely: fine-grained characterization of loneliness and its potential changes in covid-19,

Y . Jiang, Y . Jiang, L. Leqi, and P. Winkielman, “Many ways to be lonely: fine-grained characterization of loneliness and its potential changes in covid-19,” inProceedings of the International AAAI Conference on Web and Social Media, vol. 16, 2022, pp. 405–416

2022

[3] [3]

Decoding loneliness: Can explainable ai help in understanding language differences in lonely older adults?

N. Wang, S. Goel, S. Ibrahim, V . D. Badal, C. Depp, E. Bilal, K. Subbalakshmi, and E. Lee, “Decoding loneliness: Can explainable ai help in understanding language differences in lonely older adults?” Psychiatry research, vol. 339, p. 116078, 2024

2024

[4] [4]

Data set creation and empirical analysis for detecting signs of depression from social media postings,

K. Sampath and T. Durairaj, “Data set creation and empirical analysis for detecting signs of depression from social media postings,” inIn- ternational Conference on Computational Intelligence in Data Science. Springer, 2022, pp. 136–151

2022

[5] [5]

Early identification of depression severity levels on reddit using ordinal classification,

U. Naseem, A. G. Dunn, J. Kim, and M. Khushi, “Early identification of depression severity levels on reddit using ordinal classification,” in Proceedings of the ACM web conference 2022, 2022, pp. 2563–2572

2022

[6] [6]

Leverage social media for personalized stress detection,

X. Wang, H. Zhang, L. Cao, and L. Feng, “Leverage social media for personalized stress detection,” inProceedings of the 28th ACM international conference on multimedia, 2020, pp. 2710–2718

2020

[7] [7]

A meta- learning based stress category detection framework on social media,

X. Wang, L. Cao, H. Zhang, L. Feng, Y . Ding, and N. Li, “A meta- learning based stress category detection framework on social media,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 2925–2935

2022

[8] [8]

Towards preemptive detection of depression and anxiety in twitter,

D. Owen, J. Camacho-Collados, and L. E. Anke, “Towards preemptive detection of depression and anxiety in twitter,” inProceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, 2020, pp. 82–89

2020

[9] [9]

Automatic anxiety recognition method based on microblog text analysis,

Y . Yu, Q. Li, and X. Liu, “Automatic anxiety recognition method based on microblog text analysis,”Frontiers in Public Health, vol. 11, p. 1080013, 2023

2023

[10] [10]

Latent suicide risk detection on microblog via suicide-oriented word embeddings and layered attention,

L. Cao, H. Zhang, L. Feng, Z. Wei, X. Wang, N. Li, and X. He, “Latent suicide risk detection on microblog via suicide-oriented word embeddings and layered attention,” inProceedings of the 2019 Con- ference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp....

2019

[11] [11]

Learning users inner thoughts and emotion changes for social media based suicide risk detection,

L. Cao, H. Zhang, X. Wang, and L. Feng, “Learning users inner thoughts and emotion changes for social media based suicide risk detection,” IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1280– 1296, 2021

2021

[12] [12]

Mentalqlm: A lightweight large language model for mental healthcare based on instruction tuning and dual lora modules,

J. Shi, Z. Wang, J. Zhou, C. Liu, P. Z. Sun, E. Zhao, and L. Lu, “Mentalqlm: A lightweight large language model for mental healthcare based on instruction tuning and dual lora modules,”IEEE Journal of Biomedical and Health Informatics, 2025

2025

[13] [13]

Mental-llm: Leveraging large language models for mental health prediction via online text data,

X. Xu, B. Yao, Y . Dong, S. Gabriel, H. Yu, J. Hendler, M. Ghassemi, A. K. Dey, and D. Wang, “Mental-llm: Leveraging large language models for mental health prediction via online text data,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 1, pp. 1–32, 2024

2024

[14] [14]

Mentallama: interpretable mental health analysis on social media with large language models,

K. Yang, T. Zhang, Z. Kuang, Q. Xie, J. Huang, and S. Ananiadou, “Mentallama: interpretable mental health analysis on social media with large language models,” inProceedings of the ACM Web Conference 2024, 2024, pp. 4489–4500

2024

[15] [15]

From pattern recognizers to personalized companions: A survey of large language models in mental health,

H. Hu, Y . Zhou, Q. Wang, Y . Zou, C. Ma, J. Si, J. Liu, Z. Yu, L. Cui, and F. Ma, “From pattern recognizers to personalized companions: A survey of large language models in mental health,” 2025

2025

[16] [16]

Are llms effective psychological assessors? leveraging adaptive rag for interpretable mental health screening through psychometric practice,

F. Ravenda, S. A. Bahrainian, A. Raballo, A. Mira, and N. Kando, “Are llms effective psychological assessors? leveraging adaptive rag for interpretable mental health screening through psychometric practice,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 8975– 8991

2025

[17] [17]

Re- view of predictive techniques for detecting mental disorders from user- generated content on social media,

M. S. Rohei, K. D. Varathan, S. Palaiahnakote, and N. B. Anuar, “Re- view of predictive techniques for detecting mental disorders from user- generated content on social media,”PeerJ Computer Science, vol. 12, p. e3559, 2026

2026

[18] [18]

Mentalglm series: Explainable large language models for mental health analysis on chinese social media,

W. Zhai, N. Bai, Q. Zhao, J. Li, F. Wang, H. Qi, M. Jiang, X. Wang, B. X. Yang, and G. Fu, “Mentalglm series: Explainable large language models for mental health analysis on chinese social media,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 13 599–13 614

2025

[19] [19]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, vol. 16, no. 5, pp. 1–72, 2025

2025

[20] [20]

Cyberconfucius: An interactive confucian philosophical counseling system for mental health,

L. Zhao, B. Chen, W. Zheng, L. Zhou, and X. Ding, “Cyberconfucius: An interactive confucian philosophical counseling system for mental health,” inCompanion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2025, pp. 343–347

2025

[21] [21]

Beyond empathy: Integrating diagnostic and therapeutic reasoning with large language models for mental health counseling,

H. Hu, Y . Zhou, J. Si, Q. Wang, H. Zhang, F. Ren, F. Ma, L. Cui, and Q. Tian, “Beyond empathy: Integrating diagnostic and therapeutic reasoning with large language models for mental health counseling,” arXiv preprint arXiv:2505.15715, 2025

work page arXiv 2025

[22] [22]

EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

H. Hu, Y . Zhou, L. You, H. Xu, Q. Wang, Z. Lian, F. R. Yu, F. Ma, and L. Cui, “Emobench-m: Benchmarking emotional intelligence for multimodal large language models,”arXiv preprint arXiv:2502.04424, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Personax: A recommendation agent-oriented user modeling framework for long behavior sequence,

Y . Shi, W. Xu, Z. Zeqi, X. Zi, Q. Wu, and M. Xu, “Personax: A recommendation agent-oriented user modeling framework for long behavior sequence,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 5764–5787

2025

[24] [24]

Agentselect: Benchmark for narrative query-to- agent recommendation,

Y . Shi, W. Xu, T. Chen, H. Shang, L. Yang, Y . Wan, Z. Cao, X. Zi, D. N. Metaxas, and M. Xu, “Agentselect: Benchmark for narrative query-to- agent recommendation,”arXiv preprint arXiv:2603.03761, 2026

work page arXiv 2026

[25] [25]

Llm-assisted systematic review of large language models in clinical medicine,

S. F. Chen, A. Alyakin, A. Seas, E. Yang, J. J. Choi, J. V . Lee, A. L. Chen, P. I. Warman, R. T. Bitolas, R. J. Steeleet al., “Llm-assisted systematic review of large language models in clinical medicine,”Nature medicine, pp. 1–8, 2026

2026

[26] [26]

Optimization inspired few-shot adaptation for large language models,

B. Gao, X. Wang, Y . Yang, and D. A. Clifton, “Optimization inspired few-shot adaptation for large language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=rZ2nSt1X58

2025

[27] [27]

Llm post-training: A deep dive into reasoning large language models,

K. Kumar, T. Ashraf, O. Thawakar, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, P. H. Torr, F. S. Khan, and S. Khan, “Llm post-training: A deep dive into reasoning large language models,”arXiv preprint arXiv:2502.21321, 2025

work page arXiv 2025

[28] [28]

J. S. Beck,Cognitive behavior therapy: Basics and beyond. Guilford Publications, 2020

2020

[29] [29]

J. B. Persons,The case formulation approach to cognitive-behavior therapy. Guilford Press, 2012

2012

[30] [30]

Kuyken, C

W. Kuyken, C. A. Padesky, and R. Dudley,Collaborative case con- ceptualization: Working effectively with clients in cognitive-behavioral therapy. Guilford Press, 2011

2011

[31] [31]

T. D. Eells,Handbook of psychotherapy case formulation. Guilford Publications, 2022

2022

[32] [32]

A. S. Elstein, L. S. Shulman, and S. A. Sprafka,Medical problem solving: An analysis of clinical reasoning. Harvard University Press, 1978

1978

[33] [33]

Higgs, G

J. Higgs, G. M. Jensen, S. Loftus, F. V . Trede, and S. Grace,Clinical Reasoning in the Health Professions E-Book: Clinical Reasoning in the Health Professions E-Book. Elsevier Health Sciences, 2024

2024

[34] [34]

H. N. Garb,Studying the clinician. American Psychological Association (APA), 1998

1998

[35] [35]

The neural basis of decision making

J. Gold and M. Shadlen, “The neural basis of decision making.”Annual Review of Neuroscience, vol. 30, pp. 535–574, 2007

2007

[36] [36]

Whatever next? predictive brains, situated agents, and the future of cognitive science,

A. Clark, “Whatever next? predictive brains, situated agents, and the future of cognitive science,”Behavioral and brain sciences, vol. 36, no. 3, pp. 181–204, 2013

2013

[37] [37]

J. H. Wright, G. K. Brown, M. E. Thase, and M. R. Basco,Learning cognitive-behavior therapy: An illustrated guide. American Psychiatric Pub, 2017

2017

[38] [38]

Reason and emotion in psychotherapy

A. Ellis, “Reason and emotion in psychotherapy.” 1962. 12

1962

[39] [39]

A. T. Beck,Cognitive therapy and the emotional disorders. Penguin, 1979

1979

[40] [40]

R. S. Lazarus and S. Folkman,Stress, appraisal, and coping. Springer publishing company, 1984

1984

[41] [41]

Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models,

Z. Li, T. Xu, Y . Zhang, Z. Lin, Y . Yu, R. Sun, and Z. Luo, “Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models,” inForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024

2024

[42] [42]

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization

J. Hu, J. K. Liu, H. Xu, and W. Shen, “Reinforce++: An efficient rlhf algorithm with robustness to both prompt and reward models,”arXiv preprint arXiv:2501.03262, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Some implications of cognitive appraisal theories of emotion,

P. C. Ellsworth, “Some implications of cognitive appraisal theories of emotion,” 1991

1991

[45] [45]

Causes and consequences of emotions on consumer behaviour: A review and integrative cognitive appraisal theory,

L. Watson and M. T. Spence, “Causes and consequences of emotions on consumer behaviour: A review and integrative cognitive appraisal theory,”European Journal of marketing, vol. 41, no. 5/6, pp. 487–511, 2007

2007

[46] [46]

Language-based detection of depression with machine learning: systematic review and meta-analysis,

H. Fisher, N. M. Jaffe, K. Pidvirny, A. O. Tierney, M. S. Vaidean, P. Dongre, and C. A. Webb, “Language-based detection of depression with machine learning: systematic review and meta-analysis,”npj Digital Medicine, 2026

2026

[47] [47]

Contrastive learning of stress-specific word embedding for social media based stress detection,

X. Wang, H. Zhang, L. Cao, K. Zeng, Q. Li, N. Li, and L. Feng, “Contrastive learning of stress-specific word embedding for social media based stress detection,” inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 5137– 5149

2023

[48] [48]

Mise: Meta-knowledge inheritance for social media-based stressor estimation,

X. Wang, L. Feng, H. Zhang, L. Cao, K. Zeng, Q. Li, Y . Ding, Y . Dai, and D. Clifton, “Mise: Meta-knowledge inheritance for social media-based stressor estimation,” inProceedings of the ACM on Web Conference 2025, 2025, pp. 1866–1876

2025

[49] [49]

A scoping review of natural language processing for detecting work-related stress among health professionals,

C. Ikae, S. Ben Souissi, J. S. Bieri, T. J. M ¨uller, M. C. Feuz-Schlunegger, and C. Golz, “A scoping review of natural language processing for detecting work-related stress among health professionals,”Discover Computing, vol. 29, no. 1, p. 14, 2026

2026

[50] [50]

Language anxiety detection in english texts using bert and svm,

R. Hidayat, K. M. Lhaksmana, and I. K. Nurhayati, “Language anxiety detection in english texts using bert and svm,” in2025 International Conference on Data Science and Its Applications (ICoDSA). IEEE, 2025, pp. 357–362

2025

[51] [51]

Knowledge-aware assessment of severity of suicide risk for early intervention,

M. Gaur, A. Alambo, J. P. Sain, U. Kursuncu, K. Thirunarayan, R. Kavuluru, A. Sheth, R. Welton, and J. Pathak, “Knowledge-aware assessment of severity of suicide risk for early intervention,” inThe world wide web conference, 2019, pp. 514–525

2019

[52] [52]

Suicide ideation detection using social media data and ensemble machine learning model,

E. KINA, J.-G. Choi, A. Ishaq, R. Shafique, M. G. Villar, E. S. Alvarado, I. d. l. T. Diez, and I. Ashraf, “Suicide ideation detection using social media data and ensemble machine learning model,”International Journal of Computational Intelligence Systems, 2026

2026

[53] [53]

Category-aware chronic stress detection on microblogs,

L. Cao, H. Zhang, N. Li, X. Wang, W. Ri, and L. Feng, “Category-aware chronic stress detection on microblogs,”IEEE journal of biomedical and health informatics, vol. 26, no. 2, pp. 852–864, 2021

2021

[54] [54]

Rsd-15k: A large-scale user-level annotated dataset for suicide risk detection on social media,

S. Zheng, Y . Tao, and T. Zhou, “Rsd-15k: A large-scale user-level annotated dataset for suicide risk detection on social media,” in2025 IEEE 41st International Conference on Data Engineering Workshops (ICDEW). IEEE, 2025, pp. 190–196

2025

[55] [55]

Evaluation of chatgpt for nlp-based mental health applications,

B. Lamichhane, “Evaluation of chatgpt for nlp-based mental health applications,”arXiv preprint arXiv:2303.15727, 2023

work page arXiv 2023

[56] [56]

Feelings behind words: A systematic review on how effective is nlp-based assessment for mental health diagnosis in human studies,

E. P. Yulianti, Y . S. E. Putri, B. A. Keliat, and A. N. Hidayanto, “Feelings behind words: A systematic review on how effective is nlp-based assessment for mental health diagnosis in human studies,”International Journal of Medical Informatics, p. 106129, 2025

2025

[57] [57]

Detecting multiple mental health disorders with large language models,

M. Nanda, D. Inkpen, and A. Dargel, “Detecting multiple mental health disorders with large language models,” in2024 28th International Conference Information Visualisation (IV). IEEE, 2024, pp. 252–257

2024

[58] [58]

The applications of large language models in mental health: scoping review,

Y . Jin, J. Liu, P. Li, B. Wang, Y . Yan, H. Zhang, C. Ni, J. Wang, Y . Li, Y . Buet al., “The applications of large language models in mental health: scoping review,”Journal of Medical Internet Research, vol. 27, no. 1, p. e69284, 2025

2025

[59] [59]

Let’s verify step by step,

H. Lightman, V . Kosaraju, Y . Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” inThe twelfth international conference on learning represen- tations, 2023

2023

[60] [60]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[61] [61]

Consistent paths lead to truth: Self- rewarding reinforcement learning for llm reasoning,

W. Zhang, Y . Liuet al., “Consistent paths lead to truth: Self- rewarding reinforcement learning for llm reasoning,”arXiv preprint arXiv:2506.08745, 2025. [Online]. Available: https://arxiv.org/abs/2506. 08745

work page arXiv 2025

[62] [62]

Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model?

Y . Yue, Z. Chen, R. Lu, A. Zhao, Z. Wang, Y . Yue, S. Song, and G. Huang, “Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model?” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=4OsgYD7em5

2025

[63] [63]

A Survey of Reinforcement Learning for Large Reasoning Models

K. Zhang, Y . Zuo, B. He, Y . Sun, R. Liu, C. Jiang, Y . Fan, K. Tian, G. Jia, P. Liet al., “A survey of reinforcement learning for large reasoning models,”arXiv preprint arXiv:2509.08827, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[64] [64]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022

2022

[65] [65]

Rlaif: Scaling re- inforcement learning from human feedback with ai feedback

H. Lee, S. Phatale, H. Mansoor, K. R. Lu, T. Mesnard, J. Ferret, C. Bishop, E. Hall, V . Carbune, and A. Rastogi, “Rlaif: Scaling re- inforcement learning from human feedback with ai feedback.”

[66] [66]

Direct preference optimization: Your language model is secretly a reward model,

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023

2023

[67] [67]

Rest-rl: Achieving accurate code reasoning of llms with optimized self-training and decoding,

Y . Zhou, B. Lianget al., “Rest-rl: Achieving accurate code reasoning of llms with optimized self-training and decoding,” arXiv preprint arXiv:2508.19576, 2025. [Online]. Available: https: //arxiv.org/abs/2508.19576

work page arXiv 2025

[68] [69]

Available: https://arxiv.org/abs/2505.15034

[Online]. Available: https://arxiv.org/abs/2505.15034

work page arXiv

[69] [70]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Q. Yu, Z. Zhang, R. Zhu, Y . Yuan, X. Zuo, Y . Yue, W. Dai, T. Fan, G. Liu, L. Liuet al., “Dapo: An open-source llm reinforcement learning system at scale,”arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[70] [71]

Dreaddit: A reddit dataset for stress analysis in social media,

E. Turcan and K. McKeown, “Dreaddit: A reddit dataset for stress analysis in social media,”EMNLP-IJCNLP 2019, p. 97, 2019

2019

[71] [72]

Deep learning for suicide and depression identification with unsupervised label correction,

A. Haque, V . Reddi, and T. Giallanza, “Deep learning for suicide and depression identification with unsupervised label correction,” in International Conference on Artificial Neural Networks. Springer, 2021, pp. 436–447

2021

[72] [73]

Loneliness-intensity,

Y . K. Yael Katsman, Hilly Segal, “Loneliness-intensity,” https:// huggingface.co/datasets/yael-katsman/Loneliness-Causes-and-Intensity, 2025

2025

[73] [74]

Back to basics: Revisiting reinforce-style optimization for learning from human feedback in llms,

A. Ahmadian, C. Cremer, M. Gall ´e, M. Fadaee, J. Kreutzer, O. Pietquin, A. ¨Ust¨un, and S. Hooker, “Back to basics: Revisiting reinforce-style optimization for learning from human feedback in llms,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16,...

2024

[74] [75]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[75] [76]

Gemma 2: Improving Open Language Models at a Practical Size

G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ram ´eet al., “Gemma 2: Improving open language models at a practical size,”arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[76] [77]

The llama 3 herd of models,

A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv e-prints, pp. arXiv–2407, 2024

2024

[77] [78]

DeepSeek-V3 Technical Report

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[78] [79]

Gpt3.5 turbo fine-tuning and api updates,

OpenAI, “Gpt3.5 turbo fine-tuning and api updates,” https://openai.com/ index/gpt-3-5-turbo-fine-tuning-and-api-updates/, 2023

2023

[79] [80]

Hello gpt-4o,

Openai, “Hello gpt-4o,” https://openai.com/index/hello-gpt-4o/, 2024

2024

[80] [81]

Gpt-5 is here,

OpenAI, “Gpt-5 is here,” https://openai.com/gpt-5/, 2025. 13 APPENDIX A. System Prompt for Mental-R1 The prompt in last line will be replaced with the specific mental health question during usage. “A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The Assistant must explicitly think through the reasoning pro...

2025