pith. machine review for the scientific record. sign in

arxiv: 2605.10804 · v1 · submitted 2026-05-11 · 💻 cs.AI · cs.CY· cs.HC

Recognition: 3 theorem links

· Lean Theorem

New AI-Driven Tools for Enhancing Campus Well-being: A Prevention and Intervention Approach

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:20 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.HC
keywords campus well-beingmental health detectionAI chatbotsreinforcement learningmulti-model reasoningLLM assessmentPHQ-8survey adaptation
0
0 comments X

The pith

A unified AI framework combines adaptive survey chatbots with stacked mental health detection models to enhance campus well-being.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops AI-driven tools to address gaps in monitoring student satisfaction and detecting mental health risks at universities. It introduces prevention methods using chatbots like TigerGPT and an adaptive reinforcement learning system called AURA to improve survey response quality, and intervention methods with models like PsychoGPT and SMMR for accurate, explainable mental health assessments based on clinical guidelines. The central claim is that these tools can be unified so that insights from adaptive surveys directly inform specialized detection models, potentially leading to more effective prevention and intervention strategies. A reader would care because better well-being support could improve academic success and reduce risks associated with untreated mental health issues.

Core claim

The dissertation establishes a cohesive framework that unifies prevention tools for improving feedback collection through personalized and adaptive conversations with intervention tools for advancing mental health detection using expressive narratives and multi-model reasoning, allowing adaptive survey insights to flow directly into specialized mental health detection models.

What carries the argument

The cohesive framework enabling adaptive survey insights to flow directly into specialized mental health detection models, supported by AURA's reinforcement learning adaptation using LSDE quality signals and SMMR's layered expert models for task decomposition and reconciliation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the framework holds, real-world deployment could enable proactive identification of at-risk students through routine surveys.
  • Integration with university systems might allow tailored interventions based on collected data.
  • Extensions could test the framework's performance across different cultural or demographic student groups to address potential biases.

Load-bearing premise

The assumption that the LSDE quality signal accurately reflects conversation quality and that clinical guidelines combined with linguistic features can reliably detect mental health risks in campus populations without bias or hallucination.

What would settle it

Observing no significant difference in mental health detection accuracy or survey engagement metrics when comparing the proposed tools to standard methods in a controlled campus trial would challenge the central claims.

Figures

Figures reproduced from arXiv: 2605.10804 by Jinwen Tang.

Figure 3.1
Figure 3.1. Figure 3.1: Overview of TigerGPT’s initial role selection process. This flowchart [PITH_FULL_IMAGE:figures/full_fig_p043_3_1.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p043_3.png] view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: A screenshot of TigerGPT’s initial interface. A welcome screen prompting [PITH_FULL_IMAGE:figures/full_fig_p044_3_2.png] view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: TigerGPT offers users a variety of survey topics and an option of random [PITH_FULL_IMAGE:figures/full_fig_p044_3_3.png] view at source ↗
Figure 3.4
Figure 3.4. Figure 3.4: Sample Conversation with TigerGPT. This screenshot demonstrates five [PITH_FULL_IMAGE:figures/full_fig_p045_3_4.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p046_3.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p047_3.png] view at source ↗
Figure 3.5
Figure 3.5. Figure 3.5: Sample Conversation for a Sensitive Topic. This example shows TigerGPT’s [PITH_FULL_IMAGE:figures/full_fig_p048_3_5.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p048_3.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p049_3.png] view at source ↗
Figure 3.6
Figure 3.6. Figure 3.6: TigerGPT V2 Preview Building on the original TigerGPT (Section 3.2), this upgraded version broadens the chatbot’s audience to include prospective students and parents [PITH_FULL_IMAGE:figures/full_fig_p050_3_6.png] view at source ↗
Figure 3.7
Figure 3.7. Figure 3.7: TigerGPT V2 Customized Topic Feature summarization report). By integrating a more open conversation flow with automated reporting, TigerGPT V2 captures a diverse spectrum of community needs with minimal overhead. 3.4 New Reinforcement Learning Framework for Survey Chat￾bot - AURA Building on the pilot findings in Section 3.2, we address Gap #2: current chat-based surveys lack within-session adaptation an… view at source ↗
Figure 3.8
Figure 3.8. Figure 3.8: AURA system architecture showing the reinforcement learning cycle [PITH_FULL_IMAGE:figures/full_fig_p053_3_8.png] view at source ↗
Figure 3.9
Figure 3.9. Figure 3.9: Two-level learning framework. Offline learning (top) extracts patterns [PITH_FULL_IMAGE:figures/full_fig_p054_3_9.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p092_4.png] view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Illustration of BERT Embeddings [99] • Within-Post Shuffling: Randomly permutes sentences within a single post, simulating moderate logical breaks. • Cross-Post Shuffling: Mixes sentences from posts sharing the same label, creating severe narrative disorganization. Consistent with our findings from Phase 1, we limited our experiments to BERT(128) and MentalBERT(128) due to their superior performance in c… view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Illustration of Two Sentence Shuffling Manipulations [PITH_FULL_IMAGE:figures/full_fig_p094_4_2.png] view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Histogram of ChatGPT token counts of combined participant dialogues in [PITH_FULL_IMAGE:figures/full_fig_p097_4_3.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p098_4.png] view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: Three-Stage Mental Health Evaluation Framework: Comprehensive Anal [PITH_FULL_IMAGE:figures/full_fig_p102_4_4.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p109_4.png] view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: Overview of the Stacked Multi-Model Reasoning (SMMR) framework. [PITH_FULL_IMAGE:figures/full_fig_p110_4_5.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p114_4.png] view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: SMMR Prompt for DAIC-WOZ Dataset This pipeline effectively creates multiple “checkpoints” for error correction and multi-expert verification, ultimately aiming to improve the reliability of mental health evaluations in lengthy and complex conversations. The detailed result will be presented in Chapter 5. 96 [PITH_FULL_IMAGE:figures/full_fig_p115_4_6.png] view at source ↗
Figure 4.7
Figure 4.7. Figure 4.7: SMMR Prompt for Case Study Dataset 97 [PITH_FULL_IMAGE:figures/full_fig_p116_4_7.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p117_5.png] view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: Evaluation results of user opinions towards TigerGPT. These three bar [PITH_FULL_IMAGE:figures/full_fig_p118_5_1.png] view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Evaluation results of participant preference. This pie chart shows how [PITH_FULL_IMAGE:figures/full_fig_p118_5_2.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p118_5.png] view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: Word cloud visualization of user feedback. The left cloud highlights [PITH_FULL_IMAGE:figures/full_fig_p119_5_3.png] view at source ↗
Figure 5.4
Figure 5.4. Figure 5.4: Distributions of positive, neutral, and negative sentiment scores from [PITH_FULL_IMAGE:figures/full_fig_p119_5_4.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p119_5.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p120_5.png] view at source ↗
Figure 5.5
Figure 5.5. Figure 5.5: Psycho Analyst prediction accuracy on the overall DAIC-WOZ dataset for [PITH_FULL_IMAGE:figures/full_fig_p144_5_5.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p147_5.png] view at source ↗
Figure 5.6
Figure 5.6. Figure 5.6: Frequency Distribution of Absolute Differences Across Three Stages of [PITH_FULL_IMAGE:figures/full_fig_p149_5_6.png] view at source ↗
read the original abstract

Campus well-being underpins academic success, yet many universities lack effective methods for monitoring satisfaction and detecting mental health risks. This dissertation addresses these gaps through prevention (improving feedback collection) and intervention (advancing mental health detection), unified under an integrated framework. For prevention, we developed TigerGPT, a personalized survey chatbot leveraging LLMs to engage users in context-aware conversations grounded in conversational design and engagement theory, achieving 75% usability and 81% satisfaction. To address its limitations in repetitiveness and response depth, we introduced AURA, a reinforcement-learning framework that adapts follow-up question types (validate, specify, reflect, probe) within a session using an LSDE quality signal (Length, Self-disclosure, Emotion, Specificity), initialized from 96 prior conversations. AURA achieved +0.12 mean quality gain (p=0.044, d=0.66), with 63% fewer specification prompts and 10x more validation behavior. For intervention, we examine Expressive Narrative Stories (ENS) for mental health screening, showing BERT(128) captures nuanced linguistic features without keyword cues, while conventional classifiers depend heavily on explicit mental health terms. We then developed PsychoGPT, an LLM built on DSM-5 and PHQ-8 guidelines that performs initial distress classification, symptom-level scoring, and reconciliation with external ratings for explainable assessment. To reduce hallucinations, we proposed Stacked Multi-Model Reasoning (SMMR), layering expert models where early layers handle localized subtasks and later layers reconcile findings, outperforming single-model solutions on DAIC-WOZ in accuracy, F1, and PHQ-8 scoring. Finally, a cohesive framework unifies these tools, enabling adaptive survey insights to flow directly into specialized mental health detection models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a dissertation developing AI tools for campus well-being: TigerGPT as a personalized LLM survey chatbot (75% usability, 81% satisfaction), AURA as an RL framework adapting question types via an LSDE quality signal on 96 prior conversations (+0.12 mean quality gain, p=0.044, d=0.66), PsychoGPT for DSM-5/PHQ-8 based distress classification and symptom scoring, and SMMR as stacked multi-model reasoning that outperforms single models on DAIC-WOZ in accuracy, F1, and PHQ-8 scoring. These are unified under a cohesive framework in which adaptive survey insights flow directly into specialized mental health detection models.

Significance. If the integration claim holds, the work offers a prevention-intervention pipeline with concrete, statistically supported gains in conversation quality and detection performance on established benchmarks like DAIC-WOZ. The use of RL for adaptive prompting and layered reasoning to mitigate hallucinations are promising directions for applied mental-health AI. However, the absence of end-to-end validation limits the assessed novelty and practical impact.

major comments (2)
  1. [Abstract] Abstract (final paragraph): the central claim that 'a cohesive framework unifies these tools, enabling adaptive survey insights to flow directly into specialized mental health detection models' is unsupported by any reported experiments, ablations, or case studies. AURA (LSDE-driven RL) and SMMR/PsychoGPT are evaluated in isolation on separate data (prior conversations and DAIC-WOZ); no results show that AURA-generated conversations improve downstream distress classification, PHQ-8 scoring, or reduce hallucinations when fed into the detection stack.
  2. [AURA section] AURA description and evaluation: the LSDE quality signal (Length, Self-disclosure, Emotion, Specificity) is used to adapt prompts and claim a +0.12 gain, yet the manuscript provides no validation of this signal against external human quality ratings, clinical outcomes, or inter-rater reliability. Without such grounding, it is unclear whether the reported behavioral changes (63% fewer specification prompts, 10x validation) translate to improved mental-health signals for the downstream models.
minor comments (2)
  1. [Abstract] The abstract reports 75% usability and 81% satisfaction for TigerGPT but does not specify the measurement instruments, sample size, or comparison baselines.
  2. [SMMR evaluation] SMMR is described as outperforming single-model solutions on DAIC-WOZ, but the manuscript does not detail the exact baselines, data splits, or whether the gains are statistically significant after multiple-comparison correction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We agree that the manuscript's description of the unified framework requires qualification, as the components were developed and evaluated independently. We will revise the abstract and add explicit discussion of limitations and future integration plans. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final paragraph): the central claim that 'a cohesive framework unifies these tools, enabling adaptive survey insights to flow directly into specialized mental health detection models' is unsupported by any reported experiments, ablations, or case studies. AURA (LSDE-driven RL) and SMMR/PsychoGPT are evaluated in isolation on separate data (prior conversations and DAIC-WOZ); no results show that AURA-generated conversations improve downstream distress classification, PHQ-8 scoring, or reduce hallucinations when fed into the detection stack.

    Authors: We acknowledge that the abstract overstates the integration. The manuscript presents the tools as modular components within a proposed prevention-intervention pipeline, with AURA outputs conceptually feeding into PsychoGPT/SMMR, but no end-to-end experiments, ablations, or case studies were conducted. This is a genuine limitation of the dissertation, which compiles separate studies. We will revise the abstract to describe the framework as a conceptual unification based on design modularity rather than demonstrated data flow, and we will add a dedicated limitations paragraph in the discussion section outlining the need for future end-to-end validation on shared datasets. revision: yes

  2. Referee: [AURA section] AURA description and evaluation: the LSDE quality signal (Length, Self-disclosure, Emotion, Specificity) is used to adapt prompts and claim a +0.12 gain, yet the manuscript provides no validation of this signal against external human quality ratings, clinical outcomes, or inter-rater reliability. Without such grounding, it is unclear whether the reported behavioral changes (63% fewer specification prompts, 10x validation) translate to improved mental-health signals for the downstream models.

    Authors: The LSDE dimensions were selected from prior literature on conversational quality in mental-health dialogues as a practical proxy signal for RL training. The reported +0.12 mean quality gain, p-value, effect size, and behavioral shifts (63% fewer specification prompts, 10x validation) were computed directly from this internal signal on the 96 conversations. We agree that external validation against human ratings, clinical outcomes, or inter-rater reliability is absent and that this leaves open whether the improvements enhance downstream mental-health signals. We will add a limitations subsection in the AURA section explicitly stating the proxy nature of LSDE and recommending future human-evaluation studies to ground the signal. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on separate evaluations

full rationale

The paper reports experimental outcomes for TigerGPT (usability/satisfaction), AURA (LSDE-based RL quality gain on initialized conversations), ENS linguistic analysis, PsychoGPT, and SMMR (outperformance on DAIC-WOZ) as independent results. The final unified framework statement simply asserts integration of these components without any derivation, equation, or self-referential reduction that equates outputs to inputs by construction. No self-citations, fitted parameters renamed as predictions, or ansatzes are load-bearing in the provided chain. All performance metrics (p-values, F1, accuracy) are tied to external benchmarks or held-out data rather than tautological re-measurement of the optimization signal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the named tools themselves. The LSDE signal and DSM-5/PHQ-8 guidelines are treated as inputs from prior work.

pith-pipeline@v0.9.0 · 5623 in / 1261 out tokens · 45768 ms · 2026-05-12T04:20:35.171357+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 2 internal anchors

  1. [1]

    Assessment of campus climate to enhance student success

    S. A. Vogel, J. K. Holt, S. Sligar, and E. Leake, “Assessment of campus climate to enhance student success.”Journal of Postsecondary Education and Disability, vol. 21, no. 1, pp. 15–31, 2008

  2. [2]

    D. A. Dillman, J. D. Smyth, and L. M. Christian,Internet, phone, mail, and mixed-mode surveys: The tailored design method. John Wiley & Sons, 2014

  3. [3]

    Effects of questionnaire length on participation and indicators of response quality in a web survey,

    M. Galesic and M. Bosnjak, “Effects of questionnaire length on participation and indicators of response quality in a web survey,”Public opinion quarterly, vol. 73, no. 2, pp. 349–360, 2009

  4. [4]

    Effect of questionnaire length, personalisation and reminder type on response rate to a complex postal survey: randomised controlled trial,

    S. Sahlqvist, Y. Song, F. Bull, E. Adams, J. Preston, D. Ogilvie, and iConnect Consortium, “Effect of questionnaire length, personalisation and reminder type on response rate to a complex postal survey: randomised controlled trial,”BMC medical research methodology, vol. 11, pp. 1–8, 2011

  5. [5]

    Shortfall in mental health service utilisation,

    G. Andrews, C. Issakidis, and G. Carter, “Shortfall in mental health service utilisation,”The British Journal of Psychiatry, vol. 179, no. 5, pp. 417–425, 2001

  6. [6]

    Mental health of college students and their non–college-attending peers: results from the national epidemiologic study on alcohol and related conditions,

    C. Blanco, M. Okuda, C. Wright, D. S. Hasin, B. F. Grant, S.-M. Liu, and M. Olfson, “Mental health of college students and their non–college-attending peers: results from the national epidemiologic study on alcohol and related conditions,”Archives of general psychiatry, vol. 65, no. 12, pp. 1429–1437, 2008. 146

  7. [7]

    Perceived barriers and facilitators to mental health help-seeking in young people: a systematic review,

    A. Gulliver, K. M. Griffiths, and H. Christensen, “Perceived barriers and facilitators to mental health help-seeking in young people: a systematic review,” BMC psychiatry, vol. 10, no. 1, pp. 1–9, 2010

  8. [8]

    How do people come to use mental health services? current knowledge and changing perspectives

    B. A. Pescosolido and C. A. Boyer, “How do people come to use mental health services? current knowledge and changing perspectives.” 1999

  9. [9]

    Building citizen trust to enhance satisfaction in digital public services: the role of empathetic chatbot communication,

    M. Zhou, L. Liu, and Y. Feng, “Building citizen trust to enhance satisfaction in digital public services: the role of empathetic chatbot communication,” Behaviour & Information Technology, pp. 1–20, 2025

  10. [10]

    Tell me about yourself: Using an ai-powered chatbot to conduct conversational surveys with open-ended questions,

    Z. Xiao, M. X. Zhou, Q. V. Liao, G. Mark, C. Chi, W. Chen, and H. Yang, “Tell me about yourself: Using an ai-powered chatbot to conduct conversational surveys with open-ended questions,”ACM Transactions on Computer-Human Interaction (TOCHI), vol. 27, no. 3, pp. 1–37, 2020

  11. [11]

    Opinebot: Class feedback reimagined using a conversational llm,

    H. Tanwar, K. Shrivastva, R. Singh, and D. Kumar, “Opinebot: Class feedback reimagined using a conversational llm,”arXiv preprint arXiv:2401.15589, 2024

  12. [12]

    Decoding linguistic nuances in mental health text classification using expressive narrative stories,

    J. Tang, Q. Guo, Y. Zhao, and Y. Shang, “Decoding linguistic nuances in mental health text classification using expressive narrative stories,” in2024 IEEE 6th International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2024, pp. 207–216

  13. [13]

    Advancing mental health pre-screening: A new custom gpt for psychological distress assessment,

    J. Tang and Y. Shang, “Advancing mental health pre-screening: A new custom gpt for psychological distress assessment,” in2024 IEEE 6th International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2024, pp. 162–171

  14. [14]

    Tigergpt: A theory-driven ai chatbot for adaptive campus climate surveys,

    J. Tang, S. Chen, and Y. Shang, “Tigergpt: A theory-driven ai chatbot for adaptive campus climate surveys,” in2025 IEEE International Conference on Future Machine Learning and Data Science (FMLDS). IEEE, 2025, pp. 668–673. 147

  15. [15]

    Aura: A reinforcement learning framework for ai-driven adaptive conversational surveys,

    J. Tang and Y. Shang, “Aura: A reinforcement learning framework for ai-driven adaptive conversational surveys,”arXiv preprint arXiv:2510.27126, 2025

  16. [16]

    A layered multi-expert framework for long-context mental health assessments,

    J. Tang, Q. Guo, W. Sun, and Y. Shang, “A layered multi-expert framework for long-context mental health assessments,” in2025 IEEE Conference on Artificial Intelligence (CAI), 2025, pp. 435–440

  17. [17]

    University student surveys using chatbots: artificial intelligence conversational agents,

    N. Abbas, T. Pickard, E. Atwell, and A. Walker, “University student surveys using chatbots: artificial intelligence conversational agents,” inInternational Conference on Human-Computer Interaction. Springer, 2021, pp. 155–169

  18. [18]

    Engaging students to fill surveys using chatbots: University case study,

    N. Belhaj, A. Hamdane, N. E. H. Chaoui, H. Chaoui, and M. El Bekkali, “Engaging students to fill surveys using chatbots: University case study,” Indones. J. Electr. Eng. Comput. Sci, vol. 24, no. 1, pp. 473–483, 2021

  19. [19]

    Online chat and chatbots to enhance mature student engagement in higher education,

    N. Abbas, J. Whitfield, E. Atwell, H. Bowman, T. Pickard, and A. Walker, “Online chat and chatbots to enhance mature student engagement in higher education,”International Journal of Lifelong Education, vol. 41, no. 3, pp. 308–326, 2022

  20. [20]

    Ai-driven student assistance: chatbots redefining university support,

    S. Martinez-Requejo, E. J. García, S. R. Duarte, J. R. Lázaro, E. P. Sanz, and G. M. Vivas, “Ai-driven student assistance: chatbots redefining university support,” inINTED2024 Proceedings. IATED, 2024, pp. 617–625

  21. [21]

    Teaching cs50 with ai,

    R. Liu, C. Zenke, C. Liu, A. Holmes, P. Thornton, and D. J. Malan, “Teaching cs50 with ai,”Portland, OR, US: ACM, 2024

  22. [22]

    Conversational survey chatbot: User experience and perception,

    A. Njegušet al., “Conversational survey chatbot: User experience and perception,” inSinteza 2021-International Scientific Conference on Information Technology and Data Related Research. Singidunum University, 2021, pp. 322–327

  23. [23]

    Comparing chatbots and online surveys for (longitudinal) data collection: an investigation of response 148 characteristics, data quality, and user evaluation,

    B. Zarouali, T. Araujo, J. Ohme, and C. de Vreese, “Comparing chatbots and online surveys for (longitudinal) data collection: an investigation of response 148 characteristics, data quality, and user evaluation,”Communication Methods and Measures, vol. 18, no. 1, pp. 72–91, 2024

  24. [24]

    Comparing data from chatbot and web surveys: Effects of platform and conversational style on survey response quality,

    S. Kim, J. Lee, and G. Gweon, “Comparing data from chatbot and web surveys: Effects of platform and conversational style on survey response quality,” in Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–12

  25. [25]

    A review on implementation issues of rule-based chatbot systems,

    S. A. Thorat and V. Jadhav, “A review on implementation issues of rule-based chatbot systems,” inProceedings of the international conference on innovative computing & communications (ICICC), 2020

  26. [26]

    Designing a chatbot for survivors of sexual violence: Exploratory study for hybrid approach combining rule-based chatbot and ml-based chatbot,

    W. Maeng and J. Lee, “Designing a chatbot for survivors of sexual violence: Exploratory study for hybrid approach combining rule-based chatbot and ml-based chatbot,” inProceedings of the Asian CHI Symposium 2021, 2021, pp. 160–166

  27. [27]

    The challenges in designing a prevention chatbot for eating disorders: observational study,

    W. W. Chan, E. E. Fitzsimmons-Craft, A. C. Smith, M.-L. Firebaugh, L. A. Fowler, B. DePietro, N. Topooco, D. E. Wilfley, C. B. Taylor, and N. C. Jacobson, “The challenges in designing a prevention chatbot for eating disorders: observational study,”JMIR Formative Research, vol. 6, no. 1, p. e28003, 2022

  28. [28]

    E-mail subject lines and their effect on web survey viewing and response,

    S. R. Porter and M. E. Whitcomb, “E-mail subject lines and their effect on web survey viewing and response,”Social Science Computer Review, vol. 23, no. 3, pp. 380–387, 2005

  29. [29]

    The psychological meaning of words: Liwc and computerized text analysis methods,

    Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of words: Liwc and computerized text analysis methods,”Journal of language and social psychology, vol. 29, no. 1, pp. 24–54, 2010

  30. [30]

    Concreteness ratings for 40 thousand generally known english word lemmas,

    M. Brysbaert, A. B. Warriner, and V. Kuperman, “Concreteness ratings for 40 thousand generally known english word lemmas,”Behavior research methods, vol. 46, no. 3, pp. 904–911, 2014. 149

  31. [31]

    The measurement of trust and its relationship to self-disclosure,

    L. R. Wheeless and J. Grotz, “The measurement of trust and its relationship to self-disclosure,”Human Communication Research, vol. 3, no. 3, pp. 250–257, 1977

  32. [32]

    W. B. Gudykunst and S. Ting-Toomey,Culture and interpersonal communication. Sage Publications, 1988

  33. [33]

    Linguistic styles: Language use as an individual difference,

    J. W. Pennebaker and L. A. King, “Linguistic styles: Language use as an individual difference,”Journal of Personality and Social Psychology, vol. 77, no. 6, pp. 1296–1312, 1999

  34. [34]

    Tourangeau, L

    R. Tourangeau, L. J. Rips, and K. Rasinski,The psychology of survey response. Cambridge University Press, 2000

  35. [35]

    Cognitive burden of survey questions and response times: A psycholinguistic experiment,

    T. Lenzner, L. Kaczmirek, and A. Lenzner, “Cognitive burden of survey questions and response times: A psycholinguistic experiment,”Applied cognitive psychology, vol. 24, no. 7, pp. 1003–1020, 2010

  36. [36]

    Conducting sensitive interviews: A review of reflections,

    A. Melville and D. Hincks, “Conducting sensitive interviews: A review of reflections,”Law and Method, vol. 1, no. 1, pp. 1–26, 2016

  37. [37]

    R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1

  38. [38]

    Dynamic policy networks for task-oriented dialogue with reinforcement learning,

    P.-H. Su, M. Gasic, S. Young, M. Gašić, and S. Young, “Dynamic policy networks for task-oriented dialogue with reinforcement learning,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 2021, pp. 341–352

  39. [39]

    Contextual bandits for adaptive dialogue management,

    Y. Sun, X. Li, K. Zhou, and X. Wang, “Contextual bandits for adaptive dialogue management,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1152–1163, 2023

  40. [40]

    The de facto us mental and addictive disorders service system: 150 Epidemiologic catchment area prospective 1-year prevalence rates of disorders and services,

    D. A. Regier, W. E. Narrow, D. S. Rae, R. W. Manderscheid, B. Z. Locke, and F. K. Goodwin, “The de facto us mental and addictive disorders service system: 150 Epidemiologic catchment area prospective 1-year prevalence rates of disorders and services,”Archives of general psychiatry, vol. 50, no. 2, pp. 85–94, 1993

  41. [41]

    GPT-4 Technical Report

    R. OpenAI, “Gpt-4 technical report. arxiv 2303.08774,”View in Article, vol. 2, p. 13, 2023

  42. [42]

    Advancing mental health pre-screening: A new custom gpt for psychological distress assessment,

    J. Tang and Y. Shang, “Advancing mental health pre-screening: A new custom gpt for psychological distress assessment,” in2024 IEEE 6th International Conference on Cognitive Machine Intelligence (CogMI), 2024, pp. 162–171

  43. [43]

    Zero-shot strike: Testing the generalisation capabilities of out-of-the-box llm models for depression detection,

    J. Ohse, B. Hadžić, P. Mohammed, N. Peperkorn, M. Danner, A. Yorita, N. Kubota, M. Rätsch, and Y. Shiban, “Zero-shot strike: Testing the generalisation capabilities of out-of-the-box llm models for depression detection,” Computer Speech & Language, vol. 88, p. 101663, 2024

  44. [44]

    A depression detection model based on sentiment analysis in micro-blog social network,

    X. Wang, C. Zhang, Y. Ji, L. Sun, L. Wu, and Z. Bao, “A depression detection model based on sentiment analysis in micro-blog social network,” inTrends and Applications in Knowledge Discovery and Data Mining: PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Gold Coast, QLD, Australia, April 14-17, 2013, Revised Selected Papers 1...

  45. [45]

    Depression and self-harm risk assessment in online forums,

    A. Yates, A. Cohan, and N. Goharian, “Depression and self-harm risk assessment in online forums,”arXiv preprint arXiv:1709.01848, 2017

  46. [46]

    Semi-supervised approach to monitoring clinical depressive symptoms in social media,

    A. H. Yazdavar, H. S. Al-Olimat, M. Ebrahimi, G. Bajaj, T. Banerjee, K. Thirunarayan, J. Pathak, and A. Sheth, “Semi-supervised approach to monitoring clinical depressive symptoms in social media,” inProceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, 2017, pp. 1191–1198

  47. [47]

    The distress analysis 151 interview corpus of human and computer interviews

    J. Gratch, R. Artstein, G. M. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsellaet al., “The distress analysis 151 interview corpus of human and computer interviews.” inLREC. Reykjavik, 2014, pp. 3123–3128

  48. [48]

    Simsensei kiosk: A virtual human interviewer for healthcare decision support,

    D. DeVault, R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgila, J. Gratch, A. Hartholt, M. Lhommetet al., “Simsensei kiosk: A virtual human interviewer for healthcare decision support,” inProceedings of the 2014 international conference on Autonomous agents and multi-agent systems, 2014, pp. 1061–1068

  49. [49]

    Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition,

    F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E.-M. Messneret al., “Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition,” inProceedings of the 9th International on Audio/visual Emotion Challenge and Workshop, 2019, pp. 3–12

  50. [50]

    Comparison of natural language processing models for depression detection in chatbot dialogues,

    C. A. Belser, “Comparison of natural language processing models for depression detection in chatbot dialogues,” Ph.D. dissertation, Massachusetts Institute of Technology, 2023

  51. [51]

    A novel automated depression detection technique using text transcript,

    U. Yadav and A. K. Sharma, “A novel automated depression detection technique using text transcript,”International Journal of Imaging Systems and Technology, vol. 33, no. 1, pp. 108–122, 2023

  52. [52]

    Towards automatic text-based estimation of depression through symptom prediction,

    K. Milintsevich, K. Sirts, and G. Dias, “Towards automatic text-based estimation of depression through symptom prediction,”Brain Informatics, vol. 10, no. 1, pp. 1–14, 2023

  53. [53]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,”ACM Transactions on Information Systems, 2023. 152

  54. [54]

    Large legal fictions: Profiling legal hallucinations in large language models,

    M. Dahl, V. Magesh, M. Suzgun, and D. E. Ho, “Large legal fictions: Profiling legal hallucinations in large language models,”Journal of Legal Analysis, vol. 16, no. 1, pp. 64–93, 2024

  55. [55]

    Machine learning and natural language processing in mental health: systematic review,

    A. Le Glaz, Y. Haralambous, D.-H. Kim-Dufor, P. Lenca, R. Billot, T. C. Ryan, J. Marsh, J. Devylder, M. Walter, S. Berrouiguetet al., “Machine learning and natural language processing in mental health: systematic review,”Journal of Medical Internet Research, vol. 23, no. 5, p. e15708, 2021

  56. [56]

    Predicting mental conditions based on “history of present illness

    T. Tran and R. Kavuluru, “Predicting mental conditions based on “history of present illness” in psychiatric notes with deep neural networks,”Journal of biomedical informatics, vol. 75, pp. S138–S148, 2017

  57. [57]

    Natural language processing applied to mental illness detection: a narrative review,

    T. Zhang, A. M. Schoene, S. Ji, and S. Ananiadou, “Natural language processing applied to mental illness detection: a narrative review,”NPJ digital medicine, vol. 5, no. 1, p. 46, 2022

  58. [58]

    Artificial intelligence for mental health and mental illnesses: an overview,

    S. Graham, C. Depp, E. E. Lee, C. Nebeker, X. Tu, H.-C. Kim, and D. V. Jeste, “Artificial intelligence for mental health and mental illnesses: an overview,” Current psychiatry reports, vol. 21, pp. 1–18, 2019

  59. [59]

    Machine learning in mental health: a scoping review of methods and applications,

    A. B. Shatte, D. M. Hutchinson, and S. J. Teague, “Machine learning in mental health: a scoping review of methods and applications,”Psychological medicine, vol. 49, no. 9, pp. 1426–1448, 2019

  60. [60]

    Provider perspectives on integrating sensor-captured patient-generated data in mental health care,

    A. Ng, R. Kornfield, S. M. Schueller, A. K. Zalta, M. Brennan, and M. Reddy, “Provider perspectives on integrating sensor-captured patient-generated data in mental health care,”Proceedings of the ACM on human-computer interaction, vol. 3, no. CSCW, pp. 1–25, 2019

  61. [61]

    0969 predicting differences between objective and subjective sleep parameters with 153 mental health questionnaires,

    M. Marsolek, S. Emert, J. R. Dietch, E. Tucker, and D. Taylor, “0969 predicting differences between objective and subjective sleep parameters with 153 mental health questionnaires,”SLEEP, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:269288613

  62. [62]

    A content analysis of depression-related tweets,

    P. A. Cavazos-Rehg, M. J. Krauss, S. Sowles, S. Connolly, C. Rosas, M. Bharadwaj, and L. J. Bierut, “A content analysis of depression-related tweets,”Computers in human behavior, vol. 54, pp. 351–357, 2016

  63. [63]

    Psychological aspects of natural language use: Our words, our selves,

    J. W. Pennebaker, M. R. Mehl, and K. G. Niederhoffer, “Psychological aspects of natural language use: Our words, our selves,”Annual review of psychology, vol. 54, no. 1, pp. 547–577, 2003

  64. [64]

    Redeveloping diction: theoretical considerations,

    R. P. Hart, “Redeveloping diction: theoretical considerations,”Progress in communication sciences, pp. 43–60, 2001

  65. [65]

    Confronting a traumatic event: toward an understanding of inhibition and disease

    J. W. Pennebaker and S. K. Beall, “Confronting a traumatic event: toward an understanding of inhibition and disease.”Journal of abnormal psychology, vol. 95, no. 3, p. 274, 1986

  66. [66]

    Disclosure of traumas and immune function: health implications for psychotherapy

    J. W. Pennebaker, J. K. Kiecolt-Glaser, and R. Glaser, “Disclosure of traumas and immune function: health implications for psychotherapy.”Journal of consulting and clinical psychology, vol. 56, no. 2, p. 239, 1988

  67. [67]

    Expressive writing and blood pressure

    K. Davidson, A. R. Schwartz, D. Sheffield, R. S. McCord, S. J. Lepore, and W. Gerin, “Expressive writing and blood pressure.” 2002

  68. [68]

    The effects of expressive writing on adjustment to hiv,

    I. D. Rivkin, J. Gustafson, I. Weingarten, and D. Chin, “The effects of expressive writing on adjustment to hiv,”AIDS and Behavior, vol. 10, pp. 13–26, 2006

  69. [69]

    Machine learning driven mental stress detection on reddit posts using natural language processing,

    S. Inamdar, R. Chapekar, S. Gite, and B. Pradhan, “Machine learning driven mental stress detection on reddit posts using natural language processing,” Human-Centric Intelligent Systems, vol. 3, no. 2, pp. 80–91, 2023

  70. [70]

    Exploring the role of chatgpt in patient care (diagnosis and treatment) 154 and medical research: A systematic review,

    R. K. Garg, V. L. Urs, A. A. Agarwal, S. K. Chaudhary, V. Paliwal, and S. K. Kar, “Exploring the role of chatgpt in patient care (diagnosis and treatment) 154 and medical research: A systematic review,”Health Promotion Perspectives, vol. 13, no. 3, p. 183, 2023

  71. [71]

    Toward expert-level medical question answering with large language models,

    K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, M. Amin, L. Hou, K. Clark, S. R. Pfohl, H. Cole-Lewiset al., “Toward expert-level medical question answering with large language models,”Nature Medicine, pp. 1–8, 2025

  72. [72]

    Clinical text summarization: adapting large language models can outperform human experts,

    D. Van Veen, C. Van Uden, L. Blankemeier, J.-B. Delbrouck, A. Aali, C. Bluethgen, A. Pareek, M. Polacin, E. P. Reis, A. Seehofnerovaet al., “Clinical text summarization: adapting large language models can outperform human experts,”Research square, pp. rs–3, 2023

  73. [73]

    Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering,

    A. Agrawal, “Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering,”arXiv preprint arXiv:2402.05127, 2024

  74. [74]

    Soullmate: An application enhancing diverse mental health support with adaptive llms, prompt engineering, and rag techniques,

    Q. Guo, J. Tang, W. Sun, H. Tang, Y. Shang, and W. Wang, “Soullmate: An application enhancing diverse mental health support with adaptive llms, prompt engineering, and rag techniques,”arXiv preprint arXiv:2410.16322, 2024

  75. [75]

    Soullmate: An adaptive llm-driven system for advanced mental health support and assessment, based on a systematic application survey, October 2024

    ——, “Soullmate: An adaptive llm-driven system for advanced mental health support and assessment, based on a systematic application survey,”arXiv preprint arXiv:2410.11859, 2024

  76. [76]

    Large language models for mental health applications: Systematic review,

    Z. Guo, A. Lai, J. H. Thygesen, J. Farrington, T. Keen, K. Liet al., “Large language models for mental health applications: Systematic review,”JMIR mental health, vol. 11, no. 1, p. e57400, 2024

  77. [77]

    Combining hierachical vaes with llms for clinically meaningful timeline summarisation in social media,

    J. Song, J. Chim, A. Tsakalidis, J. Ive, D. Atzil-Slonim, and M. Liakata, “Combining hierachical vaes with llms for clinically meaningful timeline summarisation in social media,”arXiv preprint arXiv:2401.16240, 2024. 155

  78. [78]

    Human guided empathetic ai agent for mental health support leveraging reinforcement learning-enhanced retrieval-augmented generation,

    G. Soman, M. Judy, and A. M. Abou, “Human guided empathetic ai agent for mental health support leveraging reinforcement learning-enhanced retrieval-augmented generation,”Cognitive Systems Research, vol. 90, p. 101337, 2025

  79. [79]

    Mental-llm: Leveraging large language models for mental health prediction via online text data,

    X. Xu, B. Yao, Y. Dong, S. Gabriel, H. Yu, J. Hendler, M. Ghassemi, A. K. Dey, and D. Wang, “Mental-llm: Leveraging large language models for mental health prediction via online text data,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 1, pp. 1–32, 2024

  80. [80]

    The personalization of conversational agents in health care: systematic review,

    A. B. Kocaballi, S. Berkovsky, J. C. Quiroz, L. Laranjo, H. L. Tong, D. Rezazadegan, A. Briatore, and E. Coiera, “The personalization of conversational agents in health care: systematic review,”Journal of medical Internet research, vol. 21, no. 11, p. e15360, 2019

Showing first 80 references.