Phi-4 and Gemma-2-9B maintain high intra-model consistency (ICC > 0.89) and ASR robustness for HADS scoring while Llama-3.1-8B degrades sharply, with all models showing score-evidence dissociation.
Enhanced large language models for ef- fective screening of depression and anxiety
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Systematic comparison of nine text-only and three multimodal LLMs using in-context learning, reasoning prompts, fine-tuning, and multimodal fusion on DementiaBank speech data finds class-centroid demonstrations and token-level fine-tuning most effective, with adapted open models matching or beating
citing papers explorer
-
Can We Trust LLMs for Mental Health Screening? Consistency, ASR Robustness, and Evidence Faithfulness
Phi-4 and Gemma-2-9B maintain high intra-model consistency (ICC > 0.89) and ASR robustness for HADS scoring while Llama-3.1-8B degrades sharply, with all models showing score-evidence dissociation.
-
Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
Systematic comparison of nine text-only and three multimodal LLMs using in-context learning, reasoning prompts, fine-tuning, and multimodal fusion on DementiaBank speech data finds class-centroid demonstrations and token-level fine-tuning most effective, with adapted open models matching or beating