Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Ali Zolnour; Fatemeh Taherinezhad; Hossein AzadMaleki; Maryam Dadkhah; Maryam Zolnoori; Mohamad Javad Momeni Nezhad; Sepehr Karimi; Sina Rashidi; Yasaman Haghbin

arxiv: 2509.03525 · v2 · submitted 2025-08-24 · 💻 cs.CL · cs.AI· eess.AS

Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Fatemeh Taherinezhad , Mohamad Javad Momeni Nezhad , Sepehr Karimi , Sina Rashidi , Ali Zolnour , Maryam Dadkhah , Yasaman Haghbin , Hossein AzadMaleki

show 1 more author

Maryam Zolnoori

This is my paper

Pith reviewed 2026-05-18 21:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AIeess.AS

keywords dementia detectionspeech screeningLLM adaptationin-context learningfine-tuningDementiaBankcognitive screeningmultimodal models

0 comments

The pith

Properly adapted open-weight language models can match or exceed commercial systems in detecting dementia from speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates nine text-only and three multimodal LLMs on the DementiaBank speech corpus to find the best ways to adapt them for dementia detection. It tests in-context learning with different example selection methods, reasoning steps in prompts, parameter-efficient fine-tuning, and audio-text combinations. Results show that class-centroid examples work best for in-context learning, reasoning helps smaller models, and fine-tuning with a classification head produces the strongest outcomes overall. Multimodal approaches do not beat the top text-only models. These findings indicate that careful adaptation lets accessible models reach or surpass proprietary commercial performance for speech-based cognitive screening.

Core claim

Through systematic comparison of adaptation strategies on the DementiaBank corpus, the paper shows that open-weight models, after targeted adjustments such as class-centroid demonstration selection in in-context learning, reasoning-augmented prompting, and token-level fine-tuning, reach detection performance that matches or exceeds commercial systems, while adding a classification head substantially lifts weaker models and multimodal audio integration does not provide further gains over the best text-only results.

What carries the argument

Systematic testing of LLM adaptation strategies including class-centroid demonstration selection for in-context learning, reasoning-augmented prompting, and parameter-efficient fine-tuning applied to speech transcripts for dementia classification.

If this is right

Class-centroid demonstrations produce the highest accuracy among in-context learning policies.
Reasoning steps in prompts improve results most for smaller models.
Token-level fine-tuning combined with a classification head delivers the strongest overall scores.
Fine-tuned multimodal audio-text models fail to surpass the best adapted text-only models.
Open-weight models become viable replacements for commercial systems once adapted with these methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These adaptation techniques could support low-cost screening tools deployed in primary care or mobile apps to reach more undiagnosed individuals.
Testing the same strategies on speech data from non-English speakers or different cultural groups could reveal whether the gains generalize globally.
Longitudinal use of adapted models on repeated recordings might enable tracking of cognitive changes rather than one-time detection.

Load-bearing premise

Performance measured on the DementiaBank speech corpus will translate to useful results in real clinical screening with varied patient speech and demographics.

What would settle it

A new evaluation on an independent set of spontaneous speech recordings from undiagnosed older adults across diverse demographics that yields substantially lower accuracy than the reported benchmark scores.

Figures

Figures reproduced from arXiv: 2509.03525 by Ali Zolnour, Fatemeh Taherinezhad, Hossein AzadMaleki, Maryam Dadkhah, Maryam Zolnoori, Mohamad Javad Momeni Nezhad, Sepehr Karimi, Sina Rashidi, Yasaman Haghbin.

**Figure 4.** Figure 4: A presents validation F1-scores for each LLM using 2–12 in-context demonstrations across four selection strategies. Demonstrations selected by Average Similarity to class centroids achieved the highest or joint-highest F1-scores in five models and ranked second in three others. The Most Similar strategy generally produced the next-best performance, with notable results for GPT-4o and Gemini2.0. Least Simi… view at source ↗

**Figure 4.** Figure 4: B shows corresponding results on the test set. Average Similarity achieved the highest F1-scores in five models, including LLaMA 3B (0.73), Ministral 8B (0.73), LLaMA 70B (0.79), GPT-4o (0.81), and DeepSeek-R1 (0.79). Most Similar was optimal for LLaMA-8B (0.72), LLaMA-405B (0.80), and Gemini-2.0 (0.81). Least Similar continued to underperform, while MedAlpaca-7B again performed best with random samples (F… view at source ↗

read the original abstract

Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a solid head-to-head benchmarking of LLM adaptation strategies on the DementiaBank corpus for dementia detection, with the main practical takeaway that demonstration selection and fine-tuning choices drive most of the gains.

read the letter

The paper runs a controlled comparison of adaptation methods for turning LLMs into dementia screeners from speech. They fix the DementiaBank corpus and test nine text-only models plus three multimodal ones across in-context learning variants, reasoning prompts, parameter-efficient tuning, and classification heads. Class-centroid demonstrations came out on top for prompting, reasoning helped smaller models, and token-level fine-tuning with a head produced the strongest overall numbers. Multimodal fusion did not beat the best text-only setups. That matrix of results is the actual new piece: prior work tends to pick one adaptation and report it, while this one isolates which policy choices matter on the same data and pipeline.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates various LLM adaptation strategies for speech-based dementia detection on the DementiaBank corpus. It compares nine text-only and three multimodal models using in-context learning (with different demonstration selection policies including class-centroid), reasoning-augmented prompting, parameter-efficient fine-tuning, token-level fine-tuning, classification heads, and multimodal integration. The central claim is that properly adapted open-weight models can match or exceed commercial systems, with findings that class-centroid demonstrations perform best for ICL, reasoning helps smaller models, and fine-tuning generally yields the highest scores.

Significance. If the empirical results hold under rigorous verification, the work offers practical guidance on effective adaptation techniques for applying LLMs to clinical speech analysis. This could support development of scalable, accessible cognitive screening tools, especially by demonstrating viability of open-weight models over proprietary commercial systems. The systematic comparison of strategies adds value to the growing literature on LLM use in healthcare NLP, though generalizability depends on the representativeness of the single corpus used.

major comments (2)

Abstract: the claim that 'properly adapted open-weight models can match or exceed commercial systems' is not supported by a head-to-head evaluation. The abstract reports results only for the nine text-only and three multimodal models under the listed adaptations but does not indicate that any commercial system was re-run on the identical DementiaBank test partition, ASR transcripts, or metric computation; comparisons appear to rely on literature-reported numbers that may differ in splits, preprocessing, or label definitions.
Results (and abstract): directional performance rankings are presented without statistical tests, confidence intervals, details on data splits, or exclusion criteria. This makes it difficult to assess whether observed differences between adaptation strategies (e.g., class-centroid vs. other ICL policies, or fine-tuning vs. prompting) are reliable or could be due to sampling variability.

minor comments (2)

Abstract: consider adding the specific metrics used (accuracy, F1, AUC, etc.) and naming the top-performing model(s) with their scores to make the directional claims more concrete.
Methods: provide clearer description of the exact prompting templates, demonstration selection algorithms, and how multimodal audio-text fusion is implemented for the three multimodal models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We have addressed each major comment point by point below, making revisions where the concerns are valid to improve the manuscript's clarity and rigor.

read point-by-point responses

Referee: Abstract: the claim that 'properly adapted open-weight models can match or exceed commercial systems' is not supported by a head-to-head evaluation. The abstract reports results only for the nine text-only and three multimodal models under the listed adaptations but does not indicate that any commercial system was re-run on the identical DementiaBank test partition, ASR transcripts, or metric computation; comparisons appear to rely on literature-reported numbers that may differ in splits, preprocessing, or label definitions.

Authors: We acknowledge that our comparisons to commercial systems rely on performance figures reported in the prior literature rather than a direct re-implementation on our precise test partition, ASR transcripts, and metric computation pipeline. This approach was chosen because re-running proprietary commercial APIs under identical conditions is often impractical due to access restrictions, cost, and the need to maintain consistency with published benchmarks. In the revised manuscript we will explicitly qualify the abstract claim to state that comparisons are to literature-reported results on the DementiaBank corpus and will add a brief discussion of possible differences in splits, preprocessing, and label definitions. We believe the qualified claim remains informative for readers seeking practical guidance on open-weight model viability. revision: yes
Referee: Results (and abstract): directional performance rankings are presented without statistical tests, confidence intervals, details on data splits, or exclusion criteria. This makes it difficult to assess whether observed differences between adaptation strategies (e.g., class-centroid vs. other ICL policies, or fine-tuning vs. prompting) are reliable or could be due to sampling variability.

Authors: We agree that the absence of statistical tests and confidence intervals limits the ability to judge the reliability of observed differences. In the revision we will add bootstrap-derived 95% confidence intervals for all reported metrics and apply appropriate paired statistical tests (e.g., McNemar’s test for accuracy differences) between the leading adaptation strategies. We will also expand the Methods and Experimental Setup sections to provide complete details on the train/test splits, any exclusion criteria applied to DementiaBank recordings (such as incomplete transcripts or missing labels), and the exact preprocessing steps. These additions will allow readers to better evaluate the robustness of the rankings. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical benchmarking study

full rationale

This paper is a standard empirical evaluation of LLM adaptation strategies for dementia detection on the external DementiaBank corpus. All reported performance metrics derive from held-out test recordings using conventional train/test splits and evaluation protocols rather than any self-referential definitions, fitted parameters renamed as predictions, or equations that reduce outputs to inputs by construction. The abstract's claim that adapted open-weight models can match or exceed commercial systems rests on literature-reported numbers, but this is an external comparison rather than a circular derivation. No load-bearing steps match the enumerated circularity patterns, and the central results remain independently falsifiable against the benchmark data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the DementiaBank corpus, standard i.i.d. train/test splits, and the assumption that classification accuracy on this benchmark correlates with clinical screening value. No new entities or ad-hoc constants are introduced.

axioms (1)

domain assumption Standard machine-learning assumption that train and test recordings are drawn from the same distribution
Invoked implicitly when reporting generalization from DementiaBank splits to real-world screening.

pith-pipeline@v0.9.0 · 5747 in / 1305 out tokens · 44706 ms · 2026-05-18T21:10:51.300531+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Class-centroid demonstrations achieved the highest ICL performance... Token-level fine-tuning produced the highest scores (LLaMA 3B: F1=0.83...)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 10 internal anchors

[1]

2013 Alzheimer’s disease facts and figures

Association A, Thies W, Bleiler L. 2013 Alzheimer’s disease facts and figures. Alzheimer’s & dementia. 2013;9(2):208-245

work page 2013
[2]

ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia

Zolnoori M, Zolnour A, Topaz M. ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia. Artif Intell Med. 2023;143:102624

work page 2023
[3]

HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare

Zolnoori M, Barrón Y, Song J, et al. HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare. Int J Med Inform. Published online 2023:105146

work page 2023
[4]

Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs

Nichols LO, Martindale ‑Adams J, Zhu CW, Kaplan EK, Zuber JK, Waters TM. Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs. J Am Geriatr Soc. 2017;65(5):931-936

work page 2017
[5]

Dem entia assessment in primary care: results from a study in three managed care systems

Boise L, Neal MB, Kaye J. Dem entia assessment in primary care: results from a study in three managed care systems. J Gerontol A Biol Sci Med Sci . 2004;59(6):M621-M626

work page 2004
[6]

A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech

Tóth L, Hoffmann I, Gosztolya G, et al. A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr Alzheimer Res. 2018;15(2):130-138

work page 2018
[7]

Assessing Cognitive Impairment in Older Patients

National Institute on Aging. Assessing Cognitive Impairment in Older Patients. Accessed March 1, 2021. https://www.nia.nih.gov/health/assessing -cognitive- impairment-older-patients

work page 2021
[8]

Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates

Song J, Topaz M, Landau AY, et al. Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates. J Am Med Dir Assoc. 2024;25(8):105019

work page 2024
[9]

Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications

Zolnoori M, Zo lnour A, Vergez S, et al. Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications. Journal of the American Medical Informatics Association. Published online 2024:ocae300

work page 2024
[10]

Describing the Cookie Theft picture

Cummings L. Describing the Cookie Theft picture. Pragmatics and Society . 2019;10(2):153-176. doi:10.1075/PS.17011.CUM

work page doi:10.1075/ps.17011.cum 2019
[11]

Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia

Meilán JJG, Martínez -Sánchez F, Martínez -Nicolás I, Llorente TE, Carro J. Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia. Behavioural neurology. 2020;2020

work page 2020
[12]

Vocabulary size in speech may be an early indicator of cognitive impairment

Aramaki E, Shikata S, Miyabe M, Kinoshita A. Vocabulary size in speech may be an early indicator of cognitive impairment. PLoS One. 2016;11(5):e0155195

work page 2016
[13]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre -training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 2018;1:4171-4186. https://arxiv.org/abs/1810.04805v2

work page internal anchor Pith review Pith/arXiv arXiv 2019
[14]

Efficient Estimation of Word Representations in Vector Space

Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. Published online January 16, 2013. https://arxiv.org/pdf/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge

Azadmaleki H, Haghbin Y, Rashidi S, et al. SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge. Stud Health Technol Inform. 2025;329:1856-1857

work page 2025
[16]

SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare

Rashidi S, Azadmaleki H, Zolnour A, Nezhad MJM, Zolnoori M. SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare. Stud Health Technol Inform. 2025;329:1858-1859

work page 2025
[17]

Language Models are Few-Shot Learners

Brown TB, Mann B, Ryder N, et al. Language Models are Few -Shot Learners. Adv Neural Inf Process Syst . 2020;2020 -December. https://arxiv.org/pdf/2005.14165

work page internal anchor Pith review Pith/arXiv arXiv 2020
[18]

From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare

Zhang Z, Gupta P, Song J, Zolnoori M, Topaz M. From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare. J Nurs Scholarsh. Published online 2025

work page 2025
[19]

Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models

Hosseini SMB, Nezhad MJM, Hosseini M, Zolnoori M. Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models. Stud Health Technol Inform. 2025;329:784-788

work page 2025
[20]

A Scoping Review of Large Language Model Applications in Healthcare

Zhang Z, Nezhad MJM, Hosseini SMB, et al. A Scoping Review of Large Language Model Applications in Healthcare. Stud Health Technol Inform . 2025;329:1966-1967

work page 2025
[21]

Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph

Chen Z, Deng J, Zhou J, Wu J, Qian T, Huang M. Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computat ional Linguistics: Human Language Technologies, NAACL 2024. 2024;1:8181-8194. doi:10.18653/V1/2024.NAACL-LONG.452

work page doi:10.18653/v1/2024.naacl-long.452 2024
[22]

Enhanced Large Language Models for Effective Screening of Depression and Anxiety

Liu JM, Gao M, Sabour S, Chen Z, Huang M, Lee TMC. Enhanced Large Language Models for Effective Screening of Depression and Anxiety. Published online January 15, 2025. https://arxiv.org/pdf/2501.08769

work page arXiv 2025
[23]

https://aclanthology.org/2024.clpsych-1.21/

Detecting Suicide Risk Patterns using Hierarchical Attention Networks with Large Language Models - ACL Anthology. https://aclanthology.org/2024.clpsych-1.21/

work page 2024
[24]

A scoping review on generative AI and large language models in mitigating medication related harm

Ong JCL, Chen MH, Ng N, et al. A scoping review on generative AI and large language models in mitigating medication related harm. NPJ Digit Med . 2025;8(1):182. doi:10.1038/S41746-025-01565-7

work page doi:10.1038/s41746-025-01565-7 2025
[25]

Accessed July 21,

Google AI updates: Bard and new AI features in Search. Accessed July 21,

work page
[26]

https://blog.google/technology/ai/bard-google-ai-search-updates/

work page
[27]

DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses

Lanzi AM, Saylor AK, Fromm D, Liu H, Macwhinney B, Cohen ML. DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses. Am J Speech Lang Pathol . 2023;32(2):426 -438. doi:10. 1044/2022_AJSLP-22- 00281/ASSET/A8A1757F-EEC1-4720-A0CD- F76E75EFBF69/ASSETS/GRAPHIC/CCBY-NC-ND.PNG

work page 2023
[28]

The Llama 3 Herd of Models

Grattafiori A, Dubey A, Jauhri A, et al. The Llama 3 Herd of Models. Published online July 31, 2024. https://arxiv.org/pdf/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

mistralai/Ministral-8B-Instruct-2410 · Hugging Face. https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

work page
[30]

arXiv preprint arXiv:2304.08247 , year=

Han T, Adams LC, Papaioannou JM, et al. MedAlpaca -- An Open -Source Collection of Medical Conversational AI Models and Training Data. Published online April 14, 2023. https://arxiv.org/pdf/2304.08247

work page arXiv 2023
[31]

DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Guo D, Yang D, et al. DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Published online January 22,

work page
[32]

https://arxiv.org/pdf/2501.12948

work page internal anchor Pith review Pith/arXiv arXiv
[33]

GPT-4o System Card

Hurst A, et al. GPT-4o System Card. Published online October 25, 2024. https://arxiv.org/pdf/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

Google introduces Gemini 2.0: A new AI model for the agentic era. https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

work page 2024
[35]

https://talkbank.org/dementia/access/English/Pitt.html

DementiaBank English Pitt Corpus. https://talkbank.org/dementia/access/English/Pitt.html

work page
[36]

https://aws.amazon.com/transcribe/

Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/

work page
[37]

M 3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Chen J, Xiao S, Zhang P, Luo K, Lian D, Liu Z. BGE M3 -Embedding: Multi- Lingual, Multi -Functionality, Multi -Granularity Text Embeddings Through Self - Knowledge Distillation. Proceedings of the Annual Meeting of the Association for Computational Linguistics . Published online February 5, 2024:2318 -2335. doi:10.18653/v1/2024.findings-acl.137

work page doi:10.18653/v1/2024.findings-acl.137 2024
[38]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Wang X, Wei J, Schuurmans D, et al. Self -Consistency Improves Chain of Thought Reasoning in Language Models. 11th International Conference on Learning Representations, ICLR 2023 . Published online March 21, 2022. https://arxiv.org/pdf/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yao S, Yu D, Zhao J, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. https://github.com/princeton-nlp/tree-of-thought-llm

work page
[40]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs. Adv Neu ral Inf Process Syst . 2023;36. https://arxiv.org/pdf/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

AutoModels — transformers 3.0.2 documentation. https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

work page
[42]

Qwen2.5-Omni Technical Report

Xu J, Guo Z, He J, et al. Qwen2.5 -Omni Technical Report. Published online March 26, 2025. https://arxiv.org/pdf/2503.20215

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin A, et al. Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. Published online March 3,

work page
[44]

https://arxiv.org/pdf/2503.01743

work page internal anchor Pith review Pith/arXiv arXiv
[45]

On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other

Mann HB, Whitney DR. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics . 1947;18:50-60. https://api.semanticscholar.org/CorpusID:14328772

work page 1947
[46]

Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech

Pan Y, Mirheidari B, Harris JM, et al. Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:3810-3814. doi:10.21437/INTERSPEECH.2021-1519

work page doi:10.21437/interspeech.2021-1519 2021
[47]

Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech

Syed ZS, Syed MSS, Lech M, Pirogova E. Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . 2021;6:3815-3819. doi:10.21437/INTERSPEECH.2021-1572

work page doi:10.21437/interspeech.2021-1572 2021
[48]

Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

Qiao Y, Yin X, Wiechmann D, Kerz E. Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:4226-4230. doi:10.21437/Interspeech.2021-1415

work page doi:10.21437/interspeech.2021-1415 2021
[49]

Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech

Ilias L, Askounis D. Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech. Knowl Based Syst . 2023;277:110834. doi:10.1016/j.knosys.2023.110834

work page doi:10.1016/j.knosys.2023.110834 2023
[50]

Alzheimer’s disease recognit ion from spontaneous speech using large language models

Bang J, Han S, Kang B. Alzheimer’s disease recognit ion from spontaneous speech using large language models. ETRI Journal . 2024;46(1):96 -105. doi:10.4218/etrij.2023-0356

work page doi:10.4218/etrij.2023-0356 2024
[51]

Modality fusion using auxiliary tasks for dementia detection

Shao H, Pan Y, Wang Y, Zhang Y. Modality fusion using auxiliary tasks for dementia detection. Comput Speech Lang . 2026;95:101814. doi:10.1016/J.CSL.2025.101814

work page doi:10.1016/j.csl.2025.101814 2026
[52]

Accessed June 29, 2025

FDA Clears First Blood Test Used in Diagnosing Alzheimer’s Disease | FDA. Accessed June 29, 2025. https://www.fda.gov/news-events/press- announcements/fda-clears-first-blood-test-used-diagnosing-alzheimers- disease

work page 2025
[53]

Healthy” denotes cognitive normal and “AD

Zolnoori M, Vergez S, Kostic Z, et al. Audio recording patient -nurse verbal communications in home health care settings: pilot feasibility and usability study. JMIR Hum Factors. 2022;9(2):e35325. Appendix Appendix 1: In-Context Learning with Demonstration Selection Prompt Design To ensure consistency across all experiments in the few-shot setting, we emp...

work page 2022

[1] [1]

2013 Alzheimer’s disease facts and figures

Association A, Thies W, Bleiler L. 2013 Alzheimer’s disease facts and figures. Alzheimer’s & dementia. 2013;9(2):208-245

work page 2013

[2] [2]

ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia

Zolnoori M, Zolnour A, Topaz M. ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia. Artif Intell Med. 2023;143:102624

work page 2023

[3] [3]

HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare

Zolnoori M, Barrón Y, Song J, et al. HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare. Int J Med Inform. Published online 2023:105146

work page 2023

[4] [4]

Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs

Nichols LO, Martindale ‑Adams J, Zhu CW, Kaplan EK, Zuber JK, Waters TM. Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs. J Am Geriatr Soc. 2017;65(5):931-936

work page 2017

[5] [5]

Dem entia assessment in primary care: results from a study in three managed care systems

Boise L, Neal MB, Kaye J. Dem entia assessment in primary care: results from a study in three managed care systems. J Gerontol A Biol Sci Med Sci . 2004;59(6):M621-M626

work page 2004

[6] [6]

A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech

Tóth L, Hoffmann I, Gosztolya G, et al. A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr Alzheimer Res. 2018;15(2):130-138

work page 2018

[7] [7]

Assessing Cognitive Impairment in Older Patients

National Institute on Aging. Assessing Cognitive Impairment in Older Patients. Accessed March 1, 2021. https://www.nia.nih.gov/health/assessing -cognitive- impairment-older-patients

work page 2021

[8] [8]

Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates

Song J, Topaz M, Landau AY, et al. Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates. J Am Med Dir Assoc. 2024;25(8):105019

work page 2024

[9] [9]

Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications

Zolnoori M, Zo lnour A, Vergez S, et al. Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications. Journal of the American Medical Informatics Association. Published online 2024:ocae300

work page 2024

[10] [10]

Describing the Cookie Theft picture

Cummings L. Describing the Cookie Theft picture. Pragmatics and Society . 2019;10(2):153-176. doi:10.1075/PS.17011.CUM

work page doi:10.1075/ps.17011.cum 2019

[11] [11]

Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia

Meilán JJG, Martínez -Sánchez F, Martínez -Nicolás I, Llorente TE, Carro J. Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia. Behavioural neurology. 2020;2020

work page 2020

[12] [12]

Vocabulary size in speech may be an early indicator of cognitive impairment

Aramaki E, Shikata S, Miyabe M, Kinoshita A. Vocabulary size in speech may be an early indicator of cognitive impairment. PLoS One. 2016;11(5):e0155195

work page 2016

[13] [13]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre -training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 2018;1:4171-4186. https://arxiv.org/abs/1810.04805v2

work page internal anchor Pith review Pith/arXiv arXiv 2019

[14] [14]

Efficient Estimation of Word Representations in Vector Space

Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. Published online January 16, 2013. https://arxiv.org/pdf/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge

Azadmaleki H, Haghbin Y, Rashidi S, et al. SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge. Stud Health Technol Inform. 2025;329:1856-1857

work page 2025

[16] [16]

SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare

Rashidi S, Azadmaleki H, Zolnour A, Nezhad MJM, Zolnoori M. SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare. Stud Health Technol Inform. 2025;329:1858-1859

work page 2025

[17] [17]

Language Models are Few-Shot Learners

Brown TB, Mann B, Ryder N, et al. Language Models are Few -Shot Learners. Adv Neural Inf Process Syst . 2020;2020 -December. https://arxiv.org/pdf/2005.14165

work page internal anchor Pith review Pith/arXiv arXiv 2020

[18] [18]

From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare

Zhang Z, Gupta P, Song J, Zolnoori M, Topaz M. From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare. J Nurs Scholarsh. Published online 2025

work page 2025

[19] [19]

Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models

Hosseini SMB, Nezhad MJM, Hosseini M, Zolnoori M. Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models. Stud Health Technol Inform. 2025;329:784-788

work page 2025

[20] [20]

A Scoping Review of Large Language Model Applications in Healthcare

Zhang Z, Nezhad MJM, Hosseini SMB, et al. A Scoping Review of Large Language Model Applications in Healthcare. Stud Health Technol Inform . 2025;329:1966-1967

work page 2025

[21] [21]

Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph

Chen Z, Deng J, Zhou J, Wu J, Qian T, Huang M. Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computat ional Linguistics: Human Language Technologies, NAACL 2024. 2024;1:8181-8194. doi:10.18653/V1/2024.NAACL-LONG.452

work page doi:10.18653/v1/2024.naacl-long.452 2024

[22] [22]

Enhanced Large Language Models for Effective Screening of Depression and Anxiety

Liu JM, Gao M, Sabour S, Chen Z, Huang M, Lee TMC. Enhanced Large Language Models for Effective Screening of Depression and Anxiety. Published online January 15, 2025. https://arxiv.org/pdf/2501.08769

work page arXiv 2025

[23] [23]

https://aclanthology.org/2024.clpsych-1.21/

Detecting Suicide Risk Patterns using Hierarchical Attention Networks with Large Language Models - ACL Anthology. https://aclanthology.org/2024.clpsych-1.21/

work page 2024

[24] [24]

A scoping review on generative AI and large language models in mitigating medication related harm

Ong JCL, Chen MH, Ng N, et al. A scoping review on generative AI and large language models in mitigating medication related harm. NPJ Digit Med . 2025;8(1):182. doi:10.1038/S41746-025-01565-7

work page doi:10.1038/s41746-025-01565-7 2025

[25] [25]

Accessed July 21,

Google AI updates: Bard and new AI features in Search. Accessed July 21,

work page

[26] [26]

https://blog.google/technology/ai/bard-google-ai-search-updates/

work page

[27] [27]

DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses

Lanzi AM, Saylor AK, Fromm D, Liu H, Macwhinney B, Cohen ML. DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses. Am J Speech Lang Pathol . 2023;32(2):426 -438. doi:10. 1044/2022_AJSLP-22- 00281/ASSET/A8A1757F-EEC1-4720-A0CD- F76E75EFBF69/ASSETS/GRAPHIC/CCBY-NC-ND.PNG

work page 2023

[28] [28]

The Llama 3 Herd of Models

Grattafiori A, Dubey A, Jauhri A, et al. The Llama 3 Herd of Models. Published online July 31, 2024. https://arxiv.org/pdf/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

mistralai/Ministral-8B-Instruct-2410 · Hugging Face. https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

work page

[30] [30]

arXiv preprint arXiv:2304.08247 , year=

Han T, Adams LC, Papaioannou JM, et al. MedAlpaca -- An Open -Source Collection of Medical Conversational AI Models and Training Data. Published online April 14, 2023. https://arxiv.org/pdf/2304.08247

work page arXiv 2023

[31] [31]

DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Guo D, Yang D, et al. DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Published online January 22,

work page

[32] [32]

https://arxiv.org/pdf/2501.12948

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

GPT-4o System Card

Hurst A, et al. GPT-4o System Card. Published online October 25, 2024. https://arxiv.org/pdf/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

Google introduces Gemini 2.0: A new AI model for the agentic era. https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

work page 2024

[35] [35]

https://talkbank.org/dementia/access/English/Pitt.html

DementiaBank English Pitt Corpus. https://talkbank.org/dementia/access/English/Pitt.html

work page

[36] [36]

https://aws.amazon.com/transcribe/

Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/

work page

[37] [37]

M 3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Chen J, Xiao S, Zhang P, Luo K, Lian D, Liu Z. BGE M3 -Embedding: Multi- Lingual, Multi -Functionality, Multi -Granularity Text Embeddings Through Self - Knowledge Distillation. Proceedings of the Annual Meeting of the Association for Computational Linguistics . Published online February 5, 2024:2318 -2335. doi:10.18653/v1/2024.findings-acl.137

work page doi:10.18653/v1/2024.findings-acl.137 2024

[38] [38]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Wang X, Wei J, Schuurmans D, et al. Self -Consistency Improves Chain of Thought Reasoning in Language Models. 11th International Conference on Learning Representations, ICLR 2023 . Published online March 21, 2022. https://arxiv.org/pdf/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yao S, Yu D, Zhao J, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. https://github.com/princeton-nlp/tree-of-thought-llm

work page

[40] [40]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs. Adv Neu ral Inf Process Syst . 2023;36. https://arxiv.org/pdf/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

AutoModels — transformers 3.0.2 documentation. https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

work page

[42] [42]

Qwen2.5-Omni Technical Report

Xu J, Guo Z, He J, et al. Qwen2.5 -Omni Technical Report. Published online March 26, 2025. https://arxiv.org/pdf/2503.20215

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin A, et al. Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. Published online March 3,

work page

[44] [44]

https://arxiv.org/pdf/2503.01743

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other

Mann HB, Whitney DR. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics . 1947;18:50-60. https://api.semanticscholar.org/CorpusID:14328772

work page 1947

[46] [46]

Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech

Pan Y, Mirheidari B, Harris JM, et al. Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:3810-3814. doi:10.21437/INTERSPEECH.2021-1519

work page doi:10.21437/interspeech.2021-1519 2021

[47] [47]

Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech

Syed ZS, Syed MSS, Lech M, Pirogova E. Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . 2021;6:3815-3819. doi:10.21437/INTERSPEECH.2021-1572

work page doi:10.21437/interspeech.2021-1572 2021

[48] [48]

Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

Qiao Y, Yin X, Wiechmann D, Kerz E. Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:4226-4230. doi:10.21437/Interspeech.2021-1415

work page doi:10.21437/interspeech.2021-1415 2021

[49] [49]

Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech

Ilias L, Askounis D. Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech. Knowl Based Syst . 2023;277:110834. doi:10.1016/j.knosys.2023.110834

work page doi:10.1016/j.knosys.2023.110834 2023

[50] [50]

Alzheimer’s disease recognit ion from spontaneous speech using large language models

Bang J, Han S, Kang B. Alzheimer’s disease recognit ion from spontaneous speech using large language models. ETRI Journal . 2024;46(1):96 -105. doi:10.4218/etrij.2023-0356

work page doi:10.4218/etrij.2023-0356 2024

[51] [51]

Modality fusion using auxiliary tasks for dementia detection

Shao H, Pan Y, Wang Y, Zhang Y. Modality fusion using auxiliary tasks for dementia detection. Comput Speech Lang . 2026;95:101814. doi:10.1016/J.CSL.2025.101814

work page doi:10.1016/j.csl.2025.101814 2026

[52] [52]

Accessed June 29, 2025

FDA Clears First Blood Test Used in Diagnosing Alzheimer’s Disease | FDA. Accessed June 29, 2025. https://www.fda.gov/news-events/press- announcements/fda-clears-first-blood-test-used-diagnosing-alzheimers- disease

work page 2025

[53] [53]

Healthy” denotes cognitive normal and “AD

Zolnoori M, Vergez S, Kostic Z, et al. Audio recording patient -nurse verbal communications in home health care settings: pilot feasibility and usability study. JMIR Hum Factors. 2022;9(2):e35325. Appendix Appendix 1: In-Context Learning with Demonstration Selection Prompt Design To ensure consistency across all experiments in the few-shot setting, we emp...

work page 2022