pith. sign in

arxiv: 2509.03525 · v2 · submitted 2025-08-24 · 💻 cs.CL · cs.AI· eess.AS

Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Pith reviewed 2026-05-18 21:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AIeess.AS
keywords dementia detectionspeech screeningLLM adaptationin-context learningfine-tuningDementiaBankcognitive screeningmultimodal models
0
0 comments X

The pith

Properly adapted open-weight language models can match or exceed commercial systems in detecting dementia from speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates nine text-only and three multimodal LLMs on the DementiaBank speech corpus to find the best ways to adapt them for dementia detection. It tests in-context learning with different example selection methods, reasoning steps in prompts, parameter-efficient fine-tuning, and audio-text combinations. Results show that class-centroid examples work best for in-context learning, reasoning helps smaller models, and fine-tuning with a classification head produces the strongest outcomes overall. Multimodal approaches do not beat the top text-only models. These findings indicate that careful adaptation lets accessible models reach or surpass proprietary commercial performance for speech-based cognitive screening.

Core claim

Through systematic comparison of adaptation strategies on the DementiaBank corpus, the paper shows that open-weight models, after targeted adjustments such as class-centroid demonstration selection in in-context learning, reasoning-augmented prompting, and token-level fine-tuning, reach detection performance that matches or exceeds commercial systems, while adding a classification head substantially lifts weaker models and multimodal audio integration does not provide further gains over the best text-only results.

What carries the argument

Systematic testing of LLM adaptation strategies including class-centroid demonstration selection for in-context learning, reasoning-augmented prompting, and parameter-efficient fine-tuning applied to speech transcripts for dementia classification.

If this is right

  • Class-centroid demonstrations produce the highest accuracy among in-context learning policies.
  • Reasoning steps in prompts improve results most for smaller models.
  • Token-level fine-tuning combined with a classification head delivers the strongest overall scores.
  • Fine-tuned multimodal audio-text models fail to surpass the best adapted text-only models.
  • Open-weight models become viable replacements for commercial systems once adapted with these methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These adaptation techniques could support low-cost screening tools deployed in primary care or mobile apps to reach more undiagnosed individuals.
  • Testing the same strategies on speech data from non-English speakers or different cultural groups could reveal whether the gains generalize globally.
  • Longitudinal use of adapted models on repeated recordings might enable tracking of cognitive changes rather than one-time detection.

Load-bearing premise

Performance measured on the DementiaBank speech corpus will translate to useful results in real clinical screening with varied patient speech and demographics.

What would settle it

A new evaluation on an independent set of spontaneous speech recordings from undiagnosed older adults across diverse demographics that yields substantially lower accuracy than the reported benchmark scores.

Figures

Figures reproduced from arXiv: 2509.03525 by Ali Zolnour, Fatemeh Taherinezhad, Hossein AzadMaleki, Maryam Dadkhah, Maryam Zolnoori, Mohamad Javad Momeni Nezhad, Sepehr Karimi, Sina Rashidi, Yasaman Haghbin.

Figure 4
Figure 4. Figure 4: A presents validation F1-scores for each LLM using 2–12 in-context demonstrations across four selection strategies. Demonstrations selected by Average Similarity to class centroids achieved the highest or joint-highest F1-scores in five models and ranked second in three others. The Most Similar strategy generally produced the next-best performance, with notable results for GPT-4o and Gemini￾2.0. Least Simi… view at source ↗
Figure 4
Figure 4. Figure 4: B shows corresponding results on the test set. Average Similarity achieved the highest F1-scores in five models, including LLaMA 3B (0.73), Ministral 8B (0.73), LLaMA 70B (0.79), GPT-4o (0.81), and DeepSeek-R1 (0.79). Most Similar was optimal for LLaMA-8B (0.72), LLaMA-405B (0.80), and Gemini-2.0 (0.81). Least Similar continued to underperform, while MedAlpaca-7B again performed best with random samples (F… view at source ↗
read the original abstract

Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates various LLM adaptation strategies for speech-based dementia detection on the DementiaBank corpus. It compares nine text-only and three multimodal models using in-context learning (with different demonstration selection policies including class-centroid), reasoning-augmented prompting, parameter-efficient fine-tuning, token-level fine-tuning, classification heads, and multimodal integration. The central claim is that properly adapted open-weight models can match or exceed commercial systems, with findings that class-centroid demonstrations perform best for ICL, reasoning helps smaller models, and fine-tuning generally yields the highest scores.

Significance. If the empirical results hold under rigorous verification, the work offers practical guidance on effective adaptation techniques for applying LLMs to clinical speech analysis. This could support development of scalable, accessible cognitive screening tools, especially by demonstrating viability of open-weight models over proprietary commercial systems. The systematic comparison of strategies adds value to the growing literature on LLM use in healthcare NLP, though generalizability depends on the representativeness of the single corpus used.

major comments (2)
  1. Abstract: the claim that 'properly adapted open-weight models can match or exceed commercial systems' is not supported by a head-to-head evaluation. The abstract reports results only for the nine text-only and three multimodal models under the listed adaptations but does not indicate that any commercial system was re-run on the identical DementiaBank test partition, ASR transcripts, or metric computation; comparisons appear to rely on literature-reported numbers that may differ in splits, preprocessing, or label definitions.
  2. Results (and abstract): directional performance rankings are presented without statistical tests, confidence intervals, details on data splits, or exclusion criteria. This makes it difficult to assess whether observed differences between adaptation strategies (e.g., class-centroid vs. other ICL policies, or fine-tuning vs. prompting) are reliable or could be due to sampling variability.
minor comments (2)
  1. Abstract: consider adding the specific metrics used (accuracy, F1, AUC, etc.) and naming the top-performing model(s) with their scores to make the directional claims more concrete.
  2. Methods: provide clearer description of the exact prompting templates, demonstration selection algorithms, and how multimodal audio-text fusion is implemented for the three multimodal models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We have addressed each major comment point by point below, making revisions where the concerns are valid to improve the manuscript's clarity and rigor.

read point-by-point responses
  1. Referee: Abstract: the claim that 'properly adapted open-weight models can match or exceed commercial systems' is not supported by a head-to-head evaluation. The abstract reports results only for the nine text-only and three multimodal models under the listed adaptations but does not indicate that any commercial system was re-run on the identical DementiaBank test partition, ASR transcripts, or metric computation; comparisons appear to rely on literature-reported numbers that may differ in splits, preprocessing, or label definitions.

    Authors: We acknowledge that our comparisons to commercial systems rely on performance figures reported in the prior literature rather than a direct re-implementation on our precise test partition, ASR transcripts, and metric computation pipeline. This approach was chosen because re-running proprietary commercial APIs under identical conditions is often impractical due to access restrictions, cost, and the need to maintain consistency with published benchmarks. In the revised manuscript we will explicitly qualify the abstract claim to state that comparisons are to literature-reported results on the DementiaBank corpus and will add a brief discussion of possible differences in splits, preprocessing, and label definitions. We believe the qualified claim remains informative for readers seeking practical guidance on open-weight model viability. revision: yes

  2. Referee: Results (and abstract): directional performance rankings are presented without statistical tests, confidence intervals, details on data splits, or exclusion criteria. This makes it difficult to assess whether observed differences between adaptation strategies (e.g., class-centroid vs. other ICL policies, or fine-tuning vs. prompting) are reliable or could be due to sampling variability.

    Authors: We agree that the absence of statistical tests and confidence intervals limits the ability to judge the reliability of observed differences. In the revision we will add bootstrap-derived 95% confidence intervals for all reported metrics and apply appropriate paired statistical tests (e.g., McNemar’s test for accuracy differences) between the leading adaptation strategies. We will also expand the Methods and Experimental Setup sections to provide complete details on the train/test splits, any exclusion criteria applied to DementiaBank recordings (such as incomplete transcripts or missing labels), and the exact preprocessing steps. These additions will allow readers to better evaluate the robustness of the rankings. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical benchmarking study

full rationale

This paper is a standard empirical evaluation of LLM adaptation strategies for dementia detection on the external DementiaBank corpus. All reported performance metrics derive from held-out test recordings using conventional train/test splits and evaluation protocols rather than any self-referential definitions, fitted parameters renamed as predictions, or equations that reduce outputs to inputs by construction. The abstract's claim that adapted open-weight models can match or exceed commercial systems rests on literature-reported numbers, but this is an external comparison rather than a circular derivation. No load-bearing steps match the enumerated circularity patterns, and the central results remain independently falsifiable against the benchmark data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the DementiaBank corpus, standard i.i.d. train/test splits, and the assumption that classification accuracy on this benchmark correlates with clinical screening value. No new entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Standard machine-learning assumption that train and test recordings are drawn from the same distribution
    Invoked implicitly when reporting generalization from DementiaBank splits to real-world screening.

pith-pipeline@v0.9.0 · 5747 in / 1305 out tokens · 44706 ms · 2026-05-18T21:10:51.300531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 10 internal anchors

  1. [1]

    2013 Alzheimer’s disease facts and figures

    Association A, Thies W, Bleiler L. 2013 Alzheimer’s disease facts and figures. Alzheimer’s & dementia. 2013;9(2):208-245

  2. [2]

    ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia

    Zolnoori M, Zolnour A, Topaz M. ADscreen: A speech processing -based screening system for automatic identification of patients with Alzheimer’s disease and related dementia. Artif Intell Med. 2023;143:102624

  3. [3]

    HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare

    Zolnoori M, Barrón Y, Song J, et al. HomeADScreen: Developing Alzheimer’s disease and related dementia risk identification model in home healthcare. Int J Med Inform. Published online 2023:105146

  4. [4]

    Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs

    Nichols LO, Martindale ‑Adams J, Zhu CW, Kaplan EK, Zuber JK, Waters TM. Impact of the REACH II and REACH VA dementia caregiver interventions on healthcare costs. J Am Geriatr Soc. 2017;65(5):931-936

  5. [5]

    Dem entia assessment in primary care: results from a study in three managed care systems

    Boise L, Neal MB, Kaye J. Dem entia assessment in primary care: results from a study in three managed care systems. J Gerontol A Biol Sci Med Sci . 2004;59(6):M621-M626

  6. [6]

    A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech

    Tóth L, Hoffmann I, Gosztolya G, et al. A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr Alzheimer Res. 2018;15(2):130-138

  7. [7]

    Assessing Cognitive Impairment in Older Patients

    National Institute on Aging. Assessing Cognitive Impairment in Older Patients. Accessed March 1, 2021. https://www.nia.nih.gov/health/assessing -cognitive- impairment-older-patients

  8. [8]

    Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates

    Song J, Topaz M, Landau AY, et al. Natural Language Processing to Identify Home Health Care Patients at Risk for Becoming Incapacitated with No Evident Advance Directives or Surrogates. J Am Med Dir Assoc. 2024;25(8):105019

  9. [9]

    Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications

    Zolnoori M, Zo lnour A, Vergez S, et al. Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient -nurse verbal communications. Journal of the American Medical Informatics Association. Published online 2024:ocae300

  10. [10]

    Describing the Cookie Theft picture

    Cummings L. Describing the Cookie Theft picture. Pragmatics and Society . 2019;10(2):153-176. doi:10.1075/PS.17011.CUM

  11. [11]

    Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia

    Meilán JJG, Martínez -Sánchez F, Martínez -Nicolás I, Llorente TE, Carro J. Changes in the rhythm of spe ech difference between people with nondegenerative mild cognitive impairment and with preclinical dementia. Behavioural neurology. 2020;2020

  12. [12]

    Vocabulary size in speech may be an early indicator of cognitive impairment

    Aramaki E, Shikata S, Miyabe M, Kinoshita A. Vocabulary size in speech may be an early indicator of cognitive impairment. PLoS One. 2016;11(5):e0155195

  13. [13]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre -training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference. 2018;1:4171-4186. https://arxiv.org/abs/1810.04805v2

  14. [14]

    Efficient Estimation of Word Representations in Vector Space

    Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. Published online January 16, 2013. https://arxiv.org/pdf/1301.3781

  15. [15]

    SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge

    Azadmaleki H, Haghbin Y, Rashidi S, et al. SpeechCARE: Harnessing Multimodal Innovation to Transform Cognitive Impairment Detection -Insights from the National Institute on Aging Alzheimer’s Speech Challenge. Stud Health Technol Inform. 2025;329:1856-1857

  16. [16]

    SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare

    Rashidi S, Azadmaleki H, Zolnour A, Nezhad MJM, Zolnoori M. SpeechCura: A Novel Speech Augmentation Framework to Tackle Data Scarcity in Healthcare. Stud Health Technol Inform. 2025;329:1858-1859

  17. [17]

    Language Models are Few-Shot Learners

    Brown TB, Mann B, Ryder N, et al. Language Models are Few -Shot Learners. Adv Neural Inf Process Syst . 2020;2020 -December. https://arxiv.org/pdf/2005.14165

  18. [18]

    From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare

    Zhang Z, Gupta P, Song J, Zolnoori M, Topaz M. From Conversation to Standardized Terminology: An LLM -RAG Approach for Automated Health Problem Identification in Home Healthcare. J Nurs Scholarsh. Published online 2025

  19. [19]

    Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models

    Hosseini SMB, Nezhad MJM, Hosseini M, Zolnoori M. Optimizing Entity Recognition in Psychiatric Treatment Data with Large Language Models. Stud Health Technol Inform. 2025;329:784-788

  20. [20]

    A Scoping Review of Large Language Model Applications in Healthcare

    Zhang Z, Nezhad MJM, Hosseini SMB, et al. A Scoping Review of Large Language Model Applications in Healthcare. Stud Health Technol Inform . 2025;329:1966-1967

  21. [21]

    Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph

    Chen Z, Deng J, Zhou J, Wu J, Qian T, Huang M. Depression Detection in Clinical Interviews with LLM -Empowered Structural Element Graph. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computat ional Linguistics: Human Language Technologies, NAACL 2024. 2024;1:8181-8194. doi:10.18653/V1/2024.NAACL-LONG.452

  22. [22]

    Enhanced Large Language Models for Effective Screening of Depression and Anxiety

    Liu JM, Gao M, Sabour S, Chen Z, Huang M, Lee TMC. Enhanced Large Language Models for Effective Screening of Depression and Anxiety. Published online January 15, 2025. https://arxiv.org/pdf/2501.08769

  23. [23]

    https://aclanthology.org/2024.clpsych-1.21/

    Detecting Suicide Risk Patterns using Hierarchical Attention Networks with Large Language Models - ACL Anthology. https://aclanthology.org/2024.clpsych-1.21/

  24. [24]

    A scoping review on generative AI and large language models in mitigating medication related harm

    Ong JCL, Chen MH, Ng N, et al. A scoping review on generative AI and large language models in mitigating medication related harm. NPJ Digit Med . 2025;8(1):182. doi:10.1038/S41746-025-01565-7

  25. [25]

    Accessed July 21,

    Google AI updates: Bard and new AI features in Search. Accessed July 21,

  26. [26]

    https://blog.google/technology/ai/bard-google-ai-search-updates/

  27. [27]

    DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses

    Lanzi AM, Saylor AK, Fromm D, Liu H, Macwhinney B, Cohen ML. DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses. Am J Speech Lang Pathol . 2023;32(2):426 -438. doi:10. 1044/2022_AJSLP-22- 00281/ASSET/A8A1757F-EEC1-4720-A0CD- F76E75EFBF69/ASSETS/GRAPHIC/CCBY-NC-ND.PNG

  28. [28]

    The Llama 3 Herd of Models

    Grattafiori A, Dubey A, Jauhri A, et al. The Llama 3 Herd of Models. Published online July 31, 2024. https://arxiv.org/pdf/2407.21783

  29. [29]

    https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

    mistralai/Ministral-8B-Instruct-2410 · Hugging Face. https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

  30. [30]

    arXiv preprint arXiv:2304.08247 , year=

    Han T, Adams LC, Papaioannou JM, et al. MedAlpaca -- An Open -Source Collection of Medical Conversational AI Models and Training Data. Published online April 14, 2023. https://arxiv.org/pdf/2304.08247

  31. [31]

    DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI, Guo D, Yang D, et al. DeepSeek -R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Published online January 22,

  32. [32]

    https://arxiv.org/pdf/2501.12948

  33. [33]

    GPT-4o System Card

    Hurst A, et al. GPT-4o System Card. Published online October 25, 2024. https://arxiv.org/pdf/2410.21276

  34. [34]

    https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

    Google introduces Gemini 2.0: A new AI model for the agentic era. https://blog.google/technology/google-deepmind/google-gemini-ai-update- december-2024/

  35. [35]

    https://talkbank.org/dementia/access/English/Pitt.html

    DementiaBank English Pitt Corpus. https://talkbank.org/dementia/access/English/Pitt.html

  36. [36]

    https://aws.amazon.com/transcribe/

    Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/

  37. [37]

    M 3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

    Chen J, Xiao S, Zhang P, Luo K, Lian D, Liu Z. BGE M3 -Embedding: Multi- Lingual, Multi -Functionality, Multi -Granularity Text Embeddings Through Self - Knowledge Distillation. Proceedings of the Annual Meeting of the Association for Computational Linguistics . Published online February 5, 2024:2318 -2335. doi:10.18653/v1/2024.findings-acl.137

  38. [38]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Wang X, Wei J, Schuurmans D, et al. Self -Consistency Improves Chain of Thought Reasoning in Language Models. 11th International Conference on Learning Representations, ICLR 2023 . Published online March 21, 2022. https://arxiv.org/pdf/2203.11171

  39. [39]

    Tree of Thoughts: Deliberate Problem Solving with Large Language Models

    Yao S, Yu D, Zhao J, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. https://github.com/princeton-nlp/tree-of-thought-llm

  40. [40]

    QLoRA: Efficient Finetuning of Quantized LLMs

    Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs. Adv Neu ral Inf Process Syst . 2023;36. https://arxiv.org/pdf/2305.14314

  41. [41]

    https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

    AutoModels — transformers 3.0.2 documentation. https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

  42. [42]

    Qwen2.5-Omni Technical Report

    Xu J, Guo Z, He J, et al. Qwen2.5 -Omni Technical Report. Published online March 26, 2025. https://arxiv.org/pdf/2503.20215

  43. [43]

    Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

    Abouelenin A, et al. Phi -4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. Published online March 3,

  44. [44]

    https://arxiv.org/pdf/2503.01743

  45. [45]

    On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other

    Mann HB, Whitney DR. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics . 1947;18:50-60. https://api.semanticscholar.org/CorpusID:14328772

  46. [46]

    Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech

    Pan Y, Mirheidari B, Harris JM, et al. Using the Output s of Different Automatic Speech Recognition Paradigms for Acoustic - and BERT -Based Alzheimer’s Dementia Detection Through Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:3810-3814. doi:10.21437/INTERSPEECH.2021-1519

  47. [47]

    Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech

    Syed ZS, Syed MSS, Lech M, Pirogova E. Tackling the ADRESSO Challenge 2021: The MUET -RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . 2021;6:3815-3819. doi:10.21437/INTERSPEECH.2021-1572

  48. [48]

    Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

    Qiao Y, Yin X, Wiechmann D, Kerz E. Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021;6:4226-4230. doi:10.21437/Interspeech.2021-1415

  49. [49]

    Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech

    Ilias L, Askounis D. Context -aware attention lay ers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech. Knowl Based Syst . 2023;277:110834. doi:10.1016/j.knosys.2023.110834

  50. [50]

    Alzheimer’s disease recognit ion from spontaneous speech using large language models

    Bang J, Han S, Kang B. Alzheimer’s disease recognit ion from spontaneous speech using large language models. ETRI Journal . 2024;46(1):96 -105. doi:10.4218/etrij.2023-0356

  51. [51]

    Modality fusion using auxiliary tasks for dementia detection

    Shao H, Pan Y, Wang Y, Zhang Y. Modality fusion using auxiliary tasks for dementia detection. Comput Speech Lang . 2026;95:101814. doi:10.1016/J.CSL.2025.101814

  52. [52]

    Accessed June 29, 2025

    FDA Clears First Blood Test Used in Diagnosing Alzheimer’s Disease | FDA. Accessed June 29, 2025. https://www.fda.gov/news-events/press- announcements/fda-clears-first-blood-test-used-diagnosing-alzheimers- disease

  53. [53]

    Healthy” denotes cognitive normal and “AD

    Zolnoori M, Vergez S, Kostic Z, et al. Audio recording patient -nurse verbal communications in home health care settings: pilot feasibility and usability study. JMIR Hum Factors. 2022;9(2):e35325. Appendix Appendix 1: In-Context Learning with Demonstration Selection Prompt Design To ensure consistency across all experiments in the few-shot setting, we emp...