pith. sign in

arxiv: 2502.03484 · v2 · submitted 2025-02-04 · 📡 eess.AS · cs.LG· cs.SD

Dementia classification from spontaneous speech using wrapper-based feature selection

Pith reviewed 2026-05-23 04:04 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SD
keywords dementia classificationspontaneous speechacoustic featureswrapper feature selectionminimal learning machinecognitive assessment
0
0 comments X

The pith

Acoustic features from entire speech recordings support competitive dementia classification while lowering computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that acoustic features extracted from complete spontaneous speech recordings, rather than only active speech segments, can be used for dementia classification without loss of accuracy. This reduces the volume of data processed and improves efficiency. Wrapper-based feature selection then ranks the acoustic characteristics by importance for distinguishing healthy speech from that of people with dementia. One evaluated model achieves the same accuracy level at substantially lower computational cost due to properties of its formulation. A sympathetic reader would care because dementia diagnosis currently requires extensive clinical work, and a scalable speech-based method could expand access to early assessment.

Core claim

The authors claim that acoustic features taken from full recordings using standard extraction tools, when paired with classifier-based wrapper feature selection, produce dementia classification performance that matches results from speech-segment-only features, while the Extreme Minimal Learning Machine exhibits competitive accuracy at markedly reduced computational cost as an inherent result of its model structure and learning procedure.

What carries the argument

Classifier-based wrapper feature selection applied to acoustic feature vectors drawn from complete recordings to rank and retain diagnostically relevant characteristics.

If this is right

  • Fewer feature vectors need processing, directly lowering computational requirements.
  • Classification accuracy stays competitive despite the inclusion of non-speech material.
  • The Extreme Minimal Learning Machine provides an efficient option among tested models due to its built-in properties.
  • The resulting framework remains interpretable while functioning as a supportive assessment tool.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could remove the separate step of detecting and trimming speech activity in applied systems.
  • Comparable full-recording methods might be examined for detecting other conditions that alter speech patterns.
  • Reduced computation could allow testing on hardware with limited resources or in real-time settings.

Load-bearing premise

Acoustic features extracted from entire recordings, including non-speech segments, contain enough diagnostically relevant information to match the performance obtained from speech-active segments alone.

What would settle it

A side-by-side test on the same recordings showing that accuracy falls when full recordings are used instead of speech-only segments.

Figures

Figures reproduced from arXiv: 2502.03484 by Marko Niemel\"a, Mikaela von Bonsdorff, Sami \"Ayr\"am\"o, Tommi K\"arkk\"ainen.

Figure 1
Figure 1. Figure 1: Classification accuracies up to the 2500 most important features in LOSO vali [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Confusion matrix for dementia diagnosis based on Ridge regression. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
read the original abstract

Dementia encompasses a group of syndromes that impair cognitive functions such as memory, reasoning, and the ability to perform daily activities. As populations globally age, over 10 million new dementia diagnoses are reported annually. Currently, clinical diagnosis of dementia remains challenging due to overlapping symptoms, the need to exclude alternative conditions and the requirement for a comprehensive clinical evaluation and cognitive assessment. This underscores the growing need to develop feasible and accurate methods for detecting cognitive deficiencies. Recent advances in machine learning have highlighted spontaneous speech as a promising noninvasive, cost-effective, and scalable biomarker for dementia detection. In this study, spontaneous speech recordings from the ADReSS and Pitt Corpus datasets are analyzed, consisting of picture description tasks performed by cognitively healthy individuals and people with Alzheimer's disease. Unlike prior approaches that focus solely on speech-active segments, acoustic features are extracted from entire recordings using the openSMILE toolkit. This representation reduces the number of feature vectors and improves computational efficiency without compromising classification performance. Classification models with classifier-based wrapper feature selection are employed to estimate feature importance and identify diagnostically relevant acoustic characteristics. Among the evaluated models, the Extreme Minimal Learning Machine achieved competitive classification accuracy with substantially lower computational cost, reflecting an inherent property of the model formulation and learning procedure. Overall, the results demonstrate that the proposed framework is computationally efficient, interpretable, and well suited as a supportive tool for speech-based dementia assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes extracting acoustic features from entire spontaneous speech recordings (including non-speech segments) using openSMILE on the ADReSS and Pitt corpora for dementia classification. It applies wrapper-based feature selection across multiple classifiers and identifies the Extreme Minimal Learning Machine (EMLM) as achieving competitive accuracy with substantially lower computational cost due to its formulation, claiming the full-recording approach reduces feature vectors without compromising performance.

Significance. If the performance equivalence holds, the work provides a scalable, efficient alternative to VAD-based pipelines for speech-based dementia assessment, with the EMLM results offering a concrete efficiency advantage. The wrapper selection for interpretability is a secondary strength.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (feature extraction): the central claim that full-recording openSMILE features yield equivalent diagnostic utility 'without compromising classification performance' is asserted but unsupported by any side-by-side ablation against speech-active segments on the same ADReSS/Pitt files; this equivalence is load-bearing for both the efficiency narrative and the EMLM results.
  2. [Results] Results section: the abstract and reader's summary indicate no reported quantitative accuracies, cross-validation protocol details, baseline comparisons to prior ADReSS work, or statistical significance tests; without these, the 'competitive' claim cannot be evaluated.
minor comments (1)
  1. [Abstract] Abstract lacks any numerical performance or cost metrics, which should be added for a self-contained summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond point-by-point to the major concerns below.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (feature extraction): the central claim that full-recording openSMILE features yield equivalent diagnostic utility 'without compromising classification performance' is asserted but unsupported by any side-by-side ablation against speech-active segments on the same ADReSS/Pitt files; this equivalence is load-bearing for both the efficiency narrative and the EMLM results.

    Authors: We acknowledge that the manuscript does not contain a direct ablation comparing full-recording features to VAD-segmented features on the identical files. The claim of no performance compromise is supported by the observed competitive accuracies relative to published VAD-based results on the same corpora, but a within-study comparison is absent. In revision we will add an explicit discussion of this point in §3 and, if feasible within the experimental setup, include a limited comparison using a standard VAD pipeline on the ADReSS and Pitt recordings. revision: yes

  2. Referee: [Results] Results section: the abstract and reader's summary indicate no reported quantitative accuracies, cross-validation protocol details, baseline comparisons to prior ADReSS work, or statistical significance tests; without these, the 'competitive' claim cannot be evaluated.

    Authors: The Results section of the full manuscript reports the quantitative accuracies, the 10-fold cross-validation protocol, direct numerical comparisons to prior ADReSS studies, and statistical significance testing. The abstract, however, omits these figures. We will revise the abstract to include the key performance numbers and ensure the Results section makes the protocol, baselines, and tests fully explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard empirical ML pipeline

full rationale

The paper reports an experimental pipeline: openSMILE feature extraction from full recordings on ADReSS/Pitt data, wrapper feature selection, and classifier comparison (including EMLM). The efficiency claim and 'without compromising performance' statement are presented as outcomes of their runs rather than definitions or fitted inputs renamed as predictions. No equations, self-citations, uniqueness theorems, or ansatzes appear in the provided text that would reduce any central result to its own inputs by construction. Comparisons are internal to the same datasets and models, which is conventional and externally falsifiable via replication on the public corpora. The derivation chain is self-contained experimental reporting.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on standard assumptions in machine learning for audio classification and the validity of the chosen datasets; no new entities are introduced, but the feature selection process introduces data-dependent choices.

free parameters (2)
  • selected acoustic features
    Chosen via wrapper method based on classification performance on the training data
  • hyperparameters of classification models
    Tuned to achieve optimal performance on the datasets
axioms (2)
  • domain assumption Acoustic features from full audio recordings contain diagnostically relevant information for dementia without requiring speech segmentation
    The abstract states that this representation improves efficiency without compromising performance
  • domain assumption The datasets ADReSS and Pitt Corpus are representative for evaluating dementia classification from speech
    Used as the basis for analysis

pith-pipeline@v0.9.0 · 5797 in / 1497 out tokens · 76483 ms · 2026-05-23T04:04:19.545177+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 1 internal anchor

  1. [1]

    W. H. Organization, et al., Global action plan on the public health response to dementia 2017–2025, World Health Organization, 2017

  2. [2]

    Y. Wang, M. L. Haaksma, I. H. Ramakers, F. R. Verhey, W. M. van de Flier, P. Scheltens, I. van Maurik, M. G. Olde Rikkert, J.-M. S. Le- outsakos, R. J. Melis, Cognitive and functional progression of dementia in two longitudinal studies, International journal of geriatric psychiatry 34 (11) (2019) 1623–1632. 23

  3. [3]

    Ngandu, J

    T. Ngandu, J. Lehtisalo, A. Solomon, E. Lev¨ alahti, S. Ahtiluoto, R. An- tikainen, L. B¨ ackman, T. H¨ anninen, A. Jula, T. Laatikainen, et al., A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial...

  4. [4]

    Kulmala, T

    J. Kulmala, T. Ngandu, S. Havulinna, E. Lev¨ alahti, J. Lehtisalo, A. Solomon, R. Antikainen, T. Laatikainen, P. Pippola, M. Peltonen, et al., The effect of multidomain lifestyle intervention on daily function- ing in older people, Journal of the American Geriatrics Society 67 (6) (2019) 1138–1144

  5. [5]

    Kivipelto, F

    M. Kivipelto, F. Mangialasche, H. M. Snyder, R. Allegri, S. Andrieu, H. Arai, L. Baker, S. Belleville, H. Brodaty, S. M. Brucki, et al., World- wide fingers network: a global approach to risk reduction and prevention of dementia, Alzheimer’s & dementia 16 (7) (2020) 1078–1094

  6. [6]

    C. R. Jack Jr, D. A. Bennett, K. Blennow, M. C. Carrillo, B. Dunn, S. B. Haeberlein, D. M. Holtzman, W. Jagust, F. Jessen, J. Karlawish, et al., Nia-aa research framework: toward a biological definition of alzheimer’s disease, Alzheimer’s & dementia 14 (4) (2018) 535–562

  7. [7]

    Hanyu, T

    H. Hanyu, T. Asano, T. Iwamoto, M. Takasaki, H. Shindo, K. Abe, Magnetization transfer measurements of the hippocampus in patients with alzheimer’s disease, vascular dementia, and other types of demen- tia, American journal of neuroradiology 21 (7) (2000) 1235–1242

  8. [8]

    F.-Y. Chiu, Y. Yen, Imaging biomarkers for clinical applications in neuro-oncology: current status and future perspectives, Biomarker Re- search 11 (1) (2023) 35

  9. [9]

    P. N. Young, M. Estarellas, E. Coomans, M. Srikrishna, H. Beaumont, A. Maass, A. V. Venkataraman, R. Lissaman, D. Jim´ enez, M. J. Betts, et al., Imaging biomarkers in neurodegeneration: current and future practices, Alzheimer’s research & therapy 12 (2020) 1–17

  10. [10]

    M. A. Ebrahimighahnavieh, S. Luo, R. Chiong, Deep learning to detect alzheimer’s disease from neuroimaging: A systematic literature review, Computer methods and programs in biomedicine 187 (2020) 105242. 24

  11. [11]

    mini-mental state

    M. F. Folstein, S. E. Folstein, P. R. McHugh, “mini-mental state”: a practical method for grading the cognitive state of patients for the clin- ician, Journal of psychiatric research 12 (3) (1975) 189–198

  12. [12]

    Z. S. Nasreddine, N. A. Phillips, V. B´ edirian, S. Charbonneau, V. White- head, I. Collin, J. L. Cummings, H. Chertkow, The montreal cognitive assessment, moca: a brief screening tool for mild cognitive impairment, Journal of the American Geriatrics Society 53 (4) (2005) 695–699

  13. [13]

    J. C. Morris, A. Heyman, R. C. Mohs, J. P. Hughes, G. van Belle, G. Fil- lenbaum, E. D. Mellits, C. Clark, The consortium to establish a registry for alzheimer’s disease (cerad). part i. clinical and neuropsychological assessment of alzheimer’s disease., Neurology 39 (9) (1989) 1159–1165

  14. [14]

    Zorluoglu, M

    G. Zorluoglu, M. E. Kamasak, L. Tavacioglu, P. O. Ozanar, A mobile application for cognitive screening of dementia, Computer methods and programs in biomedicine 118 (2) (2015) 252–262

  15. [15]

    Reilly, J

    J. Reilly, J. E. Peelle, S. M. Antonucci, M. Grossman, Anomia as a marker of distinct semantic memory impairments in alzheimer’s disease and semantic dementia., Neuropsychology 25 (4) (2011) 413

  16. [16]

    Ivanova, I

    O. Ivanova, I. Mart´ ınez-Nicol´ as, E. Garc´ ıa-Pi˜ nuela, J. J. G. Meil´ an, Defying syntactic preservation in alzheimer’s disease: what type of im- pairment predicts syntactic change in dementia (if it does) and why?, Frontiers in Language Sciences 2 (2023) 1199107

  17. [17]

    K. C. Fraser, J. A. Meltzer, F. Rudzicz, Linguistic features identify alzheimer’s disease in narrative speech, Journal of Alzheimer’s disease 49 (2) (2015) 407–422

  18. [18]

    De la Fuente Garcia, C

    S. De la Fuente Garcia, C. W. Ritchie, S. Luz, Artificial intelligence, speech, and language processing approaches to monitoring alzheimer’s disease: a systematic review, Journal of Alzheimer’s Disease 78 (4) (2020) 1547–1574

  19. [19]

    Beltrami, G

    D. Beltrami, G. Gagliardi, R. Rossini Favretti, E. Ghidoni, F. Tam- burini, L. Calz` a, Speech analysis by natural language processing tech- niques: a possible tool for very early detection of cognitive decline?, Frontiers in aging neuroscience 10 (2018) 369. 25

  20. [20]

    Lopez-de Ipina, J

    K. Lopez-de Ipina, J. Alonso-Hernandez, J. Sole-Casals, C. M. Travieso- Gonzalez, A. Ezeiza, M. Faundez-Zanuy, P. M. Calvo, B. Beitia, Fea- ture selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of alzheimer’s disease, Neurocomputing 150 (2015) 392–401

  21. [21]

    Tanaka, H

    H. Tanaka, H. Adachi, N. Ukita, M. Ikeda, H. Kazui, T. Kudo, S. Naka- mura, Detecting dementia through interactive computer avatars, IEEE journal of translational engineering in health and medicine 5 (2017) 1– 11

  22. [22]

    S. Luz, F. Haider, S. de la Fuente Garcia, D. Fromm, B. MacWhinney, Alzheimer’s dementia recognition through spontaneous speech, Frontiers in computer science 3 (2021) 780169

  23. [23]

    J. J. G. Meil´ an, F. Mart´ ınez-S´ anchez, J. Carro, D. E. L´ opez, L. Millian- Morell, J. M. Arana, Speech in alzheimer’s disease: can temporal and acoustic parameters discriminate dementia?, Dementia and geriatric cognitive disorders 37 (5-6) (2014) 327–334

  24. [24]

    K. R. Scherer, T. Johnstone, G. Klasmeyer, Vocal expression of emotion, Handbook of affective sciences (2003) 433–456

  25. [25]

    Mart´ ınez-Nicol´ as, T

    I. Mart´ ınez-Nicol´ as, T. E. Llorente, F. Mart´ ınez-S´ anchez, J. J. G. Meil´ an, Ten years of research on automatic voice and speech analysis of people with alzheimer’s disease and mild cognitive impairment: a systematic review article, Frontiers in Psychology 12 (2021) 620251

  26. [26]

    Pistono, M

    A. Pistono, M. Jucla, E. J. Barbeau, L. Saint-Aubert, B. Lemesle, B. Calvet, B. K¨ opke, M. Puel, J. Pariente, Pauses during autobiograph- ical discourse reflect episodic memory processes in early alzheimer’s dis- ease, Journal of Alzheimer’s disease 50 (3) (2016) 687–698

  27. [27]

    Pappagari, J

    R. Pappagari, J. Cho, L. Moro-Velazquez, N. Dehak, Using state of the art speaker recognition and natural language processing technologies to detect alzheimer’s disease and assess its severity., in: Interspeech, 2020, pp. 2177–2181

  28. [28]

    Meghanani, C

    A. Meghanani, C. S. Anoop, A. Ramakrishnan, An exploration of log- mel spectrogram and mfcc features for alzheimer’s dementia recognition 26 from spontaneous speech, in: 2021 IEEE spoken language technology workshop (SLT), IEEE, 2021, pp. 670–677

  29. [29]

    R. J. Morris, C. Oh, P. Franklin, Second formant transitions for acous- tic analysis to differentiate among dementia types, The Journal of the Acoustical Society of America 154 (4 supplement) (2023) A206–A206

  30. [30]

    M. M. Parlak, G. Saylam, M. A. Babademez, ¨O. B. Munis, S. A. Tokg¨ oz, Voice analysis results in individuals with alzheimer’s disease: How do age and cognitive status affect voice parameters?, Brain and Behavior 13 (11) (2023) e3271

  31. [31]

    Eyben, K

    F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andr´ e, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, et al., The geneva minimalistic acoustic parameter set (gemaps) for voice re- search and affective computing, IEEE transactions on affective comput- ing 7 (2) (2015) 190–202

  32. [33]

    Eyben, M

    F. Eyben, M. W¨ ollmer, B. Schuller, Opensmile: the munich versatile and fast open-source audio feature extractor, in: Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459–1462

  33. [34]

    Degottex, J

    G. Degottex, J. Kane, T. Drugman, T. Raitio, S. Scherer, Covarep—a collaborative voice analysis repository for speech technologies, in: 2014 ieee international conference on acoustics, speech and signal processing (icassp), IEEE, 2014, pp. 960–964

  34. [35]

    Snyder, D

    D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, X- vectors: Robust dnn embeddings for speaker recognition, in: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2018, pp. 5329–5333

  35. [36]

    I. T. Jolliffe, Principal component analysis and factor analysis, Principal component analysis (2002) 150–166

  36. [37]

    Cummins, Y

    N. Cummins, Y. Pan, Z. Ren, J. Fritsch, V. S. Nallanthighal, H. Chris- tensen, D. Blackburn, B. W. Schuller, M. Magimai-Doss, H. Strik, et al., A comparison of acoustic and linguistics methodologies for alzheimer’s 27 dementia recognition, in: Interspeech 2020, ISCA-International Speech Communication Association, 2020, pp. 2182–2186

  37. [38]

    Gangamohan, B

    P. Gangamohan, B. Yegnanarayana, A robust and alternative approach to zero frequency filtering method for epoch extraction., in: INTER- SPEECH, 2017, pp. 2297–2300

  38. [39]

    Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neu- ral networks: analysis, applications, and prospects, IEEE transactions on neural networks and learning systems 33 (12) (2021) 6999–7019

  39. [40]

    J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, M. Ritter, Audio set: An ontology and human- labeled dataset for audio events, in: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2017, pp. 776–780

  40. [41]

    J. Koo, J. H. Lee, J. Pyo, Y. Jo, K. Lee, Exploiting multi-modal features from pre-trained networks for alzheimer’s dementia recognition, arXiv preprint arXiv:2009.04070 (2020)

  41. [42]

    C. M. Bishop, N. M. Nasrabadi, Pattern recognition and machine learn- ing, Vol. 4, Springer, 2006

  42. [43]

    Dehak, P

    N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing 19 (4) (2010) 788–798

  43. [44]

    Pompili, T

    A. Pompili, T. Rolland, A. Abad, The inesc-id multi-modal system for the adress 2020 challenge, arXiv preprint arXiv:2005.14646 (2020)

  44. [45]

    D. A. Reynolds, T. F. Quatieri, R. B. Dunn, Speaker verification us- ing adapted gaussian mixture models, Digital signal processing 10 (1-3) (2000) 19–41

  45. [46]

    Nagrani, J

    A. Nagrani, J. S. Chung, A. Zisserman, Voxceleb: a large-scale speaker identification dataset, arXiv preprint arXiv:1706.08612 (2017)

  46. [47]

    Schmitt, B

    M. Schmitt, B. Schuller, openxbow–introducing the passau open-source crossmodal bag-of-words toolkit, Journal of Machine Learning Research 18 (96) (2017) 1–5. 28

  47. [48]

    Chopra, R

    S. Chopra, R. Hadsell, Y. LeCun, Learning a similarity metric discrim- inatively, with application to face verification, in: 2005 IEEE com- puter society conference on computer vision and pattern recognition (CVPR’05), Vol. 1, IEEE, 2005, pp. 539–546

  48. [49]

    M. S. S. Syed, Z. S. Syed, M. Lech, E. Pirogova, Automated screening for alzheimer’s dementia through spontaneous speech., in: Interspeech, Vol. 2020, 2020, pp. 2222–6

  49. [50]

    J. Chen, Y. Wang, D. Wang, A feature study for classification-based speech separation at low signal-to-noise ratios, IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (12) (2014) 1993–2002

  50. [51]

    Kohonen, The self-organizing map, Proceedings of the IEEE 78 (9) (1990) 1464–1480

    T. Kohonen, The self-organizing map, Proceedings of the IEEE 78 (9) (1990) 1464–1480

  51. [52]

    Y. Tian, L. He, Z.-y. Li, W.-l. Wu, W.-Q. Zhang, J. Liu, Speaker ver- ification using fisher vector, in: The 9th International Symposium on Chinese Spoken Language Processing, IEEE, 2014, pp. 419–422

  52. [53]

    A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the royal statistical society: series B (methodological) 39 (1) (1977) 1–22

  53. [54]

    Martinc, S

    M. Martinc, S. Pollak, Tackling the adress challenge: A multimodal approach to the automated recognition of alzheimer’s dementia., in: In- terspeech, 2020, pp. 2157–2161

  54. [55]

    Balagopalan, B

    A. Balagopalan, B. Eyre, F. Rudzicz, J. Novikova, To bert or not to bert: comparing speech and language-based approaches for alzheimer’s disease detection, arXiv preprint arXiv:2008.01551 (2020)

  55. [56]

    J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics (2001) 1189–1232

  56. [57]

    Edwards, C

    E. Edwards, C. Dognin, B. Bollepalli, M. K. Singh, V. Analytics, Multi- scale system for alzheimer’s dementia recognition through spontaneous speech., in: Interspeech, 2020, pp. 2197–2201

  57. [58]

    Haider, S

    F. Haider, S. De La Fuente, S. Luz, An assessment of paralinguistic acoustic features for detection of alzheimer’s dementia in spontaneous 29 speech, IEEE Journal of Selected Topics in Signal Processing 14 (2) (2019) 272–281

  58. [59]

    Rohanian, J

    M. Rohanian, J. Hough, M. Purver, Multi-modal fusion with gating using audio, lexical and disfluency features for alzheimer’s dementia recognition from spontaneous speech, arXiv preprint arXiv:2106.09668 (2021)

  59. [60]

    D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, nature 323 (6088) (1986) 533–536

  60. [61]

    Gated Multimodal Units for Information Fusion

    J. Arevalo, T. Solorio, M. Montes-y G´ omez, F. A. Gonz´ alez, Gated mul- timodal units for information fusion, arXiv preprint arXiv:1702.01992 (2017)

  61. [62]

    J. T. Becker, F. Boiler, O. L. Lopez, J. Saxton, K. L. McGonigle, The natural history of alzheimer’s disease: description of study cohort and accuracy of diagnosis, Archives of neurology 51 (6) (1994) 585–594

  62. [63]

    M. W. Fong, R. Van Patten, R. P. Fucetola, The factor structure of the boston diagnostic aphasia examination, Journal of the International Neuropsychological Society 25 (7) (2019) 772–776

  63. [64]

    EBU-Recommendation, Loudness normalisation and permitted max- imum level of audio signals, Eur

    R. EBU-Recommendation, Loudness normalisation and permitted max- imum level of audio signals, Eur. Broadcast. Union (2011)

  64. [65]

    Eyben, Real-time speech and music classification by large audio fea- ture space extraction, Springer, 2015

    F. Eyben, Real-time speech and music classification by large audio fea- ture space extraction, Springer, 2015

  65. [66]

    A. E. Hoerl, R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1) (1970) 55–67

  66. [67]

    K¨ arkk¨ ainen, Extreme minimal learning machine: Ridge regression with distance-based basis, Neurocomputing 342 (2019) 33–48

    T. K¨ arkk¨ ainen, Extreme minimal learning machine: Ridge regression with distance-based basis, Neurocomputing 342 (2019) 33–48

  67. [68]

    H¨ am¨ al¨ ainen, A

    J. H¨ am¨ al¨ ainen, A. S. C. Alencar, T. K¨ arkk¨ ainen, C. L. C. Mattos, H. S. Amauri, J. P. P. Gomes, Minimal learning machine: Theoretical re- sults and clustering-based reference point selection, Journal of Machine Learning Research 21 (239) (2020) 1–29. 30

  68. [69]

    T. K¨ arkk¨ ainen, Assessment of feature saliency of mlp using analytic sensitivity., in: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2015, pp. 273–278

  69. [70]

    T. K¨ arkk¨ ainen, S.¨Ayr¨ am¨ o, On computation of spatial median for ro- bust data mining, Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN, Munich (2005) 14

  70. [71]

    Cortes, Support-vector networks, Machine Learning (1995)

    C. Cortes, Support-vector networks, Machine Learning (1995)

  71. [72]

    Ruppert, The elements of statistical learning: data mining, inference, and prediction (2004)

    D. Ruppert, The elements of statistical learning: data mining, inference, and prediction (2004)

  72. [73]

    Wolfowitz, Non-parametric statistical inference, in: Proceedings of the [First] Berkeley Symposium on Mathematical Statistics and Proba- bility, Vol

    J. Wolfowitz, Non-parametric statistical inference, in: Proceedings of the [First] Berkeley Symposium on Mathematical Statistics and Proba- bility, Vol. 1, University of California Press, 1949, pp. 93–114

  73. [74]

    Hastie, J

    T. Hastie, J. Qian, Glmnet vignette, Retrieved June 9 (2016) (2014) 1–30

  74. [75]

    Platt, Sequential minimal optimization: A fast algorithm for train- ing support vector machines, Tech

    J. Platt, Sequential minimal optimization: A fast algorithm for train- ing support vector machines, Tech. rep., Microsoft Research Technical Report (1998)

  75. [76]

    Breiman, Random forests, Machine learning 45 (2001) 5–32

    L. Breiman, Random forests, Machine learning 45 (2001) 5–32

  76. [77]

    M. A. Hall, Correlation-based feature selection for machine learning, Ph.D. thesis, The University of Waikato (1999)

  77. [78]

    Guyon, J

    I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Machine learning 46 (2002) 389–422

  78. [79]

    K. Ahn, M. Cho, S. W. Kim, K. E. Lee, Y. Song, S. Yoo, S. Y. Jeon, J. L. Kim, D. H. Yoon, H.-J. Kong, Deep learning of speech data for early detection of alzheimer’s disease in the elderly, Bioengineering 10 (9) (2023) 1093

  79. [80]

    Huang, H

    L. Huang, H. Yang, Y. Che, J. Yang, Automatic speech analysis for detecting cognitive decline of older adults, Frontiers in Public Health 12 (2024) 1417966. 31

  80. [81]

    Mahajan, V

    P. Mahajan, V. Baths, Acoustic and language based deep learning ap- proaches for alzheimer’s dementia detection from spontaneous speech, Frontiers in Aging Neuroscience 13 (2021) 623607

Showing first 80 references.