pith. sign in

arxiv: 2307.02514 · v1 · pith:XDZWY6JNnew · submitted 2023-07-05 · 📡 eess.AS · cs.AI· cs.SD

Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

classification 📡 eess.AS cs.AIcs.SD
keywords audiospeechdatafeaturesdetectiondiseasemethodspatient
0
0 comments X
read the original abstract

Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection

    cs.CL 2026-06 unverdicted novelty 4.0

    A gated fusion of graph attention networks on semantic, dependency, and co-occurrence graphs from speech achieves 90% accuracy for Alzheimer's detection on the ADReSSo dataset.

  2. LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties

    cs.CL 2023-10 unverdicted novelty 2.0

    A review summarizing LLM applications for diagnostics and treatment in oncology, dermatology, dentistry, neurodegenerative disorders, and mental health, plus integration challenges.