pith. sign in

arxiv: 2604.20302 · v1 · submitted 2026-04-22 · 💻 cs.HC

AktivTalk: Digitizing the Talk Test for Voice-Based Exercise Intensity Self-Assessment and Exploring Automated Classification from Speech

Pith reviewed 2026-05-09 23:50 UTC · model grok-4.3

classification 💻 cs.HC
keywords Talk Testexercise intensityvoice-based assessmentexertion classificationmobile healthMFCC featuresneural classifierusability study
0
0 comments X

The pith

A mobile app digitizes the Talk Test to let users assess exercise intensity by speaking and classifies high exertion from those recordings at up to 90 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AktivTalk, a mobile prototype that converts the clinical Talk Test into a self-guided voice interaction for checking exertion levels during physical activity. A within-subject study with 20 participants showed the app was rated highly usable and preferred over having a conductor guide the assessment. The authors further demonstrated that MFCC-based features from the speech samples, processed with class balancing and cross-validation through a lightweight neural classifier, can detect high versus non-high exertion at up to 90 percent accuracy. This matters for individuals with cardiovascular disease because heart rate measures can fail due to medication effects or loose wearables, creating a need for accessible voice-based alternatives that support safe exercise without extra hardware.

Core claim

AktivTalk digitizes the Talk Test into a mobile prototype for voice-based in-the-moment self-assessment of exercise exertion. In a within-subject study with 20 participants, the system was rated highly usable and preferred over conductor-guided assessment. Using MFCC-based features with class balancing and cross-validation, a lightweight neural classifier achieved up to 90 percent accuracy for detecting high versus non-high exertion from Talk Test recordings.

What carries the argument

The Talk Test protocol implemented as a mobile app interface that prompts and records user speech during exercise, combined with MFCC feature extraction fed into a lightweight neural network for binary exertion classification.

If this is right

  • Users gain an independent way to monitor exercise intensity without requiring a human conductor or physiological sensors that can be unreliable.
  • Mobile health applications can add structured voice prompts to deliver immediate feedback on whether activity is reaching high exertion.
  • Automated speech analysis creates the possibility of more frequent or on-demand checks during workouts without added devices.
  • The approach supports safer activity guidelines for populations where overexertion carries higher risk due to heart conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same speech recordings could support passive monitoring if natural conversation during exercise replaces the structured test prompts.
  • Voice-based classification might extend to related signals such as breathing strain or early fatigue in daily movement tracking.
  • Integration with other phone sensors could produce combined models that improve robustness beyond speech alone.
  • Field trials with target users in home or outdoor settings would reveal how well the lab accuracies transfer outside controlled conditions.

Load-bearing premise

Speech features recorded during the Talk Test reliably reflect exercise intensity levels in a way that generalizes from the study participants to individuals with cardiovascular disease in real-world conditions.

What would settle it

Testing the same classifier on Talk Test speech samples collected from a separate group of cardiovascular disease patients during unsupervised exercise in everyday settings and checking whether accuracy stays near 90 percent or falls substantially.

Figures

Figures reproduced from arXiv: 2604.20302 by Clemens Sauerwein, Daniela Wurhofer, Devender Kumar, Jan David Smeddinck, Laura Geiger, Rania Islambouli.

Figure 1
Figure 1. Figure 1: From the traditional Talk Test (left), AktivTalk enables digital, voice-based exercise intensity self-assessment. Data collected in a user study (middle) is used to train a machine learning model (right) for automatic classification of exercise intensity. Abstract Monitoring exercise intensity is critical for safe and effective physi￾cal activity, particularly for individuals with cardiovascular disease, w… view at source ↗
Figure 2
Figure 2. Figure 2: AktivTalk UI workflow, showing user flow from configuration through repeated assessment cycles with audio capture, self-rating, and feedback [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the user study protocol showing the sequential phases: pre-study questionnaire, 30-minute workout with [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average participant rankings of the three assess [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mel-spectrograms of speech samples at different exertion levels. Brighter areas indicate higher energy; breathing [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Model accuracy across 5-fold cross-validation for [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
read the original abstract

Monitoring exercise intensity is critical for safe and effective physical activity, particularly for individuals with cardiovascular disease, where overexertion can pose serious risks. Although physiological measures such as heart rate are widely used for avoiding overexertion, they can be unreliable in certain cases, such as when affected by medication or when wearables are worn too loosely. We introduce AktivTalk, a mobile prototype that digitizes the clinically validated Talk Test to support voice-based, in-the-moment self-assessment of exertion. In a within-subject study with 20 participants, we collected exertion-labeled voice samples and found that AktivTalk was rated as highly usable and preferred over conductor-guided assessment. We further explored automated exertion classification from Talk Test speech. Using MFCC-based features with class balancing and cross-validation, a lightweight neural classifier achieved up to 90% accuracy for detecting high vs.non-high exertion from Talk Test recordings. This work highlights the potential of structured voice interactions for accessible exertion assessment and motivates future passive exertion monitoring from speech.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AktivTalk, a mobile prototype that digitizes the clinically validated Talk Test for voice-based, in-the-moment self-assessment of exercise intensity. It reports results from a within-subject study with 20 participants showing high usability ratings and preference over conductor-guided assessment. The work also explores automated classification of high versus non-high exertion from Talk Test speech recordings, using MFCC-based features with class balancing and cross-validation in a lightweight neural classifier that reaches up to 90% accuracy.

Significance. If the usability findings and classification performance hold under broader conditions, the work could contribute to accessible, non-physiological tools for safe exercise monitoring, especially for populations where heart-rate measures are unreliable. The within-subject empirical design and direct comparison to a traditional method provide concrete evidence for the HCI component, while the MFCC-based exploration opens a path toward passive voice monitoring. However, the absence of target-population validation substantially reduces the immediate translational value.

major comments (2)
  1. [Abstract] Abstract and participant description: the 20 participants are not characterized by demographics, cardiovascular disease status, fitness level, or medication use. Given that the introduction and motivation explicitly target individuals with cardiovascular disease (where overexertion risks are highlighted), this omission makes it impossible to evaluate whether the usability preference or 90% classification accuracy can be expected to transfer to the intended users.
  2. [Automated Classification Exploration] Automated classification section: the claim of up to 90% accuracy for high vs. non-high exertion is reported without sample counts per class, the precise cross-validation scheme (e.g., leave-one-speaker-out), baseline comparisons, or statistical tests. MFCC features are known to be sensitive to speaker physiology and recording conditions; without these details or an external validation cohort, the result cannot be assessed as robust for real-world CVD deployment.
minor comments (2)
  1. [Abstract] The abstract states the accuracy figure but does not define the exact exertion labeling protocol or the decision threshold for 'high' exertion in the Talk Test.
  2. A dedicated limitations paragraph discussing the controlled recording environment versus real-world ambient noise, medication effects, and speaker variability would strengthen the manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight key areas for strengthening the manuscript's transparency and contextualization. We address each major comment below, indicating revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and participant description: the 20 participants are not characterized by demographics, cardiovascular disease status, fitness level, or medication use. Given that the introduction and motivation explicitly target individuals with cardiovascular disease (where overexertion risks are highlighted), this omission makes it impossible to evaluate whether the usability preference or 90% classification accuracy can be expected to transfer to the intended users.

    Authors: We agree that the absence of participant characterization limits evaluation of transferability to the target CVD population. The revised manuscript adds a dedicated Participants subsection in Methods, including a table with demographics (age, gender, self-reported fitness via IPAQ), confirmation of no known cardiovascular disease or relevant medications, and other relevant details. The abstract has been updated to reference this characterization. We also added an explicit limitations paragraph stating that the current cohort consists of healthy adults and that future work will validate usability and classification performance with CVD patients to assess generalizability. revision: yes

  2. Referee: [Automated Classification Exploration] Automated classification section: the claim of up to 90% accuracy for high vs. non-high exertion is reported without sample counts per class, the precise cross-validation scheme (e.g., leave-one-speaker-out), baseline comparisons, or statistical tests. MFCC features are known to be sensitive to speaker physiology and recording conditions; without these details or an external validation cohort, the result cannot be assessed as robust for real-world CVD deployment.

    Authors: We have revised the Automated Classification Exploration section to include all requested details: exact sample counts per class before and after balancing, the precise leave-one-speaker-out cross-validation scheme, comparisons to baseline classifiers (e.g., SVM and logistic regression), and statistical tests (e.g., McNemar's test) for performance differences. We also expanded the discussion to address MFCC sensitivity to speaker physiology and recording conditions, describing our normalization steps and noting remaining robustness considerations. While the current study does not include an external validation cohort, we have added this as a clear limitation and outlined plans for such validation in future work targeting CVD deployment. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical user study and standard ML pipeline

full rationale

The paper's central results derive from a within-subject study collecting labeled voice samples from 20 participants, followed by usability ratings and a standard supervised classification pipeline (MFCC features, class balancing, cross-validation) that produces accuracy figures on held-out data folds. No equations, definitions, or claims reduce by construction to prior outputs or self-citations; the 90% accuracy is an empirical measurement rather than a fitted parameter renamed as prediction. Generalization concerns exist but are validity issues, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the established validity of the Talk Test as a clinical exertion measure and standard assumptions in speech feature extraction and neural classification. No free parameters, ad-hoc axioms, or invented entities are explicitly introduced beyond routine ML practices.

axioms (1)
  • domain assumption The Talk Test is a valid clinical measure of exercise intensity
    The paper builds directly on the clinically validated Talk Test to digitize it without providing new validation data in the abstract.

pith-pipeline@v0.9.0 · 5500 in / 1384 out tokens · 47546 ms · 2026-05-09T23:50:34.514521+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Juul Achten and Asker E Jeukendrup. 2003. Heart rate monitoring: applications and limitations.Sports medicine33 (2003), 517–538

  2. [2]

    Daniel Bok, Marija Rakovac, and Carl Foster. 2022. An examination and critique of subjective methods to determine exercise intensity: the talk test, feeling scale, and rating of perceived exertion.Sports medicine52, 9 (2022), 2085–2109

  3. [3]

    Philip Donaghy, Edel Ennis, Maurice Mulvenna, Raymond Bond, Niamh Kennedy, Mike McTear, Henry O’Connell, Nate Blaylock, and Raymond Brueckner. 2024. A Review of Studies Using Machine Learning to Detect Voice Biomarkers for Depression.Journal of Technology in Behavioral Science(2024), 1–15

  4. [4]

    SM Fox 3rd and WL Haskell. 1968. Physical activity and the prevention of coronary heart disease.Bulletin of the New York Academy of Medicine44, 8 (1968), 950

  5. [5]

    Stephen Gillinov, Muhammad Etiwy, Robert Wang, Gordon Blackburn, Dermot Phelan, A Marc Gillinov, Penny Houghtaling, Hoda Javadikasgari, and Milind Y Desai. 2017. Variable accuracy of wearable heart rate monitors during aerobic exercise.Medicine & Science in Sports & Exercise49, 8 (2017), 1697–1703

  6. [6]

    Dominique Hansen, Ana Abreu, Marco Ambrosetti, Veronique Cornelissen, An- dreas Gevaert, Hareld Kemps, Jari A Laukkanen, Roberto Pedretti, Maria Si- monenko, Matthias Wilhelm, et al . 2022. Exercise intensity assessment and prescription in cardiovascular rehabilitation and beyond: why and how: a posi- tion statement from the Secondary Prevention and Rehab...

  7. [7]

    Aneeqa Ijaz, Muhammad Nabeel, Usama Masood, Tahir Mahmood, Mydah Sajid Hashmi, Iryna Posokhova, Ali Rizwan, and Ali Imran. 2022. Towards using cough for respiratory disease diagnosis by leveraging Artificial Intelligence: A survey. Informatics in Medicine Unlocked29 (2022), 100832

  8. [8]

    Rania Islambouli, Marlene Brunner, Devender Kumar, Mahdi Sareban, Gunnar Treff, Michael Neudorfer, Josef Niebauer, Arne Bathke, and Jan Smeddinck

  9. [9]

    InMensch und Computer 2025-Workshopband

    Towards a Real-Time Warning System for Detecting Inaccuracies in Photoplethysmography-Based Heart Rate Measurements in Wearable Devices. InMensch und Computer 2025-Workshopband. Gesellschaft für Informatik eV, 10–18420

  10. [10]

    Nicki Lentz-Nielsen, Lars Maaløe, Pascal Madeleine, and Stig Nikolaj Blomberg

  11. [11]

    Voice as a Health Indicator: The Use of Sound Analysis and AI for Moni- toring Respiratory Function.BioMedInformatics5, 2 (2025), 31

  12. [12]

    James R Lewis. 2002. Psychometric evaluation of the PSSUQ using data from five years of usability studies.International Journal of Human-Computer Interaction 14, 3-4 (2002), 463–488

  13. [13]

    Juliana Goulart Prata Oliveira Milani, Mauricio Milani, Kenneth Verboven, Gerson Cipriano Jr, and Dominique Hansen. 2024. Exercise intensity prescription in cardiovascular rehabilitation: bridging the gap between best evidence and clinical practice.Frontiers in cardiovascular medicine11 (2024), 1380639

  14. [14]

    Rachel Persinger, Carl Foster, Mark Gibson, Dennis CW Fater, and John P Porcari

  15. [15]

    Consistency of the talk test for exercise prescription.Medicine & Science in Sports & Exercise36, 9 (2004), 1632–1636

  16. [16]

    Timothy J Quinn and Benjamin A Coons. 2011. The Talk Test and its relationship with the ventilatory and lactate thresholds.Journal of sports sciences29, 11 (2011), 1175–1182

  17. [17]

    Jennifer L Reed and Andrew L Pipe. 2014. The talk test: a useful tool for prescrib- ing and monitoring exercise intensity.Current opinion in cardiology29, 5 (2014), 475–480

  18. [18]

    Yanzhi Ren, Zhourong Zheng, Hongbo Liu, Yingying Chen, Hongwei Li, and Chen Wang. 2021. Breathing sound-based exercise intensity monitoring via smartphones. In2021 international conference on computer communications and networks (ICCCN). IEEE, 1–10

  19. [19]

    Yong-Gon Seo, Suki Oh, Won-Hah Park, Mija Jang, Ho-Young Kim, Sung-A Chang, In-Kyung Park, and Jidong Sung. 2021. Optimal aerobic exercise intensity and its influence on the effectiveness of exercise therapy in patients with pulmonary arterial hypertension: a systematic review.Journal of thoracic disease13, 7 (2021), 4530

  20. [20]

    Sarah E Stahl, Hyun-Sung An, Danae M Dinkel, John M Noble, and Jung-Min Lee. 2016. How accurate are the wrist-based heart rate monitors during walking and running activities.Are they accurate enough2 (2016), e000106

  21. [21]

    Per A Tesch. 1985. Exercise performance and 𝛽-blockade.Sports Medicine2 (1985), 389–412

  22. [22]

    Tari D Topolski, James LoGerfo, Donald L Patrick, Barbara Williams, Julie Wal- wick, and MAJ Marsha B Patrick. 2006. The rapid assessment of physical activity Conference’17, July 2017, Washington, DC, USA Rania Islambouli, Laura Geiger, Daniela Wurhofer, Devender Kumar, Clemens Sauerwein, and Jan David Smeddinck (RAPA) among older adults.Preventing chroni...

  23. [23]

    Ariany Marques Vieira, Edgar Manoel Martins, Amanda Althoff, Daiana Apare- cida Rech, Gustavo dos Santos Ribeiro, Darlan Lauricio Matte, and Marlus Karsten

  24. [24]

    Application and measurement properties of the talk test in cardiopulmonary patients: a systematic review.Reviews in Cardiovascular Medicine23, 7 (2022), 225

  25. [25]

    Shuyi Zhou, Ruisi Ma, Wangjing Hu, Dandan Zhang, Rui Hu, Shengwei Zou, Dingyi Cai, Zikang Jiang, Hexiao Ding, and Ting Liu. 2025. Voice as a sensitive biomarker for predicting exercise intensity: a modelling study.Frontiers in Physiology16 (2025), 1483828