AktivTalk: Digitizing the Talk Test for Voice-Based Exercise Intensity Self-Assessment and Exploring Automated Classification from Speech
Pith reviewed 2026-05-09 23:50 UTC · model grok-4.3
The pith
A mobile app digitizes the Talk Test to let users assess exercise intensity by speaking and classifies high exertion from those recordings at up to 90 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AktivTalk digitizes the Talk Test into a mobile prototype for voice-based in-the-moment self-assessment of exercise exertion. In a within-subject study with 20 participants, the system was rated highly usable and preferred over conductor-guided assessment. Using MFCC-based features with class balancing and cross-validation, a lightweight neural classifier achieved up to 90 percent accuracy for detecting high versus non-high exertion from Talk Test recordings.
What carries the argument
The Talk Test protocol implemented as a mobile app interface that prompts and records user speech during exercise, combined with MFCC feature extraction fed into a lightweight neural network for binary exertion classification.
If this is right
- Users gain an independent way to monitor exercise intensity without requiring a human conductor or physiological sensors that can be unreliable.
- Mobile health applications can add structured voice prompts to deliver immediate feedback on whether activity is reaching high exertion.
- Automated speech analysis creates the possibility of more frequent or on-demand checks during workouts without added devices.
- The approach supports safer activity guidelines for populations where overexertion carries higher risk due to heart conditions.
Where Pith is reading between the lines
- The same speech recordings could support passive monitoring if natural conversation during exercise replaces the structured test prompts.
- Voice-based classification might extend to related signals such as breathing strain or early fatigue in daily movement tracking.
- Integration with other phone sensors could produce combined models that improve robustness beyond speech alone.
- Field trials with target users in home or outdoor settings would reveal how well the lab accuracies transfer outside controlled conditions.
Load-bearing premise
Speech features recorded during the Talk Test reliably reflect exercise intensity levels in a way that generalizes from the study participants to individuals with cardiovascular disease in real-world conditions.
What would settle it
Testing the same classifier on Talk Test speech samples collected from a separate group of cardiovascular disease patients during unsupervised exercise in everyday settings and checking whether accuracy stays near 90 percent or falls substantially.
Figures
read the original abstract
Monitoring exercise intensity is critical for safe and effective physical activity, particularly for individuals with cardiovascular disease, where overexertion can pose serious risks. Although physiological measures such as heart rate are widely used for avoiding overexertion, they can be unreliable in certain cases, such as when affected by medication or when wearables are worn too loosely. We introduce AktivTalk, a mobile prototype that digitizes the clinically validated Talk Test to support voice-based, in-the-moment self-assessment of exertion. In a within-subject study with 20 participants, we collected exertion-labeled voice samples and found that AktivTalk was rated as highly usable and preferred over conductor-guided assessment. We further explored automated exertion classification from Talk Test speech. Using MFCC-based features with class balancing and cross-validation, a lightweight neural classifier achieved up to 90% accuracy for detecting high vs.non-high exertion from Talk Test recordings. This work highlights the potential of structured voice interactions for accessible exertion assessment and motivates future passive exertion monitoring from speech.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AktivTalk, a mobile prototype that digitizes the clinically validated Talk Test for voice-based, in-the-moment self-assessment of exercise intensity. It reports results from a within-subject study with 20 participants showing high usability ratings and preference over conductor-guided assessment. The work also explores automated classification of high versus non-high exertion from Talk Test speech recordings, using MFCC-based features with class balancing and cross-validation in a lightweight neural classifier that reaches up to 90% accuracy.
Significance. If the usability findings and classification performance hold under broader conditions, the work could contribute to accessible, non-physiological tools for safe exercise monitoring, especially for populations where heart-rate measures are unreliable. The within-subject empirical design and direct comparison to a traditional method provide concrete evidence for the HCI component, while the MFCC-based exploration opens a path toward passive voice monitoring. However, the absence of target-population validation substantially reduces the immediate translational value.
major comments (2)
- [Abstract] Abstract and participant description: the 20 participants are not characterized by demographics, cardiovascular disease status, fitness level, or medication use. Given that the introduction and motivation explicitly target individuals with cardiovascular disease (where overexertion risks are highlighted), this omission makes it impossible to evaluate whether the usability preference or 90% classification accuracy can be expected to transfer to the intended users.
- [Automated Classification Exploration] Automated classification section: the claim of up to 90% accuracy for high vs. non-high exertion is reported without sample counts per class, the precise cross-validation scheme (e.g., leave-one-speaker-out), baseline comparisons, or statistical tests. MFCC features are known to be sensitive to speaker physiology and recording conditions; without these details or an external validation cohort, the result cannot be assessed as robust for real-world CVD deployment.
minor comments (2)
- [Abstract] The abstract states the accuracy figure but does not define the exact exertion labeling protocol or the decision threshold for 'high' exertion in the Talk Test.
- A dedicated limitations paragraph discussing the controlled recording environment versus real-world ambient noise, medication effects, and speaker variability would strengthen the manuscript.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight key areas for strengthening the manuscript's transparency and contextualization. We address each major comment below, indicating revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and participant description: the 20 participants are not characterized by demographics, cardiovascular disease status, fitness level, or medication use. Given that the introduction and motivation explicitly target individuals with cardiovascular disease (where overexertion risks are highlighted), this omission makes it impossible to evaluate whether the usability preference or 90% classification accuracy can be expected to transfer to the intended users.
Authors: We agree that the absence of participant characterization limits evaluation of transferability to the target CVD population. The revised manuscript adds a dedicated Participants subsection in Methods, including a table with demographics (age, gender, self-reported fitness via IPAQ), confirmation of no known cardiovascular disease or relevant medications, and other relevant details. The abstract has been updated to reference this characterization. We also added an explicit limitations paragraph stating that the current cohort consists of healthy adults and that future work will validate usability and classification performance with CVD patients to assess generalizability. revision: yes
-
Referee: [Automated Classification Exploration] Automated classification section: the claim of up to 90% accuracy for high vs. non-high exertion is reported without sample counts per class, the precise cross-validation scheme (e.g., leave-one-speaker-out), baseline comparisons, or statistical tests. MFCC features are known to be sensitive to speaker physiology and recording conditions; without these details or an external validation cohort, the result cannot be assessed as robust for real-world CVD deployment.
Authors: We have revised the Automated Classification Exploration section to include all requested details: exact sample counts per class before and after balancing, the precise leave-one-speaker-out cross-validation scheme, comparisons to baseline classifiers (e.g., SVM and logistic regression), and statistical tests (e.g., McNemar's test) for performance differences. We also expanded the discussion to address MFCC sensitivity to speaker physiology and recording conditions, describing our normalization steps and noting remaining robustness considerations. While the current study does not include an external validation cohort, we have added this as a clear limitation and outlined plans for such validation in future work targeting CVD deployment. revision: yes
Circularity Check
No significant circularity; empirical user study and standard ML pipeline
full rationale
The paper's central results derive from a within-subject study collecting labeled voice samples from 20 participants, followed by usability ratings and a standard supervised classification pipeline (MFCC features, class balancing, cross-validation) that produces accuracy figures on held-out data folds. No equations, definitions, or claims reduce by construction to prior outputs or self-citations; the 90% accuracy is an empirical measurement rather than a fitted parameter renamed as prediction. Generalization concerns exist but are validity issues, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Talk Test is a valid clinical measure of exercise intensity
Reference graph
Works this paper leans on
-
[1]
Juul Achten and Asker E Jeukendrup. 2003. Heart rate monitoring: applications and limitations.Sports medicine33 (2003), 517–538
work page 2003
-
[2]
Daniel Bok, Marija Rakovac, and Carl Foster. 2022. An examination and critique of subjective methods to determine exercise intensity: the talk test, feeling scale, and rating of perceived exertion.Sports medicine52, 9 (2022), 2085–2109
work page 2022
-
[3]
Philip Donaghy, Edel Ennis, Maurice Mulvenna, Raymond Bond, Niamh Kennedy, Mike McTear, Henry O’Connell, Nate Blaylock, and Raymond Brueckner. 2024. A Review of Studies Using Machine Learning to Detect Voice Biomarkers for Depression.Journal of Technology in Behavioral Science(2024), 1–15
work page 2024
-
[4]
SM Fox 3rd and WL Haskell. 1968. Physical activity and the prevention of coronary heart disease.Bulletin of the New York Academy of Medicine44, 8 (1968), 950
work page 1968
-
[5]
Stephen Gillinov, Muhammad Etiwy, Robert Wang, Gordon Blackburn, Dermot Phelan, A Marc Gillinov, Penny Houghtaling, Hoda Javadikasgari, and Milind Y Desai. 2017. Variable accuracy of wearable heart rate monitors during aerobic exercise.Medicine & Science in Sports & Exercise49, 8 (2017), 1697–1703
work page 2017
-
[6]
Dominique Hansen, Ana Abreu, Marco Ambrosetti, Veronique Cornelissen, An- dreas Gevaert, Hareld Kemps, Jari A Laukkanen, Roberto Pedretti, Maria Si- monenko, Matthias Wilhelm, et al . 2022. Exercise intensity assessment and prescription in cardiovascular rehabilitation and beyond: why and how: a posi- tion statement from the Secondary Prevention and Rehab...
work page 2022
-
[7]
Aneeqa Ijaz, Muhammad Nabeel, Usama Masood, Tahir Mahmood, Mydah Sajid Hashmi, Iryna Posokhova, Ali Rizwan, and Ali Imran. 2022. Towards using cough for respiratory disease diagnosis by leveraging Artificial Intelligence: A survey. Informatics in Medicine Unlocked29 (2022), 100832
work page 2022
-
[8]
Rania Islambouli, Marlene Brunner, Devender Kumar, Mahdi Sareban, Gunnar Treff, Michael Neudorfer, Josef Niebauer, Arne Bathke, and Jan Smeddinck
-
[9]
InMensch und Computer 2025-Workshopband
Towards a Real-Time Warning System for Detecting Inaccuracies in Photoplethysmography-Based Heart Rate Measurements in Wearable Devices. InMensch und Computer 2025-Workshopband. Gesellschaft für Informatik eV, 10–18420
work page 2025
-
[10]
Nicki Lentz-Nielsen, Lars Maaløe, Pascal Madeleine, and Stig Nikolaj Blomberg
-
[11]
Voice as a Health Indicator: The Use of Sound Analysis and AI for Moni- toring Respiratory Function.BioMedInformatics5, 2 (2025), 31
work page 2025
-
[12]
James R Lewis. 2002. Psychometric evaluation of the PSSUQ using data from five years of usability studies.International Journal of Human-Computer Interaction 14, 3-4 (2002), 463–488
work page 2002
-
[13]
Juliana Goulart Prata Oliveira Milani, Mauricio Milani, Kenneth Verboven, Gerson Cipriano Jr, and Dominique Hansen. 2024. Exercise intensity prescription in cardiovascular rehabilitation: bridging the gap between best evidence and clinical practice.Frontiers in cardiovascular medicine11 (2024), 1380639
work page 2024
-
[14]
Rachel Persinger, Carl Foster, Mark Gibson, Dennis CW Fater, and John P Porcari
-
[15]
Consistency of the talk test for exercise prescription.Medicine & Science in Sports & Exercise36, 9 (2004), 1632–1636
work page 2004
-
[16]
Timothy J Quinn and Benjamin A Coons. 2011. The Talk Test and its relationship with the ventilatory and lactate thresholds.Journal of sports sciences29, 11 (2011), 1175–1182
work page 2011
-
[17]
Jennifer L Reed and Andrew L Pipe. 2014. The talk test: a useful tool for prescrib- ing and monitoring exercise intensity.Current opinion in cardiology29, 5 (2014), 475–480
work page 2014
-
[18]
Yanzhi Ren, Zhourong Zheng, Hongbo Liu, Yingying Chen, Hongwei Li, and Chen Wang. 2021. Breathing sound-based exercise intensity monitoring via smartphones. In2021 international conference on computer communications and networks (ICCCN). IEEE, 1–10
work page 2021
-
[19]
Yong-Gon Seo, Suki Oh, Won-Hah Park, Mija Jang, Ho-Young Kim, Sung-A Chang, In-Kyung Park, and Jidong Sung. 2021. Optimal aerobic exercise intensity and its influence on the effectiveness of exercise therapy in patients with pulmonary arterial hypertension: a systematic review.Journal of thoracic disease13, 7 (2021), 4530
work page 2021
-
[20]
Sarah E Stahl, Hyun-Sung An, Danae M Dinkel, John M Noble, and Jung-Min Lee. 2016. How accurate are the wrist-based heart rate monitors during walking and running activities.Are they accurate enough2 (2016), e000106
work page 2016
-
[21]
Per A Tesch. 1985. Exercise performance and 𝛽-blockade.Sports Medicine2 (1985), 389–412
work page 1985
-
[22]
Tari D Topolski, James LoGerfo, Donald L Patrick, Barbara Williams, Julie Wal- wick, and MAJ Marsha B Patrick. 2006. The rapid assessment of physical activity Conference’17, July 2017, Washington, DC, USA Rania Islambouli, Laura Geiger, Daniela Wurhofer, Devender Kumar, Clemens Sauerwein, and Jan David Smeddinck (RAPA) among older adults.Preventing chroni...
work page 2006
-
[23]
Ariany Marques Vieira, Edgar Manoel Martins, Amanda Althoff, Daiana Apare- cida Rech, Gustavo dos Santos Ribeiro, Darlan Lauricio Matte, and Marlus Karsten
-
[24]
Application and measurement properties of the talk test in cardiopulmonary patients: a systematic review.Reviews in Cardiovascular Medicine23, 7 (2022), 225
work page 2022
-
[25]
Shuyi Zhou, Ruisi Ma, Wangjing Hu, Dandan Zhang, Rui Hu, Shengwei Zou, Dingyi Cai, Zikang Jiang, Hexiao Ding, and Ting Liu. 2025. Voice as a sensitive biomarker for predicting exercise intensity: a modelling study.Frontiers in Physiology16 (2025), 1483828
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.