End-To-End Prediction of Emotion From Heartbeat Data Collected by a Consumer Fitness Tracker
Pith reviewed 2026-05-24 21:02 UTC · model grok-4.3
The pith
A Bayesian deep learning model classifies emotional valence from heartbeat time series collected by a consumer fitness tracker.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present here a Bayesian deep learning model for end-to-end classification of emotional valence, using only the unimodal heartbeat time series collected by a consumer fitness tracker (Garmin Vivosmart 3). We collected a new dataset for this task, and report a peak F1 score of 0.7. This demonstrates a practical relevance of physiology-based emotion detection 'in the wild' today.
What carries the argument
Bayesian deep learning model for processing unimodal PPG heartbeat time series to classify emotional valence while quantifying prediction uncertainty.
If this is right
- Physiology-based emotion detection becomes feasible with devices already owned by many users.
- Models can provide uncertainty measures appropriate for mental health and wellbeing applications.
- Specialized ECG hardware is no longer required for this type of affect prediction.
- Continuous, real-world monitoring of emotional states is now more accessible.
- The reported F1 score of 0.7 indicates usable performance on newly collected data.
Where Pith is reading between the lines
- Integration into existing fitness apps could provide users with emotional wellbeing insights alongside physical metrics.
- Testing on larger and more diverse populations would be needed to confirm broad applicability.
- Combining this with other data sources might improve accuracy, though the paper shows unimodal input suffices for baseline results.
- Deployment in clinical settings would require validation against established emotion assessment methods.
Load-bearing premise
The newly collected dataset contains accurately labeled emotional valence examples that are representative enough for the model to learn a generalizable mapping from PPG time series alone.
What would settle it
Running the trained model on a separate test set collected from new participants under different conditions and observing whether the F1 score remains near 0.7 or falls substantially.
Figures
read the original abstract
Automatic detection of emotion has the potential to revolutionize mental health and wellbeing. Recent work has been successful in predicting affect from unimodal electrocardiogram (ECG) data. However, to be immediately relevant for real-world applications, physiology-based emotion detection must make use of ubiquitous photoplethysmogram (PPG) data collected by affordable consumer fitness trackers. Additionally, applications of emotion detection in healthcare settings will require some measure of uncertainty over model predictions. We present here a Bayesian deep learning model for end-to-end classification of emotional valence, using only the unimodal heartbeat time series collected by a consumer fitness tracker (Garmin V\'ivosmart 3). We collected a new dataset for this task, and report a peak F1 score of 0.7. This demonstrates a practical relevance of physiology-based emotion detection `in the wild' today.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a Bayesian deep learning model for end-to-end classification of emotional valence from unimodal PPG heartbeat time series collected by a consumer fitness tracker (Garmin Vivosmart 3). It reports collecting a new dataset for this purpose and achieving a peak F1 score of 0.7, positioning the work as demonstrating practical relevance for physiology-based emotion detection in real-world settings with uncertainty quantification.
Significance. If the performance result holds under rigorous validation, the work would be significant for enabling emotion detection on ubiquitous, low-cost devices rather than clinical ECG equipment, with the Bayesian component addressing a key requirement for healthcare applications. The use of a newly collected dataset from a consumer device is a strength if properly documented.
major comments (2)
- [Abstract] Abstract: The central claim of a peak F1 score of 0.7 cannot be assessed because the abstract (and by extension the manuscript) supplies no dataset size, collection or labeling protocol, validation procedure (e.g., train/test split, cross-validation), baseline comparisons, error bars, or details on how the Bayesian uncertainty is computed and utilized. These omissions are load-bearing for the empirical performance claim.
- [Abstract] Abstract / Dataset section: The weakest assumption—that the newly collected PPG time series are paired with accurate, generalizable valence labels—is unverified. No information is provided on elicitation method, labeling source (self-report, raters), inter-rater reliability, timing alignment, or controls for motion artifacts and demand characteristics typical of wrist PPG, undermining attribution of the F1 score to the model.
Simulated Author's Rebuttal
We thank the referee for their comments on the abstract and dataset reporting. We agree that additional details are needed to support the performance claims and will revise the manuscript to address these points.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of a peak F1 score of 0.7 cannot be assessed because the abstract (and by extension the manuscript) supplies no dataset size, collection or labeling protocol, validation procedure (e.g., train/test split, cross-validation), baseline comparisons, error bars, or details on how the Bayesian uncertainty is computed and utilized. These omissions are load-bearing for the empirical performance claim.
Authors: We agree that the abstract lacks these details, making the F1 claim difficult to evaluate from the abstract alone. The full manuscript contains methodological information in later sections, but we accept that the abstract should be expanded for self-containment. In revision we will add dataset size, a concise summary of collection and labeling protocols, the validation procedure (including splits or cross-validation), baseline comparisons, error bars, and a brief description of how Bayesian uncertainty is obtained and used. revision: yes
-
Referee: [Abstract] Abstract / Dataset section: The weakest assumption—that the newly collected PPG time series are paired with accurate, generalizable valence labels—is unverified. No information is provided on elicitation method, labeling source (self-report, raters), inter-rater reliability, timing alignment, or controls for motion artifacts and demand characteristics typical of wrist PPG, undermining attribution of the F1 score to the model.
Authors: We acknowledge the manuscript currently provides insufficient detail on label elicitation, source, reliability, alignment, and artifact controls. These elements are important for interpreting the results. In the revision we will expand the dataset section (and abstract) with available information on elicitation method, labeling source, timing alignment, and any artifact controls used. Where inter-rater reliability or specific controls were not performed we will explicitly note this and discuss implications for generalizability. revision: yes
Circularity Check
No circularity: empirical F1 on held-out data from new collection
full rationale
The paper presents a Bayesian deep learning model trained and evaluated on a newly collected PPG dataset, reporting peak F1=0.7 on held-out examples. No equations, derivations, or self-citations are present that reduce the reported performance metric to a fitted parameter or input by construction. The result is an empirical measurement on external data, not a self-referential renaming or prediction forced by the training procedure itself. Dataset labeling quality is a separate validity concern, not a circularity issue.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights and hyperparameters
axioms (1)
- domain assumption Heartbeat time series from consumer PPG contains information sufficient to classify emotional valence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present here a Bayesian deep learning model for end-to-end classification of emotional valence, using only the unimodal heartbeat time series collected by a consumer fitness tracker (Garmin Vivosmart 3).
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We collected a new dataset for this task, and report a peak F1 score of 0.7.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Automatic analysis of facial expressions: The state of the art,
M. Pantic and L. J. M. Rothkrantz, “Automatic analysis of facial expressions: The state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, no. 12, pp. 1424–1445, 2000
work page 2000
-
[2]
Affective video content representation and modeling,
A. Hanjalic and L. Q. Xu, “Affective video content representation and modeling,” IEEE Transactions on Multimedia , vol. 7, no. 1, pp. 143– 154, 2005
work page 2005
-
[3]
Real-time inference of complex mental states from facial expressions and head gestures,
R. El Kaliouby and P. Robinson, “Real-time inference of complex mental states from facial expressions and head gestures,” in 2004 Conference on Computer Vision and Pattern Recognition Workshop , pp. 154–154, 2004
work page 2004
-
[4]
A regression approach to music emotion recognition,
Y . H. Yang, Y . C. Lin, Y . F. Su, and H. H. Chen, “A regression approach to music emotion recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 2, pp. 448–457, 2008
work page 2008
-
[5]
A survey of affect recognition methods: Audio, visual, and spontaneous expressions,
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 31, no. 1, pp. 39–58, 2009
work page 2009
-
[6]
Cross-Corpus acoustic emotion recognition: Variances and strategies,
B. Schuller, B. Vlasenko, F. Eyben, M. W ¨ollmer, A. Stuhlsatz, A. Wen- demuth, and G. Rigoll, “Cross-Corpus acoustic emotion recognition: Variances and strategies,” IEEE Transactions on Affective Computing , vol. 1, no. 2, pp. 119–131, 2010
work page 2010
-
[7]
Anger recognition in speech using acoustic and linguistic cues,
T. Polzehl, A. Schmitt, F. Metze, and M. Wagner, “Anger recognition in speech using acoustic and linguistic cues,” Speech Communication , vol. 53, no. 9-10, pp. 1198–1209, 2011
work page 2011
-
[8]
B. Schuller, A. Batliner, S. Steidl, and D. Seppi, “Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge,”Speech Communication, vol. 53, no. 9-10, pp. 1062– 1087, 2011
work page 2011
-
[9]
Emotion recognition based on physiological changes in music listening,
J. Kim and E. Andr ´e, “Emotion recognition based on physiological changes in music listening,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2067–2083, 2008
work page 2067
-
[10]
Detecting naturalistic expressions of nonbasic affect using physiological signals,
O. Alzoubi, S. K. D’Mello, and R. A. Calvo, “Detecting naturalistic expressions of nonbasic affect using physiological signals,” IEEE Trans- actions on Affective Computing , vol. 3, no. 3, pp. 298–310, 2012
work page 2012
-
[11]
An accurate emotion recognition system using ECG and GSR signals and matching pursuit method,
A. Goshvarpour, A. Abbasi, and A. Goshvarpour, “An accurate emotion recognition system using ECG and GSR signals and matching pursuit method,” Biomedical Journal, vol. 40, no. 6, pp. 355–368, 2017
work page 2017
-
[12]
Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion,
P. Ekman, W. V . Friesen, M. O’Sullivan, A. Chan, I. Diacoyanni- Tarlatzis, K. Heider, R. Krause, W. A. LeCompte, T. Pitcairn, P. E. Ricci- Bitti, K. Scherer, M. Tomita, and A. Tzavaras, “Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion,”Journal of Personality and Social Psychology, vol. 53, no. 4, pp. 712–717, 1987
work page 1987
-
[13]
Emotion inferences from vocal expression correlate across languages and cultures,
K. R. Scherer, R. Banse, and H. G. Wallbott, “Emotion inferences from vocal expression correlate across languages and cultures,” Journal of Cross-Cultural Psychology, vol. 32, no. 1, pp. 76–92, 2001
work page 2001
-
[14]
A bayesian deep learning framework for end-to-end prediction of emotion from heartbeat,
R. Harper and J. Southern, “A bayesian deep learning framework for end-to-end prediction of emotion from heartbeat,” 2019
work page 2019
-
[15]
S. Katsigiannis and N. Ramzan, “DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals from Wireless Low- cost Off-the-Shelf Devices,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 1, pp. 98–107, 2018
work page 2018
-
[16]
ASCERTAIN: Emotion and Personality Recognition using Commercial Sensors,
R. Subramanian, S. Member, J. Wache Student Member, M. Khomami Abadi, S. Member, R. L. Vieriu, S. Winkler, and N. Sebe, “ASCERTAIN: Emotion and Personality Recognition using Commercial Sensors,” IEEE Transactions on Affective Computing , vol. 9, no. 2, pp. 147–160, 2018
work page 2018
-
[17]
End- to-end learning for dimensional emotion recognition from physiological signals,
G. Keren, T. Kirschstein, E. Marchi, F. Ringeval, and B. Schuller, “End- to-end learning for dimensional emotion recognition from physiological signals,” in Proceedings - IEEE International Conference on Multimedia and Expo, pp. 985–990, 2017
work page 2017
-
[18]
AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups,
J. A. Miranda-Correa, M. K. Abadi, N. Sebe, and I. Patras, “AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups,” IEEE Transactions on Affective Computing, vol. PP, 2017
work page 2017
-
[19]
H. W. Guo, Y . S. Huang, C. H. Lin, J. C. Chien, K. Haraikawa, and J. S. Shieh, “Heart Rate Variability Signal Features for Emotion Recognition by Using Principal Component Analysis and Support Vectors Machine,” in Proceedings - 2016 IEEE 16th International Conference on Bioinfor- matics and Bioengineering, BIBE 2016 , pp. 274–277, 2016
work page 2016
-
[20]
Comparing features from ECG pattern and HRV analysis for emotion recognition system,
H. Ferdinando, T. Seppanen, and E. Alasaarela, “Comparing features from ECG pattern and HRV analysis for emotion recognition system,” in CIBCB 2016 - Annual IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology , pp. 1–6, 2016
work page 2016
-
[21]
Revealing real-time emotional responses: A personalized assessment based on heartbeat dynamics,
G. Valenza, L. Citi, A. Lanat ´a, E. P. Scilingo, and R. Barbieri, “Revealing real-time emotional responses: A personalized assessment based on heartbeat dynamics,” Scientific Reports, vol. 4, pp. 1–13, 2014
work page 2014
-
[22]
ECG pattern analysis for emotion detection,
F. Agrafioti, D. Hatzinakos, and A. K. Anderson, “ECG pattern analysis for emotion detection,” IEEE Transactions on Affective Computing , vol. 3, no. 1, pp. 102–115, 2012
work page 2012
-
[23]
I. D. Corporation, “Idc forecasts sustained double-digit growth for wearable devices led by steady adoption of smartwatches,” IDC Media Center, 2018
work page 2018
-
[24]
Shimmer discovery in motion: All products
“Shimmer discovery in motion: All products.” [Online; accessed 23- April-2019]
work page 2019
-
[25]
Wearable emotion recognition system based on gsr and ppg signals,
G. Udovicic, J. DHerek, M. Russo, and M. Sikora, “Wearable emotion recognition system based on gsr and ppg signals,” in Proceedings of the 2Nd International Workshop on Multimedia for Personal Health and Health Care, MMHealth ’17, (New York, NY , USA), pp. 53–59, ACM, 2017
work page 2017
-
[26]
M. Garbarino, M. Lai, D. Bender, R. W. Picard, and S. Tognetti, “Empatica e3 a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition,” in 2014 4th Interna- tional Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEAL...
work page 2014
- [27]
-
[28]
Emo- tion recognition using physiological signals: Laboratory vs. wearable sensors,
M. Ragot, N. Martin, S. Em, N. Pallamin, and J.-M. Diverrez, “Emo- tion recognition using physiological signals: Laboratory vs. wearable sensors,” in Advances in Human Factors in Wearable Technologies and Game Design (T. Ahram and C. Falc ˜ao, eds.), (Cham), pp. 15–22, Springer International Publishing, 2018
work page 2018
-
[29]
SAM: The Self-Assessment Manikin - An efficient cross- cultural measurement of emotional response,
J. D. Morris, “SAM: The Self-Assessment Manikin - An efficient cross- cultural measurement of emotional response,” Journal of Advertising Research, vol. 35, no. 6, pp. 63–68, 1995
work page 1995
-
[30]
E. J. Weber, P. C. Molenaar, and M. W. van der Molen, “A Nonstation- arity Test for the Spectral Analysis of Physiological Time Series with an Application to Respiratory Sinus Arrhythmia,” Psychophysiology, vol. 29, no. 1, pp. 55–65, 1992
work page 1992
-
[31]
Dynamic nonlinear vago- sympathetic interaction in regulating heart rate,
K. Sunagawa, T. Kawada, and T. Nakahara, “Dynamic nonlinear vago- sympathetic interaction in regulating heart rate,” Heart and Vessels , vol. 13, no. 4, pp. 157–174, 1998
work page 1998
-
[32]
Probabilistic machine learning and artificial intelli- gence,
Z. Ghahramani, “Probabilistic machine learning and artificial intelli- gence,” Nature, vol. 521, no. 7553, pp. 452–459, 2015
work page 2015
-
[33]
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,
Y . Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” in Proceedings of the 33rd International Conference on Machine Learning (ICML-16) , pp. 1050–1059, 2016
work page 2016
-
[34]
Real time electrocardiogram QRS detection using com- bined adaptive threshold,
I. I. Christov, “Real time electrocardiogram QRS detection using com- bined adaptive threshold,” BioMedical Engineering Online, vol. 3, no. 1, p. 28, 2004
work page 2004
-
[35]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision , 2015
work page 2015
-
[36]
Adam: A Method for Stochastic Optimization,
P. D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” International Conference on Learning Representations , 2014
work page 2014
-
[37]
TensorFlow: Large-scale machine learning on hetero- geneous systems,
GoogleResearch, “TensorFlow: Large-scale machine learning on hetero- geneous systems,” Google Research, 2015
work page 2015
-
[38]
Scikit-learn: Machine learning in python,
F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- nay, “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res. , vol. 12, pp. 2825–2830, Nov. 2011
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.