StreamPPG: Low-Latency rPPG Estimation via Consistent Privileged Learning
Pith reviewed 2026-06-26 09:12 UTC · model grok-4.3
The pith
StreamPPG enables frame-by-frame rPPG estimation from video with accuracy matching clip-wise methods by using ground-truth signals only during training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
StreamPPG is a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. It is trained under a consistent privileged learning strategy that leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability.
What carries the argument
Consistent privileged learning (CPL) strategy that supplies ground-truth rPPG signals exclusively during training to strengthen single-frame inference representations.
If this is right
- StreamPPG achieves state-of-the-art accuracy across multiple datasets.
- It maintains real-time throughput on edge devices.
- It removes the multi-second delay inherent in clip-wise rPPG while avoiding the accuracy drop of conventional frame-wise methods.
- The same training approach supports continuous contact-free health monitoring on resource-constrained hardware.
Where Pith is reading between the lines
- The CPL pattern could be tested on other periodic video signals such as respiration rate estimation.
- Mobile health applications could incorporate the model to deliver continuous vital-sign feedback without specialized sensors.
- Further experiments on skin-tone diversity would clarify whether the learned representations remain stable across demographic groups.
- The approach might reduce buffer requirements in other real-time video analysis pipelines that currently rely on batch processing.
Load-bearing premise
Ground-truth rPPG signals used only at training time will produce features that generalize to new videos without those signals and without overfitting to the training distribution.
What would settle it
A cross-dataset evaluation in which StreamPPG error exceeds that of a standard clip-wise baseline under changed lighting, camera, or subject demographics.
Figures
read the original abstract
Remote photoplethysmography (rPPG) estimates the blood volume pulse (BVP) signal from facial videos, enabling contact-free health monitoring. Conventional clip-wise approaches, which use video clips as input, require capturing over one hundred frames before inference, thus introducing several seconds of delay and hindering real-time use. Meanwhile, frame-wise approaches struggle to capture long-range temporal and periodic features of physiological rhythms, and therefore lead to reduced estimation accuracy. To overcome these issues, we propose StreamPPG, a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. StreamPPG is trained under a consistent privileged learning (CPL) strategy, which leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability. Extensive experiments demonstrate that StreamPPG achieves state-of-the-art accuracy across multiple datasets while maintaining real-time throughput on edge devices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes StreamPPG, a unified frame-wise architecture for remote photoplethysmography (rPPG) that uses consistent privileged learning (CPL) to incorporate ground-truth BVP signals only at training time. This enables low-latency inference from individual frames while claiming to match or exceed the accuracy of conventional clip-wise methods. The abstract states that extensive experiments show state-of-the-art accuracy across multiple datasets together with real-time throughput on edge devices.
Significance. If the central claim holds, StreamPPG would address a practical bottleneck in contact-free physiological monitoring by delivering both low latency and competitive accuracy, potentially enabling real-time applications on resource-constrained devices. The CPL strategy is a standard privileged-information technique; its value here would lie in empirical demonstration that video-only features learned under this regime generalize without overfitting to training-domain video-BVP correlations.
major comments (2)
- [Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.
- [Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying the content of the Experiments and Method sections while outlining targeted revisions for improved clarity.
read point-by-point responses
-
Referee: [Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.
Authors: We agree that the abstract, as a concise summary, does not include specific quantitative numbers. However, the Experiments section of the manuscript contains detailed tables and figures reporting SOTA comparisons against multiple baselines, error bars across runs, full dataset specifications, and ablation studies on the CPL components. These results directly support that the video-only inference matches clip-wise accuracy. In the revision we will add a brief summary of key metrics (e.g., MAE and throughput) to the abstract for immediate accessibility. revision: partial
-
Referee: [Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.
Authors: The manuscript already reports cross-dataset transfer experiments (training on one dataset and testing on others) as well as ablations that disable the privileged BVP branch at training time. These appear in the Experiments section and confirm that the learned video representations generalize without overfitting to training-domain video-BVP correlations. We will add explicit forward references from the Method section to these results to make the generalization evidence more prominent. revision: no
Circularity Check
No circularity; derivation is self-contained empirical method
full rationale
The paper describes a standard neural architecture for rPPG estimation trained with a privileged branch that receives ground-truth BVP signals exclusively at training time. The central claim is an empirical performance result (SOTA accuracy + real-time inference) obtained via supervised optimization on labeled datasets; no equation or derivation is shown to reduce by construction to its own fitted parameters or to a self-citation chain. The inference procedure is explicitly defined to operate without the privileged signal, making the generalization claim falsifiable by cross-dataset or ablation experiments rather than tautological.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002
Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002
2002
-
[2]
Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019
Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019
2019
-
[3]
Bj¨orn Braun, Daniel McDuff, and Christian Holz. How suboptimal is training rPPG models with videos and targets from different body sites? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 410–418, 2024
2024
-
[4]
Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces
Constantino Alvarez Casado and Miguel Bordallo L ´opez. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. IEEE Journal of Biomedical and Health Informatics, 27(11):5530–5541, 2023
2023
-
[5]
DeepPhys: Video-based physiological measurement using convolutional attention networks
Weixuan Chen and Daniel McDuff. DeepPhys: Video-based physiological measurement using convolutional attention networks. InProceedings of the European Conference on Computer Vision, pages 349–365, 2018
2018
-
[6]
Juan Cheng, Ping Wang, Rencheng Song, Yu Liu, Chang Li, Yong Liu, and Xun Chen. Remote heart rate measurement from near-infrared videos based on joint blind source separation with delay-coordinate transformation.IEEE Transactions on Instrumentation and Measurement, 70:1–13, 2020
2020
-
[7]
Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024
Tri Dao and Albert Gu. Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024
2024
-
[8]
Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013
Gerard De Haan and Vincent Jeanne. Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013
2013
-
[9]
Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014
Gerard De Haan and Arno Van Leest. Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014
1913
-
[10]
A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017
Guillaume Heusch, Andr ´e Anjos, and S ´ebastien Marcel. A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017
Pith/arXiv arXiv 2017
-
[11]
ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021
Min Hu, Fei Qian, Dong Guo, Xiaohua Wang, Lei He, and Fuji Ren. ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021
2021
-
[12]
Titong Jiang, Yuan Ma, Jiaqi Li, Qing Dong, Xuewu Ji, and Yahui Liu. LSTS: Periodicity learning via long short-term temporal shift for remote physiological measurement.IEEE Transactions on Circuits and Systems for Video Technology, 35(7):6452–6465, 2025
2025
-
[13]
Jitesh Joshi, Sos S Agaian, and Youngjun Cho. Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024
arXiv 2024
-
[14]
Learning motion-robust remote photoplethysmography through arbitrary resolution videos
Jianwei Li, Zitong Yu, and Jingang Shi. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1334–1342, 2023
2023
-
[15]
Contactless pulse estimation leveraging pseudo labels and self-supervision
Zhihua Li and Lijun Yin. Contactless pulse estimation leveraging pseudo labels and self-supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597, 2023
2023
-
[16]
Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025
Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, and Hong Chen. Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025
2025
-
[17]
A general remote photoplethysmography estimator with spatiotemporal convolutional network
Si-Qi Liu and Pong C Yuen. A general remote photoplethysmography estimator with spatiotemporal convolutional network. InProceedings of the International Conference on Automatic Face and Gesture Recognition, pages 481–488. IEEE, 2020
2020
-
[18]
Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020
Xin Liu, Josh Fromm, Shwetak Patel, and Daniel McDuff. Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020
2020
-
[19]
Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement
Xin Liu, Brian Hill, Ziheng Jiang, Shwetak Patel, and Daniel McDuff. Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. InProceedings of the Winter Conference on Applications of Computer Vision, pages 5008–5017, 2023
2023
-
[20]
rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023
Xin Liu, Girish Narayanswamy, Akshay Paruchuri, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Roni Sengupta, Shwetak Patel, Yuntao Wang, and Daniel McDuff. rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023
2023
-
[21]
Dual-gan: Joint BVP and noise modeling for remote physiological measurement
Hao Lu, Hu Han, and S Kevin Zhou. Dual-gan: Joint BVP and noise modeling for remote physiological measurement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12404–12413, 2021
2021
-
[22]
A vector-contraction inequality for rademacher complexities
Andreas Maurer. A vector-contraction inequality for rademacher complexities. InProceedings of the International Conference on Algorithmic Learning Theory, pages 3–17. Springer, 2016
2016
-
[23]
Video-based remote physiological measurement via cross-verified feature disentangling
Xuesong Niu, Zitong Yu, Hu Han, Xiaobai Li, Shiguang Shan, and Guoying Zhao. Video-based remote physiological measurement via cross-verified feature disentangling. InProceedings of the European Conference on Computer Vision, pages 295–310. Springer, 2020
2020
-
[24]
On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010
Dmitry Pechyony and Vladimir Vapnik. On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010
2010
-
[25]
Local group invariance for heart rate estimation from face videos in the wild
Christian S Pilz, Sebastian Zaunseder, Jarek Krajewski, and Vladimir Blazek. Local group invariance for heart rate estimation from face videos in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1254–1262, 2018
2018
-
[26]
Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010
Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010
2010
-
[27]
Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025
Marko Savic and Guoying Zhao. Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025
2025
-
[28]
Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023
Hang Shao, Lei Luo, Jianjun Qian, Shuo Chen, Chuanfei Hu, and Jian Yang. Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023
2023
-
[29]
PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021
Rencheng Song, Huan Chen, Juan Cheng, Chang Li, Yu Liu, and Xun Chen. PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021
2021
-
[30]
Non- contrastive unsupervised learning of physiological signals from video
Jeremy Speth, Nathan Vance, Patrick Flynn, and Adam Czajka. Non- contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474, 2023
2023
-
[31]
Visual heart rate estimation with convolutional neural network
Radim ˇSpetl´ık, V ojtech Franc, and Jir´ı Matas. Visual heart rate estimation with convolutional neural network. InProceedings of the British Machine Vision Conference, pages 3–6, 2018
2018
-
[32]
Non-contact video-based pulse rate measurement on a mobile service robot
Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. In The IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE, 2014
2014
-
[33]
Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015
Yu Sun and Nitish Thakor. Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015
2015
-
[34]
Sun and X
Z. Sun and X. Li. Contrast-Phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal con- trast.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2024
2024
-
[35]
MMPD: Multi-domain mobile video physiology dataset
Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, and Xin Liu. MMPD: Multi-domain mobile video physiology dataset. InProceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society, pages 1–5, 2023
2023
-
[36]
Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015
Vladimir Vapnik, Rauf Izmailov, et al. Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015. ARXIV 11
2023
-
[37]
A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009
Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009
2009
-
[38]
Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008
Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008
2008
-
[39]
Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016
Wenjin Wang, Albertus C Den Brinker, Sander Stuijk, and Gerard De Haan. Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016
2016
-
[40]
Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals
Bryan P Yan, William HS Lai, Christy KY Chan, Stephen Chun-Hin Chan, Lok-Hei Chan, Ka-Ming Lam, Ho-Wang Lau, Chak-Ming Ng, Lok- Yin Tai, Kin-Wai Yip, et al. Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals. Journal of the American Heart Association, 7(8):e008585, 2018
2018
-
[41]
Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks
Zitong Yu, Xiaobai Li, and Guoying Zhao. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. InProceedings of the British Machine Vision Conference, 2019
2019
-
[42]
Facial-video-based physio- logical signal measurement: Recent advances and affective applications
Zitong Yu, Xiaobai Li, and Guoying Zhao. Facial-video-based physio- logical signal measurement: Recent advances and affective applications. IEEE Signal Processing Magazine, 38(6):50–58, 2021
2021
-
[43]
Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement
Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 151–160, 2019
2019
-
[44]
Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023
Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, and Guoying Zhao. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023
2023
-
[45]
Physformer: Facial video-based physiological measurement with temporal difference transformer
Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip HS Torr, and Guoying Zhao. Physformer: Facial video-based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022
2022
-
[46]
Dezhao Zhai, Wei Chen, Yinghao Ding, Ming Yu, Qinwei Li, and Hang Wu. Research on robust measurement method of heart rate using remote photoplethysmography based on adversarial learning network with high and low frequency features.IEEE Transactions on Circuits and Systems for Video Technology, 35(6):5208–5222, 2025
2025
-
[47]
Yizhu Zhang, Jingang Shi, Jiayin Wang, Yuan Zong, Wenming Zheng, and Guoying Zhao. MaskFusionNet: A dual-stream fusion model with masked pre-training mechanism for rPPG measurement.IEEE Transactions on Circuits and Systems for Video Technology, 34(11):11521–11534, 2024
2024
-
[48]
JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022
Changchen Zhao, Hongsheng Wang, Huiling Chen, Weiwei Shi, and Yuanjing Feng. JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022
2022
-
[49]
RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025
Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, and Huimin Ma. RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025
2025
-
[50]
Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment
Bochao Zou, Zizheng Guo, Xiaocheng Hu, and Huimin Ma. Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 11077–11085, 2025. APPENDIXA CONSISTENTPRIVILEGEDLEARNINGSTRATEGY A. Notation and Definitions V, z and sgt denote the input video, privile...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.