Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

Torbj\"orn E. M. Nordling; Zhi Wei Xu

arxiv: 2606.12378 · v1 · pith:JU7BK5C7new · submitted 2026-06-10 · 💻 cs.CV · cs.AI

Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

Zhi Wei Xu , Torbj\"orn E. M. Nordling This is my paper

Pith reviewed 2026-06-27 09:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords remote photoplethysmographyheart rate estimationillumination robustnesstransformerrPPGphysiological sensingrobot vision

0 comments

The pith

A spatial-temporal transformer estimates heart rate from video at 0.79 bpm error under varying illumination by using 3D face alignment, augmentation, and hybrid waveform-spectral loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an end-to-end framework for remote photoplethysmography that maintains accuracy for heart-rate estimation when lighting changes, a key requirement for cameras mounted on service or assistive robots. It integrates PRNet-based 3D face alignment to handle pose, clip-level illumination augmentation during training, a Residual Temporal Standardization Module, and a hybrid loss that balances a Soft-Shifted Pearson waveform term against a spectral Kullback-Leibler term weighted by a tunable β. On a dataset covering three illumination levels under a static all-level mix protocol, β=5 yields the lowest mean absolute error of 0.79 bpm and a correlation of 0.982. This performance is reported to exceed the PhysFormer baseline by a large margin on the same data.

Core claim

The paper claims that the described spatial-temporal transformer, after incorporating PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and hybrid supervision with β set to 5, produces heart-rate estimates whose mean absolute error reaches 0.79 bpm and whose correlation reaches 0.982 on the authors' static all-level mix protocol covering three illumination levels, corresponding to a 93.6 % reduction in error and an increase in correlation from 0.088 to 0.982 relative to the PhysFormer baseline evaluated on the same data.

What carries the argument

The end-to-end spatial-temporal transformer framework that applies PRNet-based 3D face alignment, clip-level illumination augmentation, Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision whose β weight balances waveform and spectral losses.

If this is right

Heart-rate estimation from robot-mounted cameras becomes usable across three distinct illumination levels when the hybrid loss is weighted at β=5.
The method reduces mean absolute error by 93.6 % and raises correlation from 0.088 to 0.982 relative to the PhysFormer baseline on the tested dataset.
Performance is strongest among the β values examined when frequency-domain guidance receives five times the weight of the waveform loss.
The combination of 3D face alignment and clip-level illumination augmentation supports the reported robustness on the static protocol.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the static-protocol results hold for moving robots, the same pipeline could support continuous physiological awareness during human-robot interaction in homes or care settings.
The hybrid loss structure might be adapted to estimate additional signals such as breathing rate by swapping the target frequency band.
Deployment on robots would require additional checks for motion blur and subject movement not present in the static test protocol.

Load-bearing premise

The assumption that the listed components together will deliver the reported accuracy under real robot deployment conditions with moving subjects and naturally changing light rather than only on the authors' static all-level mix protocol.

What would settle it

Running the trained estimator on video recorded by a moving robot camera in everyday indoor lighting with non-static human subjects and measuring whether the mean absolute error stays below 2 bpm and the correlation stays above 0.9.

Figures

Figures reproduced from arXiv: 2606.12378 by Torbj\"orn E. M. Nordling, Zhi Wei Xu.

**Figure 1.** Figure 1: Experimental setup for the non-contact physiological signal measurement protocol. Participants sit on the bike facing the camera mounted above the TV under controlled illumination. Reproduced from Wang (2020) [12]. 3.2 Preprocessing Raw videos are first processed by a PRNet-based 3D face alignment pipeline. PRNet predicts dense facial geometry and provides semantically aligned facial regions through UV p… view at source ↗

**Figure 2.** Figure 2: Overall architecture of our estimator. The input is a PRNet-preprocessed facial video clip. RTSM is the key module in this work; it is inserted after the convolutional stem and before tubetoken embedding to reduce brightness-induced temporal feature-statistic shifts. The PhysFormer-style temporal-difference Transformer block is repeated N times; in our implementation, N = 12 and the blocks are grouped i… view at source ↗

**Figure 3.** Figure 3: Visualization of the Residual Temporal Standardization Module. (a) It operates on the stem feature tensor X with shape [B, 96, 500, 16, 16]. (b) One temporal feature sequence x = X[b, c, :, h, w] is selected for visualization. (c) The temporal mean µT and standard deviation σT are computed over T = 500 frames for each channel and spatial location rather than globally over all channels or spatial positions… view at source ↗

**Figure 4.** Figure 4: Learned RTSM residual coefficient α. 5 Discussion The static all-level mix results show that illumination robustness depends on both data-side and objectiveside design. PRNet preprocessing provides stable aligned facial inputs, which helps reduce spatial inconsistency before learning. Clip-level illumination augmentation expands the apparent brightness and contrast distribution while preserving temporal … view at source ↗

read the original abstract

Physiological awareness is important for service, social, and assistive robots that interact with humans in everyday environments. Remote photoplethysmography (rPPG) enables non-contact heart-rate (HR) estimation from an RGB camera, making it a promising sensing modality for robot-mounted vision systems. However, illumination variation remains a major barrier to robust deployment. This paper presents an end-to-end spatial-temporal transformer framework for remote HR estimation on a new dataset with varied illumination. Our estimator integrates PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision. The training objective combines a Soft-Shifted Pearson waveform loss with a spectral Kullback-Leibler divergence loss, where a tuned weight ($\mathbf{\beta}$) controls the contribution of frequency-domain heart-rate guidance. Experiments on a static all-level mix protocol covering three illumination levels show that $\mathbf{\beta}=5$ provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982. Compared with the PhysFormer baseline evaluated on our dataset, our estimator reduces HR MAE by 93.6 %, while increasing HR correlation from 0.088 to 0.982, making it usable when illumination varies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets strong numbers on a new illumination dataset with a hybrid loss and new module, but the robot deployment claim rests on static tests only.

read the letter

The one thing to know is that this work reports a big drop in heart-rate error on illumination-varying video, down to 0.79 bpm MAE, but all the numbers come from a static protocol with no camera or subject motion.

What is new is the Residual Temporal Standardization Module, the authors' own dataset covering three illumination levels, and the hybrid loss that mixes a waveform term with a spectral KL term weighted by beta. They also use PRNet for 3D alignment and clip-level augmentation. On their static all-level mix protocol the combination beats the PhysFormer baseline by a large margin and reaches 0.982 correlation. That is a concrete data point for the illumination problem.

The soft spot is the gap between the title and the experiments. The abstract and title position the method for robot-mounted cameras in everyday settings, yet the evaluation stays fixed-camera and fixed-subject. There are no results with ego-motion, head translation, rotation, or distance changes that would appear once the camera is on a moving platform. Beta is also chosen after testing values on the same protocol, so the frequency term's contribution is data-dependent rather than fixed in advance. Without motion conditions the claimed usability for robots is not yet shown.

The paper is aimed at people working on non-contact physiological sensing for robots or under variable light. A reader who needs an illumination-focused dataset or wants to try the standardization module could get value from it. The work shows clear engagement with the rPPG literature and ships a new dataset, so it is worth sending for peer review even though the motion gap will need addressing.

Referee Report

2 major / 0 minor

Summary. The paper claims to present an end-to-end spatial-temporal transformer framework for remote photoplethysmography (rPPG) heart-rate estimation that is robust to illumination variation for use in robot physiological sensing. The method integrates PRNet-based 3D face alignment, clip-level illumination augmentation, a Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision via a Soft-Shifted Pearson waveform loss combined with spectral Kullback-Leibler divergence, with a tuned weight β controlling the frequency term. On a new dataset under a static all-level mix protocol covering three illumination levels, β=5 yields a best-run MAE of 0.79 bpm and correlation of 0.982, reported as a 93.6% MAE reduction and correlation increase from 0.088 to 0.982 relative to the PhysFormer baseline evaluated on the same data, supporting the claim of usability when illumination varies.

Significance. If the reported gains are shown to hold under robot-mounted camera conditions, the work would provide a useful advance in illumination-robust non-contact HR sensing for service, social, and assistive robots by addressing a practical deployment barrier in dynamic environments.

major comments (2)

[Abstract] Abstract: β is explicitly chosen as the value (β=5) that produces the strongest result among tested settings on the same static all-level mix evaluation protocol used to report the final MAE of 0.79 bpm and correlation of 0.982; this selection makes the contribution of the frequency-domain term data-dependent rather than independently derived.
[Abstract] Abstract: The central claim is that the estimator is usable for robot physiological sensing under varying illumination, yet all quantitative results derive exclusively from a static all-level mix protocol; no results are provided under camera ego-motion, subject head translation/rotation, or changing subject-camera distance that would occur with a moving robot platform.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and evaluation protocol. We address each major comment below and will incorporate revisions to improve clarity and precision.

read point-by-point responses

Referee: [Abstract] Abstract: β is explicitly chosen as the value (β=5) that produces the strongest result among tested settings on the same static all-level mix evaluation protocol used to report the final MAE of 0.79 bpm and correlation of 0.982; this selection makes the contribution of the frequency-domain term data-dependent rather than independently derived.

Authors: We acknowledge the validity of this observation. The value β=5 was selected because it produced the strongest result among the tested settings on the reported protocol. In the revised manuscript, we will expand the abstract to report performance for the full range of tested β values and explicitly note that β=5 corresponds to the best configuration observed on this dataset, thereby making the hyperparameter selection process transparent. revision: yes
Referee: [Abstract] Abstract: The central claim is that the estimator is usable for robot physiological sensing under varying illumination, yet all quantitative results derive exclusively from a static all-level mix protocol; no results are provided under camera ego-motion, subject head translation/rotation, or changing subject-camera distance that would occur with a moving robot platform.

Authors: We agree that the reported results are confined to a static all-level mix protocol and do not include camera ego-motion or dynamic subject-camera geometry. The present work isolates illumination variation as the primary variable. We will revise the abstract and add a limitations paragraph to state that the method demonstrates illumination robustness under static conditions and to identify evaluation under robot-mounted dynamic conditions as an important direction for future work. revision: yes

Circularity Check

1 steps flagged

β hyperparameter selected by performance on evaluation protocol

specific steps

fitted input called prediction [Abstract]
"Experiments on a static all-level mix protocol covering three illumination levels show that β=5 provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982."

The weight β is chosen as the value among tested settings that yields the strongest result on the reported evaluation protocol; the quoted performance numbers are therefore obtained by selecting the hyperparameter that optimizes the reported metrics rather than being an independent outcome of the method.

full rationale

The paper reports performance numbers obtained after selecting β=5 as the value that yields the strongest result on the static all-level mix protocol used for evaluation. This constitutes a fitted_input_called_prediction pattern because the reported MAE and correlation are the outcome of choosing the hyperparameter that optimizes those exact metrics. No other circularity patterns (self-definitional, self-citation load-bearing, etc.) are present in the provided text; the method derivation itself does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of the listed preprocessing and loss components plus the tuned β on the authors' dataset; no independent evidence is supplied for generalization beyond that dataset.

free parameters (1)

beta = 5
Tuned scalar that balances the Soft-Shifted Pearson waveform loss against the spectral Kullback-Leibler divergence loss; selected as the value giving the strongest result among tested settings.

axioms (1)

domain assumption PRNet-based 3D face alignment remains accurate under the three illumination levels used in the static all-level mix protocol
The framework description states that PRNet is integrated for face alignment as the first processing step.

invented entities (1)

Residual Temporal Standardization Module no independent evidence
purpose: Standardizes temporal features to improve robustness to illumination variation
Introduced as a core component of the proposed estimator; no external validation of its necessity is provided.

pith-pipeline@v0.9.1-grok · 5778 in / 1653 out tokens · 24754 ms · 2026-06-27T09:41:22.498408+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 14 canonical work pages

[1]

Survey on physio- logical computing in human-robot collaboration

Celal Savur and Ferat Sahin. Survey on physio- logical computing in human-robot collaboration. Machines, 11(5):536, 2023. doi: 10.3390/ma- chines11050536

work page doi:10.3390/ma- 2023
[2]

Remote plethysmographic imaging us- ing ambient light.Optics express, 16(26):21434– 21445, 2008

Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethysmographic imaging us- ing ambient light.Optics express, 16(26):21434– 21445, 2008. doi: 10.1364/OE.16.021434

work page doi:10.1364/oe.16.021434 2008
[3]

Non-contact video-based pulse rate measurement on a mobile service robot

Ronny Stricker, Steffen Mueller, and Horst- Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. InProceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062, 2014. doi: 10.1109/ROMAN.2014.6926392

work page doi:10.1109/roman.2014.6926392 2014
[4]

AutoHR: A strong end-to-end baseline for remote heart rate mea- surement with neural searching.IEEE Signal Processing Letters, 27:1245–1249, 2020

Zitong Yu, Xiaobai Li, Xuesong Niu, Jingang Shi, and Guoying Zhao. AutoHR: A strong end-to-end baseline for remote heart rate mea- surement with neural searching.IEEE Signal Processing Letters, 27:1245–1249, 2020. doi: 10.1109/LSP.2020.3007086

work page doi:10.1109/lsp.2020.3007086 2020
[5]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zitong Yu, Yuming Shen, Jingang Shi, Heng- shuang Zhao, Philip HS Torr, and Guoying Zhao. PhysFormer: facial video-based physi- ological measurement with temporal difference 7 transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022. doi: 10.1109/CVPR52688.2022.00415

work page doi:10.1109/cvpr52688.2022.00415 2022
[6]

In: IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023

Jun Seong Lee, Gyutae Hwang, Moonwook Ryu, and Sang Jun Lee. LSTC-rPPG: Long short-term convolutional network for remote photoplethysmography. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi: 10.1109/CVPRW59228.2023.00640

work page doi:10.1109/cvprw59228.2023.00640 2024
[7]

Ro- bust and generalizable heart rate estimation via deep learning for remote photoplethysmography in complex scenarios.arXiv preprint, 2025

Kang Cen, Chang-Hong Fu, and Hong Hong. Ro- bust and generalizable heart rate estimation via deep learning for remote photoplethysmography in complex scenarios.arXiv preprint, 2025. doi: 10.48550/arXiv.2507.07795

work page doi:10.48550/arxiv.2507.07795 2025
[8]

FreqPhys: Re- purposing implicit physiological frequency prior for robust remote photoplethysmography.arXiv preprint, 2026

Wei Qian, Dan Guo, Jinxing Zhou, Bochao Zou, Zitong Yu, and Meng Wang. FreqPhys: Re- purposing implicit physiological frequency prior for robust remote photoplethysmography.arXiv preprint, 2026. doi: 10.48550/arXiv.2604.00534

work page doi:10.48550/arxiv.2604.00534 2026
[9]

Non -contact, automated cardiac pulse measurements using video imaging and blind source separation

Ming-Zher Poh, Daniel J. McDuff, and Ros- alind W. Picard. Non-contact, automated car- diac pulse measurements using video imaging and blind source separation.Optics Express, 18 (10):10762–10774, 2010. All Open Access, Gold Open Access; doi: 10.1364/OE.18.010762

work page doi:10.1364/oe.18.010762 2010
[10]

Robust pulse rate from chrominance-based rPPG,

Gerard de Haan and Vincent Jeanne. Ro- bust pulse rate from chrominance-based rppg.IEEE Transactions on Biomedical Engineering, 60(10):2878–2886, 2013. doi: 10.1109/TBME.2013.2266196

work page doi:10.1109/tbme.2013.2266196 2013
[11]

C., Stuijk, S., & De Haan, G

Wenjin Wang, Albertus C. den Brinker, Sander Stuijk, and Gerard de Haan. Algorithmic prin- ciples of remote ppg.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2017. doi: 10.1109/TBME.2016.2609282

work page doi:10.1109/tbme.2016.2609282 2017
[12]

Non-contact heart rate mea- surement based on facial videos

Chien-Chih Wang. Non-contact heart rate mea- surement based on facial videos. Master’s thesis, National Cheng Kung University, No. 1, Dasyue Rd, East District, Tainan City, 701, 2020

2020
[13]

Ze Yang, Haofei Wang, and Feng Lu. As- sessment of deep learning-based heart rate esti- mation using remote photoplethysmography un- der different illuminations.IEEE Transactions on Human-Machine Systems, 52(6):1236–1246,
[14]

doi: 10.1109/THMS.2022.3207755

work page doi:10.1109/thms.2022.3207755 2022
[15]

Joint 3d face reconstruction and dense alignment with position map regression network

Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. Joint 3d face reconstruction and dense alignment with position map regression network. InProceedings of the European confer- ence on computer vision (ECCV),pages534–551,
[16]

doi: 10.1007/978-3-030-01264-9_32

work page doi:10.1007/978-3-030-01264-9_32
[17]

Comparative analysis of non- end-to-end and end-to-end deep learning models with 2d and 3d face alignment for remote heart rate estimation

Yu-Chiao Wang. Comparative analysis of non- end-to-end and end-to-end deep learning models with 2d and 3d face alignment for remote heart rate estimation. Master’s thesis, National Cheng Kung University, Tainan, Taiwan, 6 2025

2025
[18]

A plug-and-play temporal normalization module for robust remote photoplethysmography.arXiv preprint, 2024

Kegang Wang, Jiankai Tang, Yantao Wei, Mingxuan Liu, Xin Liu, and Yuntao Wang. A plug-and-play temporal normalization module for robust remote photoplethysmography.arXiv preprint, 2024. doi: 10.48550/arXiv.2411.15283. 8

work page doi:10.48550/arxiv.2411.15283 2024

[1] [1]

Survey on physio- logical computing in human-robot collaboration

Celal Savur and Ferat Sahin. Survey on physio- logical computing in human-robot collaboration. Machines, 11(5):536, 2023. doi: 10.3390/ma- chines11050536

work page doi:10.3390/ma- 2023

[2] [2]

Remote plethysmographic imaging us- ing ambient light.Optics express, 16(26):21434– 21445, 2008

Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethysmographic imaging us- ing ambient light.Optics express, 16(26):21434– 21445, 2008. doi: 10.1364/OE.16.021434

work page doi:10.1364/oe.16.021434 2008

[3] [3]

Non-contact video-based pulse rate measurement on a mobile service robot

Ronny Stricker, Steffen Mueller, and Horst- Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. InProceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062, 2014. doi: 10.1109/ROMAN.2014.6926392

work page doi:10.1109/roman.2014.6926392 2014

[4] [4]

AutoHR: A strong end-to-end baseline for remote heart rate mea- surement with neural searching.IEEE Signal Processing Letters, 27:1245–1249, 2020

Zitong Yu, Xiaobai Li, Xuesong Niu, Jingang Shi, and Guoying Zhao. AutoHR: A strong end-to-end baseline for remote heart rate mea- surement with neural searching.IEEE Signal Processing Letters, 27:1245–1249, 2020. doi: 10.1109/LSP.2020.3007086

work page doi:10.1109/lsp.2020.3007086 2020

[5] [5]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zitong Yu, Yuming Shen, Jingang Shi, Heng- shuang Zhao, Philip HS Torr, and Guoying Zhao. PhysFormer: facial video-based physi- ological measurement with temporal difference 7 transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022. doi: 10.1109/CVPR52688.2022.00415

work page doi:10.1109/cvpr52688.2022.00415 2022

[6] [6]

In: IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023

Jun Seong Lee, Gyutae Hwang, Moonwook Ryu, and Sang Jun Lee. LSTC-rPPG: Long short-term convolutional network for remote photoplethysmography. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi: 10.1109/CVPRW59228.2023.00640

work page doi:10.1109/cvprw59228.2023.00640 2024

[7] [7]

Ro- bust and generalizable heart rate estimation via deep learning for remote photoplethysmography in complex scenarios.arXiv preprint, 2025

Kang Cen, Chang-Hong Fu, and Hong Hong. Ro- bust and generalizable heart rate estimation via deep learning for remote photoplethysmography in complex scenarios.arXiv preprint, 2025. doi: 10.48550/arXiv.2507.07795

work page doi:10.48550/arxiv.2507.07795 2025

[8] [8]

FreqPhys: Re- purposing implicit physiological frequency prior for robust remote photoplethysmography.arXiv preprint, 2026

Wei Qian, Dan Guo, Jinxing Zhou, Bochao Zou, Zitong Yu, and Meng Wang. FreqPhys: Re- purposing implicit physiological frequency prior for robust remote photoplethysmography.arXiv preprint, 2026. doi: 10.48550/arXiv.2604.00534

work page doi:10.48550/arxiv.2604.00534 2026

[9] [9]

Non -contact, automated cardiac pulse measurements using video imaging and blind source separation

Ming-Zher Poh, Daniel J. McDuff, and Ros- alind W. Picard. Non-contact, automated car- diac pulse measurements using video imaging and blind source separation.Optics Express, 18 (10):10762–10774, 2010. All Open Access, Gold Open Access; doi: 10.1364/OE.18.010762

work page doi:10.1364/oe.18.010762 2010

[10] [10]

Robust pulse rate from chrominance-based rPPG,

Gerard de Haan and Vincent Jeanne. Ro- bust pulse rate from chrominance-based rppg.IEEE Transactions on Biomedical Engineering, 60(10):2878–2886, 2013. doi: 10.1109/TBME.2013.2266196

work page doi:10.1109/tbme.2013.2266196 2013

[11] [11]

C., Stuijk, S., & De Haan, G

Wenjin Wang, Albertus C. den Brinker, Sander Stuijk, and Gerard de Haan. Algorithmic prin- ciples of remote ppg.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2017. doi: 10.1109/TBME.2016.2609282

work page doi:10.1109/tbme.2016.2609282 2017

[12] [12]

Non-contact heart rate mea- surement based on facial videos

Chien-Chih Wang. Non-contact heart rate mea- surement based on facial videos. Master’s thesis, National Cheng Kung University, No. 1, Dasyue Rd, East District, Tainan City, 701, 2020

2020

[13] [13]

Ze Yang, Haofei Wang, and Feng Lu. As- sessment of deep learning-based heart rate esti- mation using remote photoplethysmography un- der different illuminations.IEEE Transactions on Human-Machine Systems, 52(6):1236–1246,

[14] [14]

doi: 10.1109/THMS.2022.3207755

work page doi:10.1109/thms.2022.3207755 2022

[15] [15]

Joint 3d face reconstruction and dense alignment with position map regression network

Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. Joint 3d face reconstruction and dense alignment with position map regression network. InProceedings of the European confer- ence on computer vision (ECCV),pages534–551,

[16] [16]

doi: 10.1007/978-3-030-01264-9_32

work page doi:10.1007/978-3-030-01264-9_32

[17] [17]

Comparative analysis of non- end-to-end and end-to-end deep learning models with 2d and 3d face alignment for remote heart rate estimation

Yu-Chiao Wang. Comparative analysis of non- end-to-end and end-to-end deep learning models with 2d and 3d face alignment for remote heart rate estimation. Master’s thesis, National Cheng Kung University, Tainan, Taiwan, 6 2025

2025

[18] [18]

A plug-and-play temporal normalization module for robust remote photoplethysmography.arXiv preprint, 2024

Kegang Wang, Jiankai Tang, Yantao Wei, Mingxuan Liu, Xin Liu, and Yuntao Wang. A plug-and-play temporal normalization module for robust remote photoplethysmography.arXiv preprint, 2024. doi: 10.48550/arXiv.2411.15283. 8

work page doi:10.48550/arxiv.2411.15283 2024