arxiv: 2605.00882 · v1 · submitted 2026-04-26 · 💻 cs.CV

Recognition: unknown

Intervention-Based Self-Supervised Learning: A Causal Probe Paradigm for Remote Photoplethysmography

Zhiyi Niu , Xiaoguang Tu , Bo Zhao , Junzhe Cao , Dan Guo , Zitong Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote photoplethysmographyself-supervised learningcausal probingintervention-based learningchrominance editingphysiological signalheart rate estimation

0 comments

The pith

Causal probing via video interventions learns the true rPPG signal by verifying physical hypotheses instead of spurious correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Self-supervised learning for remote photoplethysmography often gets trapped in learning the strongest periodic signals, which are usually motion or illumination noise rather than the weak true pulse signal. The paper introduces Physiological Causal Probing as a new paradigm that actively intervenes on the video using a hypothesized rPPG signal and checks whether the changes follow expected physical laws through nulling and equivariance tests. This matters because accurate non-contact heart rate monitoring requires models that work reliably across varied real-world conditions without expensive labels. The resulting Interv-rPPG framework achieves better generalization by focusing on causal structure.

Core claim

The paper shows that by hypothesizing the rPPG signal with PhysMambaFormer, editing the video's low-frequency chrominance components accordingly with a controllable editor, and validating via Falsifiability via Nulling and Axiomatic Equivariance, the model learns representations that capture the genuine physiological signal and resist artifacts.

What carries the argument

Physiological Causal Probing (PCP) paradigm, implemented through hypothesis-driven intervention on video chrominance to test the physical realism of the extracted rPPG signal.

If this is right

Enhances in-domain and cross-domain performance on challenging datasets such as VIPL-HR and MMPD.
Outperforms supervised baselines in complex cross-dataset scenarios.
Maintains competitiveness on clean datasets despite potential minor residual noise from editing.
Reduces sensitivity to motion and illumination artifacts as confirmed by nuisance diagnostic analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The success in cross-dataset settings implies the method could lower the need for dataset-specific fine-tuning in practical rPPG applications.
By making the learning process falsifiable, it provides a template for improving other self-supervised methods that suffer from dominant noise signals.
Future work might explore combining this with other modalities to further strengthen the causal verification.

Load-bearing premise

The Controllable Physiological Signal Editor can execute precise interventions on low-frequency chrominance that isolate and alter solely the hypothesized rPPG signal without introducing additional noise that invalidates the falsifiability tests.

What would settle it

If nulling the hypothesized rPPG signal in the video does not result in the expected removal of the periodic pulse component when verified against independent measurements, or if the equivariance tests fail to hold under signal-preserving transformations, the central claim would be disproven.

read the original abstract

Remote Photoplethysmography (rPPG) enables convenient non-contact physiological measurement. Existing Self-Supervised Learning (SSL) methods commonly fall into a correlation trap: they tend to learn the most dominant periodic signals in the data, such as high-energy motion or illumination noise, rather than the faint, true rPPG signal, leading to poor model generalization. To address this, we propose a new SSL paradigm, Physiological Causal Probing (PCP), which treats the latent rPPG signal as the underlying physical source and the resulting pixel chrominance variations as its visual manifestation. Its core idea is to shift from passive correlation learning to active, precise intervention: it intervenes on the video based on a proposed rPPG hypothesis, and verifies whether the post-intervention changes match physical expectations. We propose the Interv-rPPG framework to implement PCP: an rPPG extractor named PhysMambaFormer hypothesizes the rPPG signal, while a Controllable Physiological Signal Editor conducts precise chrominance-domain interventions on videos based on this hypothesis. Interv-rPPG validates the physical realism of the hypothesis through `Falsifiability via Nulling' and `Axiomatic Equivariance'. Our editor achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components of the video. Our method improves both in-domain and cross-domain performance on challenging datasets such as VIPL-HR and MMPD. Furthermore, it surpasses the supervised baseline in complex cross-dataset settings, while remaining competitive on clean datasets where the intervention mechanism may introduce slight residual chrominance noise. Extensive experiments, including diagnostic analysis of nuisance sensitivity, demonstrate that the PCP paradigm effectively resists motion and illumination artifacts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shifts SSL for rPPG to active intervention with falsifiability checks via nulling and equivariance, which is a clear step beyond passive correlation methods, but the gains rest on an editor whose precision is not yet strongly evidenced.

read the letter

The main thing to know is that this work reframes self-supervised rPPG as Physiological Causal Probing: an extractor proposes a signal, an editor intervenes on low-frequency chrominance in the video, and the setup checks whether the changes obey physical expectations through nulling and axiomatic equivariance. That move from passive learning to testable intervention is the actual novelty, and it directly targets the problem of models latching onto motion or illumination instead of the faint rPPG source. The abstract reports better in-domain and cross-domain results on VIPL-HR and MMPD, plus outperformance of supervised baselines in harder cross-dataset transfers, which suggests the approach can improve robustness where standard SSL struggles. The diagnostic nuisance analysis is a useful addition for showing reduced sensitivity to artifacts. The framework is implemented with PhysMambaFormer and the Controllable Physiological Signal Editor, and the authors are upfront that clean datasets may see slight residual noise from the edits. The soft spot is exactly where the stress test points: everything hinges on the editor making precise, artifact-free changes that isolate only the hypothesized signal. If the interventions leak new residuals or fail to null cleanly, the falsifiability checks lose power and the observed gains could come from incidental regularization rather than causal verification. The circular dependency between extractor and editor is noted but not fully broken without strong evidence on intervention fidelity. This is for researchers working on video-based physiological sensing or SSL methods that need to handle real-world noise. Readers who want concrete ways to add falsifiability to SSL pipelines will find usable ideas here. The paper deserves a serious referee because the paradigm is new, the claims are falsifiable in principle, and the datasets are standard. I would send it to review with the expectation that the editor's accuracy and the source of the cross-domain gains get the closest look.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Physiological Causal Probing (PCP), a self-supervised paradigm for remote photoplethysmography (rPPG) that shifts from passive correlation learning to active intervention. The Interv-rPPG framework uses PhysMambaFormer to hypothesize the latent rPPG signal and a Controllable Physiological Signal Editor to perform precise interventions on low-frequency chrominance components of input videos. Validation occurs via 'Falsifiability via Nulling' and 'Axiomatic Equivariance' checks that test whether post-intervention changes match physical expectations. The authors claim this yields improved in-domain and cross-domain performance on challenging datasets (VIPL-HR, MMPD), surpasses supervised baselines in cross-dataset settings, and remains competitive on clean data while resisting motion and illumination artifacts.

Significance. If the editor performs artifact-free interventions that isolate only the hypothesized rPPG signal, the PCP paradigm offers a falsifiable, causal alternative to standard SSL methods that often latch onto dominant noise. This could improve generalization in real-world rPPG applications such as non-contact vital-sign monitoring under motion or varying illumination. The explicit use of intervention-based verification and diagnostic nuisance-sensitivity analysis is a methodological strength that distinguishes the work from purely correlational SSL approaches.

major comments (2)

[Controllable Physiological Signal Editor and Experiments sections] The central performance claims (improved cross-domain results on VIPL-HR/MMPD and surpassing supervised baselines) rest on the Controllable Physiological Signal Editor executing precise, artifact-free interventions on low-frequency chrominance. The manuscript provides no quantitative fidelity metrics (e.g., residual power spectra after nulling, intervention error norms, or before/after chrominance difference statistics) to confirm that only the hypothesized rPPG component is modified. Without such evidence, the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' diagnostics lose diagnostic power and observed gains could arise from incidental regularization rather than causal probing of the true physiological source.
[Method and Ablation studies] The abstract and method description assert that the editor 'achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components' without introducing residual noise that would undermine verification. This assumption is load-bearing for the entire PCP paradigm, yet the paper does not report ablation or diagnostic results that isolate the editor's contribution from the extractor or from standard data-augmentation effects.

minor comments (2)

[Abstract] The abstract states performance improvements and 'extensive experiments' but contains no numerical results, dataset-specific metrics, or baseline comparisons. Adding at least one key table or figure reference in the abstract would improve immediate readability.
[Introduction] Notation for the new entities (PhysMambaFormer, PCP, Interv-rPPG) is introduced without an explicit comparison table against prior rPPG SSL methods; a small related-work summary table would clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We are pleased that the significance of the Physiological Causal Probing (PCP) paradigm as a falsifiable, causal alternative to standard SSL methods is recognized. Below, we provide point-by-point responses to the major comments, committing to revisions that include additional quantitative metrics and ablation studies to address the concerns about the Controllable Physiological Signal Editor.

read point-by-point responses

Referee: [Controllable Physiological Signal Editor and Experiments sections] The central performance claims (improved cross-domain results on VIPL-HR/MMPD and surpassing supervised baselines) rest on the Controllable Physiological Signal Editor executing precise, artifact-free interventions on low-frequency chrominance. The manuscript provides no quantitative fidelity metrics (e.g., residual power spectra after nulling, intervention error norms, or before/after chrominance difference statistics) to confirm that only the hypothesized rPPG component is modified. Without such evidence, the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' diagnostics lose diagnostic power and observed gains could arise from incidental regularization rather than causal probing of the true physiological source.

Authors: We agree that providing quantitative fidelity metrics for the interventions performed by the Controllable Physiological Signal Editor is essential to substantiate the claims and strengthen the diagnostic power of the falsifiability checks. Although the current manuscript relies on the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' validations along with nuisance sensitivity analysis to demonstrate that post-intervention changes align with physical expectations, we acknowledge the absence of explicit metrics such as residual power spectra or intervention error norms. In the revised version, we will compute and report these metrics, including residual power spectra after nulling, intervention error norms, and chrominance difference statistics before and after editing. This will help confirm that modifications are limited to the hypothesized rPPG component and mitigate concerns that gains stem from regularization effects. We will also discuss any observed residual noise, as noted in the manuscript for clean datasets. revision: yes
Referee: [Method and Ablation studies] The abstract and method description assert that the editor 'achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components' without introducing residual noise that would undermine verification. This assumption is load-bearing for the entire PCP paradigm, yet the paper does not report ablation or diagnostic results that isolate the editor's contribution from the extractor or from standard data-augmentation effects.

Authors: We recognize that isolating the contribution of the Controllable Physiological Signal Editor is important to validate its role in the PCP paradigm beyond the rPPG extractor or generic augmentations. The current work includes extensive experiments and diagnostic analysis of nuisance sensitivity to show resistance to motion and illumination artifacts, but does not present dedicated ablations for the editor. In the revision, we will add ablation studies that disable the editor or replace it with standard data augmentations, comparing performance to the full Interv-rPPG framework. This will clarify the editor's specific impact on learning the true physiological signal and address the load-bearing assumption of precise editing without undermining residual noise. revision: yes

Circularity Check

0 steps flagged

No circularity: PCP derives gains from external benchmarks via independent physical constraints

full rationale

The paper's chain proceeds from PhysMambaFormer hypothesis to editor-based chrominance intervention, followed by falsifiability/nulling and equivariance checks against stated physical expectations, then empirical evaluation on external datasets (VIPL-HR, MMPD). These steps do not reduce the final performance claims to the initial hypothesis by construction; the validation criteria invoke independent physical priors rather than self-referential consistency alone, and no equations, fitted parameters, or self-citations are shown to force the outcome. The framework remains self-contained against held-out benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review based on abstract only; full details on parameters and assumptions unavailable. The core premise treats the latent rPPG signal as the physical source of observed chrominance changes.

axioms (1)

domain assumption The latent rPPG signal is the underlying physical source whose visual manifestation is pixel chrominance variation.
Stated as the core idea of PCP in the abstract.

invented entities (2)

PhysMambaFormer no independent evidence
purpose: rPPG signal extractor that generates the hypothesis for intervention
New model component introduced in the Interv-rPPG framework.
Controllable Physiological Signal Editor no independent evidence
purpose: Performs precise chrominance-domain interventions on video based on the hypothesis
New editing module that enables the causal probing.

pith-pipeline@v0.9.0 · 5630 in / 1349 out tokens · 41533 ms · 2026-05-09T20:18:17.638102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Bobbia, S., Macwan, R., Benezeth, Y., Man- souri, A., and Dubois, J. (2019). Unsupervised skin tissue segmentation for remote photo- plethysmography.Pattern recognition letters, 124:82–90

2019
[2]

L., Casado, C

Ca˜ nellas, M. L., Casado, C. ´A., Nguyen, L., and L´ opez, M. B. (2025). A self-supervised multimodal framework for 1d physiological data fusion in remote health monitoring.Informa- tion Fusion, page 103397

2025
[3]

Chan, M., Zhu, L., Vatanparvar, K., Gwak, M., Kuang, J., and Gao, A. (2023). Estimat- ing spo 2 with deep oxygen desaturations from facial video under various lighting conditions: A feasibility study. In2023 45th Annual Inter- national Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–5. IEEE

2023
[4]

Chen, M., Liao, X., and Wu, M. (2022). Pulseedit: Editing physiological signals in facial videos for privacy protection.IEEE Transac- tions on Information Forensics and Security, 17:457–471

2022
[5]

Chen, T., Lin, J., Yang, Z., Qing, C., Shi, Y., and Lin, L. (2025). Contrastive decoupled representation learning and regularization for speech-preserving facial expression manipula- tion.International Journal of Computer Vision, 133(7):3822–3838

2025
[6]

and McDuff, D

Chen, W. and McDuff, D. (2018). Deepphys: Video-based physiological measurement using convolutional attention networks. InProceed- ings of the european conference on computer vision (ECCV), pages 349–365

2018
[7]

and Picard, R

Chen, W. and Picard, R. W. (2017). Eliminat- ing physiological information from facial videos. In2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), pages 48–55. IEEE

2017
[8]

Chen, W., Yi, Z., Lim, L. J. R., Lim, R. Q. R., Zhang, A., Qian, Z., Huang, J., He, J., and Liu, B. (2024). Deep learn- ing and remote photoplethysmography pow- ered advancements in contactless physiological measurement.Frontiers in bioengineering and biotechnology, 12:1420100

2024
[9]

Chung, W.-H., Hsieh, C.-J., Liu, S.-H., and Hsu, C.-T. (2022). Domain generalized rppg network: Disentangled feature learning with domain permutation and domain augmenta- tion. InProceedings of the Asian Conference on Computer Vision, pages 807–823

2022
[10]

and Jeanne, V

De Haan, G. and Jeanne, V. (2013). Robust pulse rate from chrominance-based rppg.IEEE transactions on biomedical engi- neering, 60(10):2878–2886

2013
[11]

Du, J., Liu, S.-Q., Zhang, B., and Yuen, P. C. (2021). Weakly supervised rppg estimation for respiratory rate estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2391–2397

2021
[12]

and Stent, S

Gideon, J. and Stent, S. (2021). The way to my heart is through contrastive learning: Remote photoplethysmography from unlabelled video. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 3995–4004

2021
[13]

K., Bhattacharya, S., and Aishwarya, N

GS, Y. K., Bhattacharya, S., and Aishwarya, N. (2024). Remote photoplethysmography (rppg) for contactless blood oxygen saturation monitoring. In2024 IEEE International Con- ference on Electronics, Computing and Commu- nication Technologies (CONECCT), pages 1–6. IEEE

2024
[14]

and Dao, T

Gu, A. and Dao, T. (2024). Mamba: Linear- time sequence modeling with selective state spaces. InFirst conference on language model- ing

2024
[15]

He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., and Girshick, R. (2022). Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009

2022
[16]

He, K., Fan, H., Wu, Y., Xie, S., and Gir- shick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738

2020
[17]

Hsieh, C.-J., Chung, W.-H., and Hsu, C.- T. (2022). Augmentation of rppg benchmark datasets: Learning to remove and embed rppg signals via double cycle consistent learning from unpaired facial videos. InEuropean conference on computer vision, pages 372–387. Springer

2022
[18]

Li, L., Chen, C., Pan, L., Tai, Y., Zhang, J., and Xiang, Y. (2023). Hiding your sig- nals: A security analysis of ppg-based biometric authentication. InEuropean Symposium on Research in Computer Security, pages 183–202. Article Title23 Springer

2023
[19]

and Yin, L

Li, Z. and Yin, L. (2023). Contactless pulse estimation leveraging pseudo labels and self- supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597

2023
[20]

Liu, X., Fromm, J., Patel, S., and McDuff, D. (2020). Multi-task temporal shift atten- tion networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411

2020
[21]

Liu, X., Zhang, Y., Yu, Z., Lu, H., Yue, H., and Yang, J. (2024). rppg-mae: Self- supervised pretraining with masked autoen- coders for remote physiological measurements. IEEE Transactions on Multimedia, 26:7278– 7293

2024
[22]

Decoupled Weight Decay Regularization

Loshchilov, I. and Hutter, F. (2017). Decou- pled weight decay regularization.arXiv preprint arXiv:1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Lu, H., Han, H., and Zhou, S. K. (2021). Dual-gan: Joint bvp and noise modeling for remote physiological measurement. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12404– 12413

2021
[24]

MediaPipe: A Framework for Building Perception Pipelines

Lugaresi, C., Tang, J., Nash, H., McClana- han, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J., et al. (2019). Mediapipe: A framework for build- ing perception pipelines.arXiv preprint arXiv:1906.08172

work page internal anchor Pith review arXiv 2019
[25]

Luo, C., Xie, Y., and Yu, Z. (2024). Phys- mamba: Efficient remote physiological measure- ment with slowfast temporal difference mamba. InChinese Conference on Biometric Recogni- tion, pages 248–259. Springer

2024
[26]

and Maaten, L

Misra, I. and Maaten, L. v. d. (2020). Self- supervised learning of pretext-invariant repre- sentations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6707–6717

2020
[27]

Niu, X., Han, H., Shan, S., and Chen, X. (2017). Continuous heart rate measurement from face: A robust rppg approach with distri- bution learning. In2017 IEEE international joint conference on biometrics (IJCB), pages 642–650. IEEE

2017
[28]

Niu, X., Han, H., Shan, S., and Chen, X. (2018). Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video. InAsian conference on computer vision, pages 562–576. Springer

2018
[29]

Niu, X., Shan, S., Han, H., and Chen, X. (2019). Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal rep- resentation.IEEE Transactions on Image Processing, 29:2409–2423

2019
[30]

C., Elgendi, M., Missale, G., and Menon, C

Ontiveros, R. C., Elgendi, M., Missale, G., and Menon, C. (2023). Evaluating rgb channels in remote photoplethysmography: a compara- tive study with contact-based ppg.Frontiers in Physiology, 14:1296277

2023
[31]

Park, S., Kim, B.-K., and Dong, S.-Y. (2022). Self-supervised rgb-nir fusion video vision transformer framework for rppg estima- tion.IEEE Transactions on Instrumentation and Measurement, 71:1–10

2022
[32]

Perez, E., Strub, F., De Vries, H., Dumoulin, V., and Courville, A. (2018). Film: Visual rea- soning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32

2018
[33]

H., and Harris-Birtill, D

Pirzada, P., Wilde, A., Doherty, G. H., and Harris-Birtill, D. (2023). Remote photoplethys- mography (rppg): A state-of-the-art review. MedRxiv, pages 2023–10

2023
[34]

Pirzada, P., Wilde, A., and Harris-Birtill, D. (2024). Remote photoplethysmography for heart rate and blood oxygenation measurement: a review.IEEE Sensors Journal, 24(15):23436– 23453

2024
[35]

J., and Picard, R

Poh, M.-Z., McDuff, D. J., and Picard, R. W. (2010). Advancements in noncontact, mul- tiparameter physiological measurements using a webcam.IEEE transactions on biomedical engineering, 58(1):7–11

2010
[36]

Qian, W., Su, G., Guo, D., Zhou, J., Li, X., Hu, B., Tang, S., and Wang, M. (2025). Physdiff: Physiology-based dynamicity disen- tangled diffusion model for remote physiological measurement. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 39, pages 6568–6576

2025
[37]

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. InInterna- tional Conference on Medical image computing and computer-assisted intervention, pages 234–

2015
[38]

24Article Title

Springer. 24Article Title
[39]

Ru, Y., Li, Q., Liu, Y., Wu, H., He, Z., and Sun, Z. (2026). Unveiling hidden psychologi- cal states: Synergistic fusion of amplified color and motion from video.Machine Intelligence Research, 23(2):396–408

2026
[40]

and Zhao, G

Savic, M. and Zhao, G. (2024). Rs-rppg: Robust self-supervised learning for rppg. In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–10. IEEE

2024
[41]

Song, R., Chen, H., Cheng, J., Li, C., Liu, Y., and Chen, X. (2021). Pulsegan: Learn- ing to generate realistic pulse waveforms in remote photoplethysmography.IEEE Jour- nal of Biomedical and Health Informatics, 25(5):1373–1384

2021
[42]

Speth, J., Vance, N., Flynn, P., and Cza- jka, A. (2023). Non-contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474

2023
[43]

Speth, J., Vance, N., Flynn, P., and Czajka, A. (2024). Sinc+: Adaptive camera-based vitals with unsupervised learning of periodic signals. arXiv preprint arXiv:2404.13449

work page arXiv 2024
[44]

Stricker, R., M¨ uller, S., and Gross, H.-M. (2014). Non-contact video-based pulse rate measurement on a mobile service robot. InThe 23rd IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE

2014
[45]

and Li, X

Sun, Z. and Li, X. (2022a). Contrast-phys: Unsupervised video-based remote physiologi- cal measurement via spatiotemporal contrast. InEuropean Conference on Computer Vision, pages 492–510. Springer
[46]

and Li, X

Sun, Z. and Li, X. (2022b). Privacy-phys: Facial video-based physiological modification for privacy protection.IEEE Signal Processing Letters, 29:1507–1511
[47]

and Li, X

Sun, Z. and Li, X. (2024). Contrast- phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal contrast.IEEE Transac- tions on Pattern Analysis and Machine Intelli- gence, 46(8):5835–5851

2024
[48]

Takeda, S., Akagi, Y., Okami, K., Isogai, M., and Kimata, H. (2019). Video magnification in the wild using fractional anisotropy in temporal distribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1614–1622

2019
[49]

Tang, J., Chen, K., Wang, Y., Shi, Y., Patel, S., McDuff, D., and Liu, X. (2023). Mmpd: Multi-domain mobile video physiology dataset. In2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–5. IEEE

2023
[50]

Tu, X., Niu, Z., Yin, J., Zhang, Y., Yang, M., Wei, L., Wang, Y., Fan, Z., and Zhao, J. (2025). Phys-edigan: A privacy-preserving method for editing physiological signals in facial videos. Pattern Recognition, page 111966

2025
[51]

O., and Nel- son, J

Verkruysse, W., Svaasand, L. O., and Nel- son, J. S. (2008). Remote plethysmographic imaging using ambient light.Optics express, 16(26):21434–21445

2008
[52]

Voelker, A. R. and Eliasmith, C. (2018). Improving spiking dynamical networks: Accu- rate delays, higher-order synapses, and time cells.Neural computation, 30(3):569–609

2018
[53]

Wang, H., Ahn, E., and Kim, J. (2022). Self- supervised representation learning framework for remote physiological measurement using spatiotemporal augmentation loss. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2431–2439

2022
[54]

Wang, T., Lin, X., Xu, Y., Ye, Q., Guo, D., Escalera, S., Khoriba, G., and Yu, Z. (2026). Micro-gesture recognition: A comprehensive survey of datasets, methods, and challenges. Machine Intelligence Research, 23(2):308–330

2026
[55]

C., Stuijk, S., and De Haan, G

Wang, W., Den Brinker, A. C., Stuijk, S., and De Haan, G. (2016). Algorithmic principles of remote ppg.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491

2016
[56]

Wang, W., Stuijk, S., and De Haan, G. (2015). A novel algorithm for remote pho- toplethysmography: Spatial subspace rotation. IEEE transactions on biomedical engineering, 63(9):1974–1984

2015
[57]

Wu, B., Yu, Z., Xie, Y., Liu, W., Luo, C., Liu, Y., and Goh, R. S. M. (2025). Semi-rppg: Semi-supervised remote physiological measure- ment with curriculum pseudo-labeling.IEEE Transactions on Instrumentation and Measure- ment

2025
[58]

Wu, H.-Y., Rubinstein, M., Shih, E., Gut- tag, J., Durand, F., and Freeman, W. (2012). Article Title25 Eulerian video magnification for revealing sub- tle changes in the world.ACM transactions on graphics (TOG), 31(4):1–8

2012
[59]

Xie, Y., Yu, Z., Wu, B., Xie, W., and Shen, L. (2024). Sfda-rppg: Source-free domain adap- tive remote physiological measurement with spatio-temporal consistency.arXiv preprint arXiv:2409.12040

work page arXiv 2024
[60]

Yu, Z., Li, X., Niu, X., Shi, J., and Zhao, G. (2020). Autohr: A strong end-to-end baseline for remote heart rate measurement with neu- ral searching.IEEE Signal Processing Letters, 27:1245–1249

2020
[61]

Yu, Z., Li, X., and Zhao, G. (2019a). Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. arXiv preprint arXiv:1905.02419

work page arXiv 1905
[62]

Yu, Z., Peng, W., Li, X., Hong, X., and Zhao, G. (2019b). Remote heart rate measure- ment from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. InProceedings of the IEEE/CVF international conference on computer vision, pages 151–160
[63]

Yu, Z., Shen, Y., Shi, J., Zhao, H., Cui, Y., Zhang, J., Torr, P., and Zhao, G. (2023). Physformer++: Facial video-based physiologi- cal measurement with slowfast temporal dif- ference transformer.International Journal of Computer Vision, 131(6):1307–1330

2023
[64]

H., and Zhao, G

Yu, Z., Shen, Y., Shi, J., Zhao, H., Torr, P. H., and Zhao, G. (2022). Physformer: Facial video- based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4186–4196

2022
[65]

Zhang, J., Lin, X., Huang, J., Ye, S., Guo, X., Zhu, D., Hu, R., Guo, D., Liang, Y., Yu, Z., et al. (2026). Multimodal deception detec- tion: A survey.Machine Intelligence Research, 23(2):284–307

2026
[66]

A., Shechtman, E., and Wang, O

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595

2018
[67]

Zhao, B., Guo, D., Cao, J., Xu, Y., Zou, B., Tan, T., Sun, Y., and Yu, Z. (2025). Phase-net: Physics-grounded harmonic attention system for efficient remote photoplethysmography mea- surement.arXiv preprint arXiv:2509.24850

work page arXiv 2025
[68]

Zhou, T., Paruchuri, A., Spjut, J., and Ak¸ sit, K. (2025). Editing physiological signals in videos using latent representations.arXiv preprint arXiv:2509.25348

work page internal anchor Pith review Pith/arXiv arXiv 2025
[69]

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image transla- tion using cycle-consistent adversarial networks. InProceedings of the IEEE international con- ference on computer vision, pages 2223–2232

2017