pith. machine review for the scientific record. sign in

arxiv: 2605.00882 · v1 · submitted 2026-04-26 · 💻 cs.CV

Recognition: unknown

Intervention-Based Self-Supervised Learning: A Causal Probe Paradigm for Remote Photoplethysmography

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords remote photoplethysmographyself-supervised learningcausal probingintervention-based learningchrominance editingphysiological signalheart rate estimation
0
0 comments X

The pith

Causal probing via video interventions learns the true rPPG signal by verifying physical hypotheses instead of spurious correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Self-supervised learning for remote photoplethysmography often gets trapped in learning the strongest periodic signals, which are usually motion or illumination noise rather than the weak true pulse signal. The paper introduces Physiological Causal Probing as a new paradigm that actively intervenes on the video using a hypothesized rPPG signal and checks whether the changes follow expected physical laws through nulling and equivariance tests. This matters because accurate non-contact heart rate monitoring requires models that work reliably across varied real-world conditions without expensive labels. The resulting Interv-rPPG framework achieves better generalization by focusing on causal structure.

Core claim

The paper shows that by hypothesizing the rPPG signal with PhysMambaFormer, editing the video's low-frequency chrominance components accordingly with a controllable editor, and validating via Falsifiability via Nulling and Axiomatic Equivariance, the model learns representations that capture the genuine physiological signal and resist artifacts.

What carries the argument

Physiological Causal Probing (PCP) paradigm, implemented through hypothesis-driven intervention on video chrominance to test the physical realism of the extracted rPPG signal.

If this is right

  • Enhances in-domain and cross-domain performance on challenging datasets such as VIPL-HR and MMPD.
  • Outperforms supervised baselines in complex cross-dataset scenarios.
  • Maintains competitiveness on clean datasets despite potential minor residual noise from editing.
  • Reduces sensitivity to motion and illumination artifacts as confirmed by nuisance diagnostic analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The success in cross-dataset settings implies the method could lower the need for dataset-specific fine-tuning in practical rPPG applications.
  • By making the learning process falsifiable, it provides a template for improving other self-supervised methods that suffer from dominant noise signals.
  • Future work might explore combining this with other modalities to further strengthen the causal verification.

Load-bearing premise

The Controllable Physiological Signal Editor can execute precise interventions on low-frequency chrominance that isolate and alter solely the hypothesized rPPG signal without introducing additional noise that invalidates the falsifiability tests.

What would settle it

If nulling the hypothesized rPPG signal in the video does not result in the expected removal of the periodic pulse component when verified against independent measurements, or if the equivariance tests fail to hold under signal-preserving transformations, the central claim would be disproven.

read the original abstract

Remote Photoplethysmography (rPPG) enables convenient non-contact physiological measurement. Existing Self-Supervised Learning (SSL) methods commonly fall into a correlation trap: they tend to learn the most dominant periodic signals in the data, such as high-energy motion or illumination noise, rather than the faint, true rPPG signal, leading to poor model generalization. To address this, we propose a new SSL paradigm, Physiological Causal Probing (PCP), which treats the latent rPPG signal as the underlying physical source and the resulting pixel chrominance variations as its visual manifestation. Its core idea is to shift from passive correlation learning to active, precise intervention: it intervenes on the video based on a proposed rPPG hypothesis, and verifies whether the post-intervention changes match physical expectations. We propose the Interv-rPPG framework to implement PCP: an rPPG extractor named PhysMambaFormer hypothesizes the rPPG signal, while a Controllable Physiological Signal Editor conducts precise chrominance-domain interventions on videos based on this hypothesis. Interv-rPPG validates the physical realism of the hypothesis through `Falsifiability via Nulling' and `Axiomatic Equivariance'. Our editor achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components of the video. Our method improves both in-domain and cross-domain performance on challenging datasets such as VIPL-HR and MMPD. Furthermore, it surpasses the supervised baseline in complex cross-dataset settings, while remaining competitive on clean datasets where the intervention mechanism may introduce slight residual chrominance noise. Extensive experiments, including diagnostic analysis of nuisance sensitivity, demonstrate that the PCP paradigm effectively resists motion and illumination artifacts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Physiological Causal Probing (PCP), a self-supervised paradigm for remote photoplethysmography (rPPG) that shifts from passive correlation learning to active intervention. The Interv-rPPG framework uses PhysMambaFormer to hypothesize the latent rPPG signal and a Controllable Physiological Signal Editor to perform precise interventions on low-frequency chrominance components of input videos. Validation occurs via 'Falsifiability via Nulling' and 'Axiomatic Equivariance' checks that test whether post-intervention changes match physical expectations. The authors claim this yields improved in-domain and cross-domain performance on challenging datasets (VIPL-HR, MMPD), surpasses supervised baselines in cross-dataset settings, and remains competitive on clean data while resisting motion and illumination artifacts.

Significance. If the editor performs artifact-free interventions that isolate only the hypothesized rPPG signal, the PCP paradigm offers a falsifiable, causal alternative to standard SSL methods that often latch onto dominant noise. This could improve generalization in real-world rPPG applications such as non-contact vital-sign monitoring under motion or varying illumination. The explicit use of intervention-based verification and diagnostic nuisance-sensitivity analysis is a methodological strength that distinguishes the work from purely correlational SSL approaches.

major comments (2)
  1. [Controllable Physiological Signal Editor and Experiments sections] The central performance claims (improved cross-domain results on VIPL-HR/MMPD and surpassing supervised baselines) rest on the Controllable Physiological Signal Editor executing precise, artifact-free interventions on low-frequency chrominance. The manuscript provides no quantitative fidelity metrics (e.g., residual power spectra after nulling, intervention error norms, or before/after chrominance difference statistics) to confirm that only the hypothesized rPPG component is modified. Without such evidence, the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' diagnostics lose diagnostic power and observed gains could arise from incidental regularization rather than causal probing of the true physiological source.
  2. [Method and Ablation studies] The abstract and method description assert that the editor 'achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components' without introducing residual noise that would undermine verification. This assumption is load-bearing for the entire PCP paradigm, yet the paper does not report ablation or diagnostic results that isolate the editor's contribution from the extractor or from standard data-augmentation effects.
minor comments (2)
  1. [Abstract] The abstract states performance improvements and 'extensive experiments' but contains no numerical results, dataset-specific metrics, or baseline comparisons. Adding at least one key table or figure reference in the abstract would improve immediate readability.
  2. [Introduction] Notation for the new entities (PhysMambaFormer, PCP, Interv-rPPG) is introduced without an explicit comparison table against prior rPPG SSL methods; a small related-work summary table would clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We are pleased that the significance of the Physiological Causal Probing (PCP) paradigm as a falsifiable, causal alternative to standard SSL methods is recognized. Below, we provide point-by-point responses to the major comments, committing to revisions that include additional quantitative metrics and ablation studies to address the concerns about the Controllable Physiological Signal Editor.

read point-by-point responses
  1. Referee: [Controllable Physiological Signal Editor and Experiments sections] The central performance claims (improved cross-domain results on VIPL-HR/MMPD and surpassing supervised baselines) rest on the Controllable Physiological Signal Editor executing precise, artifact-free interventions on low-frequency chrominance. The manuscript provides no quantitative fidelity metrics (e.g., residual power spectra after nulling, intervention error norms, or before/after chrominance difference statistics) to confirm that only the hypothesized rPPG component is modified. Without such evidence, the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' diagnostics lose diagnostic power and observed gains could arise from incidental regularization rather than causal probing of the true physiological source.

    Authors: We agree that providing quantitative fidelity metrics for the interventions performed by the Controllable Physiological Signal Editor is essential to substantiate the claims and strengthen the diagnostic power of the falsifiability checks. Although the current manuscript relies on the 'Falsifiability via Nulling' and 'Axiomatic Equivariance' validations along with nuisance sensitivity analysis to demonstrate that post-intervention changes align with physical expectations, we acknowledge the absence of explicit metrics such as residual power spectra or intervention error norms. In the revised version, we will compute and report these metrics, including residual power spectra after nulling, intervention error norms, and chrominance difference statistics before and after editing. This will help confirm that modifications are limited to the hypothesized rPPG component and mitigate concerns that gains stem from regularization effects. We will also discuss any observed residual noise, as noted in the manuscript for clean datasets. revision: yes

  2. Referee: [Method and Ablation studies] The abstract and method description assert that the editor 'achieves precise editing of the rPPG signal by intervening in the low-frequency chrominance components' without introducing residual noise that would undermine verification. This assumption is load-bearing for the entire PCP paradigm, yet the paper does not report ablation or diagnostic results that isolate the editor's contribution from the extractor or from standard data-augmentation effects.

    Authors: We recognize that isolating the contribution of the Controllable Physiological Signal Editor is important to validate its role in the PCP paradigm beyond the rPPG extractor or generic augmentations. The current work includes extensive experiments and diagnostic analysis of nuisance sensitivity to show resistance to motion and illumination artifacts, but does not present dedicated ablations for the editor. In the revision, we will add ablation studies that disable the editor or replace it with standard data augmentations, comparing performance to the full Interv-rPPG framework. This will clarify the editor's specific impact on learning the true physiological signal and address the load-bearing assumption of precise editing without undermining residual noise. revision: yes

Circularity Check

0 steps flagged

No circularity: PCP derives gains from external benchmarks via independent physical constraints

full rationale

The paper's chain proceeds from PhysMambaFormer hypothesis to editor-based chrominance intervention, followed by falsifiability/nulling and equivariance checks against stated physical expectations, then empirical evaluation on external datasets (VIPL-HR, MMPD). These steps do not reduce the final performance claims to the initial hypothesis by construction; the validation criteria invoke independent physical priors rather than self-referential consistency alone, and no equations, fitted parameters, or self-citations are shown to force the outcome. The framework remains self-contained against held-out benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review based on abstract only; full details on parameters and assumptions unavailable. The core premise treats the latent rPPG signal as the physical source of observed chrominance changes.

axioms (1)
  • domain assumption The latent rPPG signal is the underlying physical source whose visual manifestation is pixel chrominance variation.
    Stated as the core idea of PCP in the abstract.
invented entities (2)
  • PhysMambaFormer no independent evidence
    purpose: rPPG signal extractor that generates the hypothesis for intervention
    New model component introduced in the Interv-rPPG framework.
  • Controllable Physiological Signal Editor no independent evidence
    purpose: Performs precise chrominance-domain interventions on video based on the hypothesis
    New editing module that enables the causal probing.

pith-pipeline@v0.9.0 · 5630 in / 1349 out tokens · 41533 ms · 2026-05-09T20:18:17.638102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 7 canonical work pages · 3 internal anchors

  1. [1]

    Bobbia, S., Macwan, R., Benezeth, Y., Man- souri, A., and Dubois, J. (2019). Unsupervised skin tissue segmentation for remote photo- plethysmography.Pattern recognition letters, 124:82–90

  2. [2]

    L., Casado, C

    Ca˜ nellas, M. L., Casado, C. ´A., Nguyen, L., and L´ opez, M. B. (2025). A self-supervised multimodal framework for 1d physiological data fusion in remote health monitoring.Informa- tion Fusion, page 103397

  3. [3]

    Chan, M., Zhu, L., Vatanparvar, K., Gwak, M., Kuang, J., and Gao, A. (2023). Estimat- ing spo 2 with deep oxygen desaturations from facial video under various lighting conditions: A feasibility study. In2023 45th Annual Inter- national Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–5. IEEE

  4. [4]

    Chen, M., Liao, X., and Wu, M. (2022). Pulseedit: Editing physiological signals in facial videos for privacy protection.IEEE Transac- tions on Information Forensics and Security, 17:457–471

  5. [5]

    Chen, T., Lin, J., Yang, Z., Qing, C., Shi, Y., and Lin, L. (2025). Contrastive decoupled representation learning and regularization for speech-preserving facial expression manipula- tion.International Journal of Computer Vision, 133(7):3822–3838

  6. [6]

    and McDuff, D

    Chen, W. and McDuff, D. (2018). Deepphys: Video-based physiological measurement using convolutional attention networks. InProceed- ings of the european conference on computer vision (ECCV), pages 349–365

  7. [7]

    and Picard, R

    Chen, W. and Picard, R. W. (2017). Eliminat- ing physiological information from facial videos. In2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), pages 48–55. IEEE

  8. [8]

    Chen, W., Yi, Z., Lim, L. J. R., Lim, R. Q. R., Zhang, A., Qian, Z., Huang, J., He, J., and Liu, B. (2024). Deep learn- ing and remote photoplethysmography pow- ered advancements in contactless physiological measurement.Frontiers in bioengineering and biotechnology, 12:1420100

  9. [9]

    Chung, W.-H., Hsieh, C.-J., Liu, S.-H., and Hsu, C.-T. (2022). Domain generalized rppg network: Disentangled feature learning with domain permutation and domain augmenta- tion. InProceedings of the Asian Conference on Computer Vision, pages 807–823

  10. [10]

    and Jeanne, V

    De Haan, G. and Jeanne, V. (2013). Robust pulse rate from chrominance-based rppg.IEEE transactions on biomedical engi- neering, 60(10):2878–2886

  11. [11]

    Du, J., Liu, S.-Q., Zhang, B., and Yuen, P. C. (2021). Weakly supervised rppg estimation for respiratory rate estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2391–2397

  12. [12]

    and Stent, S

    Gideon, J. and Stent, S. (2021). The way to my heart is through contrastive learning: Remote photoplethysmography from unlabelled video. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 3995–4004

  13. [13]

    K., Bhattacharya, S., and Aishwarya, N

    GS, Y. K., Bhattacharya, S., and Aishwarya, N. (2024). Remote photoplethysmography (rppg) for contactless blood oxygen saturation monitoring. In2024 IEEE International Con- ference on Electronics, Computing and Commu- nication Technologies (CONECCT), pages 1–6. IEEE

  14. [14]

    and Dao, T

    Gu, A. and Dao, T. (2024). Mamba: Linear- time sequence modeling with selective state spaces. InFirst conference on language model- ing

  15. [15]

    He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., and Girshick, R. (2022). Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009

  16. [16]

    He, K., Fan, H., Wu, Y., Xie, S., and Gir- shick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738

  17. [17]

    Hsieh, C.-J., Chung, W.-H., and Hsu, C.- T. (2022). Augmentation of rppg benchmark datasets: Learning to remove and embed rppg signals via double cycle consistent learning from unpaired facial videos. InEuropean conference on computer vision, pages 372–387. Springer

  18. [18]

    Li, L., Chen, C., Pan, L., Tai, Y., Zhang, J., and Xiang, Y. (2023). Hiding your sig- nals: A security analysis of ppg-based biometric authentication. InEuropean Symposium on Research in Computer Security, pages 183–202. Article Title23 Springer

  19. [19]

    and Yin, L

    Li, Z. and Yin, L. (2023). Contactless pulse estimation leveraging pseudo labels and self- supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597

  20. [20]

    Liu, X., Fromm, J., Patel, S., and McDuff, D. (2020). Multi-task temporal shift atten- tion networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411

  21. [21]

    Liu, X., Zhang, Y., Yu, Z., Lu, H., Yue, H., and Yang, J. (2024). rppg-mae: Self- supervised pretraining with masked autoen- coders for remote physiological measurements. IEEE Transactions on Multimedia, 26:7278– 7293

  22. [22]

    Decoupled Weight Decay Regularization

    Loshchilov, I. and Hutter, F. (2017). Decou- pled weight decay regularization.arXiv preprint arXiv:1711.05101

  23. [23]

    Lu, H., Han, H., and Zhou, S. K. (2021). Dual-gan: Joint bvp and noise modeling for remote physiological measurement. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12404– 12413

  24. [24]

    MediaPipe: A Framework for Building Perception Pipelines

    Lugaresi, C., Tang, J., Nash, H., McClana- han, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J., et al. (2019). Mediapipe: A framework for build- ing perception pipelines.arXiv preprint arXiv:1906.08172

  25. [25]

    Luo, C., Xie, Y., and Yu, Z. (2024). Phys- mamba: Efficient remote physiological measure- ment with slowfast temporal difference mamba. InChinese Conference on Biometric Recogni- tion, pages 248–259. Springer

  26. [26]

    and Maaten, L

    Misra, I. and Maaten, L. v. d. (2020). Self- supervised learning of pretext-invariant repre- sentations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6707–6717

  27. [27]

    Niu, X., Han, H., Shan, S., and Chen, X. (2017). Continuous heart rate measurement from face: A robust rppg approach with distri- bution learning. In2017 IEEE international joint conference on biometrics (IJCB), pages 642–650. IEEE

  28. [28]

    Niu, X., Han, H., Shan, S., and Chen, X. (2018). Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video. InAsian conference on computer vision, pages 562–576. Springer

  29. [29]

    Niu, X., Shan, S., Han, H., and Chen, X. (2019). Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal rep- resentation.IEEE Transactions on Image Processing, 29:2409–2423

  30. [30]

    C., Elgendi, M., Missale, G., and Menon, C

    Ontiveros, R. C., Elgendi, M., Missale, G., and Menon, C. (2023). Evaluating rgb channels in remote photoplethysmography: a compara- tive study with contact-based ppg.Frontiers in Physiology, 14:1296277

  31. [31]

    Park, S., Kim, B.-K., and Dong, S.-Y. (2022). Self-supervised rgb-nir fusion video vision transformer framework for rppg estima- tion.IEEE Transactions on Instrumentation and Measurement, 71:1–10

  32. [32]

    Perez, E., Strub, F., De Vries, H., Dumoulin, V., and Courville, A. (2018). Film: Visual rea- soning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32

  33. [33]

    H., and Harris-Birtill, D

    Pirzada, P., Wilde, A., Doherty, G. H., and Harris-Birtill, D. (2023). Remote photoplethys- mography (rppg): A state-of-the-art review. MedRxiv, pages 2023–10

  34. [34]

    Pirzada, P., Wilde, A., and Harris-Birtill, D. (2024). Remote photoplethysmography for heart rate and blood oxygenation measurement: a review.IEEE Sensors Journal, 24(15):23436– 23453

  35. [35]

    J., and Picard, R

    Poh, M.-Z., McDuff, D. J., and Picard, R. W. (2010). Advancements in noncontact, mul- tiparameter physiological measurements using a webcam.IEEE transactions on biomedical engineering, 58(1):7–11

  36. [36]

    Qian, W., Su, G., Guo, D., Zhou, J., Li, X., Hu, B., Tang, S., and Wang, M. (2025). Physdiff: Physiology-based dynamicity disen- tangled diffusion model for remote physiological measurement. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 39, pages 6568–6576

  37. [37]

    Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. InInterna- tional Conference on Medical image computing and computer-assisted intervention, pages 234–

  38. [38]

    24Article Title

    Springer. 24Article Title

  39. [39]

    Ru, Y., Li, Q., Liu, Y., Wu, H., He, Z., and Sun, Z. (2026). Unveiling hidden psychologi- cal states: Synergistic fusion of amplified color and motion from video.Machine Intelligence Research, 23(2):396–408

  40. [40]

    and Zhao, G

    Savic, M. and Zhao, G. (2024). Rs-rppg: Robust self-supervised learning for rppg. In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–10. IEEE

  41. [41]

    Song, R., Chen, H., Cheng, J., Li, C., Liu, Y., and Chen, X. (2021). Pulsegan: Learn- ing to generate realistic pulse waveforms in remote photoplethysmography.IEEE Jour- nal of Biomedical and Health Informatics, 25(5):1373–1384

  42. [42]

    Speth, J., Vance, N., Flynn, P., and Cza- jka, A. (2023). Non-contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474

  43. [43]

    Speth, J., Vance, N., Flynn, P., and Czajka, A. (2024). Sinc+: Adaptive camera-based vitals with unsupervised learning of periodic signals. arXiv preprint arXiv:2404.13449

  44. [44]

    Stricker, R., M¨ uller, S., and Gross, H.-M. (2014). Non-contact video-based pulse rate measurement on a mobile service robot. InThe 23rd IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE

  45. [45]

    and Li, X

    Sun, Z. and Li, X. (2022a). Contrast-phys: Unsupervised video-based remote physiologi- cal measurement via spatiotemporal contrast. InEuropean Conference on Computer Vision, pages 492–510. Springer

  46. [46]

    and Li, X

    Sun, Z. and Li, X. (2022b). Privacy-phys: Facial video-based physiological modification for privacy protection.IEEE Signal Processing Letters, 29:1507–1511

  47. [47]

    and Li, X

    Sun, Z. and Li, X. (2024). Contrast- phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal contrast.IEEE Transac- tions on Pattern Analysis and Machine Intelli- gence, 46(8):5835–5851

  48. [48]

    Takeda, S., Akagi, Y., Okami, K., Isogai, M., and Kimata, H. (2019). Video magnification in the wild using fractional anisotropy in temporal distribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1614–1622

  49. [49]

    Tang, J., Chen, K., Wang, Y., Shi, Y., Patel, S., McDuff, D., and Liu, X. (2023). Mmpd: Multi-domain mobile video physiology dataset. In2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–5. IEEE

  50. [50]

    Tu, X., Niu, Z., Yin, J., Zhang, Y., Yang, M., Wei, L., Wang, Y., Fan, Z., and Zhao, J. (2025). Phys-edigan: A privacy-preserving method for editing physiological signals in facial videos. Pattern Recognition, page 111966

  51. [51]

    O., and Nel- son, J

    Verkruysse, W., Svaasand, L. O., and Nel- son, J. S. (2008). Remote plethysmographic imaging using ambient light.Optics express, 16(26):21434–21445

  52. [52]

    Voelker, A. R. and Eliasmith, C. (2018). Improving spiking dynamical networks: Accu- rate delays, higher-order synapses, and time cells.Neural computation, 30(3):569–609

  53. [53]

    Wang, H., Ahn, E., and Kim, J. (2022). Self- supervised representation learning framework for remote physiological measurement using spatiotemporal augmentation loss. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2431–2439

  54. [54]

    Wang, T., Lin, X., Xu, Y., Ye, Q., Guo, D., Escalera, S., Khoriba, G., and Yu, Z. (2026). Micro-gesture recognition: A comprehensive survey of datasets, methods, and challenges. Machine Intelligence Research, 23(2):308–330

  55. [55]

    C., Stuijk, S., and De Haan, G

    Wang, W., Den Brinker, A. C., Stuijk, S., and De Haan, G. (2016). Algorithmic principles of remote ppg.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491

  56. [56]

    Wang, W., Stuijk, S., and De Haan, G. (2015). A novel algorithm for remote pho- toplethysmography: Spatial subspace rotation. IEEE transactions on biomedical engineering, 63(9):1974–1984

  57. [57]

    Wu, B., Yu, Z., Xie, Y., Liu, W., Luo, C., Liu, Y., and Goh, R. S. M. (2025). Semi-rppg: Semi-supervised remote physiological measure- ment with curriculum pseudo-labeling.IEEE Transactions on Instrumentation and Measure- ment

  58. [58]

    Wu, H.-Y., Rubinstein, M., Shih, E., Gut- tag, J., Durand, F., and Freeman, W. (2012). Article Title25 Eulerian video magnification for revealing sub- tle changes in the world.ACM transactions on graphics (TOG), 31(4):1–8

  59. [59]

    Xie, Y., Yu, Z., Wu, B., Xie, W., and Shen, L. (2024). Sfda-rppg: Source-free domain adap- tive remote physiological measurement with spatio-temporal consistency.arXiv preprint arXiv:2409.12040

  60. [60]

    Yu, Z., Li, X., Niu, X., Shi, J., and Zhao, G. (2020). Autohr: A strong end-to-end baseline for remote heart rate measurement with neu- ral searching.IEEE Signal Processing Letters, 27:1245–1249

  61. [61]

    Yu, Z., Li, X., and Zhao, G. (2019a). Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. arXiv preprint arXiv:1905.02419

  62. [62]

    Yu, Z., Peng, W., Li, X., Hong, X., and Zhao, G. (2019b). Remote heart rate measure- ment from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. InProceedings of the IEEE/CVF international conference on computer vision, pages 151–160

  63. [63]

    Yu, Z., Shen, Y., Shi, J., Zhao, H., Cui, Y., Zhang, J., Torr, P., and Zhao, G. (2023). Physformer++: Facial video-based physiologi- cal measurement with slowfast temporal dif- ference transformer.International Journal of Computer Vision, 131(6):1307–1330

  64. [64]

    H., and Zhao, G

    Yu, Z., Shen, Y., Shi, J., Zhao, H., Torr, P. H., and Zhao, G. (2022). Physformer: Facial video- based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4186–4196

  65. [65]

    Zhang, J., Lin, X., Huang, J., Ye, S., Guo, X., Zhu, D., Hu, R., Guo, D., Liang, Y., Yu, Z., et al. (2026). Multimodal deception detec- tion: A survey.Machine Intelligence Research, 23(2):284–307

  66. [66]

    A., Shechtman, E., and Wang, O

    Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595

  67. [67]

    Zhao, B., Guo, D., Cao, J., Xu, Y., Zou, B., Tan, T., Sun, Y., and Yu, Z. (2025). Phase-net: Physics-grounded harmonic attention system for efficient remote photoplethysmography mea- surement.arXiv preprint arXiv:2509.24850

  68. [68]

    Zhou, T., Paruchuri, A., Spjut, J., and Ak¸ sit, K. (2025). Editing physiological signals in videos using latent representations.arXiv preprint arXiv:2509.25348

  69. [69]

    Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image transla- tion using cycle-consistent adversarial networks. InProceedings of the IEEE international con- ference on computer vision, pages 2223–2232