pith. sign in

arxiv: 2606.23226 · v1 · pith:PBYCMAIZnew · submitted 2026-06-22 · 💻 cs.CV

PhysFlow: Frequency Decoupled with Dual-Field Rectified Flow for Remote Photoplethysmography

Pith reviewed 2026-06-26 08:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords remote photoplethysmographyrPPGrectified flowfrequency decouplingdual velocity fieldsheart rate estimationwaveform reconstruction
0
0 comments X

The pith

PhysFlow uses dual velocity fields to separately model trend and amplitude in rPPG signals for better robustness against disturbances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of unstable rPPG signals in videos affected by lighting changes, expressions, and movements. It does this by decomposing the target signal into trend and amplitude components and training two separate conditional velocity fields to reconstruct each. This avoids the interference that occurs when everything is modeled together in one field. The rectified flow method also allows fast generation using few integration steps. Tests on standard datasets show gains in both heart rate accuracy and full waveform quality.

Core claim

By decomposing the ground-truth rPPG signal into trend and amplitude components and learning two component-specific conditional velocity fields to model them separately, the framework reduces mutual interference between components and improves reconstruction robustness under complex disturbances, with the rectified flow enabling efficient waveform recovery in few ODE steps.

What carries the argument

Two component-specific conditional velocity fields trained on decomposed trend and amplitude targets within a rectified flow framework.

If this is right

  • rPPG estimation becomes more stable when disturbances like illumination variations dominate.
  • Waveform reconstruction preserves weak pulse signals better than unified modeling approaches.
  • Efficient inference is possible with only a few ODE integration steps.
  • Performance improves on benchmark datasets for both heart rate and waveform metrics in challenging scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar dual-field decomposition could help in other video-based vital sign estimations like respiration rate.
  • The approach might extend to handling additional signal components if more are identified.
  • Real-time deployment could benefit from the reduced number of integration steps.

Load-bearing premise

That splitting the rPPG signal into trend and amplitude parts and supervising separate velocity fields will separate useful physiological content from disturbance effects without creating new problems or missing key details.

What would settle it

Observing no performance gain or even worse results from the dual-field model compared to a single unified field when tested on videos with strong varying illumination and head movements.

Figures

Figures reproduced from arXiv: 2606.23226 by Hang Shao, Jianjun Qian, Jian Yang, Lei Luo, Zixu Li.

Figure 1
Figure 1. Figure 1: Comparison between conventional unified modeling [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed PhysFlow. The ground-truth rPPG signal is decomposed into trend and amplitude components [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the proposed velocity heads in PhysFlow. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of HR estimation results on [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of reconstructed rPPG wave [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of the generated amplitude component, trend component and reconstructed rPPG signal under different [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Remote Photoplethysmography (rPPG) enables contactless pulse estimation from facial videos, serving as a vital tool for health monitoring. However, current deep learning methods often struggle under complex disturbances, particularly varying illumination, facial expressions, and unconstrained head movements. In such scenarios, subtle physiological signals are easily dominated by external interference, making the recovered rPPG waveform unstable and unreliable. One important reason is that most existing methods directly model the rPPG signal in a unified manner, where different signal components are coupled during reconstruction. This makes it difficult to preserve weak pulse-related variations when strong disturbance-induced changes are present. To address this challenge, we propose PhysFlow, a frequency-decoupled dual-field rectified flow framework tailored for robust rPPG estimation. Specifically, the ground-truth rPPG signal is decomposed into trend and amplitude components, which are used as separate supervisory targets. Based on the extracted facial features, PhysFlow learns two component-specific conditional velocity fields to model the two components separately. This design reduces mutual interference between different components and improves the robustness of rPPG reconstruction under complex disturbances. Moreover, the rectified flow formulation enables efficient waveform reconstruction with only a few ordinary differential equation (ODE) integration steps. Extensive experiments on multiple benchmark datasets demonstrate that PhysFlow outperforms state-of-the-art methods in both heart-rate estimation and rPPG waveform reconstruction across diverse challenging scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PhysFlow, a frequency-decoupled dual-field rectified flow framework for remote photoplethysmography (rPPG) estimation from facial videos. It decomposes the ground-truth rPPG signal into trend and amplitude components as separate supervisory targets, learns two component-specific conditional velocity fields from shared extracted facial features, and employs rectified flow to enable efficient few-step ODE integration for waveform reconstruction. The central claim is that this design reduces mutual interference between components, yielding more robust rPPG recovery under disturbances such as varying illumination, expressions, and head motion, with reported outperformance over SOTA methods on multiple benchmarks for both heart-rate estimation and waveform quality.

Significance. If the dual-field design demonstrably isolates physiological content without introducing artifacts from shared conditioning, the approach could meaningfully advance robust contactless vital-sign monitoring in unconstrained settings. The rectified-flow formulation for efficient reconstruction is a clear methodological strength that could transfer to other signal-recovery tasks.

major comments (2)
  1. [Abstract; §3 (architecture)] Abstract and method description: the claim that separate conditional velocity fields 'reduce mutual interference' rests on component-specific supervision of the targets, yet both fields are conditioned on the identical set of extracted facial features. Because the decomposition occurs only on the ground-truth side, entangled features can still produce coupled mappings at inference; this is the load-bearing assumption for the robustness claim and requires either an explicit decoupling mechanism on the conditioning side or an ablation comparing shared vs. component-specific feature extractors.
  2. [§4 (experiments)] Experiments section: the abstract asserts outperformance on 'multiple benchmark datasets' for both HR estimation and waveform reconstruction, but the provided summary contains no quantitative tables, error bars, or ablation results on the dual-field design. Without these, it is impossible to verify whether the reported gains are attributable to the frequency decoupling or to other factors such as the rectified-flow backbone.
minor comments (1)
  1. [Abstract] The abstract would benefit from a single sentence stating the key quantitative improvements (e.g., MAE or Pearson correlation deltas) to allow readers to gauge the magnitude of the claimed gains without reading the full results section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below, providing clarifications and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract; §3 (architecture)] Abstract and method description: the claim that separate conditional velocity fields 'reduce mutual interference' rests on component-specific supervision of the targets, yet both fields are conditioned on the identical set of extracted facial features. Because the decomposition occurs only on the ground-truth side, entangled features can still produce coupled mappings at inference; this is the load-bearing assumption for the robustness claim and requires either an explicit decoupling mechanism on the conditioning side or an ablation comparing shared vs. component-specific feature extractors.

    Authors: We agree that the shared feature extractor represents a point where further validation would strengthen the decoupling claim. The component-specific velocity fields and supervision targets encourage specialization in the learned dynamics even under shared conditioning, as each field optimizes independently for its waveform component during training. At inference, the two fields are integrated separately before recombination. To directly address the concern, we will add an ablation comparing the shared extractor against component-specific extractors in the revised manuscript. revision: partial

  2. Referee: [§4 (experiments)] Experiments section: the abstract asserts outperformance on 'multiple benchmark datasets' for both HR estimation and waveform reconstruction, but the provided summary contains no quantitative tables, error bars, or ablation results on the dual-field design. Without these, it is impossible to verify whether the reported gains are attributable to the frequency decoupling or to other factors such as the rectified-flow backbone.

    Authors: The full manuscript contains quantitative results in Section 4, including tables on multiple datasets (UBFC-rPPG, PURE, COHFACE, and others) reporting HR estimation and waveform metrics with standard deviations, plus ablations in Section 4.3 isolating the dual-field contribution versus single-field and non-rectified baselines. We will revise the abstract and early method sections to explicitly reference these tables and ablations for clarity. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents PhysFlow as an explicit architectural choice: decompose the ground-truth rPPG signal into trend and amplitude components, then train two separate conditional velocity fields on the same facial features. This decomposition and dual supervision is introduced as a modeling decision rather than derived from first principles or reduced to a fitted parameter. No equations, self-citations, or uniqueness theorems are invoked that would make the claimed robustness equivalent to the inputs by construction. The central claim of reduced mutual interference is positioned as an empirical outcome to be validated on benchmark datasets, leaving the derivation self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, invented entities, or non-standard axioms are stated. The framework implicitly assumes standard properties of rectified flow models and the validity of component decomposition for rPPG signals.

axioms (1)
  • domain assumption Rectified flow can be conditioned on facial features to generate separate trend and amplitude components of an rPPG signal.
    The method relies on the ability of conditional rectified flow to model the two components independently without cross-talk.

pith-pipeline@v0.9.1-grok · 5793 in / 1279 out tokens · 23170 ms · 2026-06-26T08:55:24.660398+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 2 linked inside Pith

  1. [1]

    Remote photoplethysmography in real-world and extreme lighting scenarios,

    H. Shao, L. Luo, J. Qian, M. Yan, S. Chen, and J. Yang, “Remote photoplethysmography in real-world and extreme lighting scenarios,” in Proceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 10 858–10 867

  2. [2]

    Continual learning for remote physi- ological measurement: Minimize forgetting and simplify inference,

    Q. Liang, Y . Chen, and Y . Hu, “Continual learning for remote physi- ological measurement: Minimize forgetting and simplify inference,” in European conference on computer vision. Springer, 2024, pp. 126–144

  3. [3]

    Amplitude– time dual-view fused eeg temporal feature learning for automatic sleep staging,

    P. An, J. Zhao, B. Du, W. Zhao, T. Zhang, and Z. Yuan, “Amplitude– time dual-view fused eeg temporal feature learning for automatic sleep staging,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 5, pp. 6492–6506, 2022

  4. [4]

    A novel algorithm for remote photoplethysmography: Spatial subspace rotation,

    W. Wang, S. Stuijk, and G. De Haan, “A novel algorithm for remote photoplethysmography: Spatial subspace rotation,”IEEE transactions on biomedical engineering, vol. 63, no. 9, pp. 1974–1984, 2015

  5. [5]

    Pulsegan: Learning to generate realistic pulse waveforms in remote photoplethys- mography,

    R. Song, H. Chen, J. Cheng, C. Li, Y . Liu, and X. Chen, “Pulsegan: Learning to generate realistic pulse waveforms in remote photoplethys- mography,”IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 5, pp. 1373–1384, 2021

  6. [6]

    Neuron perception inspired eeg emotion recognition with parallel contrastive learning,

    D. Li, S. Huang, L. Xie, Z. Wang, and J. Xu, “Neuron perception inspired eeg emotion recognition with parallel contrastive learning,” IEEE transactions on neural networks and learning systems, 2025

  7. [7]

    Un- supervised skin tissue segmentation for remote photoplethysmography,

    S. Bobbia, R. Macwan, Y . Benezeth, A. Mansouri, and J. Dubois, “Un- supervised skin tissue segmentation for remote photoplethysmography,” Pattern recognition letters, vol. 124, pp. 82–90, 2019

  8. [8]

    Noninvasive blood glucose monitoring using spatiotemporal ecg and ppg feature fusion and weight-based choquet integral multimodel approach,

    J. Li, J. Ma, O. M. Omisore, Y . Liu, H. Tang, P. Ao, Y . Yan, L. Wang, and Z. Nie, “Noninvasive blood glucose monitoring using spatiotemporal ecg and ppg feature fusion and weight-based choquet integral multimodel approach,”IEEE transactions on neural networks and learning systems, vol. 35, no. 10, pp. 14 491–14 505, 2023

  9. [9]

    Transformer meets gated residual networks to enhance picu’s ppg artifact detection informed by mutual information neural estimation,

    T.-D. Le, C. Macabiau, K. Albert, S. Chatzinotas, P. Jouvet, and R. Noumeir, “Transformer meets gated residual networks to enhance picu’s ppg artifact detection informed by mutual information neural estimation,”IEEE Transactions on Neural Networks and Learning Systems, 2026

  10. [10]

    3d mask face anti- spoofing with remote photoplethysmography,

    S. Liu, P. C. Yuen, S. Zhang, and G. Zhao, “3d mask face anti- spoofing with remote photoplethysmography,” inEuropean Conference on Computer Vision. Springer, 2016, pp. 85–100

  11. [11]

    Adversarial spatiotemporal contrastive learning for electrocardiogram signals,

    N. Wang, P. Feng, Z. Ge, Y . Zhou, B. Zhou, and Z. Wang, “Adversarial spatiotemporal contrastive learning for electrocardiogram signals,”IEEE transactions on neural networks and learning systems, vol. 35, no. 10, pp. 13 845–13 859, 2023

  12. [12]

    A heart rate monitoring frame- work for real-world drivers using remote photoplethysmography,

    P.-W. Huang, B.-J. Wu, and B.-F. Wu, “A heart rate monitoring frame- work for real-world drivers using remote photoplethysmography,”IEEE journal of biomedical and health informatics, vol. 25, no. 5, pp. 1397– 1408, 2020

  13. [13]

    Robust remote photoplethysmography esti- mation with environmental noise disentanglement,

    S.-Q. Liu and P. C. Yuen, “Robust remote photoplethysmography esti- mation with environmental noise disentanglement,”IEEE Transactions on Image Processing, vol. 33, pp. 27–41, 2023

  14. [14]

    Lstc-rppg: Long short-term convolutional network for remote photoplethysmography,

    J. S. Lee, G. Hwang, M. Ryu, and S. J. Lee, “Lstc-rppg: Long short-term convolutional network for remote photoplethysmography,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6015–6023

  15. [15]

    Automatic region- based heart rate measurement using remote photoplethysmography,

    B. Kossack, E. Wisotzky, A. Hilsmann, and P. Eisert, “Automatic region- based heart rate measurement using remote photoplethysmography,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2755–2759

  16. [16]

    A novel framework for remote photoplethysmography pulse extraction on compressed videos,

    C. Zhao, C.-L. Lin, W. Chen, and Z. Li, “A novel framework for remote photoplethysmography pulse extraction on compressed videos,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1299–1308

  17. [17]

    Dual-path tokenlearner for remote photoplethysmography-based phys- iological measurement with facial videos,

    W. Qian, D. Guo, K. Li, X. Zhang, X. Tian, X. Yang, and M. Wang, “Dual-path tokenlearner for remote photoplethysmography-based phys- iological measurement with facial videos,”IEEE Transactions on Com- putational Social Systems, vol. 11, no. 3, pp. 4465–4477, 2024

  18. [18]

    Physllm: Harnessing large language models for cross-modal remote physiological sensing,

    Y . Xie, B. Zhao, M. Dai, J.-P. Zhou, Y . Sun, T. Tan, W. Xie, L. Shen, and Z. Yu, “Physllm: Harnessing large language models for cross-modal remote physiological sensing,”arXiv preprint arXiv:2505.03621, 2025

  19. [19]

    Phase-net: Physics-grounded harmonic attention system for efficient remote photoplethysmography measurement,

    B. Zhao, D. Guo, J. Cao, Y . Xu, B. Zou, T. Tan, Y . Sun, and Z. Yu, “Phase-net: Physics-grounded harmonic attention system for efficient remote photoplethysmography measurement,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026, pp. 21 198–21 207

  20. [20]

    Efficient remote photoplethysmog- raphy with temporal derivative modules and time-shift invariant loss,

    J. Comas, A. Ruiz, and F. Sukno, “Efficient remote photoplethysmog- raphy with temporal derivative modules and time-shift invariant loss,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 2182–2191

  21. [21]

    Physdiff: physiology-based dynamicity disentangled diffusion model for remote physiological measurement,

    W. Qian, G. Su, D. Guo, J. Zhou, X. Li, B. Hu, S. Tang, and M. Wang, “Physdiff: physiology-based dynamicity disentangled diffusion model for remote physiological measurement,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 6, 2025, pp. 6568– 6576

  22. [22]

    Spiking-physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer,

    M. Liu, J. Tang, Y . Chen, H. Li, J. Qi, S. Li, K. Wang, J. Gan, Y . Wang, and H. Chen, “Spiking-physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer,”Neural Networks, vol. 185, p. 107128, 2025

  23. [23]

    Tranphys: Spatiotemporal masked transformer steered remote photoplethysmogra- phy estimation,

    H. Shao, L. Luo, J. Qian, S. Chen, C. Hu, and J. Yang, “Tranphys: Spatiotemporal masked transformer steered remote photoplethysmogra- phy estimation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 3030–3042, 2023

  24. [24]

    Deep learning-based image enhancement for robust remote photoplethysmography in various illumination scenarios,

    S. Chen, S. K. Ho, J. W. Chin, K. H. Luo, T. T. Chan, R. H. So, and K. L. Wong, “Deep learning-based image enhancement for robust remote photoplethysmography in various illumination scenarios,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2023, pp. 6077–6085

  25. [25]

    Video respiratory rate measurement in walking scenarios using multi-strategy adaptive denoising,

    G. Pei, J. Ning, C. Niu, S. Yao, M. Hu, and G. Zhai, “Video respiratory rate measurement in walking scenarios using multi-strategy adaptive denoising,”IEEE Transactions on Circuits and Systems for Video Technology, 2026

  26. [26]

    Video- based multiphysiological disentanglement and remote robust estimation for respiration,

    H. Shao, L. Luo, J. Qian, M. Yan, S. Gao, and J. Yang, “Video- based multiphysiological disentanglement and remote robust estimation for respiration,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 5, pp. 8360–8371, 2024

  27. [27]

    Satphys: Sandglass transformer for efficient video-based remote physiological measurement,

    S. Chu, J. Shi, M. Yuan, X. Li, Z. Jiang, and G. Zhao, “Satphys: Sandglass transformer for efficient video-based remote physiological measurement,”IEEE Transactions on Circuits and Systems for Video Technology, 2026. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

  28. [28]

    Video-based instanta- neous heart rate measurement with enhanced time-frequency represen- tations,

    J. Cheng, X. Luo, X. Wu, R. Song, and Y . Liu, “Video-based instanta- neous heart rate measurement with enhanced time-frequency represen- tations,”IEEE Transactions on Multimedia, 2025

  29. [29]

    Analyzing participants’ engagement during online meetings using unsupervised remote photoplethysmography with behavioral features,

    A. Vedernikov, Z. Sun, V .-L. Kykyri, M. Pohjola, M. Nokia, and X. Li, “Analyzing participants’ engagement during online meetings using unsupervised remote photoplethysmography with behavioral features,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 389–399

  30. [30]

    Tranpulse: Remote photoplethysmography estimation with time-varying supervision to disentangle multiphysiologically interference,

    H. Shao, L. Luo, J. Qian, S. Chen, C. Hu, and J. Yang, “Tranpulse: Remote photoplethysmography estimation with time-varying supervision to disentangle multiphysiologically interference,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–11, 2024

  31. [31]

    A compensation network with error mapping for robust remote photoplethysmography in noise-heavy conditions,

    B.-F. Wu, Y .-C. Wu, and Y .-W. Chou, “A compensation network with error mapping for robust remote photoplethysmography in noise-heavy conditions,”IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–11, 2022

  32. [32]

    Robust and remote photoplethysmography based on smartphone imaging of the human palm,

    C. Lian, Y . Yang, X. Yu, H. Sun, Y . Zhao, G. Zhang, and W. J. Li, “Robust and remote photoplethysmography based on smartphone imaging of the human palm,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–11, 2023

  33. [33]

    To- ward motion robustness: a masked attention regularization framework in remote photoplethysmography,

    P. Zhao, Q. Sun, X. Tian, Y . Yang, S. Tao, J. Cheng, and J. Chen, “To- ward motion robustness: a masked attention regularization framework in remote photoplethysmography,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 7829–7838

  34. [34]

    Realistic pulse waveforms estimation via contrastive learning in remote photoplethysmography,

    B. Dong, Y . Liu, K. Yang, and J. Cao, “Realistic pulse waveforms estimation via contrastive learning in remote photoplethysmography,” IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–15, 2024

  35. [35]

    A lstm-based re- altime signal quality assessment for photoplethysmogram and remote photoplethysmogram,

    H. Gao, X. Wu, C. Shi, Q. Gao, and J. Geng, “A lstm-based re- altime signal quality assessment for photoplethysmogram and remote photoplethysmogram,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3831–3840

  36. [36]

    Motion-resistant remote imaging photoplethysmography based on the optical properties of skin,

    L. Feng, L.-M. Po, X. Xu, Y . Li, and R. Ma, “Motion-resistant remote imaging photoplethysmography based on the optical properties of skin,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 5, pp. 879–891, 2014

  37. [37]

    Deep-learning-based remote photoplethysmography measurement in driving scenarios with color and near-infrared images,

    L.-W. Chiu, Y .-R. Chou, Y .-C. Wu, and B.-F. Wu, “Deep-learning-based remote photoplethysmography measurement in driving scenarios with color and near-infrared images,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2023

  38. [38]

    Physformer: Facial video-based physiological measurement with temporal difference transformer,

    Z. Yu, Y . Shen, J. Shi, H. Zhao, P. H. Torr, and G. Zhao, “Physformer: Facial video-based physiological measurement with temporal difference transformer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4186–4196

  39. [39]

    Physfftformer: A frequency domain-based vision transformer for ef- ficient remote physiological measurement,

    F. Liu, S. Zhao, T. Xu, Y . Sun, H. Wang, S. Zhang, and E. Chen, “Physfftformer: A frequency domain-based vision transformer for ef- ficient remote physiological measurement,” in2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025, pp. 1–6

  40. [40]

    Flow straight and fast: Learning to generate and transfer data with rectified flow,

    X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,”arXiv preprint arXiv:2209.03003, 2022

  41. [41]

    Flowgrad: Controlling the output of generative odes with gradients,

    X. Liu, L. Wu, S. Zhang, C. Gong, W. Ping, and Q. Liu, “Flowgrad: Controlling the output of generative odes with gradients,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2023, pp. 24 335–24 344

  42. [42]

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation,

    X. Liu, X. Zhang, J. Ma, J. Penget al., “Instaflow: One step is enough for high-quality diffusion-based text-to-image generation,” inThe Twelfth International Conference on Learning Representations, 2023

  43. [43]

    One diffusion step to real- world super-resolution via flow trajectory distillation,

    J. Li, J. Cao, Y . Guo, W. Li, and Y . Zhang, “One diffusion step to real- world super-resolution via flow trajectory distillation,”arXiv preprint arXiv:2502.01993, 2025

  44. [44]

    Flowie: Efficient image enhancement via rectified flow,

    Y . Zhu, W. Zhao, A. Li, Y . Tang, J. Zhou, and J. Lu, “Flowie: Efficient image enhancement via rectified flow,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13– 22

  45. [45]

    Rhythm- former: Extracting patterned rppg signals based on periodic sparse attention,

    B. Zou, Z. Guo, J. Chen, J. Zhuo, W. Huang, and H. Ma, “Rhythm- former: Extracting patterned rppg signals based on periodic sparse attention,”Pattern Recognition, vol. 164, p. 111511, 2025

  46. [46]

    Image enhancement for remote photoplethysmography in a low-light environment,

    L. Xi, W. Chen, C. Zhao, X. Wu, and J. Wang, “Image enhancement for remote photoplethysmography in a low-light environment,” in2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 2020, pp. 1–7

  47. [47]

    Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video,

    X. Niu, H. Han, S. Shan, and X. Chen, “Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video,” inAsian conference on computer vision. Springer, 2018, pp. 562–576

  48. [48]

    Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation,

    X. Niu, S. Shan, H. Han, and X. Chen, “Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation,”IEEE Transactions on Image Processing, vol. 29, pp. 2409–2423, 2019

  49. [49]

    Near-infrared imaging photoplethysmography during driving,

    E. M. Nowara, T. K. Marks, H. Mansour, and A. Veeraraghavan, “Near-infrared imaging photoplethysmography during driving,”IEEE transactions on intelligent transportation systems, vol. 23, no. 4, pp. 3589–3600, 2020

  50. [50]

    Sparseppg: Towards driver monitoring using camera-based vital signs estimation in near-infrared,

    E. Magdalena Nowara, T. K. Marks, H. Mansour, and A. Veeraraghavan, “Sparseppg: Towards driver monitoring using camera-based vital signs estimation in near-infrared,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1272– 1281

  51. [51]

    Dual-gan: Joint bvp and noise modeling for remote physiological measurement,

    H. Lu, H. Han, and S. K. Zhou, “Dual-gan: Joint bvp and noise modeling for remote physiological measurement,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 404–12 413

  52. [52]

    Single-element remote- ppg,

    W. Wang, A. C. Den Brinker, and G. De Haan, “Single-element remote- ppg,”IEEE Transactions on Biomedical Engineering, vol. 66, no. 7, pp. 2032–2043, 2018

  53. [53]

    Robust pulse rate from chrominance-based rppg,

    G. De Haan and V . Jeanne, “Robust pulse rate from chrominance-based rppg,”IEEE transactions on biomedical engineering, vol. 60, no. 10, pp. 2878–2886, 2013

  54. [54]

    Local group invariance for heart rate estimation from face videos in the wild,

    C. S. Pilz, S. Zaunseder, J. Krajewski, and V . Blazek, “Local group invariance for heart rate estimation from face videos in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1254–1262

  55. [55]

    Deepphys: Video-based physiological mea- surement using convolutional attention networks,

    W. Chen and D. McDuff, “Deepphys: Video-based physiological mea- surement using convolutional attention networks,” inProceedings of the european conference on computer vision (ECCV), 2018, pp. 349–365

  56. [56]

    Multi-task temporal shift attention networks for on-device contactless vitals measurement,

    X. Liu, J. Fromm, S. Patel, and D. McDuff, “Multi-task temporal shift attention networks for on-device contactless vitals measurement,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 19 400– 19 411, 2020

  57. [57]

    Learning motion-robust remote photoplethys- mography through arbitrary resolution videos,

    J. Li, Z. Yu, and J. Shi, “Learning motion-robust remote photoplethys- mography through arbitrary resolution videos,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 1334–1342

  58. [58]

    Neuron structure modeling for generalizable remote physiological measurement,

    H. Lu, Z. Yu, X. Niu, and Y .-C. Chen, “Neuron structure modeling for generalizable remote physiological measurement,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18 589–18 599

  59. [59]

    Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement,

    X. Liu, B. Hill, Z. Jiang, S. Patel, and D. McDuff, “Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 5008–5017

  60. [60]

    Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer,

    Z. Yu, Y . Shen, J. Shi, H. Zhao, Y . Cui, J. Zhang, P. Torr, and G. Zhao, “Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer,”International Journal of Computer Vision, vol. 131, no. 6, pp. 1307–1330, 2023

  61. [61]

    Rhythmmamba: Fast, lightweight, and accurate remote physiological measurement,

    B. Zou, Z. Guo, X. Hu, and H. Ma, “Rhythmmamba: Fast, lightweight, and accurate remote physiological measurement,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 10, 2025, pp. 11 077–11 085

  62. [62]

    rppg-toolbox: Deep remote ppg toolbox,

    X. Liu, G. Narayanswamy, A. Paruchuri, X. Zhang, J. Tang, Y . Zhang, R. Sengupta, S. Patel, Y . Wang, and D. McDuff, “rppg-toolbox: Deep remote ppg toolbox,”Advances in Neural Information Processing Sys- tems, vol. 36, pp. 68 485–68 510, 2023

  63. [63]

    Robust heart rate from fitness videos,

    W. Wang, A. C. den Brinker, S. Stuijk, and G. de Haan, “Robust heart rate from fitness videos,”Physiological measurement, vol. 38, no. 6, pp. 1023–1044, 2017

  64. [64]

    Remote measurement of heart rate from facial video in different scenarios,

    X. Zheng, C. Zhang, H. Chen, Y . Zhang, and X. Yang, “Remote measurement of heart rate from facial video in different scenarios,” Measurement, vol. 188, p. 110243, 2022

  65. [65]

    Detail-preserving arterial pulse wave measurement based biorthogonal wavelet decomposition from remote rgb observations,

    Y . Tong, Z. Huang, Z. Zhang, M. Yin, G. Shan, J. Wu, and F. Qin, “Detail-preserving arterial pulse wave measurement based biorthogonal wavelet decomposition from remote rgb observations,”Measurement, vol. 222, p. 113605, 2023

  66. [66]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13 Zixu Lireceived the B.S. degree in software ...