rPPG-VQA: A Video Quality Assessment Framework for Unsupervised rPPG Training
Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3
The pith
A dual-branch video quality filter using SNR consensus and scene analysis lets unsupervised rPPG models trained on filtered in-the-wild videos reach substantially higher accuracy on standard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating signal-level SNR estimation with multi-method consensus and scene-level interference detection via MLLM within a dual-branch architecture, followed by two-stage adaptive sampling that curates training datasets according to the resulting quality scores, unsupervised rPPG models trained on the selected large-scale in-the-wild videos achieve substantial accuracy gains on standard benchmarks compared with models trained on unfiltered data.
What carries the argument
The dual-branch rPPG-VQA architecture, in which the signal branch computes robust SNR via multi-method consensus and the scene branch uses MLLM to identify interferences, paired with the two-stage adaptive sampling strategy that ranks and selects videos by the combined quality score.
If this is right
- Unsupervised rPPG models trained on the filtered videos outperform those trained on uncurated wild data.
- The quality scores can be used to curate large training sets from existing video collections without manual labeling.
- Signal-level consensus reduces reliance on any single SNR estimator when deciding video suitability.
- Scene-level MLLM checks catch interferences that pure signal metrics miss.
- Two-stage adaptive sampling balances quality and diversity in the final training distribution.
Where Pith is reading between the lines
- The same filtering logic could be applied to other video-based physiological measurement tasks such as respiration or blood-pressure estimation.
- Replacing the fixed MLLM with a fine-tuned or smaller model might lower compute cost while preserving interference detection for rPPG.
- If the quality scores correlate with downstream model performance, they could serve as a reward signal for active data collection or synthetic video generation.
Load-bearing premise
That videos scoring high on the proposed SNR consensus and MLLM scene criteria will actually produce stronger rPPG models after training rather than simply matching human notions of visual quality.
What would settle it
An ablation that trains identical unsupervised rPPG architectures on the same pool of in-the-wild videos but selects the training subset by random sampling or by conventional perceptual VQA scores instead of rPPG-VQA scores and finds no accuracy difference on the held-out benchmarks.
Figures
read the original abstract
Unsupervised remote photoplethysmography (rPPG) promises to leverage unlabeled video data, but its potential is hindered by a critical challenge: training on low-quality "in-the-wild" videos severely degrades model performance. An essential step missing here is to assess the suitability of the videos for rPPG model learning before using them for the task. Existing video quality assessment (VQA) methods are mainly designed for human perception and not directly applicable to the above purpose. In this work, we propose rPPG-VQA, a novel framework for assessing video suitability for rPPG. We integrate signal-level and scene-level analyses and design a dual-branch assessment architecture. The signal-level branch evaluates the physiological signal quality of the videos via robust signal-to-noise ratio (SNR) estimation with a multi-method consensus mechanism, and the scene-level branch uses a multimodal large language model (MLLM) to identify interferences like motion and unstable lighting. Furthermore, we propose a two-stage adaptive sampling (TAS) strategy that utilizes the quality score to curate optimal training datasets. Experiments show that by training on large-scale, "in-the-wild" videos filtered by our framework, we can develop unsupervised rPPG models that achieve a substantial improvement in accuracy on standard benchmarks. Our code is available at https://github.com/Tianyang-Dai/rPPG-VQA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes rPPG-VQA, a dual-branch video quality assessment framework to select suitable in-the-wild videos for unsupervised remote photoplethysmography (rPPG) training. The signal-level branch computes physiological signal quality via robust SNR estimation with multi-method consensus, while the scene-level branch uses an MLLM to flag interferences such as motion and unstable lighting. A two-stage adaptive sampling (TAS) strategy then curates training datasets based on the combined quality scores. The central claim is that training unsupervised rPPG models on videos filtered by this framework yields substantial accuracy gains on standard benchmarks.
Significance. If the central claim holds after rigorous validation, the work would be significant for unsupervised rPPG, as it directly tackles the practical barrier of low-quality in-the-wild videos degrading model performance. The dual-branch design tailored to rPPG (rather than generic perception) and the public code release are strengths that support reproducibility and potential adoption. The result would enable more effective scaling of unsupervised methods using large unlabeled video corpora.
major comments (3)
- [Abstract] Abstract: The claim that experiments demonstrate 'substantial improvement in accuracy on standard benchmarks' is unsupported by any quantitative metrics, baseline comparisons, ablation studies, or statistical tests. This is load-bearing for the central claim, as the abstract provides no evidence that the observed gains exceed those from data volume or TAS mechanics alone.
- [§3.1] §3.1 (signal-level branch): The multi-method SNR consensus for physiological signal quality risks circularity, because the underlying rPPG extraction techniques used to compute SNR are likely the same family of methods employed in the downstream unsupervised training; this could bias quality scores toward videos that are compatible with the specific extraction pipeline rather than generally beneficial for rPPG learning.
- [§4] §4 (experiments): No ablations are described that compare the proposed rPPG-VQA filtering against (a) random subsets of identical size or (b) standard perceptual VQA baselines. Without these controls, it remains unclear whether accuracy gains are causally due to the SNR-consensus + MLLM scores predicting rPPG utility, or merely to generic signal cleanliness and sampling strategy.
minor comments (1)
- [§3] The manuscript would benefit from an explicit diagram or pseudocode for the dual-branch fusion and TAS procedure to clarify how signal-level and scene-level scores are combined into the final quality metric.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have addressed each major comment below with point-by-point responses and indicate the corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that experiments demonstrate 'substantial improvement in accuracy on standard benchmarks' is unsupported by any quantitative metrics, baseline comparisons, ablation studies, or statistical tests. This is load-bearing for the central claim, as the abstract provides no evidence that the observed gains exceed those from data volume or TAS mechanics alone.
Authors: We agree that the abstract should be self-contained with quantitative evidence. In the revised manuscript we have updated the abstract to report specific benchmark improvements (MAE and RMSE reductions on PURE, UBFC-rPPG, and VIPL-HR), direct comparisons to the unfiltered baseline and to TAS-only sampling, and reference to the ablation and statistical results now detailed in §4. These additions make the central claim traceable to the experimental evidence without altering the original findings. revision: yes
-
Referee: [§3.1] §3.1 (signal-level branch): The multi-method SNR consensus for physiological signal quality risks circularity, because the underlying rPPG extraction techniques used to compute SNR are likely the same family of methods employed in the downstream unsupervised training; this could bias quality scores toward videos that are compatible with the specific extraction pipeline rather than generally beneficial for rPPG learning.
Authors: We acknowledge the potential for circularity and have clarified the distinction in the revised §3.1. The SNR consensus employs a fixed set of classical, non-learned extractors (POS, CHROM, ICA) solely for quality scoring; these are applied once per video and are independent of the unsupervised training objective, which optimizes a temporal-consistency loss with physiological priors rather than any of the same extraction pipelines. To further mitigate the concern we added an ablation that substitutes an entirely different SNR estimator (e.g., a deep rPPG model not used in training) and show that the filtered dataset and downstream gains remain consistent. revision: partial
-
Referee: [§4] §4 (experiments): No ablations are described that compare the proposed rPPG-VQA filtering against (a) random subsets of identical size or (b) standard perceptual VQA baselines. Without these controls, it remains unclear whether accuracy gains are causally due to the SNR-consensus + MLLM scores predicting rPPG utility, or merely to generic signal cleanliness and sampling strategy.
Authors: We agree that explicit controls are necessary. The revised §4 now includes two new ablation tables: (a) performance when training on random subsets matched in size to the rPPG-VQA filtered set, and (b) performance when the same videos are filtered by standard perceptual VQA metrics (BRISQUE, NIQE, and a generic MLLM prompt). Both controls yield smaller gains than rPPG-VQA, and we report paired statistical tests confirming the differences are significant. These additions directly address the causal attribution question. revision: yes
Circularity Check
No significant circularity detected
full rationale
The rPPG-VQA framework defines its dual-branch quality scores (signal-level SNR consensus and scene-level MLLM analysis) and TAS sampling independently of downstream rPPG model accuracy. No equations reduce the quality metric to a fitted parameter or self-citation chain that forces the claimed performance gains. The derivation remains self-contained, with the central claim resting on empirical filtering experiments rather than definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Robust SNR estimation with multi-method consensus reliably indicates physiological signal quality suitable for rPPG model training
- domain assumption Multimodal LLMs can accurately detect scene-level interferences such as motion and unstable lighting that degrade rPPG signals
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-branch assessment architecture... signal-level branch evaluates... via robust SNR estimation with multi-method consensus... scene-level branch uses MLLM... two-stage adaptive sampling (TAS)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
qsig(vi) = sum wi,m SNRi,m with frequency consistency and spectral correlation weights; Q(vi) = α qsig + (1-α) qsce
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 6, 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue seg- mentation for remote photoplethysmography.Pattern recog- nition letters, 124:82–90, 2019. 1
work page 2019
-
[3]
Constantino Alvarez Casado and Miguel Bordallo L ´opez. Face2ppg: An unsupervised pipeline for blood volume pulse extraction from faces.IEEE Journal of Biomedical and Health Informatics, 27(11):5530–5541, 2023. 3, 6
work page 2023
-
[4]
Xun Chen, Juan Cheng, Rencheng Song, Yu Liu, Rabab Ward, and Z Jane Wang. Video-based heart rate measure- ment: Recent advances and future prospects.IEEE Trans- actions on Instrumentation and Measurement, 68(10):3600– 3615, 2018. 1
work page 2018
-
[5]
Gerard De Haan and Vincent Jeanne. Robust pulse rate from chrominance-based rppg.IEEE transactions on biomedical engineering, 60(10):2878–2886, 2013. 3, 6
work page 2013
-
[6]
Gerard De Haan and Arno Van Leest. Improved motion ro- bustness of remote-ppg by using the blood volume pulse sig- nature.Physiological measurement, 35(9):1913–1926, 2014. 3, 6
work page 1913
-
[7]
Weighted random sampling with a reservoir.Information processing letters, 97 (5):181–185, 2006
Pavlos S Efraimidis and Paul G Spirakis. Weighted random sampling with a reservoir.Information processing letters, 97 (5):181–185, 2006. 5, 1
work page 2006
-
[8]
Optimal signal quality index for photo- plethysmogram signals.Bioengineering, 3(4):21, 2016
Mohamed Elgendi. Optimal signal quality index for photo- plethysmogram signals.Bioengineering, 3(4):21, 2016. 3
work page 2016
-
[9]
Imaging pho- toplethysmography: A real-time signal quality index
Sibylle Fallet, Yann Schoenenberger, Lionel Martin, Fabian Braun, Virginie Moser, and Jean-Marc Vesin. Imaging pho- toplethysmography: A real-time signal quality index. In 2017 Computing in Cardiology (CinC), pages 1–4. IEEE,
work page 2017
-
[10]
An algorithm for real-time pulse waveform segmentation and artifact detection in photoplethysmograms
Christoph Fischer, Benno D ¨omer, Thomas Wibmer, and Thomas Penzel. An algorithm for real-time pulse waveform segmentation and artifact detection in photoplethysmograms. IEEE journal of biomedical and health informatics, 21(2): 372–381, 2016. 3
work page 2016
-
[11]
Christoph Fischer, Martin Glos, Thomas Penzel, and Ingo Fietze. Extended algorithm for real-time pulse waveform segmentation and artifact detection in photoplethysmograms. Somnologie, 21(2):110–120, 2017. 3
work page 2017
-
[12]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 4, 1
work page 1981
-
[13]
John Gideon and Simon Stent. The way to my heart is through contrastive learning: Remote photoplethysmogra- phy from unlabelled video. InProceedings of the IEEE/CVF international conference on computer vision, pages 3995– 4004, 2021. 1, 6
work page 2021
-
[14]
Cover: A comprehensive video quality evaluator
Chenlong He, Qi Zheng, Ruoxi Zhu, Xiaoyang Zeng, Yibo Fan, and Zhengzhong Tu. Cover: A comprehensive video quality evaluator. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5799–5809, 2024. 6
work page 2024
-
[15]
Petr Hus ´ak, Jan Cech, and Jiˇr´ı Matas. Spotting facial micro- expressions “in the wild”. In22nd Computer Vision Winter Workshop (Retz), pages 1–9, 2017. 3
work page 2017
-
[16]
Qoe as a func- tion of frame rate and resolution changes
Lucjan Janowski and Piotr Romaniak. Qoe as a func- tion of frame rate and resolution changes. InInternational Workshop on Future Multimedia Networking, pages 34–45. Springer, 2010. 2
work page 2010
-
[17]
Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, and Wen- jun Zhang. Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2024. 2
work page 2024
-
[18]
G-refine: A general quality refiner for text-to-image generation
Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchuan Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, and Guangtao Zhai. G-refine: A general quality refiner for text-to-image generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 7375–7384, 2024
work page 2024
-
[19]
Q-refine: A perceptual quality re- finer for ai-generated image
Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, and Guangtao Zhai. Q-refine: A perceptual quality re- finer for ai-generated image. In2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2024. 2
work page 2024
-
[20]
Image quality assessment: From human to machine preference
Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, et al. Image quality assessment: From human to machine preference. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7570–7581, 2025. 2
work page 2025
-
[21]
Celeb-df: A large-scale challenging dataset for deep- fake forensics
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207– 3216, 2020. 6, 3
work page 2020
-
[22]
Xin Liu, Girish Narayanswamy, Akshay Paruchuri, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Roni Sengupta, Shwe- tak Patel, Yuntao Wang, and Daniel McDuff. rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023. 1
work page 2023
-
[23]
Camera measurement of physiological vital signs.ACM Computing Surveys, 55(9):1–40, 2023
Daniel McDuff. Camera measurement of physiological vital signs.ACM Computing Surveys, 55(9):1–40, 2023. 1
work page 2023
-
[24]
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012. 2
work page 2012
-
[25]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 2
work page 2012
-
[26]
Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video
Xuesong Niu, Hu Han, Shiguang Shan, and Xilin Chen. Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video. InAsian conference on com- puter vision, pages 562–576. Springer, 2018. 1
work page 2018
-
[27]
Xuesong Niu, Shiguang Shan, Hu Han, and Xilin Chen. Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation.IEEE Transactions on Im- age Processing, 29:2409–2423, 2019. 1
work page 2019
-
[28]
Yen-Fu Ou, Zhan Ma, and Yao Wang. Modeling the impact of frame rate and quantization stepsizes and their temporal variations on perceptual video quality: A review of recent works. In2010 44th Annual Conference on Information Sci- ences and Systems (CISS), pages 1–6. IEEE, 2010. 2
work page 2010
-
[29]
Yen-Fu Ou, Yuanyi Xue, and Yao Wang. Q-star: A per- ceptual video quality model considering impact of spatial, temporal, and amplitude resolutions.IEEE Transactions on Image Processing, 23(6):2473–2486, 2014. 2
work page 2014
-
[30]
Local group invariance for heart rate esti- mation from face videos in the wild
Christian S Pilz, Sebastian Zaunseder, Jarek Krajewski, and Vladimir Blazek. Local group invariance for heart rate esti- mation from face videos in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 1254–1262, 2018. 3, 6
work page 2018
-
[31]
Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. Advancements in noncontact, multiparameter physiologi- cal measurements using a webcam.IEEE transactions on biomedical engineering, 58(1):7–11, 2010. 3, 6
work page 2010
-
[32]
Fangbing Qu, Su-Jing Wang, Wen-Jing Yan, He Li, Shuhang Wu, and Xiaolan Fu. Cas(me) 2: a database for spontaneous macro-expression and micro-expression spotting and recog- nition.IEEE Transactions on Affective Computing, 9(4): 424–436, 2017. 6, 3
work page 2017
-
[33]
Vmaf reproducibility: Validating a perceptual practical video quality metric
Reza Rassool. Vmaf reproducibility: Validating a perceptual practical video quality metric. In2017 IEEE international symposium on broadband multimedia systems and broad- casting (BMSB), pages 1–2. IEEE, 2017. 2
work page 2017
-
[34]
Jingang Shi, Iman Alikhani, Xiaobai Li, Zitong Yu, Tapio Sepp¨anen, and Guoying Zhao. Atrial fibrillation detection from face videos by fusing subtle variations.IEEE Transac- tions on Circuits and Systems for Video Technology, 30(8): 2781–2795, 2019. 1
work page 2019
-
[35]
Rencheng Song, Han Wang, Haojie Xia, Juan Cheng, Chang Li, and Xun Chen. Uncertainty quantification for deep learning-based remote photoplethysmography.IEEE Transactions on Instrumentation and Measurement, 72:1– 12, 2023. 3
work page 2023
-
[36]
Non-contrastive unsupervised learning of physiological signals from video
Jeremy Speth, Nathan Vance, Patrick Flynn, and Adam Cza- jka. Non-contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14464–14474, 2023. 1, 2, 3, 6
work page 2023
-
[37]
Non-contact video-based pulse rate measurement on a mo- bile service robot
Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mo- bile service robot. InThe 23rd IEEE international sympo- sium on robot and human interactive communication, pages 1056–1062. IEEE, 2014. 1, 6
work page 2014
-
[38]
Zhaodong Sun and Xiaobai Li. Contrast-phys: Unsuper- vised video-based remote physiological measurement via spatiotemporal contrast. InEuropean Conference on Com- puter Vision, pages 492–510. Springer, 2022. 1, 6
work page 2022
-
[39]
Re- mote plethysmographic imaging using ambient light.Optics express, 16(26):21434–21445, 2008
Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Re- mote plethysmographic imaging using ambient light.Optics express, 16(26):21434–21445, 2008. 3, 6
work page 2008
-
[40]
Quality metric for camera-based pulse rate monitoring in fit- ness exercise
Wenjin Wang, Beno ˆıt Balmaekers, and Gerard De Haan. Quality metric for camera-based pulse rate monitoring in fit- ness exercise. In2016 IEEE International Conference on Image Processing (ICIP), pages 2430–2434. IEEE, 2016. 3
work page 2016
-
[41]
Algorithmic principles of remote ppg
Wenjin Wang, Albertus C Den Brinker, Sander Stuijk, and Gerard De Haan. Algorithmic principles of remote ppg. IEEE Transactions on Biomedical Engineering, 64(7):1479– 1491, 2016. 3, 6
work page 2016
-
[42]
Multi- scale structural similarity for image quality assessment
Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multi- scale structural similarity for image quality assessment. In The thrity-seventh asilomar conference on signals, systems & computers, 2003, pages 1398–1402. Ieee, 2003. 2
work page 2003
-
[43]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 2
work page 2004
-
[44]
Modular blind video quality assess- ment
Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, and Kede Ma. Modular blind video quality assess- ment. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 2763–2772,
-
[45]
Bingjie Wu, Zitong Yu, Yiping Xie, Wei Liu, Chaoqi Luo, Yong Liu, and Rick Siow Mong Goh. Semi-rppg: Semi- supervised remote physiological measurement with curricu- lum pseudo-labeling.IEEE Transactions on Instrumentation and Measurement, 2025. 3, 6
work page 2025
-
[46]
Fast- vqa: Efficient end-to-end video quality assessment with frag- ment sampling
Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Fast- vqa: Efficient end-to-end video quality assessment with frag- ment sampling. InEuropean conference on computer vision, pages 538–554. Springer, 2022. 2
work page 2022
-
[47]
Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jing- wen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 20144–20154, 2023. 2
work page 2023
-
[48]
Image enhancement for remote photoplethys- mography in a low-light environment
Lin Xi, Weihai Chen, Changchen Zhao, Xingming Wu, and Jianhua Wang. Image enhancement for remote photoplethys- mography in a low-light environment. In2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 1–7. IEEE, 2020. 6
work page 2020
-
[49]
arXiv preprint arXiv:2503.01506 (2025)
Xiangyu Xi, Deyang Kong, Jian Yang, Jiawei Yang, Zhengyu Chen, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, and Wei Ye. Samplemix: A sample-wise pre- training data mixing strategey by coordinating data quality and diversity.arXiv preprint arXiv:2503.01506, 2025. 6, 1
-
[50]
Attention based network for no- reference ugc video quality assessment
Fuwang Yi, Mianyi Chen, Wei Sun, Xiongkuo Min, Yuan Tian, and Guangtao Zhai. Attention based network for no- reference ugc video quality assessment. In2021 IEEE in- ternational conference on image processing (ICIP), pages 1414–1418. IEEE, 2021. 3
work page 2021
-
[51]
Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning so- lution with video enhancement. InProceedings of the IEEE/CVF international conference on computer vision, pages 151–160, 2019. 1
work page 2019
-
[52]
Zijie Yue, Miaojing Shi, and Shuai Ding. Facial video-based remote physiological measurement via self-supervised learn- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11):13844–13859, 2023. 1
work page 2023
-
[53]
Xinyu Zhang, Weiyu Sun, Hao Lu, Ying Chen, Yun Ge, Xi- aolin Huang, Jie Yuan, and Yingcong Chen. Self-similarity prior distillation for unsupervised remote physiological mea- surement.IEEE Transactions on Multimedia, 26:10290– 10305, 2024. 1
work page 2024
-
[54]
Yuting Zhang, Hao Lu, Xin Liu, Yingcong Chen, and Kaishun Wu. Advancing generalizable remote physiological measurement through the integration of explicit and implicit prior knowledge.IEEE Transactions on Image Processing,
-
[55]
No-reference quality assessment of vari- able frame-rate videos using temporal bandpass statistics
Qi Zheng, Zhengzhong Tu, Yibo Fan, Xiaoyang Zeng, and Alan C Bovik. No-reference quality assessment of vari- able frame-rate videos using temporal bandpass statistics. InICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1795–1799. IEEE, 2022. 3
work page 2022
-
[56]
Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C Bovik, et al. Video quality assessment: A compre- hensive survey.arXiv preprint arXiv:2412.04508, 2024. 2 rPPG-VQA: A Video Quality Assessment Framework for Unsupervised rPPG Training Supplementary Material
-
[57]
RANSAC Algorithm The random sample consensus (RANSAC) algorithm is an iterative method for robustly estimating the parameters of a mathematical model from data containing outliers [12]. Its primary strength is its ability to derive a reliable model even when a significant fraction of the dataset consists of er- roneous measurements. In this work, we emplo...
-
[58]
Table 8.Ablation study on the fusion sizeM
Scene-Level Noise Perception Prompt The prompt for Qwen3-VL [1] to assess scene-level quality is given in Figure 3. Table 8.Ablation study on the fusion sizeM. MMAE↓RMSE↓R↑ 1 0.91 1.30 0.99 3 0.78 1.17 0.99 5 0.57 1.121.00 70.47 0.74 1.00
-
[59]
WRS Algorithm Weighted random sampling (WRS) is a class of algorithms for drawing items from a collection, where each item’s probability of being selected is proportional to an assigned weight [7]. To construct the target training setD tgt by re- sampling a source datasetD src, WRS effectively sample items with high-quality scores. WRS first calculates th...
-
[60]
Ablation Studies 9.1. Impact of Fusion SizeM The results in Table 8 demonstrate a clear correlation be- tween the number of fused rPPG methods (M) and estima- tion accuracy. Relying on a single method (M= 1), results in the poorest performance. As we increase the fusion size fromM= 3to our chosen configuration ofM= 7, we observe a consistent and significa...
-
[61]
Visualization Figure 4 illustrates a disparity in the quality score distribu- tions between the CAS(ME) 2 [32] and Celeb-DF (v2) [21] datasets. Scores for CAS(ME) 2, a controlled dataset, are concentrated in the high-quality range (0.4-0.8), reflecting its consistent signal fidelity. In contrast, scores for the “in- the-wild” Celeb-DF (v2) are predominant...
-
[62]
Failure Cases and Mitigation Mechanisms 11.1. Signal-level Branch Failure Figure 5(a) illustrates a video from the “in-the-wild” MEVIEW dataset [15]. Existing estimators (GREEN, ICA, LGI, OMIT) produced inflated SNR values ranging from 16.85 to 26.69, leading to an erroneous consensus SNR of 20.72. This error likely stems from misinterpreting flicker- ing...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.