Cross-View Attention Fusion Net: A Prior-Guided Dual-View Representation Learning for Cardiac Output Estimation from Short-Term PPG Signals
Pith reviewed 2026-05-20 01:55 UTC · model grok-4.3
The pith
A dual-view neural net fuses raw PPG signals with derived feature maps to estimate cardiac output from short segments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CVAF-Net processes short fixed-length PPG segments by running a temporal view on the raw waveform and a prior-guided view on the feature sequence map, then fuses the two through cross-view attention. On simulated pulse-wave data it reaches a mean absolute error of 0.19 L/min; on real-world recordings it maintains minimum MAE of 1.20 L/min while cutting floating-point operations by a factor of twelve relative to the strongest transformer baseline. The resulting estimates show expected physiological correlations with age, heart rate, and systemic vascular resistance.
What carries the argument
Cross-view attention that fuses a raw temporal PPG view with a feature sequence map prior view
Load-bearing premise
The feature sequence map supplies complementary prior information that cross-view attention fuses productively with the raw temporal view without propagating extraction errors or dataset biases.
What would settle it
Test the model on an independent clinical dataset with simultaneous invasive cardiac output reference measurements and report whether the error remains below 1.5 L/min across a wide range of vascular conditions.
Figures
read the original abstract
Accurate cardiac output (CO) estimation from photoplethysmography (PPG) is promising for unobtrusive hemodynamic monitoring, but remains difficult since CO is jointly determined by cardiac function and vascular tone. Conventional feature-based models use physiologically meaningful PPG descriptors, yet depend on accurate pulse detection and may miss latent temporal relationships. In contrast, fully end-to-end deep learning models learn directly from raw PPG but often underuse established PPG-derived prior information. Here, we introduce the Cross-View Attention Fusion Network (CVAF-Net), a prior-guided dual-view deep learning model for CO estimation from short, fixed-length PPG segments. CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention. The model was independently evaluated using 5-, 15-, and 30-s segments from three datasets: simulated pulse waves (3323 subjects), vasoconstriction provocation (79 subjects), and resting/cycling activities (10 subjects), and was compared with multiple machine learning and deep learning benchmarks. CVAF-Net outperformed most benchmark methods and achieved performance comparable to a state-of-the-art Transformer-based model, with a mean absolute error (MAE) of 0.19 L/min (MAPE: 3.95%) on simulated data and high accuracy in real-world settings (minimum MAE: 1.20 L/min). Importantly, CVAF-Net reduced FLOPs by twelvefold compared with the leading Transformer-based model. Plausibility analysis showed physiologically consistent CO estimates, with expected correlations with age ($\rho = -0.274$), heart rate ($\rho = 0.894$), and systemic vascular resistance ($\rho = -0.740$). These findings indicate that CVAF-Net provides an accurate, computationally efficient, and generalizable approach for continuous wearable-based CO monitoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CVAF-Net, a prior-guided dual-view deep learning architecture for cardiac output estimation from short fixed-length PPG segments. It processes raw PPG as a temporal view and a feature sequence map (FSM) derived from conventional PPG descriptors as a structured prior view, fusing them via cross-view attention. The model is evaluated independently on 5-, 15-, and 30-second segments from a large simulated dataset (3323 subjects), a vasoconstriction provocation set (79 subjects), and a resting/cycling activity set (10 subjects), reporting MAE of 0.19 L/min (MAPE 3.95%) on simulated data, minimum MAE of 1.20 L/min on real data, outperformance of most ML/DL benchmarks, parity with a Transformer baseline at 12x lower FLOPs, and physiologically plausible correlations with age, heart rate, and systemic vascular resistance.
Significance. If the dual-view fusion proves robust, the work offers a computationally efficient, prior-informed alternative to pure end-to-end or Transformer models for continuous wearable CO monitoring, addressing the tension between physiological interpretability and latent temporal learning. The reported efficiency gain and multi-dataset evaluation are concrete strengths.
major comments (2)
- [Experimental evaluation] Experimental evaluation (results and methods sections): the reported MAE, MAPE, and correlation values across datasets lack any description of training/validation/test splits, hyperparameter search procedure, or explicit handling of inter-subject variability. This directly weakens the claim of independent test-set performance and generalizability.
- [Model architecture] Model architecture and ablation analysis: no ablation isolating the FSM prior or cross-view attention fusion is presented, nor are noise-injection or pulse-detection-error robustness tests. Without these, it remains unclear whether the dual-view design supplies complementary information or simply propagates inaccuracies from conventional feature extraction in short (5–15 s) segments, making the central prior-guided claim load-bearing but unverified.
minor comments (2)
- [Abstract] The abstract states a 'minimum MAE: 1.20 L/min' without specifying the corresponding dataset or segment length; this should be clarified for reproducibility.
- [Figures and methods] Figure captions and notation for the feature sequence map (FSM) construction should explicitly list the PPG descriptors used and any preprocessing steps to avoid ambiguity.
Simulated Author's Rebuttal
We are grateful to the referee for the thorough review and constructive feedback. The comments highlight important aspects that will enhance the manuscript's transparency and robustness. We respond to each major comment below, indicating the planned revisions.
read point-by-point responses
-
Referee: [Experimental evaluation] Experimental evaluation (results and methods sections): the reported MAE, MAPE, and correlation values across datasets lack any description of training/validation/test splits, hyperparameter search procedure, or explicit handling of inter-subject variability. This directly weakens the claim of independent test-set performance and generalizability.
Authors: We concur that additional details on the experimental protocol are required to support the generalizability assertions. In the revised manuscript, we will augment the Methods and Results sections with explicit information on the training, validation, and test splits, which were performed in a subject-independent manner to avoid data leakage. We will also describe the hyperparameter search strategy employed and how inter-subject variability was mitigated through stratified partitioning and independent evaluation across datasets. These clarifications will reinforce the validity of the reported performance metrics. revision: yes
-
Referee: [Model architecture] Model architecture and ablation analysis: no ablation isolating the FSM prior or cross-view attention fusion is presented, nor are noise-injection or pulse-detection-error robustness tests. Without these, it remains unclear whether the dual-view design supplies complementary information or simply propagates inaccuracies from conventional feature extraction in short (5–15 s) segments, making the central prior-guided claim load-bearing but unverified.
Authors: We recognize the value of ablation studies and robustness evaluations in validating the proposed architecture. Accordingly, we will incorporate ablation experiments in the revised version that systematically remove the FSM prior and the cross-view attention mechanism to quantify their individual contributions. Furthermore, we will add robustness analyses involving controlled noise injection into the PPG signals and simulations of pulse detection inaccuracies to demonstrate the model's stability, particularly for shorter segments. These additions will provide evidence that the dual-view fusion offers complementary benefits beyond conventional features. revision: yes
Circularity Check
No significant circularity in derivation or performance claims
full rationale
The paper introduces CVAF-Net as a dual-view architecture that fuses a raw temporal PPG view with a feature sequence map (FSM) prior via cross-view attention, then reports empirical MAE/MAPE on held-out segments from three independent datasets (simulated, vasoconstriction, activity) against external benchmarks. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; performance metrics are measured on separate test recordings rather than being statistically forced by the model equations themselves. The architecture description and plausibility correlations are presented as design choices validated externally, not as tautological renamings or ansatzes smuggled via prior author work.
Axiom & Free-Parameter Ledger
free parameters (1)
- attention and fusion hyperparameters
axioms (1)
- domain assumption PPG signals jointly reflect cardiac output and vascular tone in a manner recoverable by dual-view fusion
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Breath1024.leanperiod8 / 8-tick periodicity echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We utilized a multi-channel feature sequence map (FSM) comprising eight key features extracted from the PPG signal: first derivative, second derivative, Fast Fourier transform (FFT) full amplitude, Hilbert transform, discrete stationary wavelet transform (DSWT) approximation coefficients (a) as well as first, second, and third detail coefficients (d1, d2, d3).
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat / embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cardiovascular diseases (cvds), 7 2025
World Health Organization. Cardiovascular diseases (cvds), 7 2025
work page 2025
-
[2]
G.A.Roth,G.A.Mensah,C.O.Johnson,etal. Globalburdenofcardiovasculardiseasesandriskfactors,1990-2019.Journal of the American College of Cardiology, 76(25):2982–3021, 12 2020
work page 1990
-
[3]
D. Rusinaru, Y. Bohbot, F. Djelaili, et al. Normative reference values of cardiac output by pulsed-wave doppler echocardiography in adults. The American Journal of Cardiology, 140:128–133, 2 2021
work page 2021
-
[4]
C. Vahdatpour, D. Collins, S. Goldberg. Cardiogenic shock.Journal of the American Heart Association, 8(8):e011991, 4 2019
work page 2019
-
[5]
B. F. Geerts, L. P. Aarts, J. R. Jansen. Methods in pharmacology: measurement of cardiac output.British Journal of Clinical Pharmacology, 71(3):316–330, 3 2011
work page 2011
-
[6]
J. Grensemann. Cardiac output monitoring by pulse contour analysis, the technical basics of less-invasive techniques.Frontiers in Medicine, 5:64, 3 2018
work page 2018
-
[7]
B.Saugel,J.L.Vincent. Cardiacoutputmonitoring:howtochoosetheoptimalmethodfortheindividualpatient.Current Opinion in Critical Care, 24(3):165–172, 6 2018
work page 2018
-
[8]
P. H. Charlton, P. A. Kyriacou, J. Mant, V. Marozas, P. Chowienczyk, J. Alastruey. Wearable photoplethysmography for cardiovascular monitoring.Proceedings of the IEEE, 110(3):355–381, 3 2022
work page 2022
-
[9]
Q. Y. Lee, S. J. Redmond, G. S. Chan, et al. Estimation of cardiac output and systemic vascular resistance using a multivariate regression modelwithfeaturesselectedfromthefingerphotoplethysmogramandroutinecardiovascularmeasurements.BioMedical Engineering OnLine, 12(1):19, 3 2013
work page 2013
-
[10]
Noninvasivecardiacoutputestimationusinganovelphotoplethysmogramindex
L.Wang,E.Pickwell-MacPherson,Y.P.Liang,Y.T.Zhang. Noninvasivecardiacoutputestimationusinganovelphotoplethysmogramindex. 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 1746–1749, 9 2009
work page 2009
-
[11]
E. Ipar, L. J. Cymberknop, R. L. Armentano. Parallel convolutional neural networks for non-invasive cardiac hemodynamic estimation: integrating uncalibrated ppg signals with nonlinear feature analysis.Physiological Measurement, 46(3):035008, 3 2025
work page 2025
-
[12]
J. G, A. A. Anil, P. M. Nabeel, J. Joseph. Deep learning-based cardiac output estimation using multimodal physiological signals.2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–5, 7 2025
work page 2025
-
[13]
M. Elgendi. On the analysis of fingertip photoplethysmogram signals.Current Cardiology Reviews, 8(1):14–25, 6 2012
work page 2012
-
[14]
E. Mejía-Mejía, J. Allen, K. Budidha, C. El-Hajj, P. A. Kyriacou, P. H. Charlton. Photoplethysmography signal processing and synthesis. Photoplethysmography, pages 69–146, 11 2022
work page 2022
-
[15]
N. E. Huang, Z. Shen, S. R. Long, et al. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time seriesanalysis.Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,454(1971):903–995, 3 1998
work page 1971
-
[16]
A.John,S.J.Redmond,B.Cardiff,D.John. Amultimodaldatafusiontechniqueforheartbeatdetectioninwearableiotsensors.IEEE Internet of Things Journal, 9(3):2071–2082, 2 2022
work page 2071
-
[17]
P.H.Charlton,J.MariscalHarana,S.Vennin,Y.Li,P.Chowienczyk,J.Alastruey. Modellingarterialpulsewavesinhealthyageing:adatabase for in silico evaluation of haemodynamics and pulse wave indices.American Journal of Physiology-Heart and Circulatory Physiology, 317(5):H1061–H1085, 11 2019
work page 2019
-
[18]
A. Mol, C. G. M. Meskers, S. P. Niehof, A. B. Maier, R. van Wezel. Pulse transit time as a proxy for vasoconstriction in younger and older adults, 2020
work page 2020
-
[19]
Y. Zhang, et al. Finger ppg and beat-to-beat blood pressure in resting/cycling. Zenodo, Apr. 2026
work page 2026
-
[20]
H.Tanaka,B.J.Sjöberg,O.Thulesius. Cardiacoutputandbloodpressureduringactiveandpassivestanding.Clinical Physiology,16(2):157– 170, 3 1996
work page 1996
-
[21]
Y.Zhang,L.Fresiello,P.H.Veltink,D.W.Donker,Y.Wang.Pmb-nn:Physiology-centredhybridaiforpersonalizedhemodynamicmonitoring from photoplethysmography.arXiv preprint, 12 2025
work page 2025
-
[22]
Y. Zhang, L. Fresiello, P. H. Veltink, D. W. Donker, Y. Wang. A physiological-model-based neural network framework for blood pressure estimationfromphotoplethysmographysignals.2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–5, 7 2025. :Preprint submitted to Elsevier Page 17 of 18
work page 2025
-
[23]
R.J.Rodeheffer,G.Gerstenblith,L.C.Becker,J.L.Fleg,M.L.Weisfeldt,E.G.Lakatta. Exercisecardiacoutputismaintainedwithadvancing ageinhealthyhumansubjects:cardiacdilatationandincreasedstrokevolumecompensateforadiminishedheartrate.Circulation,69(2):203– 213, 2 1984
work page 1984
- [24]
-
[25]
A.Icenhower,C.Murphy,A.K.Brooks,etal. Investigatingtheaccuracyofgarminppgsensorsondifferingskintypesbasedonthefitzpatrick scale: cross-sectional comparison study.Frontiers in Digital Health, 7:1553565, 3 2025
work page 2025
-
[26]
W. Wang, P. Mohseni, K. L. Kilgore, L. Najafizadeh. Pulsedb: A large, cleaned dataset based on mimic-iii and vitaldb for benchmarking cuff-less blood pressure estimation methods.Frontiers in Digital Health, 4:1090854, 2 2023
work page 2023
-
[27]
S. Lingawi, G. Frank, B. H. Kartawidjaja, M. Khalili, B. Kwon, C. Kuo. Reducing latency and noise in ppg-based spo2 measurements: a kalman filtering approach towards acute hypoxia detection.arXiv preprint, 10 2025
work page 2025
-
[28]
Z. Chen, C. Ding, S. Kataria, et al. Gpt-ppg: A gpt-based foundation model for photoplethysmography signals.arXiv preprint, 3 2025. :Preprint submitted to Elsevier Page 18 of 18
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.