pith. sign in

arxiv: 2605.19666 · v1 · pith:FH67QKT2new · submitted 2026-05-19 · ⚛️ physics.med-ph · cs.LG

Cross-View Attention Fusion Net: A Prior-Guided Dual-View Representation Learning for Cardiac Output Estimation from Short-Term PPG Signals

Pith reviewed 2026-05-20 01:55 UTC · model grok-4.3

classification ⚛️ physics.med-ph cs.LG
keywords cardiac outputphotoplethysmographydeep learningcross-view attentionhemodynamic monitoringwearable sensorsfeature sequence map
0
0 comments X

The pith

A dual-view neural net fuses raw PPG signals with derived feature maps to estimate cardiac output from short segments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CVAF-Net to estimate cardiac output from brief photoplethysmography recordings by treating the raw signal as one view and a structured feature sequence map as a second prior-guided view. It fuses these representations with cross-view attention so the model can draw on both latent temporal patterns and established physiological descriptors. This matters because cardiac output depends on both heart function and blood vessel tone, making pure feature extraction or raw-signal models incomplete for unobtrusive monitoring. The approach yields low error on simulated data and real recordings while using far less computation than leading transformer models.

Core claim

CVAF-Net processes short fixed-length PPG segments by running a temporal view on the raw waveform and a prior-guided view on the feature sequence map, then fuses the two through cross-view attention. On simulated pulse-wave data it reaches a mean absolute error of 0.19 L/min; on real-world recordings it maintains minimum MAE of 1.20 L/min while cutting floating-point operations by a factor of twelve relative to the strongest transformer baseline. The resulting estimates show expected physiological correlations with age, heart rate, and systemic vascular resistance.

What carries the argument

Cross-view attention that fuses a raw temporal PPG view with a feature sequence map prior view

Load-bearing premise

The feature sequence map supplies complementary prior information that cross-view attention fuses productively with the raw temporal view without propagating extraction errors or dataset biases.

What would settle it

Test the model on an independent clinical dataset with simultaneous invasive cardiac output reference measurements and report whether the error remains below 1.5 L/min across a wide range of vascular conditions.

Figures

Figures reproduced from arXiv: 2605.19666 by Bo Cui, Dirk W. Donker, Libera Fresiello, Peter H. Veltink, Yaowen Zhang, Ying Wang.

Figure 1
Figure 1. Figure 1: Schematic diagram of the CVAF-Net architecture. The framework consists of three modules: a temporal [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Boxplots of predicted cardiac output versus age on the SPW dataset. Subplots from top to bottom represent [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scatter plots of predicted cardiac output versus heart rate on the RC dataset. Subplots from left to right [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plots of predicted cardiac output versus systemic vascular resistance on the RC dataset. Subplots from [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

Accurate cardiac output (CO) estimation from photoplethysmography (PPG) is promising for unobtrusive hemodynamic monitoring, but remains difficult since CO is jointly determined by cardiac function and vascular tone. Conventional feature-based models use physiologically meaningful PPG descriptors, yet depend on accurate pulse detection and may miss latent temporal relationships. In contrast, fully end-to-end deep learning models learn directly from raw PPG but often underuse established PPG-derived prior information. Here, we introduce the Cross-View Attention Fusion Network (CVAF-Net), a prior-guided dual-view deep learning model for CO estimation from short, fixed-length PPG segments. CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention. The model was independently evaluated using 5-, 15-, and 30-s segments from three datasets: simulated pulse waves (3323 subjects), vasoconstriction provocation (79 subjects), and resting/cycling activities (10 subjects), and was compared with multiple machine learning and deep learning benchmarks. CVAF-Net outperformed most benchmark methods and achieved performance comparable to a state-of-the-art Transformer-based model, with a mean absolute error (MAE) of 0.19 L/min (MAPE: 3.95%) on simulated data and high accuracy in real-world settings (minimum MAE: 1.20 L/min). Importantly, CVAF-Net reduced FLOPs by twelvefold compared with the leading Transformer-based model. Plausibility analysis showed physiologically consistent CO estimates, with expected correlations with age ($\rho = -0.274$), heart rate ($\rho = 0.894$), and systemic vascular resistance ($\rho = -0.740$). These findings indicate that CVAF-Net provides an accurate, computationally efficient, and generalizable approach for continuous wearable-based CO monitoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CVAF-Net, a prior-guided dual-view deep learning architecture for cardiac output estimation from short fixed-length PPG segments. It processes raw PPG as a temporal view and a feature sequence map (FSM) derived from conventional PPG descriptors as a structured prior view, fusing them via cross-view attention. The model is evaluated independently on 5-, 15-, and 30-second segments from a large simulated dataset (3323 subjects), a vasoconstriction provocation set (79 subjects), and a resting/cycling activity set (10 subjects), reporting MAE of 0.19 L/min (MAPE 3.95%) on simulated data, minimum MAE of 1.20 L/min on real data, outperformance of most ML/DL benchmarks, parity with a Transformer baseline at 12x lower FLOPs, and physiologically plausible correlations with age, heart rate, and systemic vascular resistance.

Significance. If the dual-view fusion proves robust, the work offers a computationally efficient, prior-informed alternative to pure end-to-end or Transformer models for continuous wearable CO monitoring, addressing the tension between physiological interpretability and latent temporal learning. The reported efficiency gain and multi-dataset evaluation are concrete strengths.

major comments (2)
  1. [Experimental evaluation] Experimental evaluation (results and methods sections): the reported MAE, MAPE, and correlation values across datasets lack any description of training/validation/test splits, hyperparameter search procedure, or explicit handling of inter-subject variability. This directly weakens the claim of independent test-set performance and generalizability.
  2. [Model architecture] Model architecture and ablation analysis: no ablation isolating the FSM prior or cross-view attention fusion is presented, nor are noise-injection or pulse-detection-error robustness tests. Without these, it remains unclear whether the dual-view design supplies complementary information or simply propagates inaccuracies from conventional feature extraction in short (5–15 s) segments, making the central prior-guided claim load-bearing but unverified.
minor comments (2)
  1. [Abstract] The abstract states a 'minimum MAE: 1.20 L/min' without specifying the corresponding dataset or segment length; this should be clarified for reproducibility.
  2. [Figures and methods] Figure captions and notation for the feature sequence map (FSM) construction should explicitly list the PPG descriptors used and any preprocessing steps to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and constructive feedback. The comments highlight important aspects that will enhance the manuscript's transparency and robustness. We respond to each major comment below, indicating the planned revisions.

read point-by-point responses
  1. Referee: [Experimental evaluation] Experimental evaluation (results and methods sections): the reported MAE, MAPE, and correlation values across datasets lack any description of training/validation/test splits, hyperparameter search procedure, or explicit handling of inter-subject variability. This directly weakens the claim of independent test-set performance and generalizability.

    Authors: We concur that additional details on the experimental protocol are required to support the generalizability assertions. In the revised manuscript, we will augment the Methods and Results sections with explicit information on the training, validation, and test splits, which were performed in a subject-independent manner to avoid data leakage. We will also describe the hyperparameter search strategy employed and how inter-subject variability was mitigated through stratified partitioning and independent evaluation across datasets. These clarifications will reinforce the validity of the reported performance metrics. revision: yes

  2. Referee: [Model architecture] Model architecture and ablation analysis: no ablation isolating the FSM prior or cross-view attention fusion is presented, nor are noise-injection or pulse-detection-error robustness tests. Without these, it remains unclear whether the dual-view design supplies complementary information or simply propagates inaccuracies from conventional feature extraction in short (5–15 s) segments, making the central prior-guided claim load-bearing but unverified.

    Authors: We recognize the value of ablation studies and robustness evaluations in validating the proposed architecture. Accordingly, we will incorporate ablation experiments in the revised version that systematically remove the FSM prior and the cross-view attention mechanism to quantify their individual contributions. Furthermore, we will add robustness analyses involving controlled noise injection into the PPG signals and simulations of pulse detection inaccuracies to demonstrate the model's stability, particularly for shorter segments. These additions will provide evidence that the dual-view fusion offers complementary benefits beyond conventional features. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or performance claims

full rationale

The paper introduces CVAF-Net as a dual-view architecture that fuses a raw temporal PPG view with a feature sequence map (FSM) prior via cross-view attention, then reports empirical MAE/MAPE on held-out segments from three independent datasets (simulated, vasoconstriction, activity) against external benchmarks. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; performance metrics are measured on separate test recordings rather than being statistically forced by the model equations themselves. The architecture description and plausibility correlations are presented as design choices validated externally, not as tautological renamings or ansatzes smuggled via prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised deep-learning assumptions plus the domain premise that PPG waveforms encode usable information about both cardiac function and vascular tone.

free parameters (1)
  • attention and fusion hyperparameters
    Dimensions, learning rates, and weighting factors chosen to optimize validation performance on the reported datasets.
axioms (1)
  • domain assumption PPG signals jointly reflect cardiac output and vascular tone in a manner recoverable by dual-view fusion
    Invoked in the introduction and model motivation sections.

pith-pipeline@v0.9.0 · 5905 in / 1268 out tokens · 52795 ms · 2026-05-20T01:55:06.381166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/Breath1024.lean period8 / 8-tick periodicity echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We utilized a multi-channel feature sequence map (FSM) comprising eight key features extracted from the PPG signal: first derivative, second derivative, Fast Fourier transform (FFT) full amplitude, Hilbert transform, discrete stationary wavelet transform (DSWT) approximation coefficients (a) as well as first, second, and third detail coefficients (d1, d2, d3).

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat / embed_strictMono unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Cardiovascular diseases (cvds), 7 2025

    World Health Organization. Cardiovascular diseases (cvds), 7 2025

  2. [2]

    Globalburdenofcardiovasculardiseasesandriskfactors,1990-2019.Journal of the American College of Cardiology, 76(25):2982–3021, 12 2020

    G.A.Roth,G.A.Mensah,C.O.Johnson,etal. Globalburdenofcardiovasculardiseasesandriskfactors,1990-2019.Journal of the American College of Cardiology, 76(25):2982–3021, 12 2020

  3. [3]

    Rusinaru, Y

    D. Rusinaru, Y. Bohbot, F. Djelaili, et al. Normative reference values of cardiac output by pulsed-wave doppler echocardiography in adults. The American Journal of Cardiology, 140:128–133, 2 2021

  4. [4]

    Vahdatpour, D

    C. Vahdatpour, D. Collins, S. Goldberg. Cardiogenic shock.Journal of the American Heart Association, 8(8):e011991, 4 2019

  5. [5]

    B. F. Geerts, L. P. Aarts, J. R. Jansen. Methods in pharmacology: measurement of cardiac output.British Journal of Clinical Pharmacology, 71(3):316–330, 3 2011

  6. [6]

    Grensemann

    J. Grensemann. Cardiac output monitoring by pulse contour analysis, the technical basics of less-invasive techniques.Frontiers in Medicine, 5:64, 3 2018

  7. [7]

    Cardiacoutputmonitoring:howtochoosetheoptimalmethodfortheindividualpatient.Current Opinion in Critical Care, 24(3):165–172, 6 2018

    B.Saugel,J.L.Vincent. Cardiacoutputmonitoring:howtochoosetheoptimalmethodfortheindividualpatient.Current Opinion in Critical Care, 24(3):165–172, 6 2018

  8. [8]

    P. H. Charlton, P. A. Kyriacou, J. Mant, V. Marozas, P. Chowienczyk, J. Alastruey. Wearable photoplethysmography for cardiovascular monitoring.Proceedings of the IEEE, 110(3):355–381, 3 2022

  9. [9]

    Q. Y. Lee, S. J. Redmond, G. S. Chan, et al. Estimation of cardiac output and systemic vascular resistance using a multivariate regression modelwithfeaturesselectedfromthefingerphotoplethysmogramandroutinecardiovascularmeasurements.BioMedical Engineering OnLine, 12(1):19, 3 2013

  10. [10]

    Noninvasivecardiacoutputestimationusinganovelphotoplethysmogramindex

    L.Wang,E.Pickwell-MacPherson,Y.P.Liang,Y.T.Zhang. Noninvasivecardiacoutputestimationusinganovelphotoplethysmogramindex. 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 1746–1749, 9 2009

  11. [11]

    E. Ipar, L. J. Cymberknop, R. L. Armentano. Parallel convolutional neural networks for non-invasive cardiac hemodynamic estimation: integrating uncalibrated ppg signals with nonlinear feature analysis.Physiological Measurement, 46(3):035008, 3 2025

  12. [12]

    J. G, A. A. Anil, P. M. Nabeel, J. Joseph. Deep learning-based cardiac output estimation using multimodal physiological signals.2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–5, 7 2025

  13. [13]

    M. Elgendi. On the analysis of fingertip photoplethysmogram signals.Current Cardiology Reviews, 8(1):14–25, 6 2012

  14. [14]

    Mejía-Mejía, J

    E. Mejía-Mejía, J. Allen, K. Budidha, C. El-Hajj, P. A. Kyriacou, P. H. Charlton. Photoplethysmography signal processing and synthesis. Photoplethysmography, pages 69–146, 11 2022

  15. [15]

    N. E. Huang, Z. Shen, S. R. Long, et al. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time seriesanalysis.Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,454(1971):903–995, 3 1998

  16. [16]

    Amultimodaldatafusiontechniqueforheartbeatdetectioninwearableiotsensors.IEEE Internet of Things Journal, 9(3):2071–2082, 2 2022

    A.John,S.J.Redmond,B.Cardiff,D.John. Amultimodaldatafusiontechniqueforheartbeatdetectioninwearableiotsensors.IEEE Internet of Things Journal, 9(3):2071–2082, 2 2022

  17. [17]

    P.H.Charlton,J.MariscalHarana,S.Vennin,Y.Li,P.Chowienczyk,J.Alastruey. Modellingarterialpulsewavesinhealthyageing:adatabase for in silico evaluation of haemodynamics and pulse wave indices.American Journal of Physiology-Heart and Circulatory Physiology, 317(5):H1061–H1085, 11 2019

  18. [18]

    A. Mol, C. G. M. Meskers, S. P. Niehof, A. B. Maier, R. van Wezel. Pulse transit time as a proxy for vasoconstriction in younger and older adults, 2020

  19. [19]

    Zhang, et al

    Y. Zhang, et al. Finger ppg and beat-to-beat blood pressure in resting/cycling. Zenodo, Apr. 2026

  20. [20]

    Cardiacoutputandbloodpressureduringactiveandpassivestanding.Clinical Physiology,16(2):157– 170, 3 1996

    H.Tanaka,B.J.Sjöberg,O.Thulesius. Cardiacoutputandbloodpressureduringactiveandpassivestanding.Clinical Physiology,16(2):157– 170, 3 1996

  21. [21]

    Y.Zhang,L.Fresiello,P.H.Veltink,D.W.Donker,Y.Wang.Pmb-nn:Physiology-centredhybridaiforpersonalizedhemodynamicmonitoring from photoplethysmography.arXiv preprint, 12 2025

  22. [22]

    Zhang, L

    Y. Zhang, L. Fresiello, P. H. Veltink, D. W. Donker, Y. Wang. A physiological-model-based neural network framework for blood pressure estimationfromphotoplethysmographysignals.2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–5, 7 2025. :Preprint submitted to Elsevier Page 17 of 18

  23. [23]

    R.J.Rodeheffer,G.Gerstenblith,L.C.Becker,J.L.Fleg,M.L.Weisfeldt,E.G.Lakatta. Exercisecardiacoutputismaintainedwithadvancing ageinhealthyhumansubjects:cardiacdilatationandincreasedstrokevolumecompensateforadiminishedheartrate.Circulation,69(2):203– 213, 2 1984

  24. [24]

    Parak, I

    J. Parak, I. Korhonen. Evaluation of wearable consumer heart rate monitors based on photopletysmography.2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3670–3673, 8 2014

  25. [25]

    Investigatingtheaccuracyofgarminppgsensorsondifferingskintypesbasedonthefitzpatrick scale: cross-sectional comparison study.Frontiers in Digital Health, 7:1553565, 3 2025

    A.Icenhower,C.Murphy,A.K.Brooks,etal. Investigatingtheaccuracyofgarminppgsensorsondifferingskintypesbasedonthefitzpatrick scale: cross-sectional comparison study.Frontiers in Digital Health, 7:1553565, 3 2025

  26. [26]

    W. Wang, P. Mohseni, K. L. Kilgore, L. Najafizadeh. Pulsedb: A large, cleaned dataset based on mimic-iii and vitaldb for benchmarking cuff-less blood pressure estimation methods.Frontiers in Digital Health, 4:1090854, 2 2023

  27. [27]

    Lingawi, G

    S. Lingawi, G. Frank, B. H. Kartawidjaja, M. Khalili, B. Kwon, C. Kuo. Reducing latency and noise in ppg-based spo2 measurements: a kalman filtering approach towards acute hypoxia detection.arXiv preprint, 10 2025

  28. [28]

    Z. Chen, C. Ding, S. Kataria, et al. Gpt-ppg: A gpt-based foundation model for photoplethysmography signals.arXiv preprint, 3 2025. :Preprint submitted to Elsevier Page 18 of 18