pith. sign in

arxiv: 2605.18802 · v1 · pith:2QTFCP6Mnew · submitted 2026-05-11 · 📡 eess.SP · cs.AI· cs.LG

A Nonlinear Complexity Index for Wearable PPG Cardiovascular Stability: Multiscale Validation, Systematic Evaluation Correction, and Bayesian Parameter Optimization

Pith reviewed 2026-05-20 22:10 UTC · model grok-4.3

classification 📡 eess.SP cs.AIcs.LG
keywords PPGcardiovascular stabilitynonlinear indexwearable monitoringevaluation leakageBayesian optimizationtachypnea screeningICU monitoring
0
0 comments X

The pith

Correcting three evaluation artifacts and Bayesian optimization produces a generalizable nonlinear index for PPG cardiovascular stability with 0.757 pooled AUC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Stability-Constrained Cardiovascular Stability Index (SCSI) as a nonlinear measure for estimating cardiovascular stability from wearable photoplethysmography signals. It identifies and corrects three evaluation artifacts that previously inflated reported performance from a true baseline of 0.573 to 0.752. After these corrections and Bayesian tuning of 15 parameters, the index achieves a cross-validation AUC of 0.720 and a held-out pooled AUC of 0.757 with high negative predictive value for tachypnea. A sympathetic reader would care because accurate wearable indices could improve clinical screening for cardiovascular issues in real-world settings like ICUs and surgery without misleadingly high performance claims. The work validates the index across multiple datasets and temporal scales while proposing a sparse deployable version.

Core claim

The paper claims that after fixing segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting, and then applying Bayesian optimization, the SCSI achieves a cross-validation AUC of 0.720; on 18 held-out records it reaches a pooled AUC of 0.757 and NPV of 0.966 for tachypnea screening, with external validation AUC of 0.621 and the nonlinear complexity module identified as dominant through ablation.

What carries the argument

The Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory, which uses a nonlinear complexity module optimized over joint parameters to compute stability from PPG segments.

If this is right

  • Cross-dataset Kruskal-Wallis test shows large effect size eta2 = 0.351 with p < 0.001.
  • Strong cross-scale consistency with kappa > 0.97 across three temporal scales.
  • Significant Spearman correlation r = 0.346 with respiratory rate across 53 ICU records.
  • Per-record AUC of 0.497 +/- 0.207 on held-out data discloses patient-level variability.
  • External validation on 42 elective-surgery records yields AUC of 0.621 confirming cross-population generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If validated further, SCSI could support real-time alerts in consumer fitness devices for early detection of instability.
  • The proposed correction protocol for evaluation artifacts could standardize reporting in other biomedical signal classification tasks.
  • Testing the sparse three-component version in ambulatory settings would check if ICU-derived parameters hold outside controlled environments.
  • Per-patient performance variability suggests that adaptive or personalized tuning might further improve the index.

Load-bearing premise

That correcting the three evaluation artifacts and performing Bayesian optimization on the given datasets produces a generalizable index free from new selection biases or overfitting.

What would settle it

Observing a pooled AUC below 0.65 or failure to maintain high NPV on a new independent set of PPG records from a different population would falsify the generalizability of the corrected SCSI.

Figures

Figures reproduced from arXiv: 2605.18802 by Farouk Ganiyu Adewumi, Timothy Oladunni.

Figure 1
Figure 1. Figure 1: Artifact Cascade: From Reported to True Performance. Orange: heuristic CSI reported (0.752, both Artifacts 1 and 2 present). Red bars: AUC after removing Artifact 1 (segment-level CV, −0.062) then Artifact 2 (normalization leakage, −0.308); true unbiased CSI = 0.573. Blue: optimized SCSI pooled AUC (0.757); teal: per-record AUC (0.497). Dotted line: chance (= 0.5). AUC 0.621 [0.585, 0.658] (p < 0.0001 vs. … view at source ↗
Figure 2
Figure 2. Figure 2: Artifact 3: Pooled AUC Masks Per-Patient Failure. Left: CNN pooled AUC (0.804) collapses to per-record mean 0.380 (−0.423); SCSI shows a smaller gap (0.757 → 0.497). Error bars: 95% CI (pooled) or ±1σ (per-record). Dashed line: unbiased baseline (0.573). Right: Per-record strip plot; teal = SCSI, orange = CNN; SCSI outperforms CNN on 7 of 9 records. TABLE VIII PERFORMANCE COMPARISON UNDER PROGRESSIVE ARTIF… view at source ↗
Figure 5
Figure 5. Figure 5: Component Ablation Waterfall. Red: critical (|∆AUC| > 10%); orange: moderate; teal: noise (removal improves AUC). CNL (−41.3) and SampEn (−27.6) are the only critical components; five components are noise. TABLE IX ABLATION STUDY: COMPONENT CONTRIBUTION TO SCSI Component removed AUC ∆AUC Full SCSI (optimised) 0.720 n/a Critical Without CNL 0.307 −0.413 Without SampEn 0.444 −0.276 Moderate Without autonomic… view at source ↗
Figure 4
Figure 4. Figure 4: External Validation on CapnoBase. Left: ROC curves for BIDMC test (blue, 0.757) and CapnoBase (teal, 0.621); both p < 0.0001. Right: AUC comparison with 95% CI; dashed line: unbiased baseline (0.573). D. Why the Optimal Parameters Depart from Convention The heuristic m = 2, τ = 1, r = 0.2σ were derived for short RR-interval series [16] and are not validated for PPG at 30 Hz. PPG encodes respiratory, vasomo… view at source ↗
read the original abstract

Cardiovascular stability estimation from wearable photoplethysmography (PPG) requires a principled nonlinear framework, yet major gaps persist in heuristic parameter selection and evaluation protocols that inflate reported performance. We introduce a Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory and validate it across 176,742 segments from four heterogeneous PPG datasets at three temporal scales. Cross-dataset analysis demonstrates a large Kruskal-Wallis effect size (eta2 = 0.351, p < 0.001), strong cross-scale consistency (kappa > 0.97), and significant correlation with respiratory rate across 53 ICU records (Spearman r = 0.346, p = 0.011). We identify three evaluation artifacts that inflate heuristic AUC from a true baseline of 0.573 to 0.752: segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting that conceals per-patient failure. Correcting these artifacts and applying Bayesian optimization over 15 joint parameters yields SCSI with cross-validation AUC of 0.720. On 18 held-out records, SCSI achieves pooled AUC of 0.757 (95% CI: 0.686-0.828) and negative predictive value of 0.966 for tachypnea screening, while per-record AUC of 0.497 +/- 0.207 is disclosed for transparency. External validation on 42 elective-surgery records yields AUC of 0.621, confirming cross-population generalization. Ablation analysis identifies the nonlinear complexity module as the dominant component. A sparse three-component architecture is proposed as the minimal deployable configuration. The corrected protocol provides a reproducible benchmark for future wearable cardiovascular stability indices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory for wearable PPG cardiovascular stability estimation. It validates the index across 176,742 segments from four heterogeneous datasets at multiple scales, identifies three evaluation artifacts (segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting) that inflate heuristic AUC from 0.573 to 0.752, corrects them, and applies Bayesian optimization over 15 joint parameters to report cross-validation AUC of 0.720, held-out pooled AUC of 0.757 (95% CI 0.686-0.828) with NPV 0.966 for tachypnea screening, per-record AUC of 0.497 ± 0.207, and external validation AUC of 0.621 on 42 elective-surgery records, claiming cross-population generalization and providing a reproducible benchmark.

Significance. If the central claims hold after addressing evaluation and generalization concerns, the work could establish a corrected protocol for future PPG stability indices and highlight the value of nonlinear complexity modules, with strengths in transparency via per-record metrics and external validation. The large effect size, cross-scale kappa > 0.97, and ablation results add value if reproducible. However, the modest external performance and near-chance per-record AUC limit immediate clinical impact for intra-patient stability detection.

major comments (3)
  1. [Abstract] Abstract: The disclosure of per-record AUC of 0.497 ± 0.207 (near chance) on the 18 held-out records undermines the central claim that SCSI detects cardiovascular stability, as the pooled AUC of 0.757 may primarily reflect inter-patient differences rather than intra-patient changes; this requires explicit discussion of within-record performance and whether the index meets its stability-detection objective.
  2. [Abstract] Abstract: Bayesian optimization over 15 joint parameters on segments from the 53 ICU records (with 18 held out) creates a load-bearing risk of overfitting to the training distribution, as evidenced by the 0.136 drop to AUC 0.621 on the separate 42 elective-surgery records and per-record variability; the manuscript should detail the exact cross-validation procedure during optimization and test for residual selection bias.
  3. [Abstract] Abstract: The claim that external validation 'confirming cross-population generalization' is overstated given the performance gap and near-chance per-record results; a more proportionate interpretation of generalizability, including potential ICU-specific signal characteristics, is needed to support the central claim.
minor comments (2)
  1. [Abstract] The abstract reports eta2 = 0.351 for the Kruskal-Wallis test but does not specify the conventional interpretation thresholds for effect size in this context.
  2. Clarify the exact implementation steps for correcting the three identified artifacts (e.g., how segment-level leakage was prevented) to ensure reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. These observations highlight important aspects of evaluation and interpretation that we will address in the revision. Below we provide point-by-point responses to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The disclosure of per-record AUC of 0.497 ± 0.207 (near chance) on the 18 held-out records undermines the central claim that SCSI detects cardiovascular stability, as the pooled AUC of 0.757 may primarily reflect inter-patient differences rather than intra-patient changes; this requires explicit discussion of within-record performance and whether the index meets its stability-detection objective.

    Authors: We agree that the near-chance per-record AUC suggests that SCSI's performance in the pooled metric is driven primarily by inter-patient differences rather than detecting intra-patient cardiovascular stability changes. This is a key limitation for real-time stability monitoring applications. In the revised version, we will add explicit discussion in the abstract and a dedicated subsection in the results or discussion to address within-record performance, clarify the distinction between inter- and intra-patient detection, and temper the claims regarding the index meeting its stability-detection objective. We will also explore potential reasons for the per-record variability, such as short record lengths or signal quality issues. revision: yes

  2. Referee: [Abstract] Abstract: Bayesian optimization over 15 joint parameters on segments from the 53 ICU records (with 18 held out) creates a load-bearing risk of overfitting to the training distribution, as evidenced by the 0.136 drop to AUC 0.621 on the separate 42 elective-surgery records and per-record variability; the manuscript should detail the exact cross-validation procedure during optimization and test for residual selection bias.

    Authors: We recognize the risk of overfitting with joint optimization of 15 parameters. The optimization was conducted via Bayesian optimization with 5-fold cross-validation strictly on the training segments from the 35 ICU records (53 total minus 18 held-out), using the Optuna framework with 100 trials to maximize the mean AUC across folds. To mitigate and assess selection bias, we will include in the methods section a full description of the procedure, including hyperparameter search space, acquisition function, and convergence criteria. Furthermore, we will add a sensitivity analysis showing performance when using default parameters versus optimized ones, and discuss the external validation drop as indicative of both potential mild overfitting and domain differences between ICU and elective-surgery populations. revision: yes

  3. Referee: [Abstract] Abstract: The claim that external validation 'confirming cross-population generalization' is overstated given the performance gap and near-chance per-record results; a more proportionate interpretation of generalizability, including potential ICU-specific signal characteristics, is needed to support the central claim.

    Authors: We accept that the original phrasing overstates the generalizability. The drop in AUC and the per-record results indicate that while there is some transfer, it is not robust across populations. In the revision, we will revise the abstract to state that external validation 'provides initial evidence of cross-population applicability' rather than 'confirming cross-population generalization'. We will expand the discussion to include potential ICU-specific signal characteristics (e.g., higher prevalence of artifacts, different demographics, or medication effects) that may limit generalization, and propose future work on domain adaptation to improve transfer performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper identifies three evaluation artifacts (segment-level CV leakage, test-set normalization leakage, pooled-AUC overweighting), corrects them, then applies Bayesian optimization over 15 parameters to produce the SCSI index before reporting AUC on 18 held-out records from the 53 ICU set and on a separate external set of 42 elective-surgery records. No quoted equation or step reduces the final performance metric to the optimization inputs by construction; the optimization is presented as parameter tuning rather than a first-principles derivation. Held-out and external validation sets are explicitly separated, per-record AUC near chance is disclosed, and no self-citation load-bearing, uniqueness theorem, or ansatz smuggling is invoked for the central claims. The derivation therefore remains self-contained against the external benchmarks provided.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on Cardiac Stability Theory as grounding, the validity of the three artifact corrections as the main inflation sources, and the assumption that 15-parameter Bayesian optimization yields a stable minimum without excessive overfitting. No new physical entities are postulated.

free parameters (1)
  • 15 joint parameters
    Optimized via Bayesian methods over the nonlinear complexity module and stability constraints; fitted to maximize AUC on the training segments.
axioms (1)
  • domain assumption Cardiac Stability Theory provides a valid principled nonlinear framework for PPG-based stability estimation.
    Invoked in the abstract as the grounding for SCSI; no derivation or external validation of the theory is provided in the abstract.
invented entities (1)
  • Stability-Constrained Cardiovascular Stability Index (SCSI) no independent evidence
    purpose: Nonlinear complexity index for wearable PPG cardiovascular stability estimation.
    New index constructed from complexity measures and stability constraints; no independent evidence outside the reported validations.

pith-pipeline@v0.9.0 · 5864 in / 1625 out tokens · 24898 ms · 2026-05-20T22:10:34.748044+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    TRIPOD: a reporting guideline for clinical prediction models,

    G. S. Collins, J. B. Reitsma, D. G. Altman, and K. G. M. Moons, “TRIPOD: a reporting guideline for clinical prediction models,”Annals of Internal Medicine, vol. 162, pp. 55–63, 2015

  2. [2]

    Fractal dynamics in physiology: alterations with disease and aging,

    A. L. Goldberger, L. A. N. Amaral, J. M. Hausdorff, P. C. Ivanov, C.-K. Peng, and H. E. Stanley, “Fractal dynamics in physiology: alterations with disease and aging,”Proceedings of the National Academy of Sciences, vol. 99, pp. 2466–2472, 2002

  3. [3]

    Multiscale entropy analysis of complex physiologic time series,

    M. Costa, A. L. Goldberger, and C.-K. Peng, “Multiscale entropy analysis of complex physiologic time series,”Physical Review Letters, vol. 89, p. 068102, 2002

  4. [4]

    Quantifi- cation of scaling exponents and crossover phenomena in nonstationary heartbeat time series,

    C.-K. Peng, S. Havlin, H. E. Stanley, and A. L. Goldberger, “Quantifi- cation of scaling exponents and crossover phenomena in nonstationary heartbeat time series,”Chaos, vol. 5, pp. 82–87, 1995

  5. [5]

    Heart rate variability: standards of measurement, physiological interpretation and clinical use,

    Task Force of the European Society of Cardiology and the North Amer- ican Society of Pacing and Electrophysiology, “Heart rate variability: standards of measurement, physiological interpretation and clinical use,” Circulation, vol. 93, pp. 1043–1065, 1996

  6. [6]

    An overview of heart rate variability metrics and norms,

    F. Shaffer and J. P. Ginsberg, “An overview of heart rate variability metrics and norms,”Frontiers in Public Health, vol. 5, p. 258, 2017

  7. [7]

    Photoplethysmography and its application in clinical physio- logical measurement,

    J. Allen, “Photoplethysmography and its application in clinical physio- logical measurement,”Physiological Measurement, vol. 28, pp. R1–R39, 2007

  8. [8]

    On the analysis of fingertip photoplethysmogram signals,

    M. Elgendi, “On the analysis of fingertip photoplethysmogram signals,” Current Cardiology Reviews, vol. 8, pp. 14–25, 2012

  9. [9]

    ECG statistics, noise, artifacts, and missing data,

    G. D. Clifford, F. Azuaje, and P. E. McSharry, “ECG statistics, noise, artifacts, and missing data,” inAdvanced Methods and Tools for ECG Data Analysis. Artech House, 2007, pp. 55–99

  10. [10]

    A practical method for calculating largest Lyapunov exponents from small data sets,

    M. T. Rosenstein, J. J. Collins, and C. J. De Luca, “A practical method for calculating largest Lyapunov exponents from small data sets,” Physica D: Nonlinear Phenomena, vol. 65, pp. 117–134, 1993

  11. [11]

    Detecting strange attractors in turbulence,

    F. Takens, “Detecting strange attractors in turbulence,” inDynamical Systems and Turbulence. Springer, 1981, pp. 366–381

  12. [12]

    Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,

    A. Y . Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y . Ng, “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,”Nature Medicine, vol. 25, pp. 65–69, 2019

  13. [13]

    Deep learning for ECG analysis: benchmarks and insights from PTB-XL,

    N. Str ¨odthoff, P. Wagner, T. Schaeffter, and W. Samek, “Deep learning for ECG analysis: benchmarks and insights from PTB-XL,”IEEE Journal of Biomedical and Health Informatics, vol. 25, pp. 1519–1528, 2021

  14. [14]

    Cardiac Stability Theory: An Axiomatically Grounded Framework for Continuous Cardiac Health Monitoring via Smartphone Photoplethysmography

    T. Oladunni and F. G. Adewumi, “Cardiac stability theory: An axiomatically grounded framework for continuous cardiac health monitoring via smartphone photoplethysmography,” 2026. [Online]. Available: https://arxiv.org/abs/2604.23876

  15. [15]

    Explainable deep neural network for multimodal ECG signals: Intermediate versus late fusion,

    T. Oladunni and E. Aneni, “Explainable deep neural network for multimodal ECG signals: Intermediate versus late fusion,”IEEE Access, vol. 13, pp. 202 700–202 736, 2025

  16. [16]

    Physiological time-series analysis using approximate entropy and sample entropy,

    J. S. Richman and J. R. Moorman, “Physiological time-series analysis using approximate entropy and sample entropy,”American Journal of Physiology—Heart and Circulatory Physiology, vol. 278, pp. H2039– H2049, 2000

  17. [17]

    Approximate entropy as a measure of system complexity,

    S. M. Pincus, “Approximate entropy as a measure of system complexity,” Proceedings of the National Academy of Sciences, vol. 88, pp. 2297– 2301, 1991

  18. [18]

    Sample entropy analysis of neonatal heart rate variability,

    D. E. Lake, J. S. Richman, M. P. Griffin, and J. R. Moorman, “Sample entropy analysis of neonatal heart rate variability,”American Journal of Physiology—Regulatory, Integrative and Comparative Physiology, vol. 283, pp. R789–R797, 2002

  19. [19]

    Approximate entropy and sample entropy: a comprehensive tutorial,

    A. Delgado-Bonal and A. Marshak, “Approximate entropy and sample entropy: a comprehensive tutorial,”Entropy, vol. 21, p. 541, 2019

  20. [20]

    Approach to an irregular time series on the basis of the fractal theory,

    T. Higuchi, “Approach to an irregular time series on the basis of the fractal theory,”Physica D: Nonlinear Phenomena, vol. 31, pp. 277–283, 1988

  21. [21]

    A robust method to estimate the maximal Lyapunov exponent of a time series,

    H. Kantz, “A robust method to estimate the maximal Lyapunov exponent of a time series,”Physics Letters A, vol. 185, pp. 77–87, 1994

  22. [22]

    Toward a robust estimation of respiratory rate from pulse oximeters,

    M. A. F. Pimentel, A. E. W. Johnson, P. H. Charlton, D. Birrenkott, G. D. Clifford, L. Tarassenko, and D. A. Clifton, “Toward a robust estimation of respiratory rate from pulse oximeters,”IEEE Transactions on Biomedical Engineering, vol. 64, pp. 1914–1923, 2017

  23. [23]

    Deep learning for chest radiographs,

    P. Rajpurkar, J. Irvin, R. L. Ballet al., “Deep learning for chest radiographs,”PLOS Medicine, vol. 15, 2018

  24. [24]

    Tree-structured Parzen estimator: understanding its al- gorithm components and their roles for better empirical performance,

    S. Watanabe, “Tree-structured Parzen estimator: understanding its al- gorithm components and their roles for better empirical performance,” 2023

  25. [25]

    Embeddings and delays as derived from quantification of recurrence plots,

    J. P. Zbilut and C. L. Webber, “Embeddings and delays as derived from quantification of recurrence plots,”Physics Letters A, vol. 171, no. 3–4, pp. 199–203, 1992

  26. [26]

    PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals,

    A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals,”Circulation, vol. 101, pp. e215–e220, 2000

  27. [27]

    Physiological abnormalities in early warning scores are related to mortality in adult inpatients,

    D. R. Goldhill and A. F. McNarry, “Physiological abnormalities in early warning scores are related to mortality in adult inpatients,”British Journal of Anaesthesia, vol. 92, pp. 882–884, 2005

  28. [28]

    CapnoBase: signal database and tools to collect, share and annotate respiratory signals,

    W. Karlen, M. Turner, E. Cooke, G. Dumont, and J. M. Ansermino, “CapnoBase: signal database and tools to collect, share and annotate respiratory signals,” inProceedings of the Annual Meeting of the Society for Technology in Anesthesia, 2010, iEEE TBME Respiratory Rate Benchmark. Available: https://borealisdata.ca/dataverse/capnobase

  29. [29]

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,

    E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,”Biometrics, vol. 44, pp. 837–845, 1988

  30. [30]

    Alarm fatigue: a patient safety concern,

    S. Sendelbach and M. Funk, “Alarm fatigue: a patient safety concern,” AACN Advanced Critical Care, vol. 24, pp. 378–386, 2013