KindSleep: Knowledge-Informed Diagnosis of Obstructive Sleep Apnea from Oximetry

Benjamin M Smith; Chad A Purnell; Cheng Wan; J. Ben Tamo; May D Wang; Micky C Nnamdi; Wenqi Shi

arxiv: 2603.04755 · v2 · submitted 2026-03-05 · 💻 cs.LG

KindSleep: Knowledge-Informed Diagnosis of Obstructive Sleep Apnea from Oximetry

Micky C Nnamdi , Wenqi Shi , Cheng Wan , J. Ben Tamo , Benjamin M Smith , Chad A Purnell , May D Wang This is my paper

Pith reviewed 2026-05-15 16:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords obstructive sleep apneaoximetryAHI estimationdeep learningclinical concept learninginterpretable diagnosissleep medicine

0 comments

The pith

KindSleep learns clinically meaningful concepts from single-channel oximetry to estimate AHI and classify OSA severity with high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KindSleep as a framework that first extracts interpretable clinical concepts such as desaturation indices and respiratory disturbance events from raw oximetry signals, then fuses these with additional clinical data to predict the Apnea-Hypopnea Index. This knowledge-informed approach seeks to replace resource-heavy polysomnography with a simpler, more accessible diagnostic method for a disorder affecting nearly a billion people and raising cardiovascular risks. Evaluation on three large independent datasets totaling over 9,800 subjects shows strong correlation to reference AHI values and superior severity classification compared to prior methods. By anchoring predictions in explicit clinical concepts rather than opaque features, the system aims to increase transparency and trust in automated sleep medicine tools.

Core claim

KindSleep first learns to identify clinically interpretable concepts, such as desaturation indices and respiratory disturbance events, directly from raw oximetry signals. It then fuses these AI-derived concepts with multimodal clinical data to estimate the Apnea-Hypopnea Index, achieving an R2 of 0.917 and ICC of 0.957 while delivering weighted F1-scores between 0.827 and 0.941 for OSA severity classification across diverse populations.

What carries the argument

An intermediate layer that extracts desaturation indices and respiratory disturbance events from raw oximetry before fusing them with clinical data to predict AHI.

If this is right

OSA diagnosis becomes feasible with a single wearable sensor rather than full overnight polysomnography.
Predictions carry explicit clinical concepts that clinicians can inspect for validation.
Severity classification remains reliable across varied demographic groups in the tested datasets.
The method reduces reliance on specialized sleep laboratories for initial screening.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integration with consumer-grade pulse oximeters could enable population-level screening programs.
The same concept-learning structure might transfer to other physiological signal tasks where clinical interpretability matters.
If the learned concepts prove robust, regulatory approval pathways for AI diagnostics could become simpler due to built-in transparency.
Real-time deployment on home devices would allow longitudinal tracking of AHI changes over weeks rather than single-night snapshots.

Load-bearing premise

The intermediate concepts extracted from oximetry actually correspond to real clinical events instead of mere statistical patterns that may not hold outside the training data.

What would settle it

A new independent dataset from a different population or oximetry device on which the model's AHI estimates show substantially lower correlation with polysomnography ground truth.

Figures

Figures reproduced from arXiv: 2603.04755 by Benjamin M Smith, Chad A Purnell, Cheng Wan, J. Ben Tamo, May D Wang, Micky C Nnamdi, Wenqi Shi.

**Figure 1.** Figure 1: Overview of KindSleep. KindSleep involved two main components: the sleep annotation model, which extracts clinically relevant metrics from raw oximetry signals, and the regression model, which integrates these metrics with processed clinical data to estimate the AHI. (Right) Example of oximetry signals from a mild OSA patient (top; reference AHI = 5.65) and a healthy control (bottom; reference AHI = 0.175)… view at source ↗

**Figure 2.** Figure 2: (a) Parity plots, (b) Bland–Altman plots, and (c) confusion matrix results for SHHS1, SHHS2, CFS and MrOS. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Outcome comparison across varying proportions of knowledge-informed metrics. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Radar charts comparing various performance metrics of our [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Attention mechanism employed by the SLAM model across oximetry signals, with events (e.g., desaturation, apnea, and artifacts) identified from ground truth annotations. The top section displays the global signal over the full duration (0–25,200 seconds), highlighting areas of high activation that correspond to physiologically relevant events, such as desaturation and apnea, while effectively ignoring art… view at source ↗

**Figure 7.** Figure 7: Relationship between the 𝐹1 scores, MAE, and RMSE as errors are intercepted. We observed that identifying and adjusting these errors before passing them to the regression model during training significantly improves the system architecture’s performance. slightly higher at 0.839 ± 0.053. All 𝐹1 scores are reported with 95% confidence intervals, demonstrating consistent performance across BMI categories. An… view at source ↗

read the original abstract

Obstructive sleep apnea (OSA) is a sleep disorder that affects nearly one billion people globally and significantly elevates cardiovascular risk. Traditional diagnosis through polysomnography is resource-intensive and limits widespread access, creating a critical need for accurate and efficient alternatives. In this paper, we introduce KindSleep, a deep learning framework that integrates clinical knowledge with single-channel patient-specific oximetry signals and clinical data for precise OSA diagnosis. KindSleep first learns to identify clinically interpretable concepts, such as desaturation indices and respiratory disturbance events, directly from raw oximetry signals. It then fuses these AI-derived concepts with multimodal clinical data to estimate the Apnea-Hypopnea Index (AHI). We evaluate KindSleep on three large, independent datasets from the National Sleep Research Resource (SHHS, CFS, MrOS; total n = 9,815). KindSleep demonstrates excellent performance in estimating AHI scores (R2 = 0.917, ICC = 0.957) and consistently outperforms existing approaches in classifying OSA severity, achieving weighted F1-scores from 0.827 to 0.941 across diverse populations. By grounding its predictions in a layer of clinically meaningful concepts, KindSleep provides a more transparent and trustworthy diagnostic tool for sleep medicine practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KindSleep's two-stage concept extraction from oximetry before AHI fusion is a clean design choice with strong reported numbers on three big cohorts, but the concepts lack direct clinical validation.

read the letter

The main thing to know is that this paper puts a named clinical concept layer in front of the final AHI regression. It first trains on raw oximetry to pull out things like desaturation indices and respiratory disturbance events, then fuses those with tabular clinical variables. That explicit intermediate step is not the usual end-to-end setup in the OSA signal literature, and the numbers look solid: R2 of 0.917 and ICC of 0.957 on held-out data from SHHS, CFS, and MrOS, plus weighted F1 scores between 0.827 and 0.941 for severity classification. Three independent cohorts totaling nearly ten thousand subjects is a real strength for any claim about broader applicability. The authors also show consistent outperformance over prior approaches on the same data splits. The soft spot is exactly what the stress-test flagged. Nothing in the write-up shows that the learned concepts line up with expert-annotated event boundaries or durations; the performance could come from the model discovering useful statistical patterns without recovering true clinical events. The abstract also gives no architecture diagram, loss details, hyperparameter search, or imbalance handling, so it is hard to judge how much of the result is robust versus tuned. This is the kind of paper that belongs in a reading group for people working on interpretable medical signal models. Clinicians or groups focused on low-cost OSA screening would find the scale and the design choice useful even if they want tighter validation of the concepts. It deserves peer review because the datasets are large and public, the central performance numbers are falsifiable, and the architecture idea is straightforward to test or extend.

Referee Report

3 major / 2 minor

Summary. The paper introduces KindSleep, a deep learning framework that first extracts clinically interpretable concepts (desaturation indices and respiratory disturbance events) from raw single-channel oximetry signals before fusing them with multimodal clinical data to regress the Apnea-Hypopnea Index (AHI) and classify OSA severity. It reports R² = 0.917 and ICC = 0.957 for AHI estimation together with weighted F1-scores of 0.827–0.941 on three independent NSRR datasets (SHHS, CFS, MrOS; total n = 9,815), claiming consistent outperformance of existing methods.

Significance. If the intermediate concept layer can be shown to recover clinically validated events rather than dataset-specific statistical correlates, KindSleep would constitute a transparent, single-channel alternative to polysomnography that could meaningfully expand diagnostic access. The multi-cohort evaluation already provides a stronger empirical foundation than most single-site oximetry studies.

major comments (3)

[Abstract / Methods] Abstract and Methods: the manuscript supplies no architecture diagram, loss-function definitions, hyperparameter search protocol, or handling of class imbalance and missing clinical covariates, so the reported R² = 0.917 and F1 scores cannot be independently verified as robust rather than the result of unstated tuning.
[Methods] Methods (concept-extraction module): no quantitative validation is presented that the learned desaturation indices or respiratory-disturbance events align with expert-annotated event boundaries or durations on any held-out set; without this alignment check the “knowledge-informed” claim reduces to an untested architectural choice.
[Results] Results: the performance tables compare against external baselines but contain no ablation that isolates the contribution of the intermediate concept layer versus an otherwise identical end-to-end oximetry regressor, leaving open whether the interpretability component is load-bearing or incidental.

minor comments (2)

[Introduction] The global prevalence figure in the introduction should be cited to a specific reference rather than stated without attribution.
[Figures] Figure captions should explicitly state the number of patients and the train/validation/test split sizes used for each dataset.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. These points identify important gaps in reproducibility, validation of interpretability, and empirical support for the concept layer. We address each below and commit to revisions that strengthen the manuscript without overstating current results.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: the manuscript supplies no architecture diagram, loss-function definitions, hyperparameter search protocol, or handling of class imbalance and missing clinical covariates, so the reported R² = 0.917 and F1 scores cannot be independently verified as robust rather than the result of unstated tuning.

Authors: We agree that these implementation details are required for independent verification. The revised manuscript will include a full architecture diagram, explicit mathematical definitions of all loss terms (concept supervision, regression, and classification), a description of the hyperparameter search (grid ranges, cross-validation procedure, and final selected values), and the exact strategies used for class imbalance (weighted sampling and loss re-weighting) and missing covariates (multiple imputation with sensitivity checks). revision: yes
Referee: [Methods] Methods (concept-extraction module): no quantitative validation is presented that the learned desaturation indices or respiratory-disturbance events align with expert-annotated event boundaries or durations on any held-out set; without this alignment check the “knowledge-informed” claim reduces to an untested architectural choice.

Authors: We acknowledge that direct quantitative alignment with expert event boundaries was not reported. The concept layer was trained with clinically derived supervision signals, but we did not compute overlap or duration metrics against held-out expert annotations. In revision we will add such an analysis on the subset of data where event-level annotations exist, reporting precision-recall for event detection and Pearson correlation for durations; if annotation coverage is insufficient we will explicitly note this limitation and treat it as future work. revision: partial
Referee: [Results] Results: the performance tables compare against external baselines but contain no ablation that isolates the contribution of the intermediate concept layer versus an otherwise identical end-to-end oximetry regressor, leaving open whether the interpretability component is load-bearing or incidental.

Authors: We agree that an ablation isolating the concept layer is necessary. The revised results section will include performance of an otherwise identical end-to-end model (same backbone, same multimodal fusion, same training protocol) trained directly on raw oximetry, allowing direct comparison of R², ICC, and F1 scores with and without the intermediate concept layer. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes a deep learning model that extracts intermediate concepts from oximetry signals and regresses AHI using held-out external ground-truth labels from independent datasets (SHHS, CFS, MrOS). No equations, derivations, or self-referential steps are presented; performance (R2, ICC, F1) is measured against separate clinical annotations rather than being forced by construction from fitted inputs or self-citations. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the model is described only at the level of 'deep learning framework' and 'clinically interpretable concepts,' so the ledger remains empty pending full text.

pith-pipeline@v0.9.0 · 5552 in / 1215 out tokens · 65917 ms · 2026-05-15T16:30:42.859017+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

Rexford S Ahima and Mitchell A Lazar. 2013. The health risk of obesity—better metrics imperative.Science341, 6148 (2013), 856–858

work page 2013
[2]

Ángel Serrano Alarcón, Natividad Martínez Madrid, Ralf Seepold, and Juan An- tonio Ortega. 2023. Obstructive sleep apnea event detection using explainable deep learning models for a portable monitor.Frontiers in neuroscience17 (2023), 1155900

work page 2023
[3]

Lachlan D Barnes, Kevin Lee, Andreas W Kempa-Liehr, and Luke E Hallum

work page
[4]

Detection of sleep apnea from single-channel electroencephalogram (EEG) using an explainable convolutional neural network (CNN).PLOS one17, 9 (2022), e0272167

work page 2022
[5]

Terri Blackwell, Kristine Yaffe, Sonia Ancoli-Israel, Susan Redline, Kristine E Ensrud, Marcia L Stefanick, Alison Laffan, Katie L Stone, and Osteoporotic Frac- tures in Men Study Group. 2011. Associations between sleep architecture and sleep-disordered breathing and cognition in older community-dwelling men: the osteoporotic fractures in men sleep study....

work page 2011
[6]

Carly A Bobak, Paul J Barr, and A James O’Malley. 2018. Estimation of an inter- rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales.BMC medical research methodology18 (2018), 1–11

work page 2018
[7]

Pablo E Brockmann, Christine Schaefer, Anette Poets, Christian F Poets, and Michael S Urschitz. 2013. Diagnosis of obstructive sleep apnea in children: a systematic review.Sleep medicine reviews17, 5 (2013), 331–340

work page 2013
[8]

Parnasree Chakraborty and C Tharini. 2024. Non-invasive cuff free blood pressure and heart rate measurement from photoplethysmography (PPG) signal using machine learning.Wireless Personal Communications(2024), 1–13

work page 2024
[9]

Jeng-Wen Chen, Chia-Ming Liu, Cheng-Yi Wang, Chun-Cheng Lin, Kai-Yang Qiu, Cheng-Yu Yeh, and Shaw-Hwa Hwang. 2023. A deep neural network-based model for OSA severity classification using unsegmented peripheral oxygen saturation signals.Engineering Applications of Artificial Intelligence122 (2023), 106161

work page 2023
[10]

Hung-Ying Chi, Cheng-Yu Yeh, Jeng-Wen Chen, Cheng-Yi Wang, and Shaw-Hwa Hwang. 2024. Apnea-Hypopnea Index Prediction for Obstructive Sleep Apnea Using Unsegmented SpO2 Signals and Deep Learning.IEEJ Transactions on Electrical and Electronic Engineering19, 3 (2024), 448–450

work page 2024
[11]

Felipe Contreras-Briceño, Jorge Cancino, Maximiliano Espinosa-Ramírez, Gon- zalo Fernández, Vader Johnson, and Daniel E Hurtado. 2024. Estimation of ventilatory thresholds during exercise using respiratory wearable sensors.NPJ Digital Medicine7, 1 (2024), 198

work page 2024
[12]

Danny J Eckert and Atul Malhotra. 2008. Pathophysiology of adult obstructive sleep apnea.Proceedings of the American thoracic society5, 2 (2008), 144–153

work page 2008
[13]

Deema Fattal, Stacy Hester, and Linder Wendt. 2022. Body weight and obstructive sleep apnea: a mathematical relationship between body mass index and apnea- hypopnea index in veterans.Journal of Clinical Sleep Medicine18, 12 (2022), 2723–2729

work page 2022
[14]

Hamed Fayyaz, Niharika S D’Souza, and Rahmatollah Beheshti. 2024. Multimodal sleep apnea detection with missing or noisy modalities.Proceedings of machine learning research252 (2024), https–proceedings

work page 2024
[15]

Felipe Giuste, Wenqi Shi, Yuanda Zhu, Tarun Naren, Monica Isgut, Ying Sha, Li Tong, Mitali Gupte, and May D Wang. 2022. Explainable artificial intelli- gence methods in combating pandemics: A systematic review.IEEE Reviews in Biomedical Engineering16 (2022), 5–21

work page 2022
[16]

Felipe O Giuste, Lawrence L He, Monica Isgut, Wenqi Shi, Blake J Anderson, and May D Wang. 2021. Automated risk assessment of COVID-19 patients at diagnosis using electronic healthcare records. In2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4. , , Nnamdi et al

work page 2021
[17]

Gonzalo C Gutiérrez-Tobal, Daniel Álvarez, Fernando Vaquerizo-Villar, Andrea Crespo, Leila Kheirandish-Gozal, David Gozal, Félix del Campo, and Roberto Hornero. 2021. Ensemble-learning regression to estimate sleep apnea severity using at-home oximetry in adults.Applied Soft Computing111 (2021), 107827

work page 2021
[18]

David W Hudgel. 2016. Sleep apnea severity classification—revisited.Sleep39, 5 (2016), 1165–1166

work page 2016
[19]

Shiroh Isono, David S Warner, and Mark A Warner. 2009. Obstructive sleep apnea of obese adults: pathophysiology and perioperative airway management. Anesthesiology110, 4 (2009), 908–921

work page 2009
[20]

Bong Gyun Kang, Dongjun Lee, HyunGi Kim, and DoHyun Chung. 2024. Introduc- ing Spectral Attention for Long-Range Dependency in Time Series Forecasting. arXiv preprint arXiv:2410.20772(2024)

work page arXiv 2024
[21]

Brendan T Keenan, H Lester Kirchner, Olivia J Veatch, Kenneth M Borthwick, Vicki A Davenport, John C Feemster, Maged Gendy, Thomas R Gossard, Frances M Pack, Laura Sirikulvadhana, et al. 2020. Multisite validation of a simple electronic health record algorithm for identifying diagnosed obstructive sleep apnea.Journal of Clinical Sleep Medicine16, 2 (2020)...

work page 2020
[22]

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pier- son, Been Kim, and Percy Liang. 2020. Concept bottleneck models. InInternational conference on machine learning. PMLR, 5338–5348

work page 2020
[23]

Jeremy Levy, Daniel Álvarez, Félix Del Campo, and Joachim A Behar. 2023. Deep learning for obstructive sleep apnea diagnosis based on single channel oximetry. Nature Communications14, 1 (2023), 4881

work page 2023
[24]

Xilin Li, Frank HF Leung, Steven Su, and Sai Ho Ling. 2022. Sleep apnea detec- tion using multi-error-reduction classification system with multiple bio-signals. Sensors22, 15 (2022), 5560

work page 2022
[25]

Caíque Santos Lima. 2022. OxiTidy: motion artifact detection-reduction in pho- toplethysmographic signals using artificial neural networks. (2022)

work page 2022
[26]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)

work page 2017
[27]

M Melanie Lyons, Nitin Y Bhatt, Allan I Pack, and Ulysses J Magalang. 2020. Global burden of sleep-disordered breathing and its implications.Respirology25, 7 (2020), 690–702

work page 2020
[28]

Xiaoping Ming, Minlan Yang, and Xiong Chen. 2021. Metabolic bariatric surgery as a treatment for obstructive sleep apnea hypopnea syndrome: review of the literature and potential mechanisms.Surgery for Obesity and Related Diseases17, 1 (2021), 215–220

work page 2021
[29]

Amal K Mitra, Azad R Bhuiyan, and Elizabeth A Jones. 2021. Association and risk factors for obstructive sleep apnea and cardiovascular diseases: a systematic review.Diseases9, 4 (2021), 88

work page 2021
[30]

Stefano Nardini, Ulisse Corbanese, Alberto Visconti, Jacopo Dalle Mule, Clau- dio M Sanguinetti, and Fernando De Benedetto. 2023. Improving the manage- ment of patients with chronic cardiac and respiratory diseases by extending pulse-oximeter uses: the dynamic pulse-oximetry.Multidisciplinary Respiratory Medicine18, 1 (2023)

work page 2023
[31]

Micky C Nnamdi, Junior Ben Tamo, Sara Stackpole, Wenqi Shi, Benoit Marteau, and May Dongmei Wang. 2023. Model confidence calibration for reliable covid- 19 early screening via audio signal analysis. InProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 1–6

work page 2023
[32]

Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Henry J Iwinski, J Michael Wattenbarger, and May D Wang. 2023. Concept Bottleneck Model for Adolescent Idiopathic Scoliosis Patient Reported Outcomes Prediction. In2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4

work page 2023
[33]

Stuart F Quan, Barbara V Howard, Conrad Iber, James P Kiley, F Javier Nieto, George T O’Connor, David M Rapoport, Susan Redline, John Robbins, Jonathan M Samet, et al. 1997. The sleep heart health study: design, rationale, and methods. Sleep20, 12 (1997), 1077–1085

work page 1997
[34]

Asher Qureshi, Robert D Ballard, and Harold S Nelson. 2003. Obstructive sleep apnea.Journal of Allergy and Clinical Immunology112, 4 (2003), 643–651

work page 2003
[35]

Susan Redline, Peter V Tishler, Tor D Tosteson, John Williamson, Kenneth Kump, Ilene Browner, Veronica Ferrette, and Patrick Krejci. 1995. The familial aggrega- tion of obstructive sleep apnea.American journal of respiratory and critical care medicine151, 3 (1995), 682–687

work page 1995
[36]

Abraham Savitzky and Marcel JE Golay. 1964. Smoothing and differentiation of data by simplified least squares procedures.Analytical chemistry36, 8 (1964), 1627–1639

work page 1964
[37]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE interna- tional conference on computer vision. 618–626

work page 2017
[38]

Chamara V Senaratna, Jennifer L Perret, Caroline J Lodge, Adrian J Lowe, Brit- tany E Campbell, Melanie C Matheson, Garun S Hamilton, and Shyamali C Dharmage. 2017. Prevalence of obstructive sleep apnea in the general population: a systematic review.Sleep medicine reviews34 (2017), 70–81

work page 2017
[39]

Mahmoud Y Shams, Ahmed M Elshewey, El-Sayed M El-kenawy, Abdelhameed Ibrahim, Fatma M Talaat, and Zahraa Tarek. 2024. Water quality prediction using machine learning models based on grid search method.Multimedia Tools and Applications83, 12 (2024), 35307–35334

work page 2024
[40]

Wenqi Shi, Felipe O Giuste, Yuanda Zhu, Ben J Tamo, Micky C Nnamdi, Andrew Hornback, Ashley M Carpenter, Coleman Hilton, Henry J Iwinski, J Michael Wat- tenbarger, et al. 2025. Predicting pediatric patient rehabilitation outcomes after spinal deformity surgery with artificial intelligence.Communications Medicine5, 1 (2025), 1

work page 2025
[41]

Wenqi Shi, Mitali S Gupte, and May D Wang. 2021. Learning from heterogeneous data via contrastive learning: An application in multi-source covid-19 radiog- raphy. In2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4

work page 2021
[42]

Wenqi Shi*, Ran Xu*, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D Wang. 2024. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational L...

work page 2024
[43]

J Ben Tamo, Micky C Nnamdi, Lea Lesbats, Wenqi Shi, Yishan Zhong, and May D Wang. 2023. Uncertainty-aware ensemble learning models for out-of-distribution medical imaging analysis. In2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 4243–4250

work page 2023
[44]

Tue T Te, Brendan T Keenan, Olivia J Veatch, Mary Regina Boland, Rebecca A Hubbard, and Allan I Pack. 2024. Identifying clusters of patient comorbidities associated with obstructive sleep apnea using electronic health records.Journal of Clinical Sleep Medicine20, 4 (2024), 521–533

work page 2024
[45]

MB Uddin, CM Chow, and SW Su. 2018. Classification methods to detect sleep apnea in adults based on respiratory and oximetry signals: a systematic review. Physiological measurement39, 3 (2018), 03TR01

work page 2018
[46]

Ahmed Uzair, Muhammad Waseem, Aun Bin Shahid, Nauman I Bhatti, Muham- mad Arshad, Asher Ishaq, Muhammad Sajawal, Zoha Toor, and Osama Ahmad

work page
[47]

Correlation Between Body Mass Index and Apnea-Hypopnea Index or Nadir Oxygen Saturation Levels in Patients With Obstructive Sleep Apnea.Cureus16, 4 (2024)

work page 2024
[48]

Tom Van Steenkiste, Willemijn Groenendaal, Dirk Deschrijver, and Tom Dhaene

work page
[49]

Automated sleep apnea detection in raw respiratory signals using long short-term memory neural networks.IEEE journal of biomedical and health informatics23, 6 (2018), 2354–2364

work page 2018
[50]

Janani Venugopalan, Li Tong, Hamid Reza Hassanzadeh, and May D Wang. 2021. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Scientific reports11, 1 (2021), 3254

work page 2021
[51]

Sofiya Vyshnya, Rachel Epperson, Felipe Giuste, Wenqi Shi, Andrew Hornback, and May D Wang. 2024. Optimized clinical feature analysis for improved cardio- vascular disease risk screening.IEEE Open Journal of Engineering in Medicine and Biology5 (2024), 816–827

work page 2024
[52]

Cheng Wan, Micky C Nnamdi, Wenqi Shi, Benjamin Smith, Chad Purnell, and May D Wang. 2024. Advancing Sleep Disorder Diagnostics: A Transformer-based EEG Model for Sleep Stage Classification and OSA Prediction.IEEE Journal of Biomedical and Health Informatics(2024)

work page 2024
[53]

Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xu- anwu Yin, and Kunlong Zuo. 2023. Swift Parameter-free Attention Network for Efficient Super-Resolution.arXiv preprint arXiv:2311.12770(2023)

work page arXiv 2023
[54]

Hang Wu, Wenqi Shi, Anirudh Choudhary, and May D Wang. 2024. Clinical decision making under uncertainty: a bootstrapped counterfactual inference approach.BMC Medical Informatics and Decision Making24, 1 (2024), 1–15

work page 2024
[55]

Hang Wu, Wenqi Shi, and May D Wang. 2024. Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning.BMC Medical Informatics and Decision Making24, 1 (2024), 137

work page 2024
[56]

Junhao Wu and Zhaocai Wang. 2022. A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory.Water14, 4 (2022), 610

work page 2022
[57]

Ran Xu, Yuchen Zhuang, Yishan Zhong, Yue Yu, Xiangru Tang, Hang Wu, May D Wang, Peifeng Ruan, Donghan Yang, Tao Wang, et al . 2025. MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale.arXiv preprint arXiv:2506.04405(2025)

work page arXiv 2025
[58]

Terry Young, Paul E Peppard, and Daniel J Gottlieb. 2002. Epidemiology of obstructive sleep apnea: a population health perspective.American journal of respiratory and critical care medicine165, 9 (2002), 1217–1239

work page 2002
[59]

Xin Zan, Di Wang, Changyue Song, Feng Liu, Xiaochen Xian, and Richard Berry

work page
[60]

Weakly Supervised Deep Learning for Monitoring Sleep Apnea Sever- ity Using Coarsegrained Labels.IEEE Transactions on Automation Science and Engineering(2025)

work page 2025
[61]

Guo-Qiang Zhang, Licong Cui, Remo Mueller, Shiqiang Tao, Matthew Kim, Michael Rueschman, Sara Mariani, Daniel Mobley, and Susan Redline. 2018. The National Sleep Research Resource: towards a sleep data commons.Journal of the American Medical Informatics Association25, 10 (2018), 1351–1358

work page 2018
[62]

Ying Y Zhao, Rui Wang, Kevin J Gleason, Eldrin F Lewis, Stuart F Quan, Claudia M Toth, Michael Morrical, Michael Rueschman, Jia Weng, James H Ware, et al. 2017. Effect of continuous positive airway pressure treatment on health-related quality of life and sleepiness in high cardiovascular risk individuals with sleep apnea: KindSleep: Knowledge-Informed Dia...

work page 2017

[1] [1]

Rexford S Ahima and Mitchell A Lazar. 2013. The health risk of obesity—better metrics imperative.Science341, 6148 (2013), 856–858

work page 2013

[2] [2]

Ángel Serrano Alarcón, Natividad Martínez Madrid, Ralf Seepold, and Juan An- tonio Ortega. 2023. Obstructive sleep apnea event detection using explainable deep learning models for a portable monitor.Frontiers in neuroscience17 (2023), 1155900

work page 2023

[3] [3]

Lachlan D Barnes, Kevin Lee, Andreas W Kempa-Liehr, and Luke E Hallum

work page

[4] [4]

Detection of sleep apnea from single-channel electroencephalogram (EEG) using an explainable convolutional neural network (CNN).PLOS one17, 9 (2022), e0272167

work page 2022

[5] [5]

Terri Blackwell, Kristine Yaffe, Sonia Ancoli-Israel, Susan Redline, Kristine E Ensrud, Marcia L Stefanick, Alison Laffan, Katie L Stone, and Osteoporotic Frac- tures in Men Study Group. 2011. Associations between sleep architecture and sleep-disordered breathing and cognition in older community-dwelling men: the osteoporotic fractures in men sleep study....

work page 2011

[6] [6]

Carly A Bobak, Paul J Barr, and A James O’Malley. 2018. Estimation of an inter- rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales.BMC medical research methodology18 (2018), 1–11

work page 2018

[7] [7]

Pablo E Brockmann, Christine Schaefer, Anette Poets, Christian F Poets, and Michael S Urschitz. 2013. Diagnosis of obstructive sleep apnea in children: a systematic review.Sleep medicine reviews17, 5 (2013), 331–340

work page 2013

[8] [8]

Parnasree Chakraborty and C Tharini. 2024. Non-invasive cuff free blood pressure and heart rate measurement from photoplethysmography (PPG) signal using machine learning.Wireless Personal Communications(2024), 1–13

work page 2024

[9] [9]

Jeng-Wen Chen, Chia-Ming Liu, Cheng-Yi Wang, Chun-Cheng Lin, Kai-Yang Qiu, Cheng-Yu Yeh, and Shaw-Hwa Hwang. 2023. A deep neural network-based model for OSA severity classification using unsegmented peripheral oxygen saturation signals.Engineering Applications of Artificial Intelligence122 (2023), 106161

work page 2023

[10] [10]

Hung-Ying Chi, Cheng-Yu Yeh, Jeng-Wen Chen, Cheng-Yi Wang, and Shaw-Hwa Hwang. 2024. Apnea-Hypopnea Index Prediction for Obstructive Sleep Apnea Using Unsegmented SpO2 Signals and Deep Learning.IEEJ Transactions on Electrical and Electronic Engineering19, 3 (2024), 448–450

work page 2024

[11] [11]

Felipe Contreras-Briceño, Jorge Cancino, Maximiliano Espinosa-Ramírez, Gon- zalo Fernández, Vader Johnson, and Daniel E Hurtado. 2024. Estimation of ventilatory thresholds during exercise using respiratory wearable sensors.NPJ Digital Medicine7, 1 (2024), 198

work page 2024

[12] [12]

Danny J Eckert and Atul Malhotra. 2008. Pathophysiology of adult obstructive sleep apnea.Proceedings of the American thoracic society5, 2 (2008), 144–153

work page 2008

[13] [13]

Deema Fattal, Stacy Hester, and Linder Wendt. 2022. Body weight and obstructive sleep apnea: a mathematical relationship between body mass index and apnea- hypopnea index in veterans.Journal of Clinical Sleep Medicine18, 12 (2022), 2723–2729

work page 2022

[14] [14]

Hamed Fayyaz, Niharika S D’Souza, and Rahmatollah Beheshti. 2024. Multimodal sleep apnea detection with missing or noisy modalities.Proceedings of machine learning research252 (2024), https–proceedings

work page 2024

[15] [15]

Felipe Giuste, Wenqi Shi, Yuanda Zhu, Tarun Naren, Monica Isgut, Ying Sha, Li Tong, Mitali Gupte, and May D Wang. 2022. Explainable artificial intelli- gence methods in combating pandemics: A systematic review.IEEE Reviews in Biomedical Engineering16 (2022), 5–21

work page 2022

[16] [16]

Felipe O Giuste, Lawrence L He, Monica Isgut, Wenqi Shi, Blake J Anderson, and May D Wang. 2021. Automated risk assessment of COVID-19 patients at diagnosis using electronic healthcare records. In2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4. , , Nnamdi et al

work page 2021

[17] [17]

Gonzalo C Gutiérrez-Tobal, Daniel Álvarez, Fernando Vaquerizo-Villar, Andrea Crespo, Leila Kheirandish-Gozal, David Gozal, Félix del Campo, and Roberto Hornero. 2021. Ensemble-learning regression to estimate sleep apnea severity using at-home oximetry in adults.Applied Soft Computing111 (2021), 107827

work page 2021

[18] [18]

David W Hudgel. 2016. Sleep apnea severity classification—revisited.Sleep39, 5 (2016), 1165–1166

work page 2016

[19] [19]

Shiroh Isono, David S Warner, and Mark A Warner. 2009. Obstructive sleep apnea of obese adults: pathophysiology and perioperative airway management. Anesthesiology110, 4 (2009), 908–921

work page 2009

[20] [20]

Bong Gyun Kang, Dongjun Lee, HyunGi Kim, and DoHyun Chung. 2024. Introduc- ing Spectral Attention for Long-Range Dependency in Time Series Forecasting. arXiv preprint arXiv:2410.20772(2024)

work page arXiv 2024

[21] [21]

Brendan T Keenan, H Lester Kirchner, Olivia J Veatch, Kenneth M Borthwick, Vicki A Davenport, John C Feemster, Maged Gendy, Thomas R Gossard, Frances M Pack, Laura Sirikulvadhana, et al. 2020. Multisite validation of a simple electronic health record algorithm for identifying diagnosed obstructive sleep apnea.Journal of Clinical Sleep Medicine16, 2 (2020)...

work page 2020

[22] [22]

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pier- son, Been Kim, and Percy Liang. 2020. Concept bottleneck models. InInternational conference on machine learning. PMLR, 5338–5348

work page 2020

[23] [23]

Jeremy Levy, Daniel Álvarez, Félix Del Campo, and Joachim A Behar. 2023. Deep learning for obstructive sleep apnea diagnosis based on single channel oximetry. Nature Communications14, 1 (2023), 4881

work page 2023

[24] [24]

Xilin Li, Frank HF Leung, Steven Su, and Sai Ho Ling. 2022. Sleep apnea detec- tion using multi-error-reduction classification system with multiple bio-signals. Sensors22, 15 (2022), 5560

work page 2022

[25] [25]

Caíque Santos Lima. 2022. OxiTidy: motion artifact detection-reduction in pho- toplethysmographic signals using artificial neural networks. (2022)

work page 2022

[26] [26]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)

work page 2017

[27] [27]

M Melanie Lyons, Nitin Y Bhatt, Allan I Pack, and Ulysses J Magalang. 2020. Global burden of sleep-disordered breathing and its implications.Respirology25, 7 (2020), 690–702

work page 2020

[28] [28]

Xiaoping Ming, Minlan Yang, and Xiong Chen. 2021. Metabolic bariatric surgery as a treatment for obstructive sleep apnea hypopnea syndrome: review of the literature and potential mechanisms.Surgery for Obesity and Related Diseases17, 1 (2021), 215–220

work page 2021

[29] [29]

Amal K Mitra, Azad R Bhuiyan, and Elizabeth A Jones. 2021. Association and risk factors for obstructive sleep apnea and cardiovascular diseases: a systematic review.Diseases9, 4 (2021), 88

work page 2021

[30] [30]

Stefano Nardini, Ulisse Corbanese, Alberto Visconti, Jacopo Dalle Mule, Clau- dio M Sanguinetti, and Fernando De Benedetto. 2023. Improving the manage- ment of patients with chronic cardiac and respiratory diseases by extending pulse-oximeter uses: the dynamic pulse-oximetry.Multidisciplinary Respiratory Medicine18, 1 (2023)

work page 2023

[31] [31]

Micky C Nnamdi, Junior Ben Tamo, Sara Stackpole, Wenqi Shi, Benoit Marteau, and May Dongmei Wang. 2023. Model confidence calibration for reliable covid- 19 early screening via audio signal analysis. InProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 1–6

work page 2023

[32] [32]

Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Henry J Iwinski, J Michael Wattenbarger, and May D Wang. 2023. Concept Bottleneck Model for Adolescent Idiopathic Scoliosis Patient Reported Outcomes Prediction. In2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4

work page 2023

[33] [33]

Stuart F Quan, Barbara V Howard, Conrad Iber, James P Kiley, F Javier Nieto, George T O’Connor, David M Rapoport, Susan Redline, John Robbins, Jonathan M Samet, et al. 1997. The sleep heart health study: design, rationale, and methods. Sleep20, 12 (1997), 1077–1085

work page 1997

[34] [34]

Asher Qureshi, Robert D Ballard, and Harold S Nelson. 2003. Obstructive sleep apnea.Journal of Allergy and Clinical Immunology112, 4 (2003), 643–651

work page 2003

[35] [35]

Susan Redline, Peter V Tishler, Tor D Tosteson, John Williamson, Kenneth Kump, Ilene Browner, Veronica Ferrette, and Patrick Krejci. 1995. The familial aggrega- tion of obstructive sleep apnea.American journal of respiratory and critical care medicine151, 3 (1995), 682–687

work page 1995

[36] [36]

Abraham Savitzky and Marcel JE Golay. 1964. Smoothing and differentiation of data by simplified least squares procedures.Analytical chemistry36, 8 (1964), 1627–1639

work page 1964

[37] [37]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE interna- tional conference on computer vision. 618–626

work page 2017

[38] [38]

Chamara V Senaratna, Jennifer L Perret, Caroline J Lodge, Adrian J Lowe, Brit- tany E Campbell, Melanie C Matheson, Garun S Hamilton, and Shyamali C Dharmage. 2017. Prevalence of obstructive sleep apnea in the general population: a systematic review.Sleep medicine reviews34 (2017), 70–81

work page 2017

[39] [39]

Mahmoud Y Shams, Ahmed M Elshewey, El-Sayed M El-kenawy, Abdelhameed Ibrahim, Fatma M Talaat, and Zahraa Tarek. 2024. Water quality prediction using machine learning models based on grid search method.Multimedia Tools and Applications83, 12 (2024), 35307–35334

work page 2024

[40] [40]

Wenqi Shi, Felipe O Giuste, Yuanda Zhu, Ben J Tamo, Micky C Nnamdi, Andrew Hornback, Ashley M Carpenter, Coleman Hilton, Henry J Iwinski, J Michael Wat- tenbarger, et al. 2025. Predicting pediatric patient rehabilitation outcomes after spinal deformity surgery with artificial intelligence.Communications Medicine5, 1 (2025), 1

work page 2025

[41] [41]

Wenqi Shi, Mitali S Gupte, and May D Wang. 2021. Learning from heterogeneous data via contrastive learning: An application in multi-source covid-19 radiog- raphy. In2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4

work page 2021

[42] [42]

Wenqi Shi*, Ran Xu*, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D Wang. 2024. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational L...

work page 2024

[43] [43]

J Ben Tamo, Micky C Nnamdi, Lea Lesbats, Wenqi Shi, Yishan Zhong, and May D Wang. 2023. Uncertainty-aware ensemble learning models for out-of-distribution medical imaging analysis. In2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 4243–4250

work page 2023

[44] [44]

Tue T Te, Brendan T Keenan, Olivia J Veatch, Mary Regina Boland, Rebecca A Hubbard, and Allan I Pack. 2024. Identifying clusters of patient comorbidities associated with obstructive sleep apnea using electronic health records.Journal of Clinical Sleep Medicine20, 4 (2024), 521–533

work page 2024

[45] [45]

MB Uddin, CM Chow, and SW Su. 2018. Classification methods to detect sleep apnea in adults based on respiratory and oximetry signals: a systematic review. Physiological measurement39, 3 (2018), 03TR01

work page 2018

[46] [46]

Ahmed Uzair, Muhammad Waseem, Aun Bin Shahid, Nauman I Bhatti, Muham- mad Arshad, Asher Ishaq, Muhammad Sajawal, Zoha Toor, and Osama Ahmad

work page

[47] [47]

Correlation Between Body Mass Index and Apnea-Hypopnea Index or Nadir Oxygen Saturation Levels in Patients With Obstructive Sleep Apnea.Cureus16, 4 (2024)

work page 2024

[48] [48]

Tom Van Steenkiste, Willemijn Groenendaal, Dirk Deschrijver, and Tom Dhaene

work page

[49] [49]

Automated sleep apnea detection in raw respiratory signals using long short-term memory neural networks.IEEE journal of biomedical and health informatics23, 6 (2018), 2354–2364

work page 2018

[50] [50]

Janani Venugopalan, Li Tong, Hamid Reza Hassanzadeh, and May D Wang. 2021. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Scientific reports11, 1 (2021), 3254

work page 2021

[51] [51]

Sofiya Vyshnya, Rachel Epperson, Felipe Giuste, Wenqi Shi, Andrew Hornback, and May D Wang. 2024. Optimized clinical feature analysis for improved cardio- vascular disease risk screening.IEEE Open Journal of Engineering in Medicine and Biology5 (2024), 816–827

work page 2024

[52] [52]

Cheng Wan, Micky C Nnamdi, Wenqi Shi, Benjamin Smith, Chad Purnell, and May D Wang. 2024. Advancing Sleep Disorder Diagnostics: A Transformer-based EEG Model for Sleep Stage Classification and OSA Prediction.IEEE Journal of Biomedical and Health Informatics(2024)

work page 2024

[53] [53]

Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xu- anwu Yin, and Kunlong Zuo. 2023. Swift Parameter-free Attention Network for Efficient Super-Resolution.arXiv preprint arXiv:2311.12770(2023)

work page arXiv 2023

[54] [54]

Hang Wu, Wenqi Shi, Anirudh Choudhary, and May D Wang. 2024. Clinical decision making under uncertainty: a bootstrapped counterfactual inference approach.BMC Medical Informatics and Decision Making24, 1 (2024), 1–15

work page 2024

[55] [55]

Hang Wu, Wenqi Shi, and May D Wang. 2024. Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning.BMC Medical Informatics and Decision Making24, 1 (2024), 137

work page 2024

[56] [56]

Junhao Wu and Zhaocai Wang. 2022. A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory.Water14, 4 (2022), 610

work page 2022

[57] [57]

Ran Xu, Yuchen Zhuang, Yishan Zhong, Yue Yu, Xiangru Tang, Hang Wu, May D Wang, Peifeng Ruan, Donghan Yang, Tao Wang, et al . 2025. MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale.arXiv preprint arXiv:2506.04405(2025)

work page arXiv 2025

[58] [58]

Terry Young, Paul E Peppard, and Daniel J Gottlieb. 2002. Epidemiology of obstructive sleep apnea: a population health perspective.American journal of respiratory and critical care medicine165, 9 (2002), 1217–1239

work page 2002

[59] [59]

Xin Zan, Di Wang, Changyue Song, Feng Liu, Xiaochen Xian, and Richard Berry

work page

[60] [60]

Weakly Supervised Deep Learning for Monitoring Sleep Apnea Sever- ity Using Coarsegrained Labels.IEEE Transactions on Automation Science and Engineering(2025)

work page 2025

[61] [61]

Guo-Qiang Zhang, Licong Cui, Remo Mueller, Shiqiang Tao, Matthew Kim, Michael Rueschman, Sara Mariani, Daniel Mobley, and Susan Redline. 2018. The National Sleep Research Resource: towards a sleep data commons.Journal of the American Medical Informatics Association25, 10 (2018), 1351–1358

work page 2018

[62] [62]

Ying Y Zhao, Rui Wang, Kevin J Gleason, Eldrin F Lewis, Stuart F Quan, Claudia M Toth, Michael Morrical, Michael Rueschman, Jia Weng, James H Ware, et al. 2017. Effect of continuous positive airway pressure treatment on health-related quality of life and sleepiness in high cardiovascular risk individuals with sleep apnea: KindSleep: Knowledge-Informed Dia...

work page 2017