Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

Anna Sigridur Islind; Emil Hardarson; Erna Sif Arnard\'ottir; Konstantin Popov; Mar\'ia \'Oskarsd\'ottir; Sigridur Sigurdardottir

arxiv: 2605.22859 · v1 · pith:ANERXOP3new · submitted 2026-05-19 · 📡 eess.SP · cs.AI

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

Emil Hardarson , Konstantin Popov , Sigridur Sigurdardottir , Anna Sigridur Islind , Erna Sif Arnard\'ottir , Mar\'ia \'Oskarsd\'ottir This is my paper

Pith reviewed 2026-05-25 06:23 UTC · model grok-4.3

classification 📡 eess.SP cs.AI

keywords sleep stagingAASM rulesrule-based classificationpolysomnographyexplainable methodsautomatic sleep scoringdeterministic staging

0 comments

The pith

A rule-based system encodes AASM sleep scoring guidelines as executable code to produce classifications and explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a deterministic alternative to machine learning for automatic sleep stage classification. It translates the AASM manual's scoring rules into code that processes polysomnography signals and outputs both a stage and a natural language justification for each 30-second epoch. Tested on 50 recordings against a majority vote from ten scorers, the system reaches 60.5 percent agreement overall. The design prioritizes transparency and rule adherence over matching the highest possible accuracy of black-box models. This makes the method suitable for verifying other automated systems and for clinical oversight.

Core claim

The paper introduces a rule-based sleep staging algorithm that directly implements the AASM scoring manual in software, including an explanation trace that converts the decision path into readable text. On a test set of 50 PSG recordings the algorithm agrees with the ten-scorer consensus reference in 60.5 percent of epochs with a kappa of 0.42, performing best on N2 and R stages. The resulting decisions are fully determined by the encoded rules and come with justifications that mirror clinical reasoning.

What carries the argument

An executable encoding of the AASM sleep staging rules together with an explanation trace that generates epoch-level natural-language justifications.

If this is right

The method supplies verifiable, rule-following decisions that can audit opaque machine learning models.
Natural language explanations allow clinicians to inspect why a particular stage was assigned.
Deterministic behavior eliminates variability from training data or model initialization.
Lower agreement than deep learning models is accepted in exchange for explicit alignment with clinical guidelines.
Performance differences between development and test sets indicate that implementation details affect outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Disagreements with human consensus may point to specific ambiguities in the AASM guidelines that need clarification.
The rule set could be used to create large volumes of labeled data for training more accurate yet still interpretable models.
Similar rule translations might apply to other standardized medical scoring procedures beyond sleep.
Integration with signal processing pipelines could allow real-time staging during recordings.

Load-bearing premise

The AASM scoring rules are sufficiently precise and complete to be converted into deterministic code without significant loss of the judgment human experts apply to edge cases.

What would settle it

Expert review of the code's output on a new set of epochs that identifies systematic misapplications of the AASM rules arising from unencoded ambiguities.

Figures

Figures reproduced from arXiv: 2605.22859 by Anna Sigridur Islind, Emil Hardarson, Erna Sif Arnard\'ottir, Konstantin Popov, Mar\'ia \'Oskarsd\'ottir, Sigridur Sigurdardottir.

**Figure 2.** Figure 2: One epoch with micro-annotations. The displayed channels include [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Example of a sequential elimination trace produced by the rule-based [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Example of a natural-language explanation dialogue produced by the [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: The top panel shows the human consensus hypnogram for the [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: The top panel shows the human consensus hypnogram for one of [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of human inter-scorer agreement for epochs where the [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

read the original abstract

Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: a deterministic, rule-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine's (AASM) scoring logic as executable code, coupled with epoch-level natural-language justifications derived from an explanation trace. We evaluate the approach on 50 polysomnography recordings with a 10-scorer majority-vote consensus as reference. Across all recordings, the method agreed with the majority-vote reference in 60.5% of epochs ($\kappa=0.42$), with substantially higher agreement on a dataset used during development (77.1%, $\kappa=0.61$). Agreement with the reference was highest for sleep stage N2 (recall 83.5%) and moderate for sleep stage R (recall 68.7%), while Wake and N1 recall were low. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning-based sleep staging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper turns AASM rules into working code with explanations, which is useful for transparency, but the 60.5% agreement and big dev-set jump are the main things to watch.

read the letter

The main point is a deterministic implementation of AASM sleep staging rules as executable code, paired with natural-language explanations for each epoch. This stands apart from the usual deep learning work because it aims to follow the clinical manual directly rather than fit data. On 50 recordings against a 10-scorer majority vote it reaches 60.5% agreement overall, with better numbers on N2 and lower on Wake and N1. The explanations are the clearest practical win for auditing or regulatory needs. The approach is grounded in external rules instead of learned parameters, which is a real difference from supervised models. The evaluation uses a concrete reference and reports kappa values, so the numbers can be checked. The soft spots sit in the performance gap and the development process. Agreement jumps to 77.1% on the development set, which suggests some rule choices or edge-case handling may have been tuned to that data. AASM guidelines contain qualifiers on amplitude, context, and detection that any code must resolve with fixed thresholds or procedures. Without the released code or a breakdown of those decisions, it is hard to separate faithful translation from implementation artifacts. That does not sink the idea, but it does limit how strongly the transparency claim can be taken right now. This is for readers who need an explainable baseline in sleep staging or who want to audit black-box models against clinical rules. It is not aimed at people chasing the highest accuracy numbers. The work shows clear engagement with the guidelines and a reproducible setup, so it deserves peer review even if revisions on the implementation details are likely.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a deterministic, rule-based sleep staging algorithm that translates AASM scoring rules into executable code, generating epoch-level natural-language explanations from an internal trace. Evaluated on 50 PSG recordings against a 10-scorer majority-vote reference, it achieves 60.5% epoch agreement (κ=0.42) overall and 77.1% (κ=0.61) on a development subset, with highest recall for N2 and lower for Wake/N1; the work positions this as a transparent complement to opaque deep-learning models for auditing and governance.

Significance. If the rule translations prove faithful, the approach supplies a reproducible, parameter-free baseline that can serve as an auditing tool for ML sleep-staging systems and as an educational or regulatory reference. The explicit use of an external consensus reference and the generation of human-readable justifications are concrete strengths that address a recognized gap in interpretability. The lower absolute agreement relative to contemporary DL models is expected and does not diminish the potential utility for verification tasks.

major comments (3)

[§2] §2 (Rule Implementation): The description of the executable AASM encoding does not specify the concrete numerical cutoffs, tie-breaking procedures, or edge-case resolutions chosen for inherently ambiguous manual criteria (e.g., amplitude thresholds for slow waves, K-complex detection, or contextual stage-transition rules). Without these details or an external validation against multiple scorers on ambiguous epochs, the central claim that the code constitutes a faithful, lossless operationalization cannot be assessed.
[Abstract and Evaluation] Abstract and Evaluation section: The 16.6-point gap between development-set agreement (77.1%) and overall agreement (60.5%) raises the possibility that implementation choices were tuned to the development recordings. This directly undermines the asserted deterministic, non-data-dependent character of the method and must be resolved by documenting a strict separation with no post-hoc adjustments.
[Results] Results: No per-epoch or per-recording breakdown is provided that isolates performance on epochs where the 10 human scorers themselves disagree; such an analysis is required to determine whether the reported 60.5% agreement reflects intrinsic limits of the AASM rules or artifacts introduced by the deterministic encoding.

minor comments (2)

The manuscript should include at least one full worked example of an epoch trace with the generated natural-language justification in the main text or a clearly labeled supplementary figure.
[§2] The version of the AASM manual being operationalized (2012 or later) and any explicit deviations from the printed guidelines should be stated in §2.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address the major comments point by point below, proposing revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [§2] §2 (Rule Implementation): The description of the executable AASM encoding does not specify the concrete numerical cutoffs, tie-breaking procedures, or edge-case resolutions chosen for inherently ambiguous manual criteria (e.g., amplitude thresholds for slow waves, K-complex detection, or contextual stage-transition rules). Without these details or an external validation against multiple scorers on ambiguous epochs, the central claim that the code constitutes a faithful, lossless operationalization cannot be assessed.

Authors: We agree that additional details on the specific numerical thresholds and handling of edge cases are necessary to allow full assessment of the implementation's fidelity. In the revised manuscript, we will include an expanded section or supplementary material that lists all concrete cutoffs (e.g., for delta wave amplitude, K-complex criteria) and tie-breaking rules used in the code. We will also reference the open-source implementation for complete transparency. While we cannot perform new external validation on ambiguous epochs without additional data, the majority-vote reference already reflects inter-scorer variability, and we will add a note on this limitation. revision: yes
Referee: [Abstract and Evaluation] Abstract and Evaluation section: The 16.6-point gap between development-set agreement (77.1%) and overall agreement (60.5%) raises the possibility that implementation choices were tuned to the development recordings. This directly undermines the asserted deterministic, non-data-dependent character of the method and must be resolved by documenting a strict separation with no post-hoc adjustments.

Authors: The development subset was employed only during the initial coding phase to verify that the rule translations produced reasonable explanations on a small number of recordings; no quantitative metrics were optimized, and no adjustments were made based on the full evaluation results. The method remains fully deterministic with no learned parameters. To address the concern, we will revise the manuscript to clearly document the development recordings used, confirm that no post-hoc changes were applied after the full evaluation, and emphasize that the performance difference arises from the varying difficulty across recordings rather than data-dependent tuning. revision: yes
Referee: [Results] Results: No per-epoch or per-recording breakdown is provided that isolates performance on epochs where the 10 human scorers themselves disagree; such an analysis is required to determine whether the reported 60.5% agreement reflects intrinsic limits of the AASM rules or artifacts introduced by the deterministic encoding.

Authors: We concur that dissecting performance on epochs with high inter-scorer disagreement would help isolate the sources of discrepancy. However, the dataset provides only the majority-vote labels and not the individual scorer annotations per epoch, which precludes this specific analysis. We will add a discussion of this limitation in the revised paper and note that the overall agreement with the consensus serves as a conservative estimate. If individual scorer data were available, such a breakdown could be performed in future extensions. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation from external AASM rules is self-contained

full rationale

The paper's core method is an explicit translation of external AASM scoring guidelines into deterministic code, with no equations, fitted parameters, or self-citations forming the load-bearing chain. The development-set agreement (77.1%) is reported separately from the primary evaluation on the 50-recording majority-vote set (60.5%), without presenting the development result as an independent prediction or validation. Implementation choices for ambiguities are acknowledged as necessary but do not reduce the central claim to a fit or self-definition; the transparency argument rests on the external rule source rather than internal data tuning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that AASM clinical guidelines are sufficiently precise and unambiguous to be fully encoded as deterministic logic without loss of meaning. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption AASM scoring rules can be fully and unambiguously translated into deterministic executable code
The paper assumes the clinical guidelines are precise enough for direct coding without loss of nuance.

pith-pipeline@v0.9.0 · 5798 in / 1281 out tokens · 39715 ms · 2026-05-25T06:23:15.225536+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

[1]

Artificial Intelligence Models for the Automation of Standard Diagnostics in Sleep Medicine—A Systematic Review.Bioengineering, 11(3):206, March 2024

Maha Alattar, Alok Govind, and Shraddha Mainali. Artificial Intelligence Models for the Automation of Standard Diagnostics in Sleep Medicine—A Systematic Review.Bioengineering, 11(3):206, March 2024. ISSN 2306-

work page 2024
[2]

URLhttps://www.mdpi.com /2306-5354/11/3/206

doi: 10.3390/bioengineering11030206. URLhttps://www.mdpi.com /2306-5354/11/3/206. Number: 3

work page doi:10.3390/bioengineering11030206
[3]

A Systematic Review of Literature on Automated Sleep Scoring.IEEE Access, 10:79419–79443, 2022

Hadeel Alsolai, Shahnawaz Qureshi, Syed Muhammad Zeeshan Iqbal, Sirirut Vanichayobon, Lawrence Edward Henesey, Craig Lindley, and Seppo Karrila. A Systematic Review of Literature on Automated Sleep Scoring.IEEE Access, 10:79419–79443, 2022. ISSN 2169-3536. doi: 10.1109/ACCESS.2022.3194145. URLhttps://ieeexplore.ieee.or g/document/9841539. Conference Name:...

work page doi:10.1109/access.2022.3194145 2022
[4]

Madai, and the Precise4Q consortium

Julia Amann, Alessandro Blasimme, Effy Vayena, Dietmar Frey, Vince I. Madai, and the Precise4Q consortium. Explainability for artificial intelli- gence in healthcare: a multidisciplinary perspective.BMC Medical Infor- matics and Decision Making, 20(1):310, November 2020. ISSN 1472-6947. 17 N3 N2 N1 REM WakeMajority vote 01 02 03 04 05 06 07 08 Time (hour)...

work page doi:10.1186/s12911-020-01332-6 2020
[5]

Barbanoj, Heidi Danker-Hopfe, Sari-Leena Himanen, Bob Kemp, Thomas Penzel, Michael Grözinger, Dieter Kunz, Peter Rappelsberger, Alois Schlögl, and Georg Dorffner

Peter Anderer, Georg Gruber, Silvia Parapatics, Michael Woertz, Tatiana Miazhynskaia, Gerhard Klösch, Bernd Saletu, Josef Zeitlhofer, Manuel J. Barbanoj, Heidi Danker-Hopfe, Sari-Leena Himanen, Bob Kemp, Thomas Penzel, Michael Grözinger, Dieter Kunz, Peter Rappelsberger, Alois Schlögl, and Georg Dorffner. An E-Health Solution for Automatic Sleep Classific...

work page doi:10.1159/000085205 2005
[6]

Saletu-Zyhlarz, Heidi Danker-Hopfe, Josef Zeitlhofer, and Georg Dorffner

Peter Anderer, Arnaud Moreau, Michael Woertz, Marco Ross, Georg Gruber, Silvia Parapatics, Erna Loretz, Esther Heller, Andrea Schmidt, Marion Boeck, Doris Moser, Gerhard Kloesch, Bernd Saletu, Gerda M. Saletu-Zyhlarz, Heidi Danker-Hopfe, Josef Zeitlhofer, and Georg Dorffner. Computer-assisted sleep classification according to the standard of the American ...

work page doi:10.1159/000320864 2010
[7]

Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing.Frontiers in Sleep, 2,

Peter Anderer, Marco Ross, Andreas Cerny, Ray Vasko, Edmund Shaw, and Pedro Fonseca. Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing.Frontiers in Sleep, 2,

work page
[8]

URLhttps://www.frontiersin.org/articles /10.3389/frsle.2023.1163477

ISSN 2813-2890. URLhttps://www.frontiersin.org/articles /10.3389/frsle.2023.1163477

work page doi:10.3389/frsle.2023.1163477 2023
[9]

Jessie P Bakker, Marco Ross, Andreas Cerny, Ray Vasko, Edmund Shaw, Samuel Kuna, Ulysses J Magalang, Naresh M Punjabi, and Peter An- derer. Scoring sleep with artificial intelligence enables quantification of 18 sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring.Sleep, 46(2):zsac154, February 2023. ISSN 0161-8105. doi: 1...

work page doi:10.1093/sleep/zsac154 2023
[10]

Validation of the Somnolyzer 24×7 automatic scoring system in children with suspected obstructive sleep apnea.Frontiers in Medicine, 12, June

Ignacio Boira, Violeta Esteban, José Norberto Sancho-Chust, Esther Pas- tor, Paula Fernández-Martínez, Anastasiya Torba, and Eusebi Chiner. Validation of the Somnolyzer 24×7 automatic scoring system in children with suspected obstructive sleep apnea.Frontiers in Medicine, 12, June

work page
[11]

doi: 10.3389/fmed.2025.1617530

ISSN 2296-858X. doi: 10.3389/fmed.2025.1617530. URL https://www.frontiersin.org/journals/medicine/articles/10. 3389/fmed.2025.1617530/full

work page doi:10.3389/fmed.2025.1617530 2025
[12]

Braun, M

M. Braun, M. Stockhoff, M. Tijssen, S. Dietz-Terjung, S. Coughlin, and C. Schöbel. A Systematic Review on the Technical Feasibility of Home- Polysomnography for Diagnosis of Sleep Disorders in Adults.Current Sleep Medicine Reports, 10(2):276–288, June 2024. ISSN 2198-6401. doi: 10.100 7/s40675-024-00301-z. URLhttps://doi.org/10.1007/s40675-024-0 0301-z

work page doi:10.1007/s40675-024-0 2024
[13]

A review of automated sleep stage scoring based on physiological signals for the new millennia.Computer Methods and Pro- grams in Biomedicine, 176:81–91, July 2019

Oliver Faust, Hajar Razaghi, Ragab Barika, Edward J Ciaccio, and U Ra- jendra Acharya. A review of automated sleep stage scoring based on physiological signals for the new millennia.Computer Methods and Pro- grams in Biomedicine, 176:81–91, July 2019. ISSN 0169-2607. doi: 10.1016/j.cmpb.2019.04.032. URLhttps://www.sciencedirect.co m/science/article/pii/S0...

work page doi:10.1016/j.cmpb.2019.04.032 2019
[14]

Bassetti, and Francesca D

Luigi Fiorillo, Alessandro Puiatti, Michela Papandrea, Pietro-Luca Ratti, Paolo Favaro, Corinne Roth, Panagiotis Bargiotas, Claudio L. Bassetti, and Francesca D. Faraci. Automated sleep scoring: A review of the latest approaches.Sleep Medicine Reviews, 48:101204, December 2019. ISSN 1087-0792. doi: 10.1016/j.smrv.2019.07.007. URLhttps://www.scienc edirect...

work page doi:10.1016/j.smrv.2019.07.007 2019
[15]

Warncke, Markus H

Luigi Fiorillo, Giuliana Monachino, Julia van der Meer, Marco Pesce, Jan D. Warncke, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzo- vara, Paolo Favaro, and Francesca D. Faraci. U-Sleep’s resilience to AASM guidelines.npj Digital Medicine, 6(1):1–9, March 2023. ISSN 2398-6352. doi: 10.1038/s41746-023-00784-0. URLhttps://www.nature.com/artic les/s41...

work page doi:10.1038/s41746-023-00784-0 2023
[16]

Bassetti, Søren Berg, Ludger Grote, Poul Jennum, Patrick Levy, Stefan Mihaicuta, Lino Nobili, Dieter Riemann, F

Jürgen Fischer, Zoran Dogas, Claudio L. Bassetti, Søren Berg, Ludger Grote, Poul Jennum, Patrick Levy, Stefan Mihaicuta, Lino Nobili, Dieter Riemann, F. Javier Puertas Cuesta, Friedhart Raschke, Debra J. Skene, Neil Stanley, Dirk Pevernagie, Executive Committee (EC) of the Assem- bly of the National Sleep Societies (ANSS), and Board of the European Sleep ...

work page doi:10.1111/j.1365-2869.2011.00987.x 2012
[17]

Current status and prospects of automatic sleep stages scoring: Review.Biomedical Engineering Letters, 13(3):247–272, July

Maksym Gaiduk, Ángel Serrano Alarcón, Ralf Seepold, and Natividad Martínez Madrid. Current status and prospects of automatic sleep stages scoring: Review.Biomedical Engineering Letters, 13(3):247–272, July

work page
[18]

doi: 10.1007/s13534-023-00299-3

ISSN 2093-9868. doi: 10.1007/s13534-023-00299-3. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382458/

work page doi:10.1007/s13534-023-00299-3 2093
[19]

Gunnarsdottir, Charlene Gamaldo, Rachel Marie Salas, Joshua B

Kristin M. Gunnarsdottir, Charlene Gamaldo, Rachel Marie Salas, Joshua B. Ewen, Richard P. Allen, Katherine Hu, and Sridevi V. Sarma. A novel sleep stage scoring system: Combining expert-based features with the generalized linear model.Journal of Sleep Research, 29(5):e12991, Oc- tober 2020. ISSN 0962-1105, 1365-2869. doi: 10.1111/jsr.12991. URL https://o...

work page doi:10.1111/jsr.12991 2020
[20]

Human-AI Collaboration: From Explainable AI to Co-Creating Meaning.ACIS 2024 Proceedings, December 2024

Emil Hardarson, Frida Ivarsson, Anna Sigríður Islind, Erna Sif Arnardóttir, and María Óskarsdóttir. Human-AI Collaboration: From Explainable AI to Co-Creating Meaning.ACIS 2024 Proceedings, December 2024. URL https://aisel.aisnet.org/acis2024/148

work page 2024
[21]

Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time- Series Classification, March 2026

Emil Hardarson, Luka Biedebach, Ómar Bessi Ómarsson, Teitur Hrólfsson, Anna Sigridur Islind, and María Óskarsdóttir. Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time- Series Classification, March 2026. URLhttp://arxiv.org/abs/2603.1

work page 2026
[22]

arXiv:2603.15939 [cs]

work page arXiv
[23]

Past and Future of Computer-Assisted Sleep Analysis and Drowsiness Assessment:.Journal of Clinical Neurophysiology, 13(4):295– 313, July 1996

Joel Hasan. Past and Future of Computer-Assisted Sleep Analysis and Drowsiness Assessment:.Journal of Clinical Neurophysiology, 13(4):295– 313, July 1996. ISSN 0736-0258. doi: 10.1097/00004691-199607000-00004. URLhttp://journals.lww.com/00004691-199607000-00004

work page doi:10.1097/00004691-199607000-00004 1996
[24]

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The Curious Case of Neural Text Degeneration, February 2020. URLhttp: //arxiv.org/abs/1904.09751. arXiv:1904.09751 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[25]

Explainable Artificial Intelligence (XAI): Concepts and Chal- lenges in Healthcare.AI, 4(3):652–666, September 2023

Tim Hulsen. Explainable Artificial Intelligence (XAI): Concepts and Chal- lenges in Healthcare.AI, 4(3):652–666, September 2023. ISSN 2673-2688. doi: 10.3390/ai4030034. URLhttps://www.mdpi.com/2673-2688/4/3/

work page doi:10.3390/ai4030034 2023
[26]

The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications, 1st ed., 2007

Conrad Iber, Sonia Ancoli-Israel, Andrew Chesson, and Stuart Quan. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications, 1st ed., 2007

work page 2007
[27]

Toward a responsible future: rec- ommendations for AI-enabled clinical decision support.Journal of the American Medical Informatics Association, 31(11):2730–2739, November

Steven Labkoff, Bilikis Oladimeji, Joseph Kannry, Anthony Solomonides, Russell Leftwich, Eileen Koski, Amanda L Joseph, Monica Lopez-Gonzalez, Lee A Fleisher, Kimberly Nolen, Sayon Dutta, Deborah R Levy, Amy Price, Paul J Barr, Jonathan D Hron, Baihan Lin, Gyana Srivastava, Nuria Pastor, Unai Sanchez Luque, Tien Thi Thuy Bui, Reva Singh, Tayler 20 William...

work page
[28]

doi: 10.1093/jamia/ocae209

ISSN 1067-5027, 1527-974X. doi: 10.1093/jamia/ocae209. URL https://academic.oup.com/jamia/article/31/11/2730/7776823

work page doi:10.1093/jamia/ocae209
[29]

MNE-Python, November 2025

Eric Larson, Alexandre Gramfort, Denis A Engemann, Jaakko Leppakan- gas, Christian Brodbeck, Mainak Jas, Teon L Brooks, Jona Sassenhagen, Daniel McCloy, Martin Luessi, Jean-Rémi King, Richard Höchenberger, Clemens Brunner, Roman Goj, Guillaume Favelier, Marijn van Vliet, Mark Wronkiewicz, Stefan Appelhoff, Alex Rockhill, Chris Holdgraf, Mathieu Scheltienn...

work page doi:10.5281/zenodo.592483 2025
[30]

Yun Ji Lee, Jae Yong Lee, Jae Hoon Cho, and Ji Ho Choi. Interrater reliability of sleep stage scoring: a meta-analysis.Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine, 18(1):193–202, January 2022. ISSN 1550-9389. doi: 10.5664/jc sm.9538. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC8807917/

work page doi:10.5664/jc 2022
[31]

A rule- based automatic sleep staging method.Journal of Neuroscience Methods, 205(1):169–176, March 2012

Sheng-Fu Liang, Chin-En Kuo, Yu-Han Hu, and Yu-Shian Cheng. A rule- based automatic sleep staging method.Journal of Neuroscience Methods, 205(1):169–176, March 2012. ISSN 0165-0270. doi: 10.1016/j.jneumeth.2 011.12.022. URLhttps://www.sciencedirect.com/science/article/ pii/S016502701100759X

work page doi:10.1016/j.jneumeth.2 2012
[32]

Kuna, Ruth Benca, Clete A

Atul Malhotra, Magdy Younes, Samuel T. Kuna, Ruth Benca, Clete A. Kushida, James Walsh, Alexandra Hanlon, Bethany Staley, Allan I. Pack, and Grace W. Pien. Performance of an automated polysomnography scor- ing system versus computer-assisted manual scoring.Sleep, 36(4):573–582, April 2013. ISSN 1550-9109. doi: 10.5665/sleep.2548

work page doi:10.5665/sleep.2548 2013
[33]

Terrill, Heidur Gretarsdottir, Sigridur Sigurdardot- tir, Kristin Anna Olafsdottir, Anna Sigridur Islind, María Óskarsdóttir, Erna Sif Arnardóttir, and Timo Leppänen

Sami Nikkonen, Pranavan Somaskandhan, Henri Korkalainen, Samu Kain- ulainen, Philip I. Terrill, Heidur Gretarsdottir, Sigridur Sigurdardot- tir, Kristin Anna Olafsdottir, Anna Sigridur Islind, María Óskarsdóttir, Erna Sif Arnardóttir, and Timo Leppänen. Multicentre sleep-stage scoring agreement in the Sleep Revolution project.Journal of Sleep Research, 33...

work page doi:10.1111/jsr.13956 2024
[34]

Computer based sleep recording and analysis.Sleep Medicine Reviews, 4(2):131–148, April2000

Thomas Penzel and Regina Conradt. Computer based sleep recording and analysis.Sleep Medicine Reviews, 4(2):131–148, April2000. ISSN10870792. doi: 10.1053/smrv.1999.0087. URLhttps://linkinghub.elsevier.com/ retrieve/pii/S1087079299900874

work page doi:10.1053/smrv.1999.0087 1999
[35]

U-Sleep: resilient high-frequency sleep staging

MathiasPerslev, SuneDarkner, LykkeKempfner, MikiNikolic, PoulJørgen Jennum, and Christian Igel. U-Sleep: resilient high-frequency sleep staging. 23 npj Digital Medicine, 4(1):72, April 2021. ISSN 2398-6352. doi: 10.1038/ s41746-021-00440-5. URLhttps://www.nature.com/articles/s41746 -021-00440-5

work page 2021
[36]

Lorenzen, Elisabeth Heremans, Oliver Y

Huy Phan, Kristian P. Lorenzen, Elisabeth Heremans, Oliver Y. Chén, Minh C. Tran, Philipp Koch, Alfred Mertins, Mathias Baumert, Kaare B. Mikkelsen, and Maarten De Vos. L-SeqSleepNet: Whole-cycle Long Se- quence Modeling for Automatic Sleep Staging.IEEE Journal of Biomedical and Health Informatics, 27(10):4748–4757, October 2023. ISSN 2168-2208. doi: 10.1...

work page doi:10.1109/jbhi.2023.3303197 2023
[37]

University of California, Brain Information Service/Brain Research Institute, Los Ange- les, 1968

A Rechtschaffen and A Kales.A manual of standardized terminology, tech- niques and scoring system of sleep stages in human subjects. University of California, Brain Information Service/Brain Research Institute, Los Ange- les, 1968

work page 1968
[38]

Rosenberg and Steven Van Hout

Richard S. Rosenberg and Steven Van Hout. The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring.Jour- nal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine, 9(1):81–87, January 2013. ISSN 1550-9397. doi: 10.5664/jcsm.2350

work page doi:10.5664/jcsm.2350 2013
[39]

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead, September

Cynthia Rudin. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead, September

work page
[40]

arXiv:1811.10154 [cs, stat]

URLhttp://arxiv.org/abs/1811.10154. arXiv:1811.10154 [cs, stat]

work page arXiv
[41]

The Future of Sleep Staging, Revisited.Nature and Science of Sleep, 15:313–322, May 2023

Neil Stanley. The Future of Sleep Staging, Revisited.Nature and Science of Sleep, 15:313–322, May 2023. doi: 10.2147/NSS.S405663

work page doi:10.2147/nss.s405663 2023
[42]

Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG.IEEE Transactions on Neural Systems and Rehabilitation Engineer- ing, 25(11):1998–2008, November 2017. ISSN 1534-4320, 1558-0210. doi: 10.1109/TNSRE.2017.2721116. URLhttp://arxiv.org/abs/1703.040

work page doi:10.1109/tnsre.2017.2721116 1998
[43]

arXiv:1703.04046 [stat]

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Troester, Stuart F

Matthew M. Troester, Stuart F. Quan, American Academy of Sleep Medicine, and Richard B. Berry.The AASM Manual for the Scoring of Sleep and Associated Events, Version 3. American Academy Of Sleep Medicine, June 2023. ISBN 978-0-9706137-1-4

work page 2023
[45]

An open-source, high-performance tool for automated sleep staging.eLife, 10:e70092, October 2021

Raphael Vallat and Matthew P Walker. An open-source, high-performance tool for automated sleep staging.eLife, 10:e70092, October 2021. ISSN 2050-084X. doi: 10.7554/eLife.70092. URLhttps://doi.org/10.7554/ eLife.70092. 24

work page doi:10.7554/elife.70092 2021
[46]

P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified peri- odograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, June 1967. ISSN 1558-2582. doi: 10.1109/TAU.1967.1161901. URL https://ieeexplore.ieee.org/document/1161901

work page doi:10.1109/tau.1967.1161901 1967
[47]

A Review on Au- tomated Sleep Study.Annals of Biomedical Engineering, 52(6):1463–1491, June 2024

Mehran Yazdi, Mahdi Samaee, and Daniel Massicotte. A Review on Au- tomated Sleep Study.Annals of Biomedical Engineering, 52(6):1463–1491, June 2024. ISSN 1573-9686. doi: 10.1007/s10439-024-03486-0. URL https://doi.org/10.1007/s10439-024-03486-0

work page doi:10.1007/s10439-024-03486-0 2024
[48]

EEG-Based Auto- matic Sleep Staging Using Ontology and Weighting Feature Analysis.Com- putational and Mathematical Methods in Medicine, 2018:1–16, September

Bingtao Zhang, Tao Lei, Hong Liu, and Hanshu Cai. EEG-Based Auto- matic Sleep Staging Using Ontology and Weighting Feature Analysis.Com- putational and Mathematical Methods in Medicine, 2018:1–16, September

work page 2018
[49]

doi: 10.1155/2018/6534041

ISSN 1748-670X, 1748-6718. doi: 10.1155/2018/6534041. URL https://www.hindawi.com/journals/cmmm/2018/6534041/. 25 0 5000 10000 15000 20000Number of epochs Method disagrees Method agrees 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Human agreement ratio 0.0 0.5 1.0Proportion Figure 7: Distribution of human inter-scorer agreement for epochs where the rule-based algorith...

work page doi:10.1155/2018/6534041 2018

[1] [1]

Artificial Intelligence Models for the Automation of Standard Diagnostics in Sleep Medicine—A Systematic Review.Bioengineering, 11(3):206, March 2024

Maha Alattar, Alok Govind, and Shraddha Mainali. Artificial Intelligence Models for the Automation of Standard Diagnostics in Sleep Medicine—A Systematic Review.Bioengineering, 11(3):206, March 2024. ISSN 2306-

work page 2024

[2] [2]

URLhttps://www.mdpi.com /2306-5354/11/3/206

doi: 10.3390/bioengineering11030206. URLhttps://www.mdpi.com /2306-5354/11/3/206. Number: 3

work page doi:10.3390/bioengineering11030206

[3] [3]

A Systematic Review of Literature on Automated Sleep Scoring.IEEE Access, 10:79419–79443, 2022

Hadeel Alsolai, Shahnawaz Qureshi, Syed Muhammad Zeeshan Iqbal, Sirirut Vanichayobon, Lawrence Edward Henesey, Craig Lindley, and Seppo Karrila. A Systematic Review of Literature on Automated Sleep Scoring.IEEE Access, 10:79419–79443, 2022. ISSN 2169-3536. doi: 10.1109/ACCESS.2022.3194145. URLhttps://ieeexplore.ieee.or g/document/9841539. Conference Name:...

work page doi:10.1109/access.2022.3194145 2022

[4] [4]

Madai, and the Precise4Q consortium

Julia Amann, Alessandro Blasimme, Effy Vayena, Dietmar Frey, Vince I. Madai, and the Precise4Q consortium. Explainability for artificial intelli- gence in healthcare: a multidisciplinary perspective.BMC Medical Infor- matics and Decision Making, 20(1):310, November 2020. ISSN 1472-6947. 17 N3 N2 N1 REM WakeMajority vote 01 02 03 04 05 06 07 08 Time (hour)...

work page doi:10.1186/s12911-020-01332-6 2020

[5] [5]

Barbanoj, Heidi Danker-Hopfe, Sari-Leena Himanen, Bob Kemp, Thomas Penzel, Michael Grözinger, Dieter Kunz, Peter Rappelsberger, Alois Schlögl, and Georg Dorffner

Peter Anderer, Georg Gruber, Silvia Parapatics, Michael Woertz, Tatiana Miazhynskaia, Gerhard Klösch, Bernd Saletu, Josef Zeitlhofer, Manuel J. Barbanoj, Heidi Danker-Hopfe, Sari-Leena Himanen, Bob Kemp, Thomas Penzel, Michael Grözinger, Dieter Kunz, Peter Rappelsberger, Alois Schlögl, and Georg Dorffner. An E-Health Solution for Automatic Sleep Classific...

work page doi:10.1159/000085205 2005

[6] [6]

Saletu-Zyhlarz, Heidi Danker-Hopfe, Josef Zeitlhofer, and Georg Dorffner

Peter Anderer, Arnaud Moreau, Michael Woertz, Marco Ross, Georg Gruber, Silvia Parapatics, Erna Loretz, Esther Heller, Andrea Schmidt, Marion Boeck, Doris Moser, Gerhard Kloesch, Bernd Saletu, Gerda M. Saletu-Zyhlarz, Heidi Danker-Hopfe, Josef Zeitlhofer, and Georg Dorffner. Computer-assisted sleep classification according to the standard of the American ...

work page doi:10.1159/000320864 2010

[7] [7]

Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing.Frontiers in Sleep, 2,

Peter Anderer, Marco Ross, Andreas Cerny, Ray Vasko, Edmund Shaw, and Pedro Fonseca. Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing.Frontiers in Sleep, 2,

work page

[8] [8]

URLhttps://www.frontiersin.org/articles /10.3389/frsle.2023.1163477

ISSN 2813-2890. URLhttps://www.frontiersin.org/articles /10.3389/frsle.2023.1163477

work page doi:10.3389/frsle.2023.1163477 2023

[9] [9]

Jessie P Bakker, Marco Ross, Andreas Cerny, Ray Vasko, Edmund Shaw, Samuel Kuna, Ulysses J Magalang, Naresh M Punjabi, and Peter An- derer. Scoring sleep with artificial intelligence enables quantification of 18 sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring.Sleep, 46(2):zsac154, February 2023. ISSN 0161-8105. doi: 1...

work page doi:10.1093/sleep/zsac154 2023

[10] [10]

Validation of the Somnolyzer 24×7 automatic scoring system in children with suspected obstructive sleep apnea.Frontiers in Medicine, 12, June

Ignacio Boira, Violeta Esteban, José Norberto Sancho-Chust, Esther Pas- tor, Paula Fernández-Martínez, Anastasiya Torba, and Eusebi Chiner. Validation of the Somnolyzer 24×7 automatic scoring system in children with suspected obstructive sleep apnea.Frontiers in Medicine, 12, June

work page

[11] [11]

doi: 10.3389/fmed.2025.1617530

ISSN 2296-858X. doi: 10.3389/fmed.2025.1617530. URL https://www.frontiersin.org/journals/medicine/articles/10. 3389/fmed.2025.1617530/full

work page doi:10.3389/fmed.2025.1617530 2025

[12] [12]

Braun, M

M. Braun, M. Stockhoff, M. Tijssen, S. Dietz-Terjung, S. Coughlin, and C. Schöbel. A Systematic Review on the Technical Feasibility of Home- Polysomnography for Diagnosis of Sleep Disorders in Adults.Current Sleep Medicine Reports, 10(2):276–288, June 2024. ISSN 2198-6401. doi: 10.100 7/s40675-024-00301-z. URLhttps://doi.org/10.1007/s40675-024-0 0301-z

work page doi:10.1007/s40675-024-0 2024

[13] [13]

A review of automated sleep stage scoring based on physiological signals for the new millennia.Computer Methods and Pro- grams in Biomedicine, 176:81–91, July 2019

Oliver Faust, Hajar Razaghi, Ragab Barika, Edward J Ciaccio, and U Ra- jendra Acharya. A review of automated sleep stage scoring based on physiological signals for the new millennia.Computer Methods and Pro- grams in Biomedicine, 176:81–91, July 2019. ISSN 0169-2607. doi: 10.1016/j.cmpb.2019.04.032. URLhttps://www.sciencedirect.co m/science/article/pii/S0...

work page doi:10.1016/j.cmpb.2019.04.032 2019

[14] [14]

Bassetti, and Francesca D

Luigi Fiorillo, Alessandro Puiatti, Michela Papandrea, Pietro-Luca Ratti, Paolo Favaro, Corinne Roth, Panagiotis Bargiotas, Claudio L. Bassetti, and Francesca D. Faraci. Automated sleep scoring: A review of the latest approaches.Sleep Medicine Reviews, 48:101204, December 2019. ISSN 1087-0792. doi: 10.1016/j.smrv.2019.07.007. URLhttps://www.scienc edirect...

work page doi:10.1016/j.smrv.2019.07.007 2019

[15] [15]

Warncke, Markus H

Luigi Fiorillo, Giuliana Monachino, Julia van der Meer, Marco Pesce, Jan D. Warncke, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzo- vara, Paolo Favaro, and Francesca D. Faraci. U-Sleep’s resilience to AASM guidelines.npj Digital Medicine, 6(1):1–9, March 2023. ISSN 2398-6352. doi: 10.1038/s41746-023-00784-0. URLhttps://www.nature.com/artic les/s41...

work page doi:10.1038/s41746-023-00784-0 2023

[16] [16]

Bassetti, Søren Berg, Ludger Grote, Poul Jennum, Patrick Levy, Stefan Mihaicuta, Lino Nobili, Dieter Riemann, F

Jürgen Fischer, Zoran Dogas, Claudio L. Bassetti, Søren Berg, Ludger Grote, Poul Jennum, Patrick Levy, Stefan Mihaicuta, Lino Nobili, Dieter Riemann, F. Javier Puertas Cuesta, Friedhart Raschke, Debra J. Skene, Neil Stanley, Dirk Pevernagie, Executive Committee (EC) of the Assem- bly of the National Sleep Societies (ANSS), and Board of the European Sleep ...

work page doi:10.1111/j.1365-2869.2011.00987.x 2012

[17] [17]

Current status and prospects of automatic sleep stages scoring: Review.Biomedical Engineering Letters, 13(3):247–272, July

Maksym Gaiduk, Ángel Serrano Alarcón, Ralf Seepold, and Natividad Martínez Madrid. Current status and prospects of automatic sleep stages scoring: Review.Biomedical Engineering Letters, 13(3):247–272, July

work page

[18] [18]

doi: 10.1007/s13534-023-00299-3

ISSN 2093-9868. doi: 10.1007/s13534-023-00299-3. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382458/

work page doi:10.1007/s13534-023-00299-3 2093

[19] [19]

Gunnarsdottir, Charlene Gamaldo, Rachel Marie Salas, Joshua B

Kristin M. Gunnarsdottir, Charlene Gamaldo, Rachel Marie Salas, Joshua B. Ewen, Richard P. Allen, Katherine Hu, and Sridevi V. Sarma. A novel sleep stage scoring system: Combining expert-based features with the generalized linear model.Journal of Sleep Research, 29(5):e12991, Oc- tober 2020. ISSN 0962-1105, 1365-2869. doi: 10.1111/jsr.12991. URL https://o...

work page doi:10.1111/jsr.12991 2020

[20] [20]

Human-AI Collaboration: From Explainable AI to Co-Creating Meaning.ACIS 2024 Proceedings, December 2024

Emil Hardarson, Frida Ivarsson, Anna Sigríður Islind, Erna Sif Arnardóttir, and María Óskarsdóttir. Human-AI Collaboration: From Explainable AI to Co-Creating Meaning.ACIS 2024 Proceedings, December 2024. URL https://aisel.aisnet.org/acis2024/148

work page 2024

[21] [21]

Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time- Series Classification, March 2026

Emil Hardarson, Luka Biedebach, Ómar Bessi Ómarsson, Teitur Hrólfsson, Anna Sigridur Islind, and María Óskarsdóttir. Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time- Series Classification, March 2026. URLhttp://arxiv.org/abs/2603.1

work page 2026

[22] [22]

arXiv:2603.15939 [cs]

work page arXiv

[23] [23]

Past and Future of Computer-Assisted Sleep Analysis and Drowsiness Assessment:.Journal of Clinical Neurophysiology, 13(4):295– 313, July 1996

Joel Hasan. Past and Future of Computer-Assisted Sleep Analysis and Drowsiness Assessment:.Journal of Clinical Neurophysiology, 13(4):295– 313, July 1996. ISSN 0736-0258. doi: 10.1097/00004691-199607000-00004. URLhttp://journals.lww.com/00004691-199607000-00004

work page doi:10.1097/00004691-199607000-00004 1996

[24] [24]

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The Curious Case of Neural Text Degeneration, February 2020. URLhttp: //arxiv.org/abs/1904.09751. arXiv:1904.09751 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[25] [25]

Explainable Artificial Intelligence (XAI): Concepts and Chal- lenges in Healthcare.AI, 4(3):652–666, September 2023

Tim Hulsen. Explainable Artificial Intelligence (XAI): Concepts and Chal- lenges in Healthcare.AI, 4(3):652–666, September 2023. ISSN 2673-2688. doi: 10.3390/ai4030034. URLhttps://www.mdpi.com/2673-2688/4/3/

work page doi:10.3390/ai4030034 2023

[26] [26]

The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications, 1st ed., 2007

Conrad Iber, Sonia Ancoli-Israel, Andrew Chesson, and Stuart Quan. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications, 1st ed., 2007

work page 2007

[27] [27]

Toward a responsible future: rec- ommendations for AI-enabled clinical decision support.Journal of the American Medical Informatics Association, 31(11):2730–2739, November

Steven Labkoff, Bilikis Oladimeji, Joseph Kannry, Anthony Solomonides, Russell Leftwich, Eileen Koski, Amanda L Joseph, Monica Lopez-Gonzalez, Lee A Fleisher, Kimberly Nolen, Sayon Dutta, Deborah R Levy, Amy Price, Paul J Barr, Jonathan D Hron, Baihan Lin, Gyana Srivastava, Nuria Pastor, Unai Sanchez Luque, Tien Thi Thuy Bui, Reva Singh, Tayler 20 William...

work page

[28] [28]

doi: 10.1093/jamia/ocae209

ISSN 1067-5027, 1527-974X. doi: 10.1093/jamia/ocae209. URL https://academic.oup.com/jamia/article/31/11/2730/7776823

work page doi:10.1093/jamia/ocae209

[29] [29]

MNE-Python, November 2025

Eric Larson, Alexandre Gramfort, Denis A Engemann, Jaakko Leppakan- gas, Christian Brodbeck, Mainak Jas, Teon L Brooks, Jona Sassenhagen, Daniel McCloy, Martin Luessi, Jean-Rémi King, Richard Höchenberger, Clemens Brunner, Roman Goj, Guillaume Favelier, Marijn van Vliet, Mark Wronkiewicz, Stefan Appelhoff, Alex Rockhill, Chris Holdgraf, Mathieu Scheltienn...

work page doi:10.5281/zenodo.592483 2025

[30] [30]

Yun Ji Lee, Jae Yong Lee, Jae Hoon Cho, and Ji Ho Choi. Interrater reliability of sleep stage scoring: a meta-analysis.Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine, 18(1):193–202, January 2022. ISSN 1550-9389. doi: 10.5664/jc sm.9538. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC8807917/

work page doi:10.5664/jc 2022

[31] [31]

A rule- based automatic sleep staging method.Journal of Neuroscience Methods, 205(1):169–176, March 2012

Sheng-Fu Liang, Chin-En Kuo, Yu-Han Hu, and Yu-Shian Cheng. A rule- based automatic sleep staging method.Journal of Neuroscience Methods, 205(1):169–176, March 2012. ISSN 0165-0270. doi: 10.1016/j.jneumeth.2 011.12.022. URLhttps://www.sciencedirect.com/science/article/ pii/S016502701100759X

work page doi:10.1016/j.jneumeth.2 2012

[32] [32]

Kuna, Ruth Benca, Clete A

Atul Malhotra, Magdy Younes, Samuel T. Kuna, Ruth Benca, Clete A. Kushida, James Walsh, Alexandra Hanlon, Bethany Staley, Allan I. Pack, and Grace W. Pien. Performance of an automated polysomnography scor- ing system versus computer-assisted manual scoring.Sleep, 36(4):573–582, April 2013. ISSN 1550-9109. doi: 10.5665/sleep.2548

work page doi:10.5665/sleep.2548 2013

[33] [33]

Terrill, Heidur Gretarsdottir, Sigridur Sigurdardot- tir, Kristin Anna Olafsdottir, Anna Sigridur Islind, María Óskarsdóttir, Erna Sif Arnardóttir, and Timo Leppänen

Sami Nikkonen, Pranavan Somaskandhan, Henri Korkalainen, Samu Kain- ulainen, Philip I. Terrill, Heidur Gretarsdottir, Sigridur Sigurdardot- tir, Kristin Anna Olafsdottir, Anna Sigridur Islind, María Óskarsdóttir, Erna Sif Arnardóttir, and Timo Leppänen. Multicentre sleep-stage scoring agreement in the Sleep Revolution project.Journal of Sleep Research, 33...

work page doi:10.1111/jsr.13956 2024

[34] [34]

Computer based sleep recording and analysis.Sleep Medicine Reviews, 4(2):131–148, April2000

Thomas Penzel and Regina Conradt. Computer based sleep recording and analysis.Sleep Medicine Reviews, 4(2):131–148, April2000. ISSN10870792. doi: 10.1053/smrv.1999.0087. URLhttps://linkinghub.elsevier.com/ retrieve/pii/S1087079299900874

work page doi:10.1053/smrv.1999.0087 1999

[35] [35]

U-Sleep: resilient high-frequency sleep staging

MathiasPerslev, SuneDarkner, LykkeKempfner, MikiNikolic, PoulJørgen Jennum, and Christian Igel. U-Sleep: resilient high-frequency sleep staging. 23 npj Digital Medicine, 4(1):72, April 2021. ISSN 2398-6352. doi: 10.1038/ s41746-021-00440-5. URLhttps://www.nature.com/articles/s41746 -021-00440-5

work page 2021

[36] [36]

Lorenzen, Elisabeth Heremans, Oliver Y

Huy Phan, Kristian P. Lorenzen, Elisabeth Heremans, Oliver Y. Chén, Minh C. Tran, Philipp Koch, Alfred Mertins, Mathias Baumert, Kaare B. Mikkelsen, and Maarten De Vos. L-SeqSleepNet: Whole-cycle Long Se- quence Modeling for Automatic Sleep Staging.IEEE Journal of Biomedical and Health Informatics, 27(10):4748–4757, October 2023. ISSN 2168-2208. doi: 10.1...

work page doi:10.1109/jbhi.2023.3303197 2023

[37] [37]

University of California, Brain Information Service/Brain Research Institute, Los Ange- les, 1968

A Rechtschaffen and A Kales.A manual of standardized terminology, tech- niques and scoring system of sleep stages in human subjects. University of California, Brain Information Service/Brain Research Institute, Los Ange- les, 1968

work page 1968

[38] [38]

Rosenberg and Steven Van Hout

Richard S. Rosenberg and Steven Van Hout. The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring.Jour- nal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine, 9(1):81–87, January 2013. ISSN 1550-9397. doi: 10.5664/jcsm.2350

work page doi:10.5664/jcsm.2350 2013

[39] [39]

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead, September

Cynthia Rudin. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead, September

work page

[40] [40]

arXiv:1811.10154 [cs, stat]

URLhttp://arxiv.org/abs/1811.10154. arXiv:1811.10154 [cs, stat]

work page arXiv

[41] [41]

The Future of Sleep Staging, Revisited.Nature and Science of Sleep, 15:313–322, May 2023

Neil Stanley. The Future of Sleep Staging, Revisited.Nature and Science of Sleep, 15:313–322, May 2023. doi: 10.2147/NSS.S405663

work page doi:10.2147/nss.s405663 2023

[42] [42]

Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG.IEEE Transactions on Neural Systems and Rehabilitation Engineer- ing, 25(11):1998–2008, November 2017. ISSN 1534-4320, 1558-0210. doi: 10.1109/TNSRE.2017.2721116. URLhttp://arxiv.org/abs/1703.040

work page doi:10.1109/tnsre.2017.2721116 1998

[43] [43]

arXiv:1703.04046 [stat]

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Troester, Stuart F

Matthew M. Troester, Stuart F. Quan, American Academy of Sleep Medicine, and Richard B. Berry.The AASM Manual for the Scoring of Sleep and Associated Events, Version 3. American Academy Of Sleep Medicine, June 2023. ISBN 978-0-9706137-1-4

work page 2023

[45] [45]

An open-source, high-performance tool for automated sleep staging.eLife, 10:e70092, October 2021

Raphael Vallat and Matthew P Walker. An open-source, high-performance tool for automated sleep staging.eLife, 10:e70092, October 2021. ISSN 2050-084X. doi: 10.7554/eLife.70092. URLhttps://doi.org/10.7554/ eLife.70092. 24

work page doi:10.7554/elife.70092 2021

[46] [46]

P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified peri- odograms.IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, June 1967. ISSN 1558-2582. doi: 10.1109/TAU.1967.1161901. URL https://ieeexplore.ieee.org/document/1161901

work page doi:10.1109/tau.1967.1161901 1967

[47] [47]

A Review on Au- tomated Sleep Study.Annals of Biomedical Engineering, 52(6):1463–1491, June 2024

Mehran Yazdi, Mahdi Samaee, and Daniel Massicotte. A Review on Au- tomated Sleep Study.Annals of Biomedical Engineering, 52(6):1463–1491, June 2024. ISSN 1573-9686. doi: 10.1007/s10439-024-03486-0. URL https://doi.org/10.1007/s10439-024-03486-0

work page doi:10.1007/s10439-024-03486-0 2024

[48] [48]

EEG-Based Auto- matic Sleep Staging Using Ontology and Weighting Feature Analysis.Com- putational and Mathematical Methods in Medicine, 2018:1–16, September

Bingtao Zhang, Tao Lei, Hong Liu, and Hanshu Cai. EEG-Based Auto- matic Sleep Staging Using Ontology and Weighting Feature Analysis.Com- putational and Mathematical Methods in Medicine, 2018:1–16, September

work page 2018

[49] [49]

doi: 10.1155/2018/6534041

ISSN 1748-670X, 1748-6718. doi: 10.1155/2018/6534041. URL https://www.hindawi.com/journals/cmmm/2018/6534041/. 25 0 5000 10000 15000 20000Number of epochs Method disagrees Method agrees 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Human agreement ratio 0.0 0.5 1.0Proportion Figure 7: Distribution of human inter-scorer agreement for epochs where the rule-based algorith...

work page doi:10.1155/2018/6534041 2018