Suitability of an inter-burst detection method for grading hypoxic-ischemic encephalopathy in newborn EEG
Pith reviewed 2026-05-25 02:03 UTC · model grok-4.3
The pith
An inter-burst detection method from preterm infants works without change on term newborn EEG to classify grades of hypoxic-ischemic encephalopathy at 77.8 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The inter-burst detection method developed for preterm infants born less than 30 weeks gestational age accurately identifies inter-bursts in term infants. Features from the temporal organisation of the inter-bursts, in particular the percentage of inter-bursts and the maximum duration of inter-bursts, when combined in a multi-layer perceptron, classify four grades of hypoxic-ischemic encephalopathy with a testing accuracy of 77.8 percent, similar to existing multi-feature approaches.
What carries the argument
The inter-burst detection method (developed for preterm infants) together with the multi-layer perceptron that classifies injury grades from temporal features of the detected intervals.
If this is right
- The preterm inter-burst detector transfers directly to term EEG without retraining or parameter changes.
- Percentage of inter-bursts alone already separates the four injury grades at 59.3 percent accuracy.
- Adding maximum inter-burst duration to the classifier raises performance to 77.8 percent on test data.
- The resulting accuracy matches that of more elaborate multi-feature classifiers already in use.
Where Pith is reading between the lines
- The transfer success suggests that the statistical structure of inter-bursts is sufficiently stable across the term-preterm boundary to allow reuse of the same detector.
- If the same features continue to work on larger or more varied term cohorts, automated grading pipelines could drop the requirement for separate age-specific detectors.
- The approach opens a route to test whether the same two features also track recovery trajectories or predict later neurodevelopmental scores.
Load-bearing premise
The detector trained only on very preterm EEG marks the same inter-burst segments in term EEG that a human expert would mark.
What would settle it
Manual expert annotation of inter-bursts in a new set of term EEG recordings followed by direct comparison of detection overlap or boundary error against the preterm method's output.
Figures
read the original abstract
Electroencephalography (EEG) is an important clinical tool for grading injury caused by lack of oxygen or blood to the brain during birth. Characteristics of low-voltage waveforms, known as inter-bursts, are related to different grades of injury. This study assesses the suitability of an existing inter-burst detection method, developed from preterm infants born <30 weeks of gestational age, to detect inter-bursts in term infants. Different features from the temporal organisation of the inter-bursts are combined using a multi-layer perceptron (MLP) machine learning algorithm to classify four grades of injury in the EEG. We find that the best performing feature, percentage of inter-bursts, has an accuracy of 59.3%. Combining this with the maximum duration of inter-bursts in the MLP produces a testing accuracy of 77.8%, with similar performance to existing multi-feature methods. These results validate the use of the preterm detection method in term EEG and show how simple measures of the inter-burst interval can be used to classify different grades of injury.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper assesses the suitability of an inter-burst detection method originally developed for preterm infants (<30 weeks gestational age) when applied without modification to term infants for grading hypoxic-ischemic encephalopathy (HIE) severity in EEG. Temporal features derived from the detected inter-bursts (e.g., percentage of inter-bursts, maximum duration) are fed to a multi-layer perceptron classifier to distinguish four injury grades, with reported testing accuracies of 59.3% for the single best feature and 77.8% for the combined feature set.
Significance. If the detector transfer holds and the classification results are reliable, the work indicates that straightforward inter-burst interval statistics can achieve performance comparable to more complex multi-feature methods for HIE grading, potentially offering a simpler clinical tool. The empirical validation on new term recordings using standard machine learning is a positive aspect, though the absence of direct detector metrics limits the strength of the suitability claim.
major comments (2)
- [Abstract] Abstract: The central claim that the preterm inter-burst detector is suitable for term EEG rests on classification accuracy alone; no quantitative detection performance metrics (sensitivity, specificity, or agreement with annotations) are supplied for the term cohort, so maturational differences in burst patterns could invalidate the derived features without being detected.
- [Abstract] Abstract: Reported accuracies (59.3% single feature, 77.8% combined) are given without dataset size, number of recordings or subjects, cross-validation details, statistical testing, or error bars, preventing assessment of whether the results support the suitability conclusion or are consistent with chance-level performance.
Simulated Author's Rebuttal
We thank the referee for the detailed comments on our manuscript. We respond point-by-point to the major comments, indicating where revisions to the manuscript (including the abstract) will be made to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the preterm inter-burst detector is suitable for term EEG rests on classification accuracy alone; no quantitative detection performance metrics (sensitivity, specificity, or agreement with annotations) are supplied for the term cohort, so maturational differences in burst patterns could invalidate the derived features without being detected.
Authors: We agree that the suitability claim is supported indirectly via the downstream classification performance (77.8% testing accuracy using inter-burst features) rather than direct detector metrics on term data. The study applied the preterm method without modification to assess transferability through feature utility for HIE grading. Direct metrics (sensitivity/specificity) would require new expert annotations of inter-bursts in the term recordings, which were outside the scope of this work. We will revise the abstract and add a limitations paragraph in the discussion to explicitly note this indirect validation approach and the potential impact of maturational differences. revision: partial
-
Referee: [Abstract] Abstract: Reported accuracies (59.3% single feature, 77.8% combined) are given without dataset size, number of recordings or subjects, cross-validation details, statistical testing, or error bars, preventing assessment of whether the results support the suitability conclusion or are consistent with chance-level performance.
Authors: The abstract is constrained by length and therefore omits these details, but the full manuscript describes the dataset (number of recordings and subjects), the MLP architecture, leave-one-subject-out cross-validation, and results in the methods and results sections. To address the concern, we will revise the abstract to include the cohort size and cross-validation method, and ensure error bars or confidence intervals are reported if not already present in the results. revision: yes
- Direct quantitative detection performance metrics (sensitivity, specificity, agreement) for the inter-burst detector on the term cohort cannot be supplied, as the study did not generate new annotations of inter-bursts in term EEG.
Circularity Check
Minor self-citation of prior detector; downstream ML classification accuracy is independently measured
full rationale
The paper applies an existing inter-burst detector (developed on preterm EEG) to a new term-infant cohort, extracts simple interval features, and trains/tests an MLP to predict injury grades, achieving 77.8% test accuracy via standard supervised learning on held-out data. No equations, fitted parameters, or self-citations reduce this accuracy figure to a quantity defined by the inputs themselves. The reliance on the prior detector constitutes a self-citation but is not load-bearing for the reported result, as the classification performance remains an empirical, falsifiable outcome on independent recordings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The inter-burst detection algorithm developed on preterm infants (<30 weeks) identifies inter-bursts in term infants without adjustment.
Reference graph
Works this paper leans on
-
[1]
P. Kirsi, T. Kirjavainen, S. Stjerna, T. Salokivi, and S. Vanhat- alo. “Sleep wake cycling in early preterm infants: comparison of polysomnographic recordings with a novel EEG-based index” Clinical Neurophysiology, 124(9), (2013): 1807–1814
work page 2013
-
[2]
J. M. O’Toole, G. B. Boylan, S. Vanhatalo, and N. J. Stevenson. “Estimating functional brain maturity in very and extremely preterm neonates using automated analysis of the electroencephalogram.” Clin- ical Neurophysiology, 127(8), (2016): 2910–2918
work page 2016
-
[3]
R. W. Claire, J. E. Harding, C. E. Williams, M. I. Gunning, and M. R. Battin. “Quantitative electroencephalographic patterns in normal preterm infants over the first week after birth” Early Human Devel- opment, 82(1), (2006): 43–51
work page 2006
-
[4]
M. D. Murray, C. A. Ryan, G. B. Boylan, A. P. Fitzgerald, and S. Connolly. “Prediction of seizures in asphyxiated neonates: cor- relation with continuous video-electroencephalographic monitoring” Pediatrics, 118(1), (2006): 41–46
work page 2006
-
[5]
S. Vanhatalo, J. M. Palva, S. Andersson, C. Rivera, J. V oipio, and K. Kaila. “Slow endogenous activity transients and developmental expression of K+-Cl- cotransporter 2 in the immature human cortex” European Journal of Neuroscience , 22(11), (2005): 2799–2804
work page 2005
-
[6]
Improving reliability of monitoring background EEG dynamics in asphyxiated infants
V . Matic, P. J. Cherian, K. Jansen, N. Koolen, G. Naulaers, R. M. Swarte, P. Govaert, S. V . Huffel, and M. D. V os. “Improving reliability of monitoring background EEG dynamics in asphyxiated infants.” IEEE Transactions on Biomedical Engineering , 63(5), (2016): 973– 983
work page 2016
-
[7]
Detecting bursts in the EEG of very and extremely premature infants using a multi-feature approach
J. M. O’Toole, G. B. Boylan, R. O. Lloyd, R. M. Goulding, S. Vanhatalo, and N. J. Stevenson. “Detecting bursts in the EEG of very and extremely premature infants using a multi-feature approach.” Medical Engineering & Physics , 45, (2017): 42–50
work page 2017
-
[8]
Surrogate data test for nonlinearity of EEG signals: A newborn EEG burst suppression case study
P. Mirzaei, G. Azemi, N. Japaridze, and B. Boashash. “Surrogate data test for nonlinearity of EEG signals: A newborn EEG burst suppression case study.” Digital Signal Processing , 70 (2017): 30–38
work page 2017
-
[9]
Automated detection of neonate EEG sleep stages
A. Piryatinska, T. Gyorgy, W. A. Woyczynski, K. A. Loparo, M. S. Scher, and A. Zlotnik. “Automated detection of neonate EEG sleep stages.” Computer Methods and Programs in Biomedicine , 95(1), (2009): 31–46
work page 2009
-
[10]
N. J. Stevenson, I. Korotchikova, A. Temko, G. Lightbody, W. P. Marnane, and G. B. Boylan. “An automated system for grading EEG abnormality in term neonates with hypoxic-ischaemic encephalopa- thy.” Annals of Biomedical Engineering , 41(4), (2013): 775–785
work page 2013
-
[11]
R. Ahmed, A. Temko, W. Marnane, G. Lightbody, and G. Boylan. “Grading hypoxicischemic encephalopathy severity in neonatal EEG using GMM supervectors and the support vector machine.” Clinical Neurophysiology, 127(1), (2016): 297-309
work page 2016
-
[12]
B. H. Walsh, D. M. Murray, and G. B. Boylan. “The use of conven- tional EEG for the assessment of hypoxic ischaemic encephalopathy in the newborn: a review.” Clinical Neurophysiology, 122(7), (2011): 1284–1294
work page 2011
-
[13]
EMG acquisition and hand pose classification for bionic hands from randomly-placed sensors
S. Raurale, J. McAllister, and J. M. del Rincon, “EMG acquisition and hand pose classification for bionic hands from randomly-placed sensors” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , (2018): 1105–1109
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.