Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets
Pith reviewed 2026-06-29 19:53 UTC · model grok-4.3
The pith
Stimulus reconstruction DNN decoders overestimate auditory attention accuracy on unbalanced EEG datasets, but LOPEO cross-validation prevents the inflation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets, and demonstrates through experiments on the KUL, DTU, and NJU cEEGrid collections that the proposed leave-one-paired-envelope-out (LOPEO) cross-validation protocol effectively prevents this inflation while supplying a principled evaluation framework for unbalanced published data.
What carries the argument
The leave-one-paired-envelope-out (LOPEO) cross-validation protocol, which constructs folds by leaving out one paired stimulus envelope at a time to enforce balanced condition representation during decoder assessment.
If this is right
- Existing unbalanced EEG-AAD datasets can be evaluated without inflated accuracy metrics.
- Prior decoding performance numbers obtained on unbalanced data may need re-assessment under LOPEO.
- Experimental designs should still target class balance, yet LOPEO supplies a fallback for already-published collections.
- Decoder comparisons between balanced and unbalanced conditions become feasible without the bias.
Where Pith is reading between the lines
- Class imbalance may produce similar overestimation in other EEG decoding tasks that rely on stimulus reconstruction.
- The protocol could be adapted to additional biomedical signal classification problems where trial counts differ across classes.
- Future stimulus reconstruction studies might combine LOPEO with explicit class-weighting or resampling inside the training loop.
- The finding highlights a general need to verify that reported gains in EEG machine learning reflect signal content rather than dataset statistics.
Load-bearing premise
Constructing balanced and unbalanced versions from the same three public datasets isolates the effect of class balance without introducing other uncontrolled differences in trial statistics or signal quality.
What would settle it
Observing that standard cross-validation on the unbalanced versions yields the same accuracy as LOPEO while balanced versions show no such gap would falsify the overestimation claim.
read the original abstract
In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that stimulus reconstruction-based DNN decoders for auditory attention decoding (AAD) from EEG overestimate performance on unbalanced datasets. It constructs balanced and unbalanced conditions from three public datasets (KUL, DTU, NJU cEEGrid), hypothesizes this overestimation effect, proposes a leave-one-paired-envelope-out (LOPEO) cross-validation protocol to mitigate inflation, and states that experimental results confirm both the hypothesis and LOPEO's effectiveness. The work positions LOPEO as a framework for evaluating already-published unbalanced datasets.
Significance. If the central hypothesis and LOPEO results hold after proper controls, the finding would address an important and previously unexplored source of bias in AAD decoding studies, where unbalanced datasets are common. This could improve the reliability of reported accuracies and provide a practical evaluation tool for existing data, strengthening the field's methodological standards.
major comments (2)
- [Abstract and experimental conditions paragraph] Abstract and experimental conditions paragraph: the claim that balanced and unbalanced versions were constructed from the same three datasets to isolate the effect of class balance requires explicit verification that trial counts, durations, SNR distributions, and artifact statistics remain matched after re-sampling or re-partitioning. Without this, any observed performance gap cannot be attributed solely to label imbalance, as other uncontrolled covariates could drive the difference.
- [Abstract] Abstract: the hypothesis is stated and experimental confirmation is claimed, yet no quantitative results, error bars, statistical tests, decoder architectures, or performance metrics are supplied. This absence makes it impossible to assess the magnitude or reliability of the reported overestimation or LOPEO improvement.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below and agree to make revisions that strengthen the clarity and rigor of the manuscript.
read point-by-point responses
-
Referee: [Abstract and experimental conditions paragraph] Abstract and experimental conditions paragraph: the claim that balanced and unbalanced versions were constructed from the same three datasets to isolate the effect of class balance requires explicit verification that trial counts, durations, SNR distributions, and artifact statistics remain matched after re-sampling or re-partitioning. Without this, any observed performance gap cannot be attributed solely to label imbalance, as other uncontrolled covariates could drive the difference.
Authors: We agree that explicit verification is necessary to isolate the effect of class balance. The manuscript constructs the conditions by subsampling the majority class within each original recording to achieve balance while preserving the total trial count and durations; SNR distributions and artifact statistics are matched because the re-partitioning occurs within the same EEG segments. To make this verification fully transparent, we will add a table in the experimental conditions section (or supplementary material) reporting mean and standard deviation of trial counts, durations, SNR, and artifact rejection rates for balanced versus unbalanced versions of each dataset. revision: yes
-
Referee: [Abstract] Abstract: the hypothesis is stated and experimental confirmation is claimed, yet no quantitative results, error bars, statistical tests, decoder architectures, or performance metrics are supplied. This absence makes it impossible to assess the magnitude or reliability of the reported overestimation or LOPEO improvement.
Authors: We acknowledge that the current abstract is too high-level. The full manuscript reports decoder architectures (stimulus-reconstruction DNNs), performance metrics (e.g., attention decoding accuracy), error bars across cross-validation folds, and statistical comparisons. We will revise the abstract to include concise quantitative results, such as the magnitude of overestimation on unbalanced data and the improvement under LOPEO, along with the primary metrics and significance tests used. revision: yes
Circularity Check
No circularity: purely experimental comparison on public datasets
full rationale
The manuscript contains no derivations, equations, fitted parameters, or self-citation chains. The central claim rests on constructing balanced/unbalanced splits from three external public datasets (KUL, DTU, NJU cEEGrid) and measuring decoder accuracy under LOPEO cross-validation. Because the evaluation is data-driven and the protocol is defined independently of the observed performance numbers, no step reduces to its own inputs by construction. This is the expected non-finding for an empirical methods paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Some experiments on the recognition of speech, with one and with two ears,
E. C. Cherry, “Some experiments on the recognition of speech, with one and with two ears,” The Journal of the Acoustical Society of America , vol. 25, no. 5, pp. 975 – 979, 1953
1953
-
[2]
A. J. Kolarik, B. C. J. Moore, P. Zahorik, S. Cirstea, and S. Pardhan, “Auditory distance perception in humans: a Table 2 Experiment results under leave -one-trial-out cross-validation and leave -one-paired-envelope-out strategy. Acc represents the decoding accuracy. ,au is the Pearson correlation coefficient between the reconstructed envelope and the a...
-
[3]
Auditory attention decoding with EEG recordings using noisy acoustic reference signals,
A. Aroudi, B. Mirkovic, M. D. Vos, and S. Doclo, “Auditory attention decoding with EEG recordings using noisy acoustic reference signals,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar. 2016, pp. 694– 698. doi: 10.1109/ICASSP.2016.7471764
-
[4]
W. Biesmans, J. Vanthornhout, J. Wouters, M. Moonen, T. Francart, and A. Bertrand, “Comparison of speech envelope extraction methods for EEG -based auditory attention detection in a cocktail party scenario,” in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , Milan: IEEE, Aug. 2015, pp. 5155 –5158....
-
[5]
Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,
A. Aroudi, E. Fischer, M. Serman, H. Puder, and S. Doclo, “Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,” Algorithms, vol. 14, no. 10, Art. no. 10, Oct. 2021, doi: 10.3390/a14100287
-
[6]
On the importance of different cough phases for covid-19 detection,
Y. Zhang, H. Ruan, Z. Yuan, H. Du, X. Gao, and J. Lu, “A learnable spatial mapping for decoding the directional focus of auditory attention using EEG,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Jun. 2023, pp. 1–5. doi: 10.1109/ICASSP49357.2023.10096819
-
[7]
X. Dong et al. , “CSDA: Cross -Session D omain Adaptation in Auditory Attention Decoding of EEG for a Single Subject,” in 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Nov. 2024, pp. 451– 455. doi: 10.1109/ISCSLP63861.2024.10800128
-
[8]
H. Zhu and S. Cai, “ EEG-Based Neurosteered Speaker Extraction in Cocktail Party Environment Without Stimulus Reconstruction,” IEEE Transactions on Cognitive and Developmental Systems, vol. 18, no. 1, pp. 102–112, Feb. 2026, doi: 10.1109/TCDS.2025.3550441
-
[9]
Y. Zhang, J. Lu, F. Chen, H. Du, X. Gao, and Z. Lin, “Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum,” IEEE Transactions on Neural Systems and Rehabilitation Engineering , vol. 33, pp. 2892 –2903, 2025, doi: 10.1109/TNSRE.2025.3591819
-
[10]
EEG -based detection of the locus of auditory attention with convolutional neural networks,
S. Vandecappelle, L. Deckers, N. Das, A. H. Ansari, A. Bertrand, and T. Francart, “EEG -based detection of the locus of auditory attention with convolutional neural networks,” eLife, vol. 10, Apr. 2021, doi: 10.7554/eLife.56481
-
[11]
Y. Yan et al., “Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings,” presented at the Proc. Interspeech 2025, 2025, pp. 1053– 1057. doi: 10.21437/Interspeech.2025-85
-
[12]
X. Xu et al. , “Beware of Over estimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals,” May 27, 2024, arXiv : arXiv:2405.17024. doi: 10.48550/arXiv.2405.17024
-
[14]
I. Rotaru, S. Geirnaert, N. Heintz, I. V. de Ryck, A. Bertrand, and T. Francart, “What are we really decoding? Unveiling biases in EEG -based decoding of the spatial focus of auditory attention,” J. Neural Eng. , vol. 21, no. 1, p. 016017, Feb. 2024, doi: 10.1088/1741- 2552/ad2214
-
[15]
A large auditory EEG decoding dataset
L. Bollens, B. Accou, H. Van Hamme, and T. Francart, “A large auditory EEG decoding dataset.” KU Leuven RDR, 2023. doi: 10.48804/K3VSND
-
[16]
ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations
S. Cai, Z. Lin, T. He, and H. Li, “ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations.” Zenodo, Jun. 2024. doi: 10.5281/zenodo.11541114
-
[17]
Auditory attention detection dataset KULeuven
N. Das, T. Francart, and A. Bertrand, “Auditory attention detection dataset KULeuven.” Zenodo, Aug. 27, 2020. doi: 10.5281/zenodo.3997352
-
[18]
EEG and audio dataset for auditory attention decoding
S. A. Fuglsang, D. D. E. Wong, and J. Hjortkjær, “EEG and audio dataset for auditory attention decoding.” Zenodo, Mar. 15, 2018. doi: 10.5281/zenodo.1199011
-
[19]
Z. Yuanming, S. Z eyan, L. Jing, and L. Zhibin, “16 channel Three speaker dynamic switch cEEGrid Auditory Attention Decoding Dataset Nanjing University.” Zenodo, Oct. 20, 2025. doi: 10.5281/zenodo.17393865
-
[20]
Auditory attention detection da taset nanjing university
Y. Zhang, Z. Yuan, and J. Lu, “Auditory attention detection da taset nanjing university.” Zenodo, Oct. 26,
-
[21]
doi: 10.5281/zenodo.7253438
-
[22]
Scientific Reports14(1), 23053 (2024)
B. Accou, J. Vanthornhout, H. V. Hamme, and T. Francart, “Decoding of the speech envelope from EEG using the VLAAI deep neural network,” Sci Rep, vol. 13, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41598- 022- 27332-2
-
[23]
Y. Ding, L. Wang, J. Lu, Z. Lin, and F. Chen, “EEG - Based Auditory Attention Decoding for Speaker Identification Under Mixed -Speech Hearing -Assistive Conditions,” IEEE Transactions on Biomedical Engineering, pp. 1 –12, 2025, doi: 10.1109/TBME.2025.3647138
-
[24]
C. Fan et al. , “ListenNet: A Lightweight Spatio - Temporal Enhancement Nested Network for Auditory Attention Detection,” May 15, 2025, arXiv: arXiv:2505.10348. doi: 10.48550/arXiv.2505.10348
-
[25]
FlowSep: Language-queried sound separation with rectified flow matching,
C. Fan , S. Zhang, J. Zhang, Z. Pan, and Z. Lv, “SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Apr. 2025, pp. 1– 5. doi: 10.1109/ICASSP49660.2025.10888785
-
[26]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” Jan. 29, 2017, arXiv : arXiv:1412.6980. doi: 10.48550/arXiv.1412.6980
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.