Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

Jing Lu; Yayun Liang; Yuanming Zhang; Zhibin Lin

arxiv: 2605.25605 · v1 · pith:AZ52J3V6new · submitted 2026-05-25 · 📡 eess.AS · cs.LG

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

Yuanming Zhang , Yayun Liang , Zhibin Lin , Jing Lu This is my paper

Pith reviewed 2026-06-29 19:53 UTC · model grok-4.3

classification 📡 eess.AS cs.LG

keywords auditory attention decodingEEGDNN decoderunbalanced datasetsstimulus reconstructioncross-validationLOPEO

0 comments

The pith

Stimulus reconstruction DNN decoders overestimate auditory attention accuracy on unbalanced EEG datasets, but LOPEO cross-validation prevents the inflation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep neural networks decoding auditory attention from EEG via stimulus reconstruction produce inflated accuracy figures when the training data contains unequal numbers of trials from each attention condition. The authors test the hypothesis by constructing both balanced and unbalanced versions of three public datasets and measuring the performance gap under standard evaluation. They introduce the leave-one-paired-envelope-out protocol, which structures cross-validation folds around paired stimulus envelopes so that each fold maintains equal representation of the two conditions. This approach lets researchers evaluate already-collected unbalanced datasets without the artificial boost. The result matters because many existing EEG-AAD collections are unbalanced by design or collection constraints, so uncorrected numbers can mislead about true decoder capability.

Core claim

The paper claims that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets, and demonstrates through experiments on the KUL, DTU, and NJU cEEGrid collections that the proposed leave-one-paired-envelope-out (LOPEO) cross-validation protocol effectively prevents this inflation while supplying a principled evaluation framework for unbalanced published data.

What carries the argument

The leave-one-paired-envelope-out (LOPEO) cross-validation protocol, which constructs folds by leaving out one paired stimulus envelope at a time to enforce balanced condition representation during decoder assessment.

If this is right

Existing unbalanced EEG-AAD datasets can be evaluated without inflated accuracy metrics.
Prior decoding performance numbers obtained on unbalanced data may need re-assessment under LOPEO.
Experimental designs should still target class balance, yet LOPEO supplies a fallback for already-published collections.
Decoder comparisons between balanced and unbalanced conditions become feasible without the bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Class imbalance may produce similar overestimation in other EEG decoding tasks that rely on stimulus reconstruction.
The protocol could be adapted to additional biomedical signal classification problems where trial counts differ across classes.
Future stimulus reconstruction studies might combine LOPEO with explicit class-weighting or resampling inside the training loop.
The finding highlights a general need to verify that reported gains in EEG machine learning reflect signal content rather than dataset statistics.

Load-bearing premise

Constructing balanced and unbalanced versions from the same three public datasets isolates the effect of class balance without introducing other uncontrolled differences in trial statistics or signal quality.

What would settle it

Observing that standard cross-validation on the unbalanced versions yields the same accuracy as LOPEO while balanced versions show no such gap would falsify the overestimation claim.

read the original abstract

In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags that stimulus-reconstruction DNNs inflate AAD accuracy on unbalanced EEG sets and offers LOPEO as a targeted CV fix, but supplies no numbers or balance checks in the visible text.

read the letter

The main takeaway is that stimulus reconstruction DNNs for auditory attention decoding from EEG give inflated performance when the attended and unattended classes are unbalanced, and the authors introduce a leave-one-paired-envelope-out cross-validation scheme to reduce that inflation.

The work takes three public datasets, builds balanced and unbalanced versions from them, runs the DNN decoders, and compares results. LOPEO is the concrete new piece: it structures the folds so that paired stimulus envelopes are held out together, which fits the reconstruction setup better than off-the-shelf CV. That is a practical adjustment for a subfield that often works with existing unbalanced recordings.

The hypothesis is reasonable on its face. When one class dominates, a model can pick up on the imbalance rather than the actual neural signature of attention. Calling this out and giving a protocol that works on already-published data is useful for people who cannot rerun experiments.

The soft spots are straightforward. The abstract states the claim and says experiments confirm it, yet shows no accuracy figures, no error bars, no statistical tests, and no description of how the balanced and unbalanced partitions were actually made. Without those, it is impossible to judge the size of the effect or whether LOPEO closes it. The stress-test point also lands: nothing in the provided text verifies that trial counts, durations, SNR, or artifact rates stayed matched after re-partitioning. If those covariates moved, the performance gap could come from them rather than balance alone.

This is for labs running DNN-based AAD on EEG who need evaluation protocols that do not overstate results on real-world data. It deserves a serious referee because the practical issue is common and the proposed fix is cheap to try, even though the current evidence is thin.

Referee Report

2 major / 0 minor

Summary. The paper claims that stimulus reconstruction-based DNN decoders for auditory attention decoding (AAD) from EEG overestimate performance on unbalanced datasets. It constructs balanced and unbalanced conditions from three public datasets (KUL, DTU, NJU cEEGrid), hypothesizes this overestimation effect, proposes a leave-one-paired-envelope-out (LOPEO) cross-validation protocol to mitigate inflation, and states that experimental results confirm both the hypothesis and LOPEO's effectiveness. The work positions LOPEO as a framework for evaluating already-published unbalanced datasets.

Significance. If the central hypothesis and LOPEO results hold after proper controls, the finding would address an important and previously unexplored source of bias in AAD decoding studies, where unbalanced datasets are common. This could improve the reliability of reported accuracies and provide a practical evaluation tool for existing data, strengthening the field's methodological standards.

major comments (2)

[Abstract and experimental conditions paragraph] Abstract and experimental conditions paragraph: the claim that balanced and unbalanced versions were constructed from the same three datasets to isolate the effect of class balance requires explicit verification that trial counts, durations, SNR distributions, and artifact statistics remain matched after re-sampling or re-partitioning. Without this, any observed performance gap cannot be attributed solely to label imbalance, as other uncontrolled covariates could drive the difference.
[Abstract] Abstract: the hypothesis is stated and experimental confirmation is claimed, yet no quantitative results, error bars, statistical tests, decoder architectures, or performance metrics are supplied. This absence makes it impossible to assess the magnitude or reliability of the reported overestimation or LOPEO improvement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and agree to make revisions that strengthen the clarity and rigor of the manuscript.

read point-by-point responses

Referee: [Abstract and experimental conditions paragraph] Abstract and experimental conditions paragraph: the claim that balanced and unbalanced versions were constructed from the same three datasets to isolate the effect of class balance requires explicit verification that trial counts, durations, SNR distributions, and artifact statistics remain matched after re-sampling or re-partitioning. Without this, any observed performance gap cannot be attributed solely to label imbalance, as other uncontrolled covariates could drive the difference.

Authors: We agree that explicit verification is necessary to isolate the effect of class balance. The manuscript constructs the conditions by subsampling the majority class within each original recording to achieve balance while preserving the total trial count and durations; SNR distributions and artifact statistics are matched because the re-partitioning occurs within the same EEG segments. To make this verification fully transparent, we will add a table in the experimental conditions section (or supplementary material) reporting mean and standard deviation of trial counts, durations, SNR, and artifact rejection rates for balanced versus unbalanced versions of each dataset. revision: yes
Referee: [Abstract] Abstract: the hypothesis is stated and experimental confirmation is claimed, yet no quantitative results, error bars, statistical tests, decoder architectures, or performance metrics are supplied. This absence makes it impossible to assess the magnitude or reliability of the reported overestimation or LOPEO improvement.

Authors: We acknowledge that the current abstract is too high-level. The full manuscript reports decoder architectures (stimulus-reconstruction DNNs), performance metrics (e.g., attention decoding accuracy), error bars across cross-validation folds, and statistical comparisons. We will revise the abstract to include concise quantitative results, such as the magnitude of overestimation on unbalanced data and the improvement under LOPEO, along with the primary metrics and significance tests used. revision: yes

Circularity Check

0 steps flagged

No circularity: purely experimental comparison on public datasets

full rationale

The manuscript contains no derivations, equations, fitted parameters, or self-citation chains. The central claim rests on constructing balanced/unbalanced splits from three external public datasets (KUL, DTU, NJU cEEGrid) and measuring decoder accuracy under LOPEO cross-validation. Because the evaluation is data-driven and the protocol is defined independently of the observed performance numbers, no step reduces to its own inputs by construction. This is the expected non-finding for an empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on fitted parameters, background axioms, or new postulated entities; ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5699 in / 984 out tokens · 26415 ms · 2026-06-29T19:53:57.570461+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Some experiments on the recognition of speech, with one and with two ears,

E. C. Cherry, “Some experiments on the recognition of speech, with one and with two ears,” The Journal of the Acoustical Society of America , vol. 25, no. 5, pp. 975 – 979, 1953

1953
[2]

A. J. Kolarik, B. C. J. Moore, P. Zahorik, S. Cirstea, and S. Pardhan, “Auditory distance perception in humans: a Table 2 Experiment results under leave -one-trial-out cross-validation and leave -one-paired-envelope-out strategy. Acc represents the decoding accuracy. ,au is the Pearson correlation coefficient between the reconstructed envelope and the a...

work page doi:10.3758/s13414- 2016
[3]

Auditory attention decoding with EEG recordings using noisy acoustic reference signals,

A. Aroudi, B. Mirkovic, M. D. Vos, and S. Doclo, “Auditory attention decoding with EEG recordings using noisy acoustic reference signals,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar. 2016, pp. 694– 698. doi: 10.1109/ICASSP.2016.7471764

work page doi:10.1109/icassp.2016.7471764 2016
[4]

Comparison of speech envelope extraction methods for EEG -based auditory attention detection in a cocktail party scenario,

W. Biesmans, J. Vanthornhout, J. Wouters, M. Moonen, T. Francart, and A. Bertrand, “Comparison of speech envelope extraction methods for EEG -based auditory attention detection in a cocktail party scenario,” in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , Milan: IEEE, Aug. 2015, pp. 5155 –5158....

work page doi:10.1109/embc.2015.7319552 2015
[5]

Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,

A. Aroudi, E. Fischer, M. Serman, H. Puder, and S. Doclo, “Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,” Algorithms, vol. 14, no. 10, Art. no. 10, Oct. 2021, doi: 10.3390/a14100287

work page doi:10.3390/a14100287 2021
[6]

On the importance of different cough phases for covid-19 detection,

Y. Zhang, H. Ruan, Z. Yuan, H. Du, X. Gao, and J. Lu, “A learnable spatial mapping for decoding the directional focus of auditory attention using EEG,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Jun. 2023, pp. 1–5. doi: 10.1109/ICASSP49357.2023.10096819

work page doi:10.1109/icassp49357.2023.10096819 2023
[7]

Improv- ing grapheme-to-phoneme conversion through in-context knowl- edge retrieval with large language models,

X. Dong et al. , “CSDA: Cross -Session D omain Adaptation in Auditory Attention Decoding of EEG for a Single Subject,” in 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Nov. 2024, pp. 451– 455. doi: 10.1109/ISCSLP63861.2024.10800128

work page doi:10.1109/iscslp63861.2024.10800128 2024
[8]

EEG-Based Neurosteered Speaker Extraction in Cocktail Party Environment Without Stimulus Reconstruction,

H. Zhu and S. Cai, “ EEG-Based Neurosteered Speaker Extraction in Cocktail Party Environment Without Stimulus Reconstruction,” IEEE Transactions on Cognitive and Developmental Systems, vol. 18, no. 1, pp. 102–112, Feb. 2026, doi: 10.1109/TCDS.2025.3550441

work page doi:10.1109/tcds.2025.3550441 2026
[9]

Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum,

Y. Zhang, J. Lu, F. Chen, H. Du, X. Gao, and Z. Lin, “Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum,” IEEE Transactions on Neural Systems and Rehabilitation Engineering , vol. 33, pp. 2892 –2903, 2025, doi: 10.1109/TNSRE.2025.3591819

work page doi:10.1109/tnsre.2025.3591819 2025
[10]

EEG -based detection of the locus of auditory attention with convolutional neural networks,

S. Vandecappelle, L. Deckers, N. Das, A. H. Ansari, A. Bertrand, and T. Francart, “EEG -based detection of the locus of auditory attention with convolutional neural networks,” eLife, vol. 10, Apr. 2021, doi: 10.7554/eLife.56481

work page doi:10.7554/elife.56481 2021
[11]

Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings,

Y. Yan et al., “Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings,” presented at the Proc. Interspeech 2025, 2025, pp. 1053– 1057. doi: 10.21437/Interspeech.2025-85

work page doi:10.21437/interspeech.2025-85 2025
[12]

Beware of Over estimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals,

X. Xu et al. , “Beware of Over estimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals,” May 27, 2024, arXiv : arXiv:2405.17024. doi: 10.48550/arXiv.2405.17024

work page doi:10.48550/arxiv.2405.17024 2024
[14]

Kumar, G

I. Rotaru, S. Geirnaert, N. Heintz, I. V. de Ryck, A. Bertrand, and T. Francart, “What are we really decoding? Unveiling biases in EEG -based decoding of the spatial focus of auditory attention,” J. Neural Eng. , vol. 21, no. 1, p. 016017, Feb. 2024, doi: 10.1088/1741- 2552/ad2214

work page doi:10.1088/1741- 2024
[15]

A large auditory EEG decoding dataset

L. Bollens, B. Accou, H. Van Hamme, and T. Francart, “A large auditory EEG decoding dataset.” KU Leuven RDR, 2023. doi: 10.48804/K3VSND

work page doi:10.48804/k3vsnd 2023
[16]

ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations

S. Cai, Z. Lin, T. He, and H. Li, “ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations.” Zenodo, Jun. 2024. doi: 10.5281/zenodo.11541114

work page doi:10.5281/zenodo.11541114 2024
[17]

Auditory attention detection dataset KULeuven

N. Das, T. Francart, and A. Bertrand, “Auditory attention detection dataset KULeuven.” Zenodo, Aug. 27, 2020. doi: 10.5281/zenodo.3997352

work page doi:10.5281/zenodo.3997352 2020
[18]

EEG and audio dataset for auditory attention decoding

S. A. Fuglsang, D. D. E. Wong, and J. Hjortkjær, “EEG and audio dataset for auditory attention decoding.” Zenodo, Mar. 15, 2018. doi: 10.5281/zenodo.1199011

work page doi:10.5281/zenodo.1199011 2018
[19]

16 channel Three speaker dynamic switch cEEGrid Auditory Attention Decoding Dataset Nanjing University

Z. Yuanming, S. Z eyan, L. Jing, and L. Zhibin, “16 channel Three speaker dynamic switch cEEGrid Auditory Attention Decoding Dataset Nanjing University.” Zenodo, Oct. 20, 2025. doi: 10.5281/zenodo.17393865

work page doi:10.5281/zenodo.17393865 2025
[20]

Auditory attention detection da taset nanjing university

Y. Zhang, Z. Yuan, and J. Lu, “Auditory attention detection da taset nanjing university.” Zenodo, Oct. 26,
[21]

doi: 10.5281/zenodo.7253438

work page doi:10.5281/zenodo.7253438
[22]

Scientific Reports14(1), 23053 (2024)

B. Accou, J. Vanthornhout, H. V. Hamme, and T. Francart, “Decoding of the speech envelope from EEG using the VLAAI deep neural network,” Sci Rep, vol. 13, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41598- 022- 27332-2

work page doi:10.1038/s41598- 2023
[23]

EEG - Based Auditory Attention Decoding for Speaker Identification Under Mixed -Speech Hearing -Assistive Conditions,

Y. Ding, L. Wang, J. Lu, Z. Lin, and F. Chen, “EEG - Based Auditory Attention Decoding for Speaker Identification Under Mixed -Speech Hearing -Assistive Conditions,” IEEE Transactions on Biomedical Engineering, pp. 1 –12, 2025, doi: 10.1109/TBME.2025.3647138

work page doi:10.1109/tbme.2025.3647138 2025
[24]

ListenNet: A Lightweight Spatio - Temporal Enhancement Nested Network for Auditory Attention Detection,

C. Fan et al. , “ListenNet: A Lightweight Spatio - Temporal Enhancement Nested Network for Auditory Attention Detection,” May 15, 2025, arXiv: arXiv:2505.10348. doi: 10.48550/arXiv.2505.10348

work page doi:10.48550/arxiv.2505.10348 2025
[25]

FlowSep: Language-queried sound separation with rectified flow matching,

C. Fan , S. Zhang, J. Zhang, Z. Pan, and Z. Lv, “SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Apr. 2025, pp. 1– 5. doi: 10.1109/ICASSP49660.2025.10888785

work page doi:10.1109/icassp49660.2025.10888785 2025
[26]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” Jan. 29, 2017, arXiv : arXiv:1412.6980. doi: 10.48550/arXiv.1412.6980

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2017

[1] [1]

Some experiments on the recognition of speech, with one and with two ears,

E. C. Cherry, “Some experiments on the recognition of speech, with one and with two ears,” The Journal of the Acoustical Society of America , vol. 25, no. 5, pp. 975 – 979, 1953

1953

[2] [2]

A. J. Kolarik, B. C. J. Moore, P. Zahorik, S. Cirstea, and S. Pardhan, “Auditory distance perception in humans: a Table 2 Experiment results under leave -one-trial-out cross-validation and leave -one-paired-envelope-out strategy. Acc represents the decoding accuracy. ,au is the Pearson correlation coefficient between the reconstructed envelope and the a...

work page doi:10.3758/s13414- 2016

[3] [3]

Auditory attention decoding with EEG recordings using noisy acoustic reference signals,

A. Aroudi, B. Mirkovic, M. D. Vos, and S. Doclo, “Auditory attention decoding with EEG recordings using noisy acoustic reference signals,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar. 2016, pp. 694– 698. doi: 10.1109/ICASSP.2016.7471764

work page doi:10.1109/icassp.2016.7471764 2016

[4] [4]

Comparison of speech envelope extraction methods for EEG -based auditory attention detection in a cocktail party scenario,

W. Biesmans, J. Vanthornhout, J. Wouters, M. Moonen, T. Francart, and A. Bertrand, “Comparison of speech envelope extraction methods for EEG -based auditory attention detection in a cocktail party scenario,” in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) , Milan: IEEE, Aug. 2015, pp. 5155 –5158....

work page doi:10.1109/embc.2015.7319552 2015

[5] [5]

Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,

A. Aroudi, E. Fischer, M. Serman, H. Puder, and S. Doclo, “Closed -loop cognitive -driven gain control of competing sounds using auditory attention decoding,” Algorithms, vol. 14, no. 10, Art. no. 10, Oct. 2021, doi: 10.3390/a14100287

work page doi:10.3390/a14100287 2021

[6] [6]

On the importance of different cough phases for covid-19 detection,

Y. Zhang, H. Ruan, Z. Yuan, H. Du, X. Gao, and J. Lu, “A learnable spatial mapping for decoding the directional focus of auditory attention using EEG,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Jun. 2023, pp. 1–5. doi: 10.1109/ICASSP49357.2023.10096819

work page doi:10.1109/icassp49357.2023.10096819 2023

[7] [7]

Improv- ing grapheme-to-phoneme conversion through in-context knowl- edge retrieval with large language models,

X. Dong et al. , “CSDA: Cross -Session D omain Adaptation in Auditory Attention Decoding of EEG for a Single Subject,” in 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Nov. 2024, pp. 451– 455. doi: 10.1109/ISCSLP63861.2024.10800128

work page doi:10.1109/iscslp63861.2024.10800128 2024

[8] [8]

EEG-Based Neurosteered Speaker Extraction in Cocktail Party Environment Without Stimulus Reconstruction,

H. Zhu and S. Cai, “ EEG-Based Neurosteered Speaker Extraction in Cocktail Party Environment Without Stimulus Reconstruction,” IEEE Transactions on Cognitive and Developmental Systems, vol. 18, no. 1, pp. 102–112, Feb. 2026, doi: 10.1109/TCDS.2025.3550441

work page doi:10.1109/tcds.2025.3550441 2026

[9] [9]

Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum,

Y. Zhang, J. Lu, F. Chen, H. Du, X. Gao, and Z. Lin, “Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum,” IEEE Transactions on Neural Systems and Rehabilitation Engineering , vol. 33, pp. 2892 –2903, 2025, doi: 10.1109/TNSRE.2025.3591819

work page doi:10.1109/tnsre.2025.3591819 2025

[10] [10]

EEG -based detection of the locus of auditory attention with convolutional neural networks,

S. Vandecappelle, L. Deckers, N. Das, A. H. Ansari, A. Bertrand, and T. Francart, “EEG -based detection of the locus of auditory attention with convolutional neural networks,” eLife, vol. 10, Apr. 2021, doi: 10.7554/eLife.56481

work page doi:10.7554/elife.56481 2021

[11] [11]

Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings,

Y. Yan et al., “Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings,” presented at the Proc. Interspeech 2025, 2025, pp. 1053– 1057. doi: 10.21437/Interspeech.2025-85

work page doi:10.21437/interspeech.2025-85 2025

[12] [12]

Beware of Over estimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals,

X. Xu et al. , “Beware of Over estimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals,” May 27, 2024, arXiv : arXiv:2405.17024. doi: 10.48550/arXiv.2405.17024

work page doi:10.48550/arxiv.2405.17024 2024

[13] [14]

Kumar, G

I. Rotaru, S. Geirnaert, N. Heintz, I. V. de Ryck, A. Bertrand, and T. Francart, “What are we really decoding? Unveiling biases in EEG -based decoding of the spatial focus of auditory attention,” J. Neural Eng. , vol. 21, no. 1, p. 016017, Feb. 2024, doi: 10.1088/1741- 2552/ad2214

work page doi:10.1088/1741- 2024

[14] [15]

A large auditory EEG decoding dataset

L. Bollens, B. Accou, H. Van Hamme, and T. Francart, “A large auditory EEG decoding dataset.” KU Leuven RDR, 2023. doi: 10.48804/K3VSND

work page doi:10.48804/k3vsnd 2023

[15] [16]

ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations

S. Cai, Z. Lin, T. He, and H. Li, “ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations.” Zenodo, Jun. 2024. doi: 10.5281/zenodo.11541114

work page doi:10.5281/zenodo.11541114 2024

[16] [17]

Auditory attention detection dataset KULeuven

N. Das, T. Francart, and A. Bertrand, “Auditory attention detection dataset KULeuven.” Zenodo, Aug. 27, 2020. doi: 10.5281/zenodo.3997352

work page doi:10.5281/zenodo.3997352 2020

[17] [18]

EEG and audio dataset for auditory attention decoding

S. A. Fuglsang, D. D. E. Wong, and J. Hjortkjær, “EEG and audio dataset for auditory attention decoding.” Zenodo, Mar. 15, 2018. doi: 10.5281/zenodo.1199011

work page doi:10.5281/zenodo.1199011 2018

[18] [19]

16 channel Three speaker dynamic switch cEEGrid Auditory Attention Decoding Dataset Nanjing University

Z. Yuanming, S. Z eyan, L. Jing, and L. Zhibin, “16 channel Three speaker dynamic switch cEEGrid Auditory Attention Decoding Dataset Nanjing University.” Zenodo, Oct. 20, 2025. doi: 10.5281/zenodo.17393865

work page doi:10.5281/zenodo.17393865 2025

[19] [20]

Auditory attention detection da taset nanjing university

Y. Zhang, Z. Yuan, and J. Lu, “Auditory attention detection da taset nanjing university.” Zenodo, Oct. 26,

[20] [21]

doi: 10.5281/zenodo.7253438

work page doi:10.5281/zenodo.7253438

[21] [22]

Scientific Reports14(1), 23053 (2024)

B. Accou, J. Vanthornhout, H. V. Hamme, and T. Francart, “Decoding of the speech envelope from EEG using the VLAAI deep neural network,” Sci Rep, vol. 13, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41598- 022- 27332-2

work page doi:10.1038/s41598- 2023

[22] [23]

EEG - Based Auditory Attention Decoding for Speaker Identification Under Mixed -Speech Hearing -Assistive Conditions,

Y. Ding, L. Wang, J. Lu, Z. Lin, and F. Chen, “EEG - Based Auditory Attention Decoding for Speaker Identification Under Mixed -Speech Hearing -Assistive Conditions,” IEEE Transactions on Biomedical Engineering, pp. 1 –12, 2025, doi: 10.1109/TBME.2025.3647138

work page doi:10.1109/tbme.2025.3647138 2025

[23] [24]

ListenNet: A Lightweight Spatio - Temporal Enhancement Nested Network for Auditory Attention Detection,

C. Fan et al. , “ListenNet: A Lightweight Spatio - Temporal Enhancement Nested Network for Auditory Attention Detection,” May 15, 2025, arXiv: arXiv:2505.10348. doi: 10.48550/arXiv.2505.10348

work page doi:10.48550/arxiv.2505.10348 2025

[24] [25]

FlowSep: Language-queried sound separation with rectified flow matching,

C. Fan , S. Zhang, J. Zhang, Z. Pan, and Z. Lv, “SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Apr. 2025, pp. 1– 5. doi: 10.1109/ICASSP49660.2025.10888785

work page doi:10.1109/icassp49660.2025.10888785 2025

[25] [26]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” Jan. 29, 2017, arXiv : arXiv:1412.6980. doi: 10.48550/arXiv.1412.6980

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2017