Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

Elvin Som\'on; Miguel A. Guti\'errez-Naranjo

arxiv: 2605.23453 · v1 · pith:4WOM5EPWnew · submitted 2026-05-22 · 💻 cs.LG

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

Elvin Som\'on , Miguel A. Guti\'errez-Naranjo This is my paper

Pith reviewed 2026-05-25 04:37 UTC · model grok-4.3

classification 💻 cs.LG

keywords data augmentationclass imbalancemigraine classificationhybrid methodsmulticlassmedical machine learningimbalanced learning

0 comments

The pith

Class-dependent hybrid augmentation with proportional growth improves average macro-F1 to 0.862 across classifiers for seven migraine subtypes after leakage corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper re-evaluates prior migraine classification studies by correcting for data leakage and metric bias, which brings the baseline macro-F1 down to 0.71. It proposes a clinically motivated aggregation of hemiplegic subtypes and a class-dependent hybrid augmentation strategy that selects generation methods according to per-class sample sizes, along with the idea of fidelity asymmetry that favors proportionally constrained growth over full class balancing. On a dataset of 400 patients, the framework raises the average macro-F1 across eight classifiers to 0.862, beating individual augmenters and the no-augmentation baseline of 0.801, while the peak of 0.914 occurs with the FT-Transformer under proportional augmentation. A reader would care because the work shows that tailoring augmentation to class size and fixing problem formulation can yield more reliable performance in severely imbalanced medical multiclass tasks.

Core claim

After correcting methodological flaws in previous work, the class-dependent hybrid data augmentation framework, which assigns different synthetic data generation methods based on per-class sample size and employs proportionally constrained growth motivated by fidelity asymmetry, consistently outperforms both no-augmentation and single-augmenter baselines in macro-F1 averaged across eight classifiers, achieving 0.862 on average and a maximum of 0.914 with FT-Transformer, while demonstrating that clinically motivated subtype aggregation accounts for most of the absolute gains at the per-classifier level.

What carries the argument

The class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, together with the fidelity asymmetry concept that motivates proportionally constrained growth as an alternative to full class balance.

If this is right

The proposed framework provides higher average robustness across multiple classifiers than any individual augmentation method.
Clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 accounts for most of the absolute performance improvement when using the best single classifier.
Proportional augmentation under fidelity asymmetry yields better results than aiming for full class balance in this imbalanced setting.
Correcting for data leakage and metric bias substantially lowers the performance estimates reported in earlier migraine classification studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar class-dependent assignment of augmentation methods could improve robustness in other medical domains with severe class imbalance, such as rare disease diagnosis.
The focus on average performance across classifiers suggests the method may help avoid model-specific overfitting in clinical machine learning applications.
Testing the framework on datasets with varying numbers of classes or different imbalance ratios would reveal how general the per-class assignment rule is.

Load-bearing premise

The 400-patient dataset after aggregating hemiplegic subtypes is representative of the seven migraine subtypes and that the applied corrections have fully removed data leakage and metric bias without any remaining confounding.

What would settle it

Re-running the exact same framework and evaluation protocol on a new, larger independent collection of migraine patient records would determine if the reported macro-F1 improvements hold or if they were specific to the original dataset's characteristics.

Figures

Figures reproduced from arXiv: 2605.23453 by Elvin Som\'on, Miguel A. Guti\'errez-Naranjo.

read the original abstract

We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 {\S}1.2.3, (ii) a class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, and (iii) the concept of fidelity asymmetry, motivating proportionally constrained growth as an alternative to full class balance. Experiments were performed on a dataset of 400 patients across seven migraine subtypes under a two-stage protocol, including the six-class configuration described above. Models were evaluated using stratified 5-fold cross-validation with macro-averaged F1 as the primary metric. Correcting methodological flaws reduces previously inflated performance estimates, with the corrected macro-F1 baseline standing at 0.71. The proposed framework consistently outperformed individual augmenters in macro-F1 averaged across the eight evaluated classifiers (0.862 vs. 0.836 for Gaussian Copula, 0.815 for CTGAN, and 0.801 for the no-augmentation baseline), and achieved its peak result of 0.914 with FT-Transformer under proportional augmentation. The no-augmentation FT-Transformer baseline (0.896) shows that, at the per-classifier ceiling, clinically motivated class aggregation accounts for most of the absolute improvement; the framework's principal measurable contribution is the gain in average robustness across classifiers, highlighting the dominant role of problem formulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The main takeaway is that clinical subtype aggregation explains most of the lift here, while the class-dependent hybrid augmentation mainly improves average robustness across models.

read the letter

The main thing to know is that the biggest lift comes from aggregating the hemiplegic subtypes per ICHD-3, not from the fancy augmentation. The hybrid class-dependent approach does add some average robustness across the eight classifiers, but the per-model ceiling is already high with just the aggregation. What stands out as new is the idea of assigning different generators (Gaussian Copula, CTGAN) based on per-class sample sizes, plus the fidelity asymmetry framing that favors proportional growth over full balancing. They also did a reproducibility pass on earlier migraine papers, fixing leakage and bias issues, which drops the baseline to 0.71 macro-F1. The paper does a solid job laying out the two-stage protocol and reporting results on 400 patients with 5-fold stratified CV. Showing that no-augmentation FT-Transformer hits 0.896 while the hybrid gets to 0.914 is useful context, and the average across models (0.862 vs 0.801) supports their claim about robustness. The soft spot is the dataset itself. Even after corrections, a single 400-patient cohort from one source leaves open questions about representativeness of the seven subtypes and whether any residual selection effects remain. The abstract claims the corrections eliminate leakage, but without the full methods or code it's tough to judge how thorough that was. The gains look real but modest, and external validation would strengthen it. This is for researchers handling severe imbalance in multiclass medical problems, especially those who value practical templates over theoretical novelty. It is not going to change the field, but it is a careful piece of applied work. I would send it to peer review. The evidence is grounded enough in the reported experiments to merit referee input, even if revisions are needed on the dataset limitations.

Referee Report

2 major / 2 minor

Summary. The paper conducts a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. It introduces (i) clinically motivated aggregation of two hemiplegic subtypes per ICHD-3 §1.2.3, (ii) a class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, and (iii) the concept of fidelity asymmetry motivating proportionally constrained growth. Experiments use a 400-patient dataset across seven migraine subtypes under a two-stage protocol with stratified 5-fold cross-validation and macro-F1 as primary metric. The corrected baseline is 0.71; the framework reports average macro-F1 of 0.862 across eight classifiers (vs. 0.836 Gaussian Copula, 0.815 CTGAN, 0.801 no-augmentation), with peak 0.914 for FT-Transformer under proportional augmentation (no-augmentation FT-Transformer baseline 0.896).

Significance. If the leakage corrections and post-aggregation dataset are free of residual confounding, the work shows that clinically motivated class aggregation accounts for most absolute gains while the hybrid strategy improves average robustness across classifiers. This highlights the value of problem formulation over augmentation alone in severe imbalance settings and supplies concrete, reproducible numbers from a two-stage protocol on a medical dataset.

major comments (2)

[Abstract and Methods] Abstract and Methods (two-stage protocol and leakage corrections): the central claim that the hybrid framework delivers measurable robustness gains (0.862 vs. 0.801 baseline) beyond aggregation rests on the corrected 400-patient dataset being free of residual patient-selection or feature-definition confounding. No patient-level split audit, external cohort, or explicit validation of the corrections is described, which is load-bearing for attributing the delta to the augmentation strategy.
[Results] Results (classifier-averaged macro-F1 and per-classifier baselines): the no-augmentation FT-Transformer result of 0.896 vs. 0.914 peak shows aggregation drives most improvement, yet the average robustness claim (0.862) lacks reported per-fold variance, statistical significance tests, or an ablation isolating each hybrid component from the aggregation step.

minor comments (2)

[Abstract] The LaTeX fragment {§}1.2.3 in the abstract should be rendered as §1.2.3 for readability.
[Abstract] The term 'fidelity asymmetry' is introduced without a concise formal definition or equation in the provided abstract text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on attribution of gains and validation of the corrected dataset. We address each major comment below, with revisions where feasible to improve transparency and rigor while remaining faithful to the conducted experiments.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods (two-stage protocol and leakage corrections): the central claim that the hybrid framework delivers measurable robustness gains (0.862 vs. 0.801 baseline) beyond aggregation rests on the corrected 400-patient dataset being free of residual patient-selection or feature-definition confounding. No patient-level split audit, external cohort, or explicit validation of the corrections is described, which is load-bearing for attributing the delta to the augmentation strategy.

Authors: The manuscript describes the two-stage protocol using stratified 5-fold cross-validation on the 400-patient dataset and specifies the leakage corrections applied to prior studies (data leakage and metric bias). These steps mitigate patient-selection and feature-definition issues within the available data. We agree that an external cohort would provide stronger evidence against residual confounding; no such cohort is available. We will revise the Methods section to expand the explicit description of the patient-level split procedure and the precise correction steps performed, improving transparency without overstating the evidence. revision: partial
Referee: [Results] Results (classifier-averaged macro-F1 and per-classifier baselines): the no-augmentation FT-Transformer result of 0.896 vs. 0.914 peak shows aggregation drives most improvement, yet the average robustness claim (0.862) lacks reported per-fold variance, statistical significance tests, or an ablation isolating each hybrid component from the aggregation step.

Authors: The manuscript already states that aggregation accounts for most absolute gains (explicitly citing the 0.896 no-augmentation FT-Transformer baseline versus the 0.914 peak) while the hybrid framework's main contribution is improved average robustness across classifiers. To strengthen the robustness claim, we will add per-fold variance, paired statistical significance tests, and an ablation that isolates the hybrid augmentation components from the aggregation step in the revised Results section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical CV results independent of augmentation inputs

full rationale

The paper reports an empirical ML study: prior-work corrections, ICHD-3-based subtype aggregation, class-dependent hybrid augmentation, and stratified 5-fold CV evaluation on a 400-patient dataset. Macro-F1 values (0.862 average, 0.914 peak) are computed on held-out folds and do not reduce to any fitted parameter or self-defined quantity by construction. No equations, uniqueness theorems, or self-citations appear as load-bearing premises for the central performance claims. The derivation chain consists of standard data-preprocessing and augmentation steps followed by independent cross-validation; the reported deltas are falsifiable against external cohorts and do not collapse to the augmentation strategy itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are quantified beyond the clinical aggregation rule and the new fidelity-asymmetry framing.

axioms (1)

domain assumption ICHD-3 §1.2.3 provides a clinically valid basis for aggregating the two hemiplegic subtypes.
Invoked to reduce the seven-class problem to six classes.

invented entities (1)

fidelity asymmetry no independent evidence
purpose: Motivates proportionally constrained growth instead of full class balance.
Introduced to justify the proportional augmentation regime.

pith-pipeline@v0.9.0 · 5811 in / 1554 out tokens · 26898 ms · 2026-05-25T04:37:12.229492+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

L. J. Stovner, K. Hagen, M. Linde, T. J. Steiner, The global prevalence ofheadache: anupdate, withanalysisoftheinfluencesofmethodological factors on prevalence estimates, J. Headache Pain 23 (1) (2022) 34

work page 2022
[2]

Ashina, Migraine, The New England Journal of Medicine 383 (19) (2021) 1866–1876.doi:10.1056/NEJMra1915327

M. Ashina, Migraine, The New England Journal of Medicine 383 (19) (2021) 1866–1876.doi:10.1056/NEJMra1915327

work page doi:10.1056/nejmra1915327 2021
[3]

doi:10.1177/0333102417738202

International Headache Society, The international classification of headache disorders, 3rd edition (ichd-3), Cephalalgia 38 (2018) 1–211. doi:10.1177/0333102417738202

work page doi:10.1177/0333102417738202 2018
[4]

Petrušić, R

I. Petrušić, R. Messina, L. Pellesi, et al., Application of machine learning in migraine classification: a call for study design standardization and global collaboration, The Journal of Headache and Pain 26 (1) (2025) 200.doi:10.1186/s10194-025-02134-9

work page doi:10.1186/s10194-025-02134-9 2025
[5]

W. Lee, M. K. Chu, The current role of artificial intelligence in the field of headache disorders, with a focus on migraine: A systemic review, Headache and Pain Research (Feb. 2025)

work page 2025
[6]

Stubberud, H

A. Stubberud, H. Langseth, P. Nachev, M. S. Matharu, E. Tron- vik, Artificial intelligence and headache, Cephalalgia 44 (8) (2024) 3331024241268290

work page 2024
[7]

G. S. Collins, K. G. M. Moons, P. Dhiman, R. D. Riley, A. L. Beam, B. Van Calster, M. Ghassemi, X. Liu, J. B. Reitsma, M. van Smeden, et al., TRIPOD+AI statement: updated guidance for reporting clinical predictionmodelsthatuseregressionormachinelearningmethods, BMJ 385 (2024) e078378.doi:10.1136/bmj-2023-078378

work page doi:10.1136/bmj-2023-078378 2024
[8]

L. Khan, M. Shahreen, A. Qazi, S. J. A. Shah, S. Hussain, H.-T. Chang, Migraine headache (MH) classification using machine learning methods with data augmentation, Scientific Reports 14 (1) (2024) 5180.doi: 10.1038/s41598-024-55874-0

work page doi:10.1038/s41598-024-55874-0 2024
[9]

Reddy, A

A. Reddy, A. Reddy, Migraine triggers, phases, and classification using machine learning models, Front. Neurol. 16 (2025) 1555215

work page 2025
[10]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21 (9) (2009) 1263–1284.doi: 10.1109/TKDE.2008.239

work page doi:10.1109/tkde.2008.239 2009
[11]

D. M. Powers, Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation, Journal of Machine Learning Technologies 2 (1) (2011) 37–63

work page 2011
[12]

Saito, M

T. Saito, M. Rehmsmeier, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets, PLoS One 10 (3) (2015) e0118432

work page 2015
[13]

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357

work page 2002
[14]

Blagus, L.Lusa, SMOTEforhigh-dimensional class-imbalanced data, BMC Bioinformatics 14 (1) (2013) 106

R. Blagus, L.Lusa, SMOTEforhigh-dimensional class-imbalanced data, BMC Bioinformatics 14 (1) (2013) 106

work page 2013
[15]

L. Xu, M. Skoularidou, A. Cuesta-Infante, K. Veeramachaneni, Model- ing tabular data using conditional GAN, CoRR abs/1907.00503 (2019). arXiv:1907.00503. URLhttp://arxiv.org/abs/1907.00503

work page arXiv 1907
[16]

S. Arik, T. Pfister, Tabnet: Attentive interpretable tabular learning, Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 6679–6687

work page 2021
[17]

Somepalli, M

G. Somepalli, M. Goldblum, A. Schwarzschild, M. Bruss, T. Goldstein, Saint: Improved neural networks for tabular data via row attention and contrastive pre-training, in: Advances in Neural Information Processing Systems, Vol. 34, 2021, pp. 23983–23994

work page 2021
[18]

Gorishniy, I

Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, in: Advances in Neural Information Processing Systems, Vol. 34, Curran Associates, Inc., 2021, pp. 18598– 18608. URLhttps://proceedings.neurips.cc/paper/2021/hash/ 9d86d83f925f2149e9edb0ac3b49229c-Abstract.html

work page 2021
[19]

Hollmann, S

N. Hollmann, S. Müller, K. Eggensperger, M. Lindauer, Tabular data: Deep learning is not all you need, Advances in Neural Information Pro- cessing Systems 35 (2022) 644–658

work page 2022
[20]

Borisov, T

V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (2022) 7499–7519.doi: 10.1109/TNNLS.2022.3229161

work page doi:10.1109/tnnls.2022.3229161 2022
[21]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on tabular data?, Advances in Neural Information Processing Systems 35 (2022) 507–520

work page 2022
[22]

Shwartz-Ziv, A

R. Shwartz-Ziv, A. Armon, Tabular data: Deep learning is not all you need, Information Fusion 81 (2022) 84–90.doi:10.1016/j.inffus. 2021.11.011

work page doi:10.1016/j.inffus 2022
[23]

Petrušić, A

I. Petrušić, A. Savić, K. Mitrović, N. Bačanin, G. Sebastianelli, D. Secci, G. Coppola, Machine learning classification meets migraine: recommen- dations for study evaluation, The Journal of Headache and Pain 25 (1) (2024) 215.doi:10.1186/s10194-024-01924-x

work page doi:10.1186/s10194-024-01924-x 2024
[24]

Mosquera, L

C. Mosquera, L. Ferrer, D. H. Milone, D. Luna, E. Ferrante, Class imbal- ance on medical image classification: towards better evaluation practices for discrimination and calibration performance, Eur. Radiol. 34 (12) (2024) 7895–7903

work page 2024
[25]

C. J. Hellín, A. A. Olmedo, A. Valledor, J. Gómez, M. López-Benítez, A. Tayebi, Unraveling the impact of class imbalance on deep-learning models for medical image classification, Appl. Sci. (Basel) 14 (8) (2024) 3419

work page 2024
[26]

Sokolova, G

M. Sokolova, G. Lapalme, A systematic analysis of performance mea- sures for classification tasks, Information Processing & Management 45 (4) (2009) 427–437.doi:10.1016/j.ipm.2009.03.002

work page doi:10.1016/j.ipm.2009.03.002 2009
[27]

H. He, Y. Bai, E. A. Garcia, S. Li, ADASYN: Adaptive synthetic sam- pling approach for imbalanced learning, in: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Com- putational Intelligence), IEEE, 2008, pp. 1322–1328

work page 2008
[28]

Han, W.-Y

H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A new over- sampling method in imbalanced data sets learning, in: Lecture Notes in Computer Science, Lecture Notes in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 878–887

work page 2005
[29]

A. S. Tarawneh, A. B. Hassanat, G. A. Altarawneh, A. Almuhaimeed, Stop oversampling for class imbalance learning: A review, IEEE Access 10 (2022) 47643–47660

work page 2022
[30]

Patki, R

N. Patki, R. Wedge, K. Veeramachaneni, The synthetic data vault, in: 2016 IEEE International Conference on Data Science and Advanced An- alytics (DSAA), 2016, pp. 399–410.doi:10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016
[31]

Fonseca, F

J. Fonseca, F. Bacao, Tabular and latent space synthetic data gen- eration: a literature review, Journal of Big Data 10 (1) (2023) 115. doi:10.1186/s40537-023-00792-7

work page doi:10.1186/s40537-023-00792-7 2023
[32]

Sauber-Cole, T

R. Sauber-Cole, T. M. Khoshgoftaar, The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey, Journal of Big Data 9 (1) (2022) 98.doi:10.1186/s40537-022-00648-6

work page doi:10.1186/s40537-022-00648-6 2022
[33]

Kapoor, A

S. Kapoor, A. Narayanan, Leakage and the reproducibility crisis in machine-learning-based science, Patterns (N. Y.) 4 (9) (2023) 100804

work page 2023
[34]

D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wress- negger, L. Cavallaro, K. Rieck, Pitfalls in machine learning for computer security, Commun. ACM 67 (11) (2024) 104–112

work page 2024
[35]

Lemaître, F

G. Lemaître, F. Nogueira, C. K. Aridas, Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning, Journal of Machine Learning Research 18 (17) (2017) 1–5

work page 2017
[36]

Joseph, H

M. Joseph, H. Raj, Gandalf: Gated adaptive network for deep auto- mated learning of features (2024).arXiv:2207.08548,doi:10.48550/ arXiv.2207.08548

work page arXiv 2024
[37]

J. A. Sáez, J. Luengo, F. Herrera, Evaluating the classifier behavior with noisy data considering performance and robustness, Information Sciences 346–347 (2016) 256–274.doi:10.1016/j.ins.2016.03.050

work page doi:10.1016/j.ins.2016.03.050 2016
[38]

J. A. Sáez, J. Luengo, J. Stefanowski, F. Herrera, SMOTE-IPF: ad- dressing the noisy and borderline examples problem in imbalanced clas- sification by a re-sampling method with filtering, Inf. Sci. 291 (2015) 184–203.doi:10.1016/J.INS.2014.08.051. URLhttps://doi.org/10.1016/j.ins.2014.08.051 Table A.7: Python libraries and classes used for each pipeline c...

work page doi:10.1016/j.ins.2014.08.051 2015

[1] [1]

L. J. Stovner, K. Hagen, M. Linde, T. J. Steiner, The global prevalence ofheadache: anupdate, withanalysisoftheinfluencesofmethodological factors on prevalence estimates, J. Headache Pain 23 (1) (2022) 34

work page 2022

[2] [2]

Ashina, Migraine, The New England Journal of Medicine 383 (19) (2021) 1866–1876.doi:10.1056/NEJMra1915327

M. Ashina, Migraine, The New England Journal of Medicine 383 (19) (2021) 1866–1876.doi:10.1056/NEJMra1915327

work page doi:10.1056/nejmra1915327 2021

[3] [3]

doi:10.1177/0333102417738202

International Headache Society, The international classification of headache disorders, 3rd edition (ichd-3), Cephalalgia 38 (2018) 1–211. doi:10.1177/0333102417738202

work page doi:10.1177/0333102417738202 2018

[4] [4]

Petrušić, R

I. Petrušić, R. Messina, L. Pellesi, et al., Application of machine learning in migraine classification: a call for study design standardization and global collaboration, The Journal of Headache and Pain 26 (1) (2025) 200.doi:10.1186/s10194-025-02134-9

work page doi:10.1186/s10194-025-02134-9 2025

[5] [5]

W. Lee, M. K. Chu, The current role of artificial intelligence in the field of headache disorders, with a focus on migraine: A systemic review, Headache and Pain Research (Feb. 2025)

work page 2025

[6] [6]

Stubberud, H

A. Stubberud, H. Langseth, P. Nachev, M. S. Matharu, E. Tron- vik, Artificial intelligence and headache, Cephalalgia 44 (8) (2024) 3331024241268290

work page 2024

[7] [7]

G. S. Collins, K. G. M. Moons, P. Dhiman, R. D. Riley, A. L. Beam, B. Van Calster, M. Ghassemi, X. Liu, J. B. Reitsma, M. van Smeden, et al., TRIPOD+AI statement: updated guidance for reporting clinical predictionmodelsthatuseregressionormachinelearningmethods, BMJ 385 (2024) e078378.doi:10.1136/bmj-2023-078378

work page doi:10.1136/bmj-2023-078378 2024

[8] [8]

L. Khan, M. Shahreen, A. Qazi, S. J. A. Shah, S. Hussain, H.-T. Chang, Migraine headache (MH) classification using machine learning methods with data augmentation, Scientific Reports 14 (1) (2024) 5180.doi: 10.1038/s41598-024-55874-0

work page doi:10.1038/s41598-024-55874-0 2024

[9] [9]

Reddy, A

A. Reddy, A. Reddy, Migraine triggers, phases, and classification using machine learning models, Front. Neurol. 16 (2025) 1555215

work page 2025

[10] [10]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21 (9) (2009) 1263–1284.doi: 10.1109/TKDE.2008.239

work page doi:10.1109/tkde.2008.239 2009

[11] [11]

D. M. Powers, Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation, Journal of Machine Learning Technologies 2 (1) (2011) 37–63

work page 2011

[12] [12]

Saito, M

T. Saito, M. Rehmsmeier, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets, PLoS One 10 (3) (2015) e0118432

work page 2015

[13] [13]

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357

work page 2002

[14] [14]

Blagus, L.Lusa, SMOTEforhigh-dimensional class-imbalanced data, BMC Bioinformatics 14 (1) (2013) 106

R. Blagus, L.Lusa, SMOTEforhigh-dimensional class-imbalanced data, BMC Bioinformatics 14 (1) (2013) 106

work page 2013

[15] [15]

L. Xu, M. Skoularidou, A. Cuesta-Infante, K. Veeramachaneni, Model- ing tabular data using conditional GAN, CoRR abs/1907.00503 (2019). arXiv:1907.00503. URLhttp://arxiv.org/abs/1907.00503

work page arXiv 1907

[16] [16]

S. Arik, T. Pfister, Tabnet: Attentive interpretable tabular learning, Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 6679–6687

work page 2021

[17] [17]

Somepalli, M

G. Somepalli, M. Goldblum, A. Schwarzschild, M. Bruss, T. Goldstein, Saint: Improved neural networks for tabular data via row attention and contrastive pre-training, in: Advances in Neural Information Processing Systems, Vol. 34, 2021, pp. 23983–23994

work page 2021

[18] [18]

Gorishniy, I

Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, in: Advances in Neural Information Processing Systems, Vol. 34, Curran Associates, Inc., 2021, pp. 18598– 18608. URLhttps://proceedings.neurips.cc/paper/2021/hash/ 9d86d83f925f2149e9edb0ac3b49229c-Abstract.html

work page 2021

[19] [19]

Hollmann, S

N. Hollmann, S. Müller, K. Eggensperger, M. Lindauer, Tabular data: Deep learning is not all you need, Advances in Neural Information Pro- cessing Systems 35 (2022) 644–658

work page 2022

[20] [20]

Borisov, T

V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (2022) 7499–7519.doi: 10.1109/TNNLS.2022.3229161

work page doi:10.1109/tnnls.2022.3229161 2022

[21] [21]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on tabular data?, Advances in Neural Information Processing Systems 35 (2022) 507–520

work page 2022

[22] [22]

Shwartz-Ziv, A

R. Shwartz-Ziv, A. Armon, Tabular data: Deep learning is not all you need, Information Fusion 81 (2022) 84–90.doi:10.1016/j.inffus. 2021.11.011

work page doi:10.1016/j.inffus 2022

[23] [23]

Petrušić, A

I. Petrušić, A. Savić, K. Mitrović, N. Bačanin, G. Sebastianelli, D. Secci, G. Coppola, Machine learning classification meets migraine: recommen- dations for study evaluation, The Journal of Headache and Pain 25 (1) (2024) 215.doi:10.1186/s10194-024-01924-x

work page doi:10.1186/s10194-024-01924-x 2024

[24] [24]

Mosquera, L

C. Mosquera, L. Ferrer, D. H. Milone, D. Luna, E. Ferrante, Class imbal- ance on medical image classification: towards better evaluation practices for discrimination and calibration performance, Eur. Radiol. 34 (12) (2024) 7895–7903

work page 2024

[25] [25]

C. J. Hellín, A. A. Olmedo, A. Valledor, J. Gómez, M. López-Benítez, A. Tayebi, Unraveling the impact of class imbalance on deep-learning models for medical image classification, Appl. Sci. (Basel) 14 (8) (2024) 3419

work page 2024

[26] [26]

Sokolova, G

M. Sokolova, G. Lapalme, A systematic analysis of performance mea- sures for classification tasks, Information Processing & Management 45 (4) (2009) 427–437.doi:10.1016/j.ipm.2009.03.002

work page doi:10.1016/j.ipm.2009.03.002 2009

[27] [27]

H. He, Y. Bai, E. A. Garcia, S. Li, ADASYN: Adaptive synthetic sam- pling approach for imbalanced learning, in: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Com- putational Intelligence), IEEE, 2008, pp. 1322–1328

work page 2008

[28] [28]

Han, W.-Y

H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A new over- sampling method in imbalanced data sets learning, in: Lecture Notes in Computer Science, Lecture Notes in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 878–887

work page 2005

[29] [29]

A. S. Tarawneh, A. B. Hassanat, G. A. Altarawneh, A. Almuhaimeed, Stop oversampling for class imbalance learning: A review, IEEE Access 10 (2022) 47643–47660

work page 2022

[30] [30]

Patki, R

N. Patki, R. Wedge, K. Veeramachaneni, The synthetic data vault, in: 2016 IEEE International Conference on Data Science and Advanced An- alytics (DSAA), 2016, pp. 399–410.doi:10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016

[31] [31]

Fonseca, F

J. Fonseca, F. Bacao, Tabular and latent space synthetic data gen- eration: a literature review, Journal of Big Data 10 (1) (2023) 115. doi:10.1186/s40537-023-00792-7

work page doi:10.1186/s40537-023-00792-7 2023

[32] [32]

Sauber-Cole, T

R. Sauber-Cole, T. M. Khoshgoftaar, The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey, Journal of Big Data 9 (1) (2022) 98.doi:10.1186/s40537-022-00648-6

work page doi:10.1186/s40537-022-00648-6 2022

[33] [33]

Kapoor, A

S. Kapoor, A. Narayanan, Leakage and the reproducibility crisis in machine-learning-based science, Patterns (N. Y.) 4 (9) (2023) 100804

work page 2023

[34] [34]

D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wress- negger, L. Cavallaro, K. Rieck, Pitfalls in machine learning for computer security, Commun. ACM 67 (11) (2024) 104–112

work page 2024

[35] [35]

Lemaître, F

G. Lemaître, F. Nogueira, C. K. Aridas, Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning, Journal of Machine Learning Research 18 (17) (2017) 1–5

work page 2017

[36] [36]

Joseph, H

M. Joseph, H. Raj, Gandalf: Gated adaptive network for deep auto- mated learning of features (2024).arXiv:2207.08548,doi:10.48550/ arXiv.2207.08548

work page arXiv 2024

[37] [37]

J. A. Sáez, J. Luengo, F. Herrera, Evaluating the classifier behavior with noisy data considering performance and robustness, Information Sciences 346–347 (2016) 256–274.doi:10.1016/j.ins.2016.03.050

work page doi:10.1016/j.ins.2016.03.050 2016

[38] [38]

J. A. Sáez, J. Luengo, J. Stefanowski, F. Herrera, SMOTE-IPF: ad- dressing the noisy and borderline examples problem in imbalanced clas- sification by a re-sampling method with filtering, Inf. Sci. 291 (2015) 184–203.doi:10.1016/J.INS.2014.08.051. URLhttps://doi.org/10.1016/j.ins.2014.08.051 Table A.7: Python libraries and classes used for each pipeline c...

work page doi:10.1016/j.ins.2014.08.051 2015