Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Abolfazl Nikahd; Alireza Orouji; Behnam Yousefimehr; Javad Fazli; Mahdi Razi Gandomani; Mehdi Ghatee; Mohammad Amin Seifi; Negin Sadat Mousavi; Ramtin Mahmoudi Kashani; Sajed Tavakoli

arxiv: 2505.13518 · v2 · submitted 2025-05-17 · 📊 stat.ML · cs.AI· cs.LG

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Behnam Yousefimehr , Mehdi Ghatee , Javad Fazli , Shervin Ghaffari , Zahra Rafei , Mohammad Amin Seifi , Sajed Tavakoli , Abolfazl Nikahd

show 5 more authors

Mahdi Razi Gandomani Alireza Orouji Ramtin Mahmoudi Kashani Sarina Heshmati Negin Sadat Mousavi

This is my paper

Pith reviewed 2026-05-22 14:33 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords imbalanced datasetsdata balancingSMOTEoversamplingundersamplinggenerative modelsmachine learningsystematic survey

0 comments

The pith

No single data balancing method works best for every imbalanced dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews techniques for fixing class imbalance, where one category dominates the data and skews model predictions. It organizes methods from basic oversampling such as SMOTE and its variants through generative models like GANs, VAEs, and diffusion models to undersampling, hybrids, and ensembles. The review weighs each approach's assumptions, how it operates, and its fit for high-dimensional data, mixed features, overlap, or noise. Its central result is that selection must match the dataset traits, classifier, and metrics rather than relying on any default technique.

Core claim

The paper establishes that data balancing strategies vary widely in effectiveness, with no method universally superior. Success instead depends on matching the technique to dataset characteristics such as dimensionality, feature types, class overlap, and noise, together with the classifier and chosen evaluation metrics.

What carries the argument

Systematic categorization and critical analysis of resampling, augmentation, and generative balancing methods, assessing their assumptions and suitability for varied data conditions.

If this is right

Hybrid combinations like SMOTE with Tomek Links or ENN can reduce both imbalance and noise at once.
Generative models including diffusion approaches enable creation of realistic minority-class samples beyond traditional interpolation.
Ensemble strategies such as SMOTEBoost and Balanced Random Forest add robustness when paired with balancing steps.
Specialized variants are required for multi-label or clustered imbalance settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future tools could automate balancing-method selection by profiling dataset properties first.
Benchmarks for imbalanced learning should test methods across diverse data regimes instead of fixed suites.
Foundation models may need balancing adaptations that preserve their pre-trained distributions.

Load-bearing premise

The review assumes its selection and grouping of methods accurately captures the current literature without major omissions or selection bias.

What would settle it

A controlled study in which one specific method, such as a diffusion-based oversampler, outperforms all others across a broad collection of datasets, classifiers, and metrics would challenge the central finding.

Figures

Figures reproduced from arXiv: 2505.13518 by Abolfazl Nikahd, Alireza Orouji, Behnam Yousefimehr, Javad Fazli, Mahdi Razi Gandomani, Mehdi Ghatee, Mohammad Amin Seifi, Negin Sadat Mousavi, Ramtin Mahmoudi Kashani, Sajed Tavakoli, Sarina Heshmati, Shervin Ghaffari, Zahra Rafei.

**Figure 2.** Figure 2: Taxonomy of data resampling and augmentation techniques for imbalanced learning. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Decision flow for selecting ensemble strategies based on data characteristics and computational budget. [PITH_FULL_IMAGE:figures/full_fig_p045_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of future research directions in resampling and augmentation for imbalanced learning. [PITH_FULL_IMAGE:figures/full_fig_p061_4.png] view at source ↗

read the original abstract

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a comprehensive, systematic review of data balancing methods, extending beyond foundational oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants (e.g., Borderline SMOTE, K-Means SMOTE, and Safe-Level SMOTE) to encompass advanced adaptive methods (MWMOTE, AMDO), deep generative models (generative adversarial networks, variational autoencoders, and diffusion models), undersampling techniques (NearMiss, Tomek Links), combination/hybrid methods (SMOTE-ENN, SMOTE-Tomek, and SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest, and One-Sided Selection), and specialized approaches for multi-label and clustered data. Beyond descriptive categorization, this review critically examines each method's underlying assumptions, operational mechanisms, and suitability for diverse data characteristics, including high dimensionality, mixed feature types, class overlap, and noise. Key findings demonstrate that no single method universally outperforms others; optimal selection depends critically on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes by identifying emerging research directions, including self-supervised learning for imbalance, diffusion-based generative oversampling, distribution-preserving resampling, knowledge distillation for imbalanced deployment, and the adaptation of foundation models to skewed distributions, offering practical guidelines for practitioners and a roadmap for future methodological development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A useful survey that organizes balancing methods and rightly stresses that no single technique wins across all datasets.

read the letter

Hi, this survey pulls together resampling and augmentation approaches for imbalanced data. The core takeaway is that performance depends on dataset traits, the classifier, and the metrics, with no universal best method. That conclusion follows from the reviewed literature and matches what shows up in practice. The paper covers the expected ground from SMOTE variants through adaptive methods, then adds sections on GANs, VAEs, diffusion models, hybrids, ensembles, and handling for multi-label or clustered cases. It goes further than a simple list by looking at assumptions, mechanisms, and fit for high-dimensional data, overlap, and noise. That analysis gives readers a clearer sense of trade-offs. The forward-looking part on self-supervised learning, diffusion oversampling, and foundation-model adaptations is a reasonable way to close the review. The synthesis is solid and the scope described in the abstract looks broad enough to be helpful. The main limitation is the standard one for surveys: completeness of coverage and depth of critique can vary, and any selection bias in the papers chosen would affect the strength of the practical guidelines. No new derivations or experiments appear, so the value sits in the organization rather than fresh results. This is the sort of reference that helps applied people in healthcare or finance pick techniques without starting from scratch. It deserves peer review so referees can check the reference list and suggest places where the critique could be tightened.

Referee Report

0 major / 3 minor

Summary. The manuscript is a systematic survey of data balancing methods for imbalanced datasets in machine learning. It reviews foundational oversampling techniques such as SMOTE and its variants (Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE), advanced adaptive methods (MWMOTE, AMDO), deep generative models (GANs, VAEs, diffusion models), undersampling (NearMiss, Tomek Links), hybrid/combination methods (SMOTE-ENN, SMOTE-Tomek, SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest), and specialized approaches for multi-label and clustered data. Beyond categorization, the paper critically examines each method's assumptions, operational mechanisms, and suitability for data characteristics including high dimensionality, mixed features, class overlap, and noise. The central claim is that no single method universally outperforms others and that optimal selection depends on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes with emerging research directions and practical guidelines.

Significance. If the synthesis holds, the survey provides a structured, critical overview of the imbalanced data literature that can serve as a reference for practitioners selecting methods and for researchers identifying gaps. The emphasis on context-dependence and the examination of assumptions/mechanisms for each family of techniques adds utility beyond a descriptive list. Explicit discussion of future directions such as diffusion-based oversampling and foundation-model adaptation supplies a clear roadmap.

minor comments (3)

The abstract lists many method families and examples; a shorter version that foregrounds the critical-examination contribution and the context-dependence finding would improve readability while retaining completeness.
Section headings and subsection numbering should be checked for consistency when moving from the SMOTE-variants discussion to the deep-generative-models section; some readers may lose the thread between families.
A small number of cited works on diffusion models for oversampling appear only in the future-directions paragraph; moving one or two representative references into the main generative-models section would strengthen the coverage claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and detailed summary of our manuscript, as well as the recommendation for minor revision. We appreciate the recognition of the survey's critical examination of assumptions, mechanisms, and context-dependent performance of balancing methods. We will incorporate minor improvements to enhance clarity and completeness in the revised version.

Circularity Check

0 steps flagged

No significant circularity in survey synthesis

full rationale

This paper is a systematic literature survey that categorizes, describes, and critically examines existing data balancing methods drawn from external sources. It introduces no new mathematical derivations, parameter fittings, or empirical predictions whose outputs reduce by construction to its own inputs. The key finding that no single method universally outperforms others is presented as a qualitative synthesis of the reviewed literature, with explicit dependence on dataset characteristics and classifier choice already flagged in the abstract and scope. No self-definitional steps, fitted inputs relabeled as predictions, load-bearing self-citations forming closed loops, or ansatzes smuggled via prior author work are present. The argument remains self-contained against external benchmarks in the cited studies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new free parameters, axioms, or invented entities; it aggregates and evaluates techniques from the existing machine learning literature on class imbalance.

pith-pipeline@v0.9.0 · 5881 in / 1064 out tokens · 58747 ms · 2026-05-22T14:33:57.245887+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Key findings demonstrate that no single method universally outperforms others; optimal selection depends critically on dataset characteristics, classifier choice, and evaluation metrics.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The paper provides a comprehensive, systematic review of data balancing methods... extending beyond foundational oversampling techniques such as SMOTE and its variants to encompass... generative models... ensemble strategies...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

136 extracted references · 136 canonical work pages · 4 internal anchors

[1]

A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

Miftahushudur, T.; Sahin, H.M.; Grieve, B.; Yin, H. A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

work page 2025
[2]

A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J

Yousefimehr, B.; Ghatee, M. A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J. Math. Comput.2026,7, 85–116

work page 2026
[3]

A comprehensive survey on imbalanced data learning.Front

Gao, X.; Xie, D.; Zhang, Y .; Wang, Z.; Chen, C.; He, C.; Yin, H.; Zhang, W. A comprehensive survey on imbalanced data learning.Front. Comput. Sci.2026,20, 2011622

work page 2026
[4]

SMOTE: Synthetic minority over-sampling tech- nique.J

Chawla, N.V .; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling tech- nique.J. Artif. Intell. Res.2002,16, 321–357

work page 2002
[5]

Learning from imbalanced data.IEEE Trans

He, H.; Garcia, E.A. Learning from imbalanced data.IEEE Trans. Knowl. Data Eng.2009,21, 1263–1284

work page 2009
[6]

Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

work page
[7]

OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

Parrales-Bravo, F.; Caicedo-Quiroz, R.; Tolozano-Benitez, E.; G ´omez-Rodr´ıguez, V .; Cevallos-Torres, L.; Charco-Aguirre, J.; Vasquez-Cevallos, L. OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

work page
[8]

Generative adversarial nets.Adv

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y . Generative adversarial nets.Adv. Neural Inf. Process. Syst.2014,27

work page 2014
[9]

Auto-Encoding Variational Bayes

Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes.arXiv2022, arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Improving committee diagnosis with resampling techniques.Adv

Parmanto, B.; Munro, P.; Doyle, H. Improving committee diagnosis with resampling techniques.Adv. Neural Inf. Process. Syst.1995,8. 66 APREPRINT- APRIL30, 2026

work page 1995
[11]

A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci

Salehi, A.R.; Khedmati, M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci. Rep.2024,14, 5152

work page 2024
[12]

Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

Hou, G.; Tong, D.L.; Liew, S.Y .; Choo, P.Y . Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

work page
[13]

BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71

Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.bmj2021,372ttps://doi.org/10.1136/bmj.n71

work page doi:10.1136/bmj.n71 2020
[14]

A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain

Yousefimehr, B.; Ghatee, M.; Ghaffari, S.; Arasteh, A.; Ahmadi, P.; Ghane, A.; Esnaasharieh, S. A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain. Comput.2026, 56, 101076

work page 2019
[15]

A new measure of rank correlation.Biometrika1938,30, 81–93

Kendall, M.G. A new measure of rank correlation.Biometrika1938,30, 81–93

work page
[16]

How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

Cinelli, M.; Kadzi ´nski, M.; Gonzalez, M.; Słowi ´nski, R. How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

work page
[17]

Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S.Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008

work page 2008
[18]

An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans

Azhar, N.A.; Pozi, M.S.M.; Din, A.M.; Jatowt, A. An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans. Knowl. Data Eng.2023,35, 6651–6672.https://doi.org/10.1109/TKDE.2022.3179381

work page doi:10.1109/tkde.2022.3179381 2023
[19]

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

Sakho, A.; Malherbe, E.; Scornet, E. Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

work page arXiv
[20]

A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

Glazkova, A. A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

work page arXiv 2020
[21]

Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

Kachan, O.; Savchenko, A.; Gusev, G. Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; pp. 625–635

work page 2025
[22]

An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl

Kov ´acs, G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl. Soft Comput.2019,83, 105662

work page 2019
[23]

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst

Nekooeimehr, I.; Lai-Yuen, S.K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst. Appl.2016,46, 405–416

work page 2016
[24]

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

He, H.; Bai, Y .; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 1322–1328

work page 2008
[25]

Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

Tao, X.; Guo, X.; Zheng, Y .; Zhang, X.; Chen, Z. Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

work page 2023
[26]

Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

Wang, X.; Xu, J.; Zeng, T.; Jing, L. Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

work page
[27]

AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf

Guan, S.; Zhao, X.; Xue, Y .; Pan, H. AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf. Sci.2024,663, 120311

work page 2024
[28]

Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans

Ren, S.; Ding, J.; Cheung, Y .m. Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2026,38, 983–996

work page 2026
[29]

B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans

Wang, X.; Wang, C.; Wang, M.; Liu, J.; Guan, X. B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2025,37, 5808–5822. https://doi.org/10.1109/TKDE.2025.3593637

work page doi:10.1109/tkde.2025.3593637 2025
[30]

A survey on explainable artificial intel- ligence (xai): Toward medical xai

Dablain, D.; Krawczyk, B.; Chawla, N.V . DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data.IEEE Trans. Neural Netw. Learn. Syst.2023,34, 6390–6404.https://doi.org/10.1109/TNNLS. 2021.3136503

work page doi:10.1109/tnnls 2023
[31]

Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst

Engelmann, J.; Lessmann, S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst. Appl.2021,174, 114582

work page 2021
[32]

CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl

Wang, A.X.; Le, M.Q.; Duong, H.T.; Van, B.N.; Nguyen, B.P. CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl. Inf. Syst.2025,67, 5335–5354

work page 2025
[33]

RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans

Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans. Syst. Man-Cybern.-Part A Syst. Hum.2009,40, 185–197. 67 APREPRINT- APRIL30, 2026

work page 2009
[34]

Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

Gurcan, F.; Soylu, A. Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

work page
[35]

A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans

Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)2011,42, 463–484

work page 2011
[36]

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

work page doi:10.1016/j.patcog.2013.05.006 2013
[37]

Measuring agreement in method comparison studies.Stat

Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies.Stat. Methods Med. Res.1999, 8, 135–160

work page 1999
[38]

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf

Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf. Sci.2018,465, 1–20.https://doi.org/10.1016/j.ins.2018.06.056

work page doi:10.1016/j.ins.2018.06.056 2018
[39]

Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem

Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidel- berg, Germany, 2009; pp. 475–482

work page 2009
[40]

Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog

Garc ´ıa, V .; S´anchez, J.; Mart ´ın F´elez, R.; Mollineda, R. Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog. Artif. Intell.2012,1, 347–362.https://doi.org/10.1007/ s13748-012-0027-5

work page 2012
[41]

A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit

Chaudhuri, B. A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit. Lett. 1996,17, 11–17.https://doi.org/10.1016/0167-8655(95)00093-3

work page doi:10.1016/0167-8655(95)00093-3 1996
[42]

673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

S ´anchez, J.; Marqu ´es, A.Enhanced Neighbourhood Specifications for Pattern Classification; Springer: Berlin/Heidelberg, Germany, 2003; pp. 673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

work page doi:10.1007/978-1-4613-0231-5_26 2003
[43]

Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach

Azizian, A.; Yousefimehr, B.; Ghatee, M. Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach. In Proceedings of the 2024 8th International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Mashhad, Iran, 14–15 May 2024; pp. 94–99.https://doi. org/10.1109/SCIoT62588.2024.10570108

work page doi:10.1109/sciot62588.2024.10570108 2024
[44]

MSMOTE: Improving Classification Performance When Training Data is Imbalanced

Hu, S.; Liang, Y .; Ma, L.; He, Y . MSMOTE: Improving Classification Performance When Training Data is Imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; V olume 2, pp. 13–17.https://doi.org/10.1109/WCSE.2009.756

work page doi:10.1109/wcse.2009.756 2009
[45]

Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

Han, H.; Wang, W.; Mao, B. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005

work page 2005
[46]

Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

Zhang, A.; Yu, H.; Zhou, S.; Huan, Z.; Yang, X. Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

work page doi:10.1016/j.knosys.2022.108919 2022
[47]

AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci

Wang, J.B.; Zou, C.A. AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci. Program.2021,2021, 1–18.https://doi.org/10.1155/2021/9947621

work page doi:10.1155/2021/9947621 2021
[48]

ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell

Yi, X.; Xu, Y .; Hu, Q.; Krishnamoorthy, S.; Li, W.; Tang, Z. ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell. Syst.2022,8, 2247–2272

work page 2022
[49]

A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int

Hussein, A.S.; Li, T.; Yohannese, C.W.; Bashir, K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int. J. Comput. Intell. Syst.2019,12, 1412–1422

work page 2019
[50]

SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf

S ´aez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf. Sci.2015,291, 184–203. https://doi.org/10.1016/j.ins.2014.08.051

work page doi:10.1016/j.ins.2014.08.051 2015
[51]

Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

Napierała, K.; Stefanowski, J.; Wilk, S. Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. InProceedings of the Rough Sets and Current Trends in Computing; Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 158–167

work page 2010
[52]

FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

Maldonado, S.; Vairetti, C.; Fernandez, A.; Herrera, F. FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

work page 2022
[53]

IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans

Maldonado, S.; Merig ´o, J.; Miranda, J. IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans. Fuzzy Syst.2020,28, 2143–2150.https://doi.org/10.1109/ TFUZZ.2019.2930942. 68 APREPRINT- APRIL30, 2026

work page arXiv 2020
[54]

An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl

Meng, D.; Li, Y . An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl. Soft Comput.2022,120, 108618

work page 2022
[55]

A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int

Salunkhe, U.R.; Mali, S.N. A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int. J. Intell. Syst. Appl.2018,10, 71

work page 2018
[56]

Applying support vector machines to imbalanced datasets

Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. InProceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004; Proceedings 15; Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50

work page 2004
[57]

Support-vector networks.Mach

Cortes, C.; Vapnik, V . Support-vector networks.Mach. Learn.1995,20, 273–297.https://doi.org/10. 1007/BF00994018

work page 1995
[58]

ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning

Ibrahim, M.H. ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput. Appl.2021,33, 15781–15806.https://doi.org/10.1007/s00521-021-06198-x

work page doi:10.1007/s00521-021-06198-x 2021
[59]

MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans

Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans. Knowl. Data Eng.2012,26, 405–425

work page 2012
[60]

AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans

Yang, X.; Kuang, Q.; Zhang, W.; Zhang, G. AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans. Knowl. Data Eng.2018,30, 1672–1685.https://doi.org/10.1109/TKDE.2017. 2761347

work page doi:10.1109/tkde.2017 2018
[61]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf

Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf. Sci.2023,622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139

work page doi:10.1016/j.ins.2022.11.139 2023
[62]

Wasserstein generative adversarial networks

Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 214– 223

work page 2017
[63]

Conditional Generative Adversarial Nets

Mirza, M.; Osindero, S. Conditional generative adversarial nets.arXiv2014, arXiv:1411.1784

work page internal anchor Pith review Pith/arXiv arXiv
[64]

Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens

Ahmadian, R.; Ghatee, M.; Wahlstr ¨om, J. Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens. J.2022,22, 6879–6886

work page 2022
[65]

Generative moment matching networks

Li, Y .; Swersky, K.; Zemel, R. Generative moment matching networks. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1718–1727

work page 2015
[66]

The synthetic data vault

Patki, N.; Wedge, R.; Veeramachaneni, K. The synthetic data vault. In Proceedings of the 2016 IEEE Interna- tional Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 399–410

work page 2016
[67]

GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach

Naglik, I.; Lango, M. GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach. Learn.2024,113, 5183–5202.https://doi.org/10.1007/ s10994-023-06416-8

work page 2024
[68]

Denoising diffusion probabilistic models.Adv

Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models.Adv. Neural Inf. Process. Syst.2020, 33, 6840–6851

work page 2020
[69]

CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

Suh, S.; Lee, H.; Lukowicz, P.; Lee, Y .O. CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

work page 2021
[70]

Wasserstein generative adversarial network with gradient penalty for handwritten digit generation

Wu, J.; Li, W.; Wu, Y .; Qiu, S. Wasserstein generative adversarial network with gradient penalty for handwritten digit generation. In Proceedings of the 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC), Guangzhou, China, 29 November–1 December 2024; pp. 375–379

work page 2024
[71]

ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl

Huang, K.; Wang, X. ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl. Intell.2022,52, 2838–2853

work page 2022
[72]

Inference suboptimality in variational autoencoders

Cremer, C.; Li, X.; Duvenaud, D. Inference suboptimality in variational autoencoders. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm Sweden, 10–15 July 2018; pp. 1078–1086

work page 2018
[73]

Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

Geng, C.; Wang, J.; Chen, L.; Gao, Z. Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

work page
[74]

Improved training of wasserstein gans

Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V .; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst.2017,30

work page 2017
[75]

Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur

Xu, Y .; Zhang, X.; Qiu, Z.; Zhang, X.; Qiu, J.; Zhang, H. Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur. Commun. Netw.2021,2021, 9206440. 69 APREPRINT- APRIL30, 2026

work page 2021
[76]

Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference

Bharti, A.; Naslidnyk, M.; Key, O.; Kaski, S.; Briol, F.X. Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 2289–2312

work page 2023
[77]

Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst

Yen, S.J.; Lee, Y .S. Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst. Appl.2009,36, 5718–5727.https://doi.org/10.1016/j.eswa.2008.06.108

work page doi:10.1016/j.eswa.2008.06.108 2009
[78]

Nearmiss under sampling for imbalanced dataset classification

Zhang, J.; Mani, I.; Lin, K. Nearmiss under sampling for imbalanced dataset classification. In Proceedings of the ICML Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 July 2003

work page 2003
[79]

Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect

Yen, S.; Lee, Y . Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect. Notes Control. Inf. Sci.2006,344, 731

work page 2006
[80]

An instance level analysis of data complexity.Mach

Smith, M.R.; Martinez, T.; Giraud-Carrier, C. An instance level analysis of data complexity.Mach. Learn. 2014,95, 225–256

work page 2014

Showing first 80 references.

[1] [1]

A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

Miftahushudur, T.; Sahin, H.M.; Grieve, B.; Yin, H. A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

work page 2025

[2] [2]

A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J

Yousefimehr, B.; Ghatee, M. A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J. Math. Comput.2026,7, 85–116

work page 2026

[3] [3]

A comprehensive survey on imbalanced data learning.Front

Gao, X.; Xie, D.; Zhang, Y .; Wang, Z.; Chen, C.; He, C.; Yin, H.; Zhang, W. A comprehensive survey on imbalanced data learning.Front. Comput. Sci.2026,20, 2011622

work page 2026

[4] [4]

SMOTE: Synthetic minority over-sampling tech- nique.J

Chawla, N.V .; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling tech- nique.J. Artif. Intell. Res.2002,16, 321–357

work page 2002

[5] [5]

Learning from imbalanced data.IEEE Trans

He, H.; Garcia, E.A. Learning from imbalanced data.IEEE Trans. Knowl. Data Eng.2009,21, 1263–1284

work page 2009

[6] [6]

Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

work page

[7] [7]

OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

Parrales-Bravo, F.; Caicedo-Quiroz, R.; Tolozano-Benitez, E.; G ´omez-Rodr´ıguez, V .; Cevallos-Torres, L.; Charco-Aguirre, J.; Vasquez-Cevallos, L. OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

work page

[8] [8]

Generative adversarial nets.Adv

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y . Generative adversarial nets.Adv. Neural Inf. Process. Syst.2014,27

work page 2014

[9] [9]

Auto-Encoding Variational Bayes

Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes.arXiv2022, arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Improving committee diagnosis with resampling techniques.Adv

Parmanto, B.; Munro, P.; Doyle, H. Improving committee diagnosis with resampling techniques.Adv. Neural Inf. Process. Syst.1995,8. 66 APREPRINT- APRIL30, 2026

work page 1995

[11] [11]

A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci

Salehi, A.R.; Khedmati, M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci. Rep.2024,14, 5152

work page 2024

[12] [12]

Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

Hou, G.; Tong, D.L.; Liew, S.Y .; Choo, P.Y . Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

work page

[13] [13]

BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71

Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.bmj2021,372ttps://doi.org/10.1136/bmj.n71

work page doi:10.1136/bmj.n71 2020

[14] [14]

A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain

Yousefimehr, B.; Ghatee, M.; Ghaffari, S.; Arasteh, A.; Ahmadi, P.; Ghane, A.; Esnaasharieh, S. A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain. Comput.2026, 56, 101076

work page 2019

[15] [15]

A new measure of rank correlation.Biometrika1938,30, 81–93

Kendall, M.G. A new measure of rank correlation.Biometrika1938,30, 81–93

work page

[16] [16]

How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

Cinelli, M.; Kadzi ´nski, M.; Gonzalez, M.; Słowi ´nski, R. How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

work page

[17] [17]

Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S.Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008

work page 2008

[18] [18]

An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans

Azhar, N.A.; Pozi, M.S.M.; Din, A.M.; Jatowt, A. An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans. Knowl. Data Eng.2023,35, 6651–6672.https://doi.org/10.1109/TKDE.2022.3179381

work page doi:10.1109/tkde.2022.3179381 2023

[19] [19]

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

Sakho, A.; Malherbe, E.; Scornet, E. Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

work page arXiv

[20] [20]

A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

Glazkova, A. A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

work page arXiv 2020

[21] [21]

Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

Kachan, O.; Savchenko, A.; Gusev, G. Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; pp. 625–635

work page 2025

[22] [22]

An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl

Kov ´acs, G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl. Soft Comput.2019,83, 105662

work page 2019

[23] [23]

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst

Nekooeimehr, I.; Lai-Yuen, S.K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst. Appl.2016,46, 405–416

work page 2016

[24] [24]

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

He, H.; Bai, Y .; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 1322–1328

work page 2008

[25] [25]

Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

Tao, X.; Guo, X.; Zheng, Y .; Zhang, X.; Chen, Z. Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

work page 2023

[26] [26]

Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

Wang, X.; Xu, J.; Zeng, T.; Jing, L. Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

work page

[27] [27]

AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf

Guan, S.; Zhao, X.; Xue, Y .; Pan, H. AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf. Sci.2024,663, 120311

work page 2024

[28] [28]

Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans

Ren, S.; Ding, J.; Cheung, Y .m. Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2026,38, 983–996

work page 2026

[29] [29]

B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans

Wang, X.; Wang, C.; Wang, M.; Liu, J.; Guan, X. B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2025,37, 5808–5822. https://doi.org/10.1109/TKDE.2025.3593637

work page doi:10.1109/tkde.2025.3593637 2025

[30] [30]

A survey on explainable artificial intel- ligence (xai): Toward medical xai

Dablain, D.; Krawczyk, B.; Chawla, N.V . DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data.IEEE Trans. Neural Netw. Learn. Syst.2023,34, 6390–6404.https://doi.org/10.1109/TNNLS. 2021.3136503

work page doi:10.1109/tnnls 2023

[31] [31]

Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst

Engelmann, J.; Lessmann, S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst. Appl.2021,174, 114582

work page 2021

[32] [32]

CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl

Wang, A.X.; Le, M.Q.; Duong, H.T.; Van, B.N.; Nguyen, B.P. CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl. Inf. Syst.2025,67, 5335–5354

work page 2025

[33] [33]

RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans

Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans. Syst. Man-Cybern.-Part A Syst. Hum.2009,40, 185–197. 67 APREPRINT- APRIL30, 2026

work page 2009

[34] [34]

Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

Gurcan, F.; Soylu, A. Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

work page

[35] [35]

A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans

Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)2011,42, 463–484

work page 2011

[36] [36]

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

work page doi:10.1016/j.patcog.2013.05.006 2013

[37] [37]

Measuring agreement in method comparison studies.Stat

Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies.Stat. Methods Med. Res.1999, 8, 135–160

work page 1999

[38] [38]

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf

Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf. Sci.2018,465, 1–20.https://doi.org/10.1016/j.ins.2018.06.056

work page doi:10.1016/j.ins.2018.06.056 2018

[39] [39]

Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem

Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidel- berg, Germany, 2009; pp. 475–482

work page 2009

[40] [40]

Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog

Garc ´ıa, V .; S´anchez, J.; Mart ´ın F´elez, R.; Mollineda, R. Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog. Artif. Intell.2012,1, 347–362.https://doi.org/10.1007/ s13748-012-0027-5

work page 2012

[41] [41]

A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit

Chaudhuri, B. A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit. Lett. 1996,17, 11–17.https://doi.org/10.1016/0167-8655(95)00093-3

work page doi:10.1016/0167-8655(95)00093-3 1996

[42] [42]

673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

S ´anchez, J.; Marqu ´es, A.Enhanced Neighbourhood Specifications for Pattern Classification; Springer: Berlin/Heidelberg, Germany, 2003; pp. 673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

work page doi:10.1007/978-1-4613-0231-5_26 2003

[43] [43]

Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach

Azizian, A.; Yousefimehr, B.; Ghatee, M. Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach. In Proceedings of the 2024 8th International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Mashhad, Iran, 14–15 May 2024; pp. 94–99.https://doi. org/10.1109/SCIoT62588.2024.10570108

work page doi:10.1109/sciot62588.2024.10570108 2024

[44] [44]

MSMOTE: Improving Classification Performance When Training Data is Imbalanced

Hu, S.; Liang, Y .; Ma, L.; He, Y . MSMOTE: Improving Classification Performance When Training Data is Imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; V olume 2, pp. 13–17.https://doi.org/10.1109/WCSE.2009.756

work page doi:10.1109/wcse.2009.756 2009

[45] [45]

Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

Han, H.; Wang, W.; Mao, B. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005

work page 2005

[46] [46]

Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

Zhang, A.; Yu, H.; Zhou, S.; Huan, Z.; Yang, X. Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

work page doi:10.1016/j.knosys.2022.108919 2022

[47] [47]

AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci

Wang, J.B.; Zou, C.A. AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci. Program.2021,2021, 1–18.https://doi.org/10.1155/2021/9947621

work page doi:10.1155/2021/9947621 2021

[48] [48]

ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell

Yi, X.; Xu, Y .; Hu, Q.; Krishnamoorthy, S.; Li, W.; Tang, Z. ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell. Syst.2022,8, 2247–2272

work page 2022

[49] [49]

A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int

Hussein, A.S.; Li, T.; Yohannese, C.W.; Bashir, K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int. J. Comput. Intell. Syst.2019,12, 1412–1422

work page 2019

[50] [50]

SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf

S ´aez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf. Sci.2015,291, 184–203. https://doi.org/10.1016/j.ins.2014.08.051

work page doi:10.1016/j.ins.2014.08.051 2015

[51] [51]

Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

Napierała, K.; Stefanowski, J.; Wilk, S. Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. InProceedings of the Rough Sets and Current Trends in Computing; Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 158–167

work page 2010

[52] [52]

FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

Maldonado, S.; Vairetti, C.; Fernandez, A.; Herrera, F. FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

work page 2022

[53] [53]

IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans

Maldonado, S.; Merig ´o, J.; Miranda, J. IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans. Fuzzy Syst.2020,28, 2143–2150.https://doi.org/10.1109/ TFUZZ.2019.2930942. 68 APREPRINT- APRIL30, 2026

work page arXiv 2020

[54] [54]

An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl

Meng, D.; Li, Y . An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl. Soft Comput.2022,120, 108618

work page 2022

[55] [55]

A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int

Salunkhe, U.R.; Mali, S.N. A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int. J. Intell. Syst. Appl.2018,10, 71

work page 2018

[56] [56]

Applying support vector machines to imbalanced datasets

Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. InProceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004; Proceedings 15; Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50

work page 2004

[57] [57]

Support-vector networks.Mach

Cortes, C.; Vapnik, V . Support-vector networks.Mach. Learn.1995,20, 273–297.https://doi.org/10. 1007/BF00994018

work page 1995

[58] [58]

ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning

Ibrahim, M.H. ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput. Appl.2021,33, 15781–15806.https://doi.org/10.1007/s00521-021-06198-x

work page doi:10.1007/s00521-021-06198-x 2021

[59] [59]

MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans

Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans. Knowl. Data Eng.2012,26, 405–425

work page 2012

[60] [60]

AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans

Yang, X.; Kuang, Q.; Zhang, W.; Zhang, G. AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans. Knowl. Data Eng.2018,30, 1672–1685.https://doi.org/10.1109/TKDE.2017. 2761347

work page doi:10.1109/tkde.2017 2018

[61] [61]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf

Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf. Sci.2023,622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139

work page doi:10.1016/j.ins.2022.11.139 2023

[62] [62]

Wasserstein generative adversarial networks

Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 214– 223

work page 2017

[63] [63]

Conditional Generative Adversarial Nets

Mirza, M.; Osindero, S. Conditional generative adversarial nets.arXiv2014, arXiv:1411.1784

work page internal anchor Pith review Pith/arXiv arXiv

[64] [64]

Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens

Ahmadian, R.; Ghatee, M.; Wahlstr ¨om, J. Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens. J.2022,22, 6879–6886

work page 2022

[65] [65]

Generative moment matching networks

Li, Y .; Swersky, K.; Zemel, R. Generative moment matching networks. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1718–1727

work page 2015

[66] [66]

The synthetic data vault

Patki, N.; Wedge, R.; Veeramachaneni, K. The synthetic data vault. In Proceedings of the 2016 IEEE Interna- tional Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 399–410

work page 2016

[67] [67]

GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach

Naglik, I.; Lango, M. GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach. Learn.2024,113, 5183–5202.https://doi.org/10.1007/ s10994-023-06416-8

work page 2024

[68] [68]

Denoising diffusion probabilistic models.Adv

Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models.Adv. Neural Inf. Process. Syst.2020, 33, 6840–6851

work page 2020

[69] [69]

CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

Suh, S.; Lee, H.; Lukowicz, P.; Lee, Y .O. CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

work page 2021

[70] [70]

Wasserstein generative adversarial network with gradient penalty for handwritten digit generation

Wu, J.; Li, W.; Wu, Y .; Qiu, S. Wasserstein generative adversarial network with gradient penalty for handwritten digit generation. In Proceedings of the 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC), Guangzhou, China, 29 November–1 December 2024; pp. 375–379

work page 2024

[71] [71]

ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl

Huang, K.; Wang, X. ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl. Intell.2022,52, 2838–2853

work page 2022

[72] [72]

Inference suboptimality in variational autoencoders

Cremer, C.; Li, X.; Duvenaud, D. Inference suboptimality in variational autoencoders. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm Sweden, 10–15 July 2018; pp. 1078–1086

work page 2018

[73] [73]

Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

Geng, C.; Wang, J.; Chen, L.; Gao, Z. Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

work page

[74] [74]

Improved training of wasserstein gans

Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V .; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst.2017,30

work page 2017

[75] [75]

Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur

Xu, Y .; Zhang, X.; Qiu, Z.; Zhang, X.; Qiu, J.; Zhang, H. Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur. Commun. Netw.2021,2021, 9206440. 69 APREPRINT- APRIL30, 2026

work page 2021

[76] [76]

Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference

Bharti, A.; Naslidnyk, M.; Key, O.; Kaski, S.; Briol, F.X. Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 2289–2312

work page 2023

[77] [77]

Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst

Yen, S.J.; Lee, Y .S. Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst. Appl.2009,36, 5718–5727.https://doi.org/10.1016/j.eswa.2008.06.108

work page doi:10.1016/j.eswa.2008.06.108 2009

[78] [78]

Nearmiss under sampling for imbalanced dataset classification

Zhang, J.; Mani, I.; Lin, K. Nearmiss under sampling for imbalanced dataset classification. In Proceedings of the ICML Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 July 2003

work page 2003

[79] [79]

Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect

Yen, S.; Lee, Y . Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect. Notes Control. Inf. Sci.2006,344, 731

work page 2006

[80] [80]

An instance level analysis of data complexity.Mach

Smith, M.R.; Martinez, T.; Giraud-Carrier, C. An instance level analysis of data complexity.Mach. Learn. 2014,95, 225–256

work page 2014