pith. sign in

arxiv: 2505.13518 · v2 · submitted 2025-05-17 · 📊 stat.ML · cs.AI· cs.LG

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Pith reviewed 2026-05-22 14:33 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords imbalanced datasetsdata balancingSMOTEoversamplingundersamplinggenerative modelsmachine learningsystematic survey
0
0 comments X

The pith

No single data balancing method works best for every imbalanced dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews techniques for fixing class imbalance, where one category dominates the data and skews model predictions. It organizes methods from basic oversampling such as SMOTE and its variants through generative models like GANs, VAEs, and diffusion models to undersampling, hybrids, and ensembles. The review weighs each approach's assumptions, how it operates, and its fit for high-dimensional data, mixed features, overlap, or noise. Its central result is that selection must match the dataset traits, classifier, and metrics rather than relying on any default technique.

Core claim

The paper establishes that data balancing strategies vary widely in effectiveness, with no method universally superior. Success instead depends on matching the technique to dataset characteristics such as dimensionality, feature types, class overlap, and noise, together with the classifier and chosen evaluation metrics.

What carries the argument

Systematic categorization and critical analysis of resampling, augmentation, and generative balancing methods, assessing their assumptions and suitability for varied data conditions.

If this is right

  • Hybrid combinations like SMOTE with Tomek Links or ENN can reduce both imbalance and noise at once.
  • Generative models including diffusion approaches enable creation of realistic minority-class samples beyond traditional interpolation.
  • Ensemble strategies such as SMOTEBoost and Balanced Random Forest add robustness when paired with balancing steps.
  • Specialized variants are required for multi-label or clustered imbalance settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future tools could automate balancing-method selection by profiling dataset properties first.
  • Benchmarks for imbalanced learning should test methods across diverse data regimes instead of fixed suites.
  • Foundation models may need balancing adaptations that preserve their pre-trained distributions.

Load-bearing premise

The review assumes its selection and grouping of methods accurately captures the current literature without major omissions or selection bias.

What would settle it

A controlled study in which one specific method, such as a diffusion-based oversampler, outperforms all others across a broad collection of datasets, classifiers, and metrics would challenge the central finding.

Figures

Figures reproduced from arXiv: 2505.13518 by Abolfazl Nikahd, Alireza Orouji, Behnam Yousefimehr, Javad Fazli, Mahdi Razi Gandomani, Mehdi Ghatee, Mohammad Amin Seifi, Negin Sadat Mousavi, Ramtin Mahmoudi Kashani, Sajed Tavakoli, Sarina Heshmati, Shervin Ghaffari, Zahra Rafei.

Figure 1
Figure 1. Figure 1: PRISMA 2020 flow diagram of the systematic review selection process (QPSF: Quantitative Paper Scoring [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy of data resampling and augmentation techniques for imbalanced learning. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Decision flow for selecting ensemble strategies based on data characteristics and computational budget. [PITH_FULL_IMAGE:figures/full_fig_p045_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of future research directions in resampling and augmentation for imbalanced learning. [PITH_FULL_IMAGE:figures/full_fig_p061_4.png] view at source ↗
read the original abstract

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a comprehensive, systematic review of data balancing methods, extending beyond foundational oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants (e.g., Borderline SMOTE, K-Means SMOTE, and Safe-Level SMOTE) to encompass advanced adaptive methods (MWMOTE, AMDO), deep generative models (generative adversarial networks, variational autoencoders, and diffusion models), undersampling techniques (NearMiss, Tomek Links), combination/hybrid methods (SMOTE-ENN, SMOTE-Tomek, and SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest, and One-Sided Selection), and specialized approaches for multi-label and clustered data. Beyond descriptive categorization, this review critically examines each method's underlying assumptions, operational mechanisms, and suitability for diverse data characteristics, including high dimensionality, mixed feature types, class overlap, and noise. Key findings demonstrate that no single method universally outperforms others; optimal selection depends critically on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes by identifying emerging research directions, including self-supervised learning for imbalance, diffusion-based generative oversampling, distribution-preserving resampling, knowledge distillation for imbalanced deployment, and the adaptation of foundation models to skewed distributions, offering practical guidelines for practitioners and a roadmap for future methodological development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is a systematic survey of data balancing methods for imbalanced datasets in machine learning. It reviews foundational oversampling techniques such as SMOTE and its variants (Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE), advanced adaptive methods (MWMOTE, AMDO), deep generative models (GANs, VAEs, diffusion models), undersampling (NearMiss, Tomek Links), hybrid/combination methods (SMOTE-ENN, SMOTE-Tomek, SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest), and specialized approaches for multi-label and clustered data. Beyond categorization, the paper critically examines each method's assumptions, operational mechanisms, and suitability for data characteristics including high dimensionality, mixed features, class overlap, and noise. The central claim is that no single method universally outperforms others and that optimal selection depends on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes with emerging research directions and practical guidelines.

Significance. If the synthesis holds, the survey provides a structured, critical overview of the imbalanced data literature that can serve as a reference for practitioners selecting methods and for researchers identifying gaps. The emphasis on context-dependence and the examination of assumptions/mechanisms for each family of techniques adds utility beyond a descriptive list. Explicit discussion of future directions such as diffusion-based oversampling and foundation-model adaptation supplies a clear roadmap.

minor comments (3)
  1. The abstract lists many method families and examples; a shorter version that foregrounds the critical-examination contribution and the context-dependence finding would improve readability while retaining completeness.
  2. Section headings and subsection numbering should be checked for consistency when moving from the SMOTE-variants discussion to the deep-generative-models section; some readers may lose the thread between families.
  3. A small number of cited works on diffusion models for oversampling appear only in the future-directions paragraph; moving one or two representative references into the main generative-models section would strengthen the coverage claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and detailed summary of our manuscript, as well as the recommendation for minor revision. We appreciate the recognition of the survey's critical examination of assumptions, mechanisms, and context-dependent performance of balancing methods. We will incorporate minor improvements to enhance clarity and completeness in the revised version.

Circularity Check

0 steps flagged

No significant circularity in survey synthesis

full rationale

This paper is a systematic literature survey that categorizes, describes, and critically examines existing data balancing methods drawn from external sources. It introduces no new mathematical derivations, parameter fittings, or empirical predictions whose outputs reduce by construction to its own inputs. The key finding that no single method universally outperforms others is presented as a qualitative synthesis of the reviewed literature, with explicit dependence on dataset characteristics and classifier choice already flagged in the abstract and scope. No self-definitional steps, fitted inputs relabeled as predictions, load-bearing self-citations forming closed loops, or ansatzes smuggled via prior author work are present. The argument remains self-contained against external benchmarks in the cited studies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new free parameters, axioms, or invented entities; it aggregates and evaluates techniques from the existing machine learning literature on class imbalance.

pith-pipeline@v0.9.0 · 5881 in / 1064 out tokens · 58747 ms · 2026-05-22T14:33:57.245887+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

136 extracted references · 136 canonical work pages · 4 internal anchors

  1. [1]

    A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

    Miftahushudur, T.; Sahin, H.M.; Grieve, B.; Yin, H. A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454

  2. [2]

    A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J

    Yousefimehr, B.; Ghatee, M. A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J. Math. Comput.2026,7, 85–116

  3. [3]

    A comprehensive survey on imbalanced data learning.Front

    Gao, X.; Xie, D.; Zhang, Y .; Wang, Z.; Chen, C.; He, C.; Yin, H.; Zhang, W. A comprehensive survey on imbalanced data learning.Front. Comput. Sci.2026,20, 2011622

  4. [4]

    SMOTE: Synthetic minority over-sampling tech- nique.J

    Chawla, N.V .; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling tech- nique.J. Artif. Intell. Res.2002,16, 321–357

  5. [5]

    Learning from imbalanced data.IEEE Trans

    He, H.; Garcia, E.A. Learning from imbalanced data.IEEE Trans. Knowl. Data Eng.2009,21, 1263–1284

  6. [6]

    Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

    Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119

  7. [7]

    OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

    Parrales-Bravo, F.; Caicedo-Quiroz, R.; Tolozano-Benitez, E.; G ´omez-Rodr´ıguez, V .; Cevallos-Torres, L.; Charco-Aguirre, J.; Vasquez-Cevallos, L. OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351

  8. [8]

    Generative adversarial nets.Adv

    Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y . Generative adversarial nets.Adv. Neural Inf. Process. Syst.2014,27

  9. [9]

    Auto-Encoding Variational Bayes

    Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes.arXiv2022, arXiv:1312.6114

  10. [10]

    Improving committee diagnosis with resampling techniques.Adv

    Parmanto, B.; Munro, P.; Doyle, H. Improving committee diagnosis with resampling techniques.Adv. Neural Inf. Process. Syst.1995,8. 66 APREPRINT- APRIL30, 2026

  11. [11]

    A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci

    Salehi, A.R.; Khedmati, M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci. Rep.2024,14, 5152

  12. [12]

    Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

    Hou, G.; Tong, D.L.; Liew, S.Y .; Choo, P.Y . Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186

  13. [13]

    BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71

    Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.bmj2021,372ttps://doi.org/10.1136/bmj.n71

  14. [14]

    A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain

    Yousefimehr, B.; Ghatee, M.; Ghaffari, S.; Arasteh, A.; Ahmadi, P.; Ghane, A.; Esnaasharieh, S. A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain. Comput.2026, 56, 101076

  15. [15]

    A new measure of rank correlation.Biometrika1938,30, 81–93

    Kendall, M.G. A new measure of rank correlation.Biometrika1938,30, 81–93

  16. [16]

    How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

    Cinelli, M.; Kadzi ´nski, M.; Gonzalez, M.; Słowi ´nski, R. How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261

  17. [17]

    Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S.Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008

  18. [18]

    An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans

    Azhar, N.A.; Pozi, M.S.M.; Din, A.M.; Jatowt, A. An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans. Knowl. Data Eng.2023,35, 6651–6672.https://doi.org/10.1109/TKDE.2022.3179381

  19. [19]

    Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

    Sakho, A.; Malherbe, E.; Scornet, E. Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819

  20. [20]

    A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

    Glazkova, A. A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636

  21. [21]

    Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

    Kachan, O.; Savchenko, A.; Gusev, G. Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; pp. 625–635

  22. [22]

    An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl

    Kov ´acs, G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl. Soft Comput.2019,83, 105662

  23. [23]

    Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst

    Nekooeimehr, I.; Lai-Yuen, S.K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst. Appl.2016,46, 405–416

  24. [24]

    ADASYN: Adaptive synthetic sampling approach for imbalanced learning

    He, H.; Bai, Y .; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 1322–1328

  25. [25]

    Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

    Tao, X.; Guo, X.; Zheng, Y .; Zhang, X.; Chen, Z. Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795

  26. [26]

    Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

    Wang, X.; Xu, J.; Zeng, T.; Jing, L. Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213

  27. [27]

    AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf

    Guan, S.; Zhao, X.; Xue, Y .; Pan, H. AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf. Sci.2024,663, 120311

  28. [28]

    Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans

    Ren, S.; Ding, J.; Cheung, Y .m. Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2026,38, 983–996

  29. [29]

    B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans

    Wang, X.; Wang, C.; Wang, M.; Liu, J.; Guan, X. B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2025,37, 5808–5822. https://doi.org/10.1109/TKDE.2025.3593637

  30. [30]

    A survey on explainable artificial intel- ligence (xai): Toward medical xai

    Dablain, D.; Krawczyk, B.; Chawla, N.V . DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data.IEEE Trans. Neural Netw. Learn. Syst.2023,34, 6390–6404.https://doi.org/10.1109/TNNLS. 2021.3136503

  31. [31]

    Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst

    Engelmann, J.; Lessmann, S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst. Appl.2021,174, 114582

  32. [32]

    CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl

    Wang, A.X.; Le, M.Q.; Duong, H.T.; Van, B.N.; Nguyen, B.P. CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl. Inf. Syst.2025,67, 5335–5354

  33. [33]

    RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans

    Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans. Syst. Man-Cybern.-Part A Syst. Hum.2009,40, 185–197. 67 APREPRINT- APRIL30, 2026

  34. [34]

    Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

    Gurcan, F.; Soylu, A. Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417

  35. [35]

    A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans

    Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)2011,42, 463–484

  36. [36]

    EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

    Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006

  37. [37]

    Measuring agreement in method comparison studies.Stat

    Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies.Stat. Methods Med. Res.1999, 8, 135–160

  38. [38]

    Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf

    Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf. Sci.2018,465, 1–20.https://doi.org/10.1016/j.ins.2018.06.056

  39. [39]

    Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem

    Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidel- berg, Germany, 2009; pp. 475–482

  40. [40]

    Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog

    Garc ´ıa, V .; S´anchez, J.; Mart ´ın F´elez, R.; Mollineda, R. Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog. Artif. Intell.2012,1, 347–362.https://doi.org/10.1007/ s13748-012-0027-5

  41. [41]

    A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit

    Chaudhuri, B. A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit. Lett. 1996,17, 11–17.https://doi.org/10.1016/0167-8655(95)00093-3

  42. [42]

    673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

    S ´anchez, J.; Marqu ´es, A.Enhanced Neighbourhood Specifications for Pattern Classification; Springer: Berlin/Heidelberg, Germany, 2003; pp. 673–702.https://doi.org/10.1007/978-1-4613-0231-5_26

  43. [43]

    Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach

    Azizian, A.; Yousefimehr, B.; Ghatee, M. Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach. In Proceedings of the 2024 8th International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Mashhad, Iran, 14–15 May 2024; pp. 94–99.https://doi. org/10.1109/SCIoT62588.2024.10570108

  44. [44]

    MSMOTE: Improving Classification Performance When Training Data is Imbalanced

    Hu, S.; Liang, Y .; Ma, L.; He, Y . MSMOTE: Improving Classification Performance When Training Data is Imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; V olume 2, pp. 13–17.https://doi.org/10.1109/WCSE.2009.756

  45. [45]

    Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

    Han, H.; Wang, W.; Mao, B. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005

  46. [46]

    Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

    Zhang, A.; Yu, H.; Zhou, S.; Huan, Z.; Yang, X. Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919

  47. [47]

    AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci

    Wang, J.B.; Zou, C.A. AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci. Program.2021,2021, 1–18.https://doi.org/10.1155/2021/9947621

  48. [48]

    ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell

    Yi, X.; Xu, Y .; Hu, Q.; Krishnamoorthy, S.; Li, W.; Tang, Z. ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell. Syst.2022,8, 2247–2272

  49. [49]

    A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int

    Hussein, A.S.; Li, T.; Yohannese, C.W.; Bashir, K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int. J. Comput. Intell. Syst.2019,12, 1412–1422

  50. [50]

    SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf

    S ´aez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf. Sci.2015,291, 184–203. https://doi.org/10.1016/j.ins.2014.08.051

  51. [51]

    Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

    Napierała, K.; Stefanowski, J.; Wilk, S. Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. InProceedings of the Rough Sets and Current Trends in Computing; Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 158–167

  52. [52]

    FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

    Maldonado, S.; Vairetti, C.; Fernandez, A.; Herrera, F. FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511

  53. [53]

    IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans

    Maldonado, S.; Merig ´o, J.; Miranda, J. IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans. Fuzzy Syst.2020,28, 2143–2150.https://doi.org/10.1109/ TFUZZ.2019.2930942. 68 APREPRINT- APRIL30, 2026

  54. [54]

    An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl

    Meng, D.; Li, Y . An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl. Soft Comput.2022,120, 108618

  55. [55]

    A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int

    Salunkhe, U.R.; Mali, S.N. A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int. J. Intell. Syst. Appl.2018,10, 71

  56. [56]

    Applying support vector machines to imbalanced datasets

    Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. InProceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004; Proceedings 15; Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50

  57. [57]

    Support-vector networks.Mach

    Cortes, C.; Vapnik, V . Support-vector networks.Mach. Learn.1995,20, 273–297.https://doi.org/10. 1007/BF00994018

  58. [58]

    ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning

    Ibrahim, M.H. ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput. Appl.2021,33, 15781–15806.https://doi.org/10.1007/s00521-021-06198-x

  59. [59]

    MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans

    Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans. Knowl. Data Eng.2012,26, 405–425

  60. [60]

    AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans

    Yang, X.; Kuang, Q.; Zhang, W.; Zhang, G. AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans. Knowl. Data Eng.2018,30, 1672–1685.https://doi.org/10.1109/TKDE.2017. 2761347

  61. [61]

    K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf

    Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf. Sci.2023,622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139

  62. [62]

    Wasserstein generative adversarial networks

    Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 214– 223

  63. [63]

    Conditional Generative Adversarial Nets

    Mirza, M.; Osindero, S. Conditional generative adversarial nets.arXiv2014, arXiv:1411.1784

  64. [64]

    Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens

    Ahmadian, R.; Ghatee, M.; Wahlstr ¨om, J. Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens. J.2022,22, 6879–6886

  65. [65]

    Generative moment matching networks

    Li, Y .; Swersky, K.; Zemel, R. Generative moment matching networks. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1718–1727

  66. [66]

    The synthetic data vault

    Patki, N.; Wedge, R.; Veeramachaneni, K. The synthetic data vault. In Proceedings of the 2016 IEEE Interna- tional Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 399–410

  67. [67]

    GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach

    Naglik, I.; Lango, M. GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach. Learn.2024,113, 5183–5202.https://doi.org/10.1007/ s10994-023-06416-8

  68. [68]

    Denoising diffusion probabilistic models.Adv

    Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models.Adv. Neural Inf. Process. Syst.2020, 33, 6840–6851

  69. [69]

    CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

    Suh, S.; Lee, H.; Lukowicz, P.; Lee, Y .O. CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86

  70. [70]

    Wasserstein generative adversarial network with gradient penalty for handwritten digit generation

    Wu, J.; Li, W.; Wu, Y .; Qiu, S. Wasserstein generative adversarial network with gradient penalty for handwritten digit generation. In Proceedings of the 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC), Guangzhou, China, 29 November–1 December 2024; pp. 375–379

  71. [71]

    ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl

    Huang, K.; Wang, X. ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl. Intell.2022,52, 2838–2853

  72. [72]

    Inference suboptimality in variational autoencoders

    Cremer, C.; Li, X.; Duvenaud, D. Inference suboptimality in variational autoencoders. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm Sweden, 10–15 July 2018; pp. 1078–1086

  73. [73]

    Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

    Geng, C.; Wang, J.; Chen, L.; Gao, Z. Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428

  74. [74]

    Improved training of wasserstein gans

    Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V .; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst.2017,30

  75. [75]

    Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur

    Xu, Y .; Zhang, X.; Qiu, Z.; Zhang, X.; Qiu, J.; Zhang, H. Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur. Commun. Netw.2021,2021, 9206440. 69 APREPRINT- APRIL30, 2026

  76. [76]

    Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference

    Bharti, A.; Naslidnyk, M.; Key, O.; Kaski, S.; Briol, F.X. Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 2289–2312

  77. [77]

    Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst

    Yen, S.J.; Lee, Y .S. Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst. Appl.2009,36, 5718–5727.https://doi.org/10.1016/j.eswa.2008.06.108

  78. [78]

    Nearmiss under sampling for imbalanced dataset classification

    Zhang, J.; Mani, I.; Lin, K. Nearmiss under sampling for imbalanced dataset classification. In Proceedings of the ICML Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 July 2003

  79. [79]

    Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect

    Yen, S.; Lee, Y . Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect. Notes Control. Inf. Sci.2006,344, 731

  80. [80]

    An instance level analysis of data complexity.Mach

    Smith, M.R.; Martinez, T.; Giraud-Carrier, C. An instance level analysis of data complexity.Mach. Learn. 2014,95, 225–256

Showing first 80 references.