Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods
Pith reviewed 2026-05-22 14:33 UTC · model grok-4.3
The pith
No single data balancing method works best for every imbalanced dataset.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that data balancing strategies vary widely in effectiveness, with no method universally superior. Success instead depends on matching the technique to dataset characteristics such as dimensionality, feature types, class overlap, and noise, together with the classifier and chosen evaluation metrics.
What carries the argument
Systematic categorization and critical analysis of resampling, augmentation, and generative balancing methods, assessing their assumptions and suitability for varied data conditions.
If this is right
- Hybrid combinations like SMOTE with Tomek Links or ENN can reduce both imbalance and noise at once.
- Generative models including diffusion approaches enable creation of realistic minority-class samples beyond traditional interpolation.
- Ensemble strategies such as SMOTEBoost and Balanced Random Forest add robustness when paired with balancing steps.
- Specialized variants are required for multi-label or clustered imbalance settings.
Where Pith is reading between the lines
- Future tools could automate balancing-method selection by profiling dataset properties first.
- Benchmarks for imbalanced learning should test methods across diverse data regimes instead of fixed suites.
- Foundation models may need balancing adaptations that preserve their pre-trained distributions.
Load-bearing premise
The review assumes its selection and grouping of methods accurately captures the current literature without major omissions or selection bias.
What would settle it
A controlled study in which one specific method, such as a diffusion-based oversampler, outperforms all others across a broad collection of datasets, classifiers, and metrics would challenge the central finding.
Figures
read the original abstract
Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a comprehensive, systematic review of data balancing methods, extending beyond foundational oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants (e.g., Borderline SMOTE, K-Means SMOTE, and Safe-Level SMOTE) to encompass advanced adaptive methods (MWMOTE, AMDO), deep generative models (generative adversarial networks, variational autoencoders, and diffusion models), undersampling techniques (NearMiss, Tomek Links), combination/hybrid methods (SMOTE-ENN, SMOTE-Tomek, and SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest, and One-Sided Selection), and specialized approaches for multi-label and clustered data. Beyond descriptive categorization, this review critically examines each method's underlying assumptions, operational mechanisms, and suitability for diverse data characteristics, including high dimensionality, mixed feature types, class overlap, and noise. Key findings demonstrate that no single method universally outperforms others; optimal selection depends critically on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes by identifying emerging research directions, including self-supervised learning for imbalance, diffusion-based generative oversampling, distribution-preserving resampling, knowledge distillation for imbalanced deployment, and the adaptation of foundation models to skewed distributions, offering practical guidelines for practitioners and a roadmap for future methodological development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a systematic survey of data balancing methods for imbalanced datasets in machine learning. It reviews foundational oversampling techniques such as SMOTE and its variants (Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE), advanced adaptive methods (MWMOTE, AMDO), deep generative models (GANs, VAEs, diffusion models), undersampling (NearMiss, Tomek Links), hybrid/combination methods (SMOTE-ENN, SMOTE-Tomek, SMOTE+OCSVM), ensemble strategies (SMOTEBoost, RUSBoost, Balanced Random Forest), and specialized approaches for multi-label and clustered data. Beyond categorization, the paper critically examines each method's assumptions, operational mechanisms, and suitability for data characteristics including high dimensionality, mixed features, class overlap, and noise. The central claim is that no single method universally outperforms others and that optimal selection depends on dataset characteristics, classifier choice, and evaluation metrics. The paper concludes with emerging research directions and practical guidelines.
Significance. If the synthesis holds, the survey provides a structured, critical overview of the imbalanced data literature that can serve as a reference for practitioners selecting methods and for researchers identifying gaps. The emphasis on context-dependence and the examination of assumptions/mechanisms for each family of techniques adds utility beyond a descriptive list. Explicit discussion of future directions such as diffusion-based oversampling and foundation-model adaptation supplies a clear roadmap.
minor comments (3)
- The abstract lists many method families and examples; a shorter version that foregrounds the critical-examination contribution and the context-dependence finding would improve readability while retaining completeness.
- Section headings and subsection numbering should be checked for consistency when moving from the SMOTE-variants discussion to the deep-generative-models section; some readers may lose the thread between families.
- A small number of cited works on diffusion models for oversampling appear only in the future-directions paragraph; moving one or two representative references into the main generative-models section would strengthen the coverage claim.
Simulated Author's Rebuttal
We thank the referee for the positive and detailed summary of our manuscript, as well as the recommendation for minor revision. We appreciate the recognition of the survey's critical examination of assumptions, mechanisms, and context-dependent performance of balancing methods. We will incorporate minor improvements to enhance clarity and completeness in the revised version.
Circularity Check
No significant circularity in survey synthesis
full rationale
This paper is a systematic literature survey that categorizes, describes, and critically examines existing data balancing methods drawn from external sources. It introduces no new mathematical derivations, parameter fittings, or empirical predictions whose outputs reduce by construction to its own inputs. The key finding that no single method universally outperforms others is presented as a qualitative synthesis of the reviewed literature, with explicit dependence on dataset characteristics and classifier choice already flagged in the abstract and scope. No self-definitional steps, fitted inputs relabeled as predictions, load-bearing self-citations forming closed loops, or ansatzes smuggled via prior author work are present. The argument remains self-contained against external benchmarks in the cited studies.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Key findings demonstrate that no single method universally outperforms others; optimal selection depends critically on dataset characteristics, classifier choice, and evaluation metrics.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The paper provides a comprehensive, systematic review of data balancing methods... extending beyond foundational oversampling techniques such as SMOTE and its variants to encompass... generative models... ensemble strategies...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Miftahushudur, T.; Sahin, H.M.; Grieve, B.; Yin, H. A survey of methods for addressing imbalance data problems in agriculture applications.Remote Sens.2025,17, 454
work page 2025
-
[2]
Yousefimehr, B.; Ghatee, M. A systematic survey and empirical comparison of hybrid methods for imbalanced fraud detection: Combining resampling and machine learning.AUT J. Math. Comput.2026,7, 85–116
work page 2026
-
[3]
A comprehensive survey on imbalanced data learning.Front
Gao, X.; Xie, D.; Zhang, Y .; Wang, Z.; Chen, C.; He, C.; Yin, H.; Zhang, W. A comprehensive survey on imbalanced data learning.Front. Comput. Sci.2026,20, 2011622
work page 2026
-
[4]
SMOTE: Synthetic minority over-sampling tech- nique.J
Chawla, N.V .; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling tech- nique.J. Artif. Intell. Res.2002,16, 321–357
work page 2002
-
[5]
Learning from imbalanced data.IEEE Trans
He, H.; Garcia, E.A. Learning from imbalanced data.IEEE Trans. Knowl. Data Eng.2009,21, 1263–1284
work page 2009
-
[6]
Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119
Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Resampling to Classify Rare Attack Tactics in UWF- ZeekData22.Knowledge2024,4, 96–119
-
[7]
Parrales-Bravo, F.; Caicedo-Quiroz, R.; Tolozano-Benitez, E.; G ´omez-Rodr´ıguez, V .; Cevallos-Torres, L.; Charco-Aguirre, J.; Vasquez-Cevallos, L. OUCH: Oversampling and undersampling cannot help improve ac- curacy in our bayesian classifiers that predict preeclampsia.Mathematics2024,12, 3351
-
[8]
Generative adversarial nets.Adv
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y . Generative adversarial nets.Adv. Neural Inf. Process. Syst.2014,27
work page 2014
-
[9]
Auto-Encoding Variational Bayes
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes.arXiv2022, arXiv:1312.6114
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Improving committee diagnosis with resampling techniques.Adv
Parmanto, B.; Munro, P.; Doyle, H. Improving committee diagnosis with resampling techniques.Adv. Neural Inf. Process. Syst.1995,8. 66 APREPRINT- APRIL30, 2026
work page 1995
-
[11]
Salehi, A.R.; Khedmati, M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for clas- sifying imbalanced data.Sci. Rep.2024,14, 5152
work page 2024
-
[12]
Hou, G.; Tong, D.L.; Liew, S.Y .; Choo, P.Y . Comparative analysis of resampling techniques for class imbalance in financial distress prediction using XGBOOST.Mathematics2025,13, 2186
-
[13]
BMJ372(71), 1–9 (2021) https://doi.org/10.1136/bmj.n71
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews.bmj2021,372ttps://doi.org/10.1136/bmj.n71
-
[14]
A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain
Yousefimehr, B.; Ghatee, M.; Ghaffari, S.; Arasteh, A.; Ahmadi, P.; Ghane, A.; Esnaasharieh, S. A systematic review for 2019–2025 on deep learning models in the film production industry.Entertain. Comput.2026, 56, 101076
work page 2019
-
[15]
A new measure of rank correlation.Biometrika1938,30, 81–93
Kendall, M.G. A new measure of rank correlation.Biometrika1938,30, 81–93
-
[16]
Cinelli, M.; Kadzi ´nski, M.; Gonzalez, M.; Słowi ´nski, R. How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy.Omega2020,96, 102261
-
[17]
Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S.Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008
work page 2008
-
[18]
Azhar, N.A.; Pozi, M.S.M.; Din, A.M.; Jatowt, A. An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis.IEEE Trans. Knowl. Data Eng.2023,35, 6651–6672.https://doi.org/10.1109/TKDE.2022.3179381
-
[19]
Sakho, A.; Malherbe, E.; Scornet, E. Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.arXiv2026, arXiv:2402.03819
-
[20]
Glazkova, A. A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification.arXiv 2020, arXiv:2008.04636
-
[21]
Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem
Kachan, O.; Savchenko, A.; Gusev, G. Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; pp. 625–635
work page 2025
-
[22]
Kov ´acs, G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets.Appl. Soft Comput.2019,83, 105662
work page 2019
-
[23]
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst
Nekooeimehr, I.; Lai-Yuen, S.K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbal- anced datasets.Expert Syst. Appl.2016,46, 405–416
work page 2016
-
[24]
ADASYN: Adaptive synthetic sampling approach for imbalanced learning
He, H.; Bai, Y .; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 1322–1328
work page 2008
-
[25]
Tao, X.; Guo, X.; Zheng, Y .; Zhang, X.; Chen, Z. Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification.Knowl.-Based Syst.2023,277, 110795
work page 2023
-
[26]
Wang, X.; Xu, J.; Zeng, T.; Jing, L. Local distribution-based adaptive minority oversampling for imbalanced data classification.Neurocomputing2021,422, 200–213
-
[27]
AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf
Guan, S.; Zhao, X.; Xue, Y .; Pan, H. AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets.Inf. Sci.2024,663, 120311
work page 2024
-
[28]
Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans
Ren, S.; Ding, J.; Cheung, Y .m. Diffusion GAN-Based Oversampling for Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2026,38, 983–996
work page 2026
-
[29]
Wang, X.; Wang, C.; Wang, M.; Liu, J.; Guan, X. B2BGAN: A Backbone-to-Branches GAN-Based Over- sampling Approach for Class-Imbalanced Tabular Data.IEEE Trans. Knowl. Data Eng.2025,37, 5808–5822. https://doi.org/10.1109/TKDE.2025.3593637
-
[30]
A survey on explainable artificial intel- ligence (xai): Toward medical xai
Dablain, D.; Krawczyk, B.; Chawla, N.V . DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data.IEEE Trans. Neural Netw. Learn. Syst.2023,34, 6390–6404.https://doi.org/10.1109/TNNLS. 2021.3136503
-
[31]
Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst
Engelmann, J.; Lessmann, S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning.Expert Syst. Appl.2021,174, 114582
work page 2021
-
[32]
CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl
Wang, A.X.; Le, M.Q.; Duong, H.T.; Van, B.N.; Nguyen, B.P. CTV AE: Contrastive Tabular Variational Au- toencoder for imbalance data.Knowl. Inf. Syst.2025,67, 5335–5354
work page 2025
-
[33]
RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans
Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance.IEEE Trans. Syst. Man-Cybern.-Part A Syst. Hum.2009,40, 185–197. 67 APREPRINT- APRIL30, 2026
work page 2009
-
[34]
Gurcan, F.; Soylu, A. Learning from imbalanced data: Integration of advanced resampling techniques and machine learning models for enhanced cancer diagnosis and prognosis.Cancers2024,16, 3417
-
[35]
Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)2011,42, 463–484
work page 2011
-
[36]
Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling.Pattern Recognit.2013,46, 3460–3472.https: //doi.org/10.1016/j.patcog.2013.05.006
-
[37]
Measuring agreement in method comparison studies.Stat
Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies.Stat. Methods Med. Res.1999, 8, 135–160
work page 1999
-
[38]
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf
Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE.Inf. Sci.2018,465, 1–20.https://doi.org/10.1016/j.ins.2018.06.056
-
[39]
Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over- sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidel- berg, Germany, 2009; pp. 475–482
work page 2009
-
[40]
Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog
Garc ´ıa, V .; S´anchez, J.; Mart ´ın F´elez, R.; Mollineda, R. Surrounding neighborhood-based SMOTE for learning from imbalanced data sets.Prog. Artif. Intell.2012,1, 347–362.https://doi.org/10.1007/ s13748-012-0027-5
work page 2012
-
[41]
A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit
Chaudhuri, B. A new definition of neighborhood of a point in multi-dimensional space.Pattern Recognit. Lett. 1996,17, 11–17.https://doi.org/10.1016/0167-8655(95)00093-3
-
[42]
673–702.https://doi.org/10.1007/978-1-4613-0231-5_26
S ´anchez, J.; Marqu ´es, A.Enhanced Neighbourhood Specifications for Pattern Classification; Springer: Berlin/Heidelberg, Germany, 2003; pp. 673–702.https://doi.org/10.1007/978-1-4613-0231-5_26
-
[43]
Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach
Azizian, A.; Yousefimehr, B.; Ghatee, M. Enhanced Multi-Modal Gas Leakage Detection with NSMOTE: A Novel Over-sampling Approach. In Proceedings of the 2024 8th International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Mashhad, Iran, 14–15 May 2024; pp. 94–99.https://doi. org/10.1109/SCIoT62588.2024.10570108
-
[44]
MSMOTE: Improving Classification Performance When Training Data is Imbalanced
Hu, S.; Liang, Y .; Ma, L.; He, Y . MSMOTE: Improving Classification Performance When Training Data is Imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; V olume 2, pp. 13–17.https://doi.org/10.1109/WCSE.2009.756
-
[45]
Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning
Han, H.; Wang, W.; Mao, B. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005
work page 2005
-
[46]
Zhang, A.; Yu, H.; Zhou, S.; Huan, Z.; Yang, X. Instance weighted SMOTE by indirectly exploring the data distribution.Knowl.-Based Syst.2022,249, 108919.https://doi.org/10.1016/j.knosys.2022.108919
-
[47]
AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci
Wang, J.B.; Zou, C.A. AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learn- ing.Sci. Program.2021,2021, 1–18.https://doi.org/10.1155/2021/9947621
-
[48]
Yi, X.; Xu, Y .; Hu, Q.; Krishnamoorthy, S.; Li, W.; Tang, Z. ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection.Complex Intell. Syst.2022,8, 2247–2272
work page 2022
-
[49]
A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int
Hussein, A.S.; Li, T.; Yohannese, C.W.; Bashir, K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE.Int. J. Comput. Intell. Syst.2019,12, 1412–1422
work page 2019
-
[50]
S ´aez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering.Inf. Sci.2015,291, 184–203. https://doi.org/10.1016/j.ins.2014.08.051
-
[51]
Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
Napierała, K.; Stefanowski, J.; Wilk, S. Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. InProceedings of the Rough Sets and Current Trends in Computing; Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 158–167
work page 2010
-
[52]
Maldonado, S.; Vairetti, C.; Fernandez, A.; Herrera, F. FW-SMOTE: A feature-weighted oversampling ap- proach for imbalanced classification.Pattern Recognit.2022,124, 108511
work page 2022
-
[53]
IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans
Maldonado, S.; Merig ´o, J.; Miranda, J. IOW A-SVM: A Density-Based Weighting Strategy for SVM Classi- fication via OW A Operators.IEEE Trans. Fuzzy Syst.2020,28, 2143–2150.https://doi.org/10.1109/ TFUZZ.2019.2930942. 68 APREPRINT- APRIL30, 2026
-
[54]
An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl
Meng, D.; Li, Y . An imbalanced learning method by combining SMOTE with Center Offset Factor.Appl. Soft Comput.2022,120, 108618
work page 2022
-
[55]
Salunkhe, U.R.; Mali, S.N. A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling.Int. J. Intell. Syst. Appl.2018,10, 71
work page 2018
-
[56]
Applying support vector machines to imbalanced datasets
Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. InProceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004; Proceedings 15; Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50
work page 2004
-
[57]
Cortes, C.; Vapnik, V . Support-vector networks.Mach. Learn.1995,20, 273–297.https://doi.org/10. 1007/BF00994018
work page 1995
-
[58]
ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
Ibrahim, M.H. ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput. Appl.2021,33, 15781–15806.https://doi.org/10.1007/s00521-021-06198-x
-
[59]
MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans
Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning.IEEE Trans. Knowl. Data Eng.2012,26, 405–425
work page 2012
-
[60]
AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans
Yang, X.; Kuang, Q.; Zhang, W.; Zhang, G. AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems.IEEE Trans. Knowl. Data Eng.2018,30, 1672–1685.https://doi.org/10.1109/TKDE.2017. 2761347
-
[61]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data.Inf. Sci.2023,622, 178–210. https://doi.org/10.1016/j.ins.2022.11.139
-
[62]
Wasserstein generative adversarial networks
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 214– 223
work page 2017
-
[63]
Conditional Generative Adversarial Nets
Mirza, M.; Osindero, S. Conditional generative adversarial nets.arXiv2014, arXiv:1411.1784
work page internal anchor Pith review Pith/arXiv arXiv
-
[64]
Ahmadian, R.; Ghatee, M.; Wahlstr ¨om, J. Discrete wavelet transform for generative adversarial network to identify drivers using gyroscope and accelerometer sensors.IEEE Sens. J.2022,22, 6879–6886
work page 2022
-
[65]
Generative moment matching networks
Li, Y .; Swersky, K.; Zemel, R. Generative moment matching networks. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1718–1727
work page 2015
-
[66]
Patki, N.; Wedge, R.; Veeramachaneni, K. The synthetic data vault. In Proceedings of the 2016 IEEE Interna- tional Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 399–410
work page 2016
-
[67]
Naglik, I.; Lango, M. GMMSampling: A new model-based, data difficulty-driven resampling method for multi-class imbalanced data.Mach. Learn.2024,113, 5183–5202.https://doi.org/10.1007/ s10994-023-06416-8
work page 2024
-
[68]
Denoising diffusion probabilistic models.Adv
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models.Adv. Neural Inf. Process. Syst.2020, 33, 6840–6851
work page 2020
-
[69]
Suh, S.; Lee, H.; Lukowicz, P.; Lee, Y .O. CEGAN: Classification Enhancement Generative Adversarial Net- works for unraveling data imbalance problems.Neural Netw.2021,133, 69–86
work page 2021
-
[70]
Wasserstein generative adversarial network with gradient penalty for handwritten digit generation
Wu, J.; Li, W.; Wu, Y .; Qiu, S. Wasserstein generative adversarial network with gradient penalty for handwritten digit generation. In Proceedings of the 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC), Guangzhou, China, 29 November–1 December 2024; pp. 375–379
work page 2024
-
[71]
Huang, K.; Wang, X. ADA-INCV AE: Improved data generation using variational autoencoder for imbalanced classification.Appl. Intell.2022,52, 2838–2853
work page 2022
-
[72]
Inference suboptimality in variational autoencoders
Cremer, C.; Li, X.; Duvenaud, D. Inference suboptimality in variational autoencoders. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm Sweden, 10–15 July 2018; pp. 1078–1086
work page 2018
-
[73]
Geng, C.; Wang, J.; Chen, L.; Gao, Z. Solving the reconstruction-generation trade-off: Generative model with implicit embedding learning.Neurocomputing2023,549, 126428
-
[74]
Improved training of wasserstein gans
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V .; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst.2017,30
work page 2017
-
[75]
Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur
Xu, Y .; Zhang, X.; Qiu, Z.; Zhang, X.; Qiu, J.; Zhang, H. Oversampling imbalanced data based on convergent WGAN for network threat detection.Secur. Commun. Netw.2021,2021, 9206440. 69 APREPRINT- APRIL30, 2026
work page 2021
-
[76]
Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference
Bharti, A.; Naslidnyk, M.; Key, O.; Kaski, S.; Briol, F.X. Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 2289–2312
work page 2023
-
[77]
Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst
Yen, S.J.; Lee, Y .S. Cluster-based under-sampling approaches for imbalanced data distributions.Expert Syst. Appl.2009,36, 5718–5727.https://doi.org/10.1016/j.eswa.2008.06.108
-
[78]
Nearmiss under sampling for imbalanced dataset classification
Zhang, J.; Mani, I.; Lin, K. Nearmiss under sampling for imbalanced dataset classification. In Proceedings of the ICML Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 July 2003
work page 2003
-
[79]
Yen, S.; Lee, Y . Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Lect. Notes Control. Inf. Sci.2006,344, 731
work page 2006
-
[80]
An instance level analysis of data complexity.Mach
Smith, M.R.; Martinez, T.; Giraud-Carrier, C. An instance level analysis of data complexity.Mach. Learn. 2014,95, 225–256
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.