TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes

Jiaqi Luo; Ruizhe Liu

arxiv: 2605.14915 · v1 · pith:FINULYYFnew · submitted 2026-05-14 · 💻 cs.LG

TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes

Ruizhe Liu , Jiaqi Luo This is my paper

Pith reviewed 2026-06-30 21:10 UTC · model grok-4.3

classification 💻 cs.LG

keywords tabular dataimbalanced learningbenchmarkempirical evaluationalgorithm selectiondata characteristicsclass imbalancemachine learning

0 comments

The pith

No single imbalanced learning method dominates all tabular settings; performance depends on dataset characteristics and computational constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper builds TILBench to compare more than 40 imbalanced learning algorithms across 57 tabular datasets in over 200000 controlled experiments. The central finding is that method effectiveness varies strongly with data properties such as size and imbalance level as well as available compute, rather than any one approach proving best everywhere. A reader would care because tabular data with class imbalance appears in many practical tasks, and the results replace the hope of a universal fix with concrete selection guidance. The benchmark also surfaces patterns that let users match methods to their specific constraints.

Core claim

TILBench evaluates more than 40 representative algorithms across 57 diverse tabular datasets, resulting in over 200000 controlled experiments across a wide range of data characteristics. Our findings show that no single method consistently dominates across all settings; instead, the effectiveness of imbalanced learning methods depends strongly on dataset characteristics and computational constraints. Based on these findings, we provide practical recommendations for selecting appropriate methods in real-world applications.

What carries the argument

TILBench, the benchmark that runs controlled comparisons of algorithm families under varied tabular data regimes and resource limits.

If this is right

Practitioners must examine dataset traits such as imbalance ratio and dimensionality before picking a method instead of defaulting to one option.
Compute budgets should be treated as a first-class input when choosing between oversampling, undersampling, cost-sensitive, or ensemble approaches.
Algorithm comparisons that ignore data characteristics or runtime will produce misleading rankings for deployment.
The benchmark supplies a starting map for matching common method families to typical data regimes encountered in applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

An automated selector that inspects a few dataset statistics could route new problems to the empirically strongest method family for those traits.
The observed variability suggests value in hybrid algorithms that switch internal strategies according to detected data properties.
Repeating the benchmark on streaming tabular data or with concept drift would test whether the same dependence on characteristics persists.
Method developers could prioritize variants that remain effective under tight compute limits, since the results flag scalability as a frequent bottleneck.

Load-bearing premise

The 57 chosen datasets and more than 40 algorithms sufficiently cover the space of real-world tabular imbalanced learning problems so that the observed performance patterns generalize beyond the benchmark.

What would settle it

A follow-up study that identifies one algorithm or family achieving top results on the majority of the same 57 datasets across multiple imbalance ratios, sizes, and compute budgets would undermine the claim that no method dominates.

Figures

Figures reproduced from arXiv: 2605.14915 by Jiaqi Luo, Ruizhe Liu.

**Figure 2.** Figure 2: Method Categorization 2.1.1. Data-level Methods Data-level methods, also known as resampling methods, address class imbalance by modifying the training data distribution prior to model learning while leaving the underlying classifier unchanged. As external preprocessing techniques, they are model-agnostic and can be applied across a wide range of learning algorithms. The core idea is to rebalance class dis… view at source ↗

**Figure 3.** Figure 3: Family-level F1-scores across sample size regimes. Each box shows the distri [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Family-level F1-scores across feature dimensionality regimes. Each box shows [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Family-level F1-scores across imbalance severity regimes. Each box shows the [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Training time (log10(seconds)) of different methods under increasing sample sizes. Each method is evaluated on datasets with 1k and 100k samples. Bar length indicates training time on a logarithmic scale. Colors represent method families, and color intensity indicates dataset size. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Training time (log10(seconds)) of different methods under increasing feature dimensionality. Each method is evaluated on datasets with 50 and 500 features. Bar length represents training time on a logarithmic scale. Colors represent method families, and color intensity indicates feature dimensionality. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Training time (log10(seconds)) of different methods under increasing class numbers. Each method is evaluated on datasets with 2 and 20 classes. Bar length represents training time on a logarithmic scale. Colors indicate method families, while color intensity represents the number of classes. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

read the original abstract

Imbalanced learning remains a fundamental challenge in tabular data applications. Despite decades of research and numerous proposed algorithms, a systematic empirical understanding of how different imbalanced learning methods behave across diverse data characteristics is still lacking. In particular, it remains unclear how different method families compare in predictive performance, robustness under varying data characteristics, and computational scalability. In this work, we present Tabular Imbalanced Learning Benchmark (TILBench), a large-scale empirical benchmark for tabular imbalanced learning. TILBench evaluates more than 40 representative algorithms across 57 diverse tabular datasets, resulting in over 200000 controlled experiments across a wide range of data characteristics. Our findings show that no single method consistently dominates across all settings; instead, the effectiveness of imbalanced learning methods depends strongly on dataset characteristics and computational constraints. Based on these findings, we provide practical recommendations for selecting appropriate methods in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TILBench delivers a large controlled comparison showing no single imbalanced learning method wins across tabular regimes; choice depends on data traits and compute.

read the letter

The core result is that method performance in tabular imbalance splits by dataset characteristics and resource limits rather than one approach dominating. The paper runs this out at scale with 57 datasets, more than 40 algorithms, and over 200k experiments, including meta-feature breakdowns and per-dataset results.

What the work actually adds is the controlled variation across regimes plus explicit dataset selection criteria and computational scaling measurements. Prior studies were smaller and less systematic, so this supplies a clearer map of where different families (oversampling, cost-sensitive, ensemble, etc.) hold up or fall off. The full text supplies the missing details on metrics, hyperparameter protocols, and statistical handling that the abstract left open, which removes the main circularity worry.

The soft spot is coverage: even with transparent selection, 57 datasets cannot exhaust every real-world tabular imbalance pattern, so the observed regularities may shift on new collections. That is a standard benchmark limit rather than a flaw in execution. No internal contradictions or hidden fitting appear in the reported controls.

This is for practitioners who need to pick an imbalance method under time or data constraints, and for researchers who want a shared reference set. It is worth a serious referee because the experiments are reproducible in principle and the central claim is testable against the breakdowns they provide.

Referee Report

2 major / 2 minor

Summary. The paper introduces TILBench, a large-scale empirical benchmark that evaluates more than 40 imbalanced learning algorithms across 57 tabular datasets in over 200,000 controlled experiments. It reports that no single method consistently dominates across settings and that method effectiveness depends strongly on dataset characteristics and computational constraints, from which it derives practical recommendations for method selection.

Significance. If the experimental controls and coverage hold, the work supplies a useful empirical map of method behavior across data regimes in tabular imbalanced learning, a domain where practitioners often lack systematic guidance. The scale of the study and the explicit inclusion of computational scaling measurements are strengths that could inform both algorithm choice and future benchmark design.

major comments (2)

[§4, §5] §4 (Experimental Protocol) and §5 (Results): the central claim that 'no single method consistently dominates' requires a precise definition of dominance (e.g., win-rate thresholds, handling of statistical ties, and the exact multiple-comparison correction). Without these details it is unclear whether the reported pattern is robust to reasonable variations in aggregation.
[§3.2] §3.2 (Dataset Selection) and meta-feature analysis: while selection criteria are stated, the paper should quantify how well the 57 datasets span the space of real-world imbalance ratios, feature types, and class-overlap regimes; a sensitivity check removing the most frequent meta-feature clusters would strengthen the generalization claim.

minor comments (2)

[Table 2, Figure 3] Table 2 and Figure 3: axis labels and legend entries should explicitly state the performance metric (e.g., AUROC vs. F1) and whether results are averaged over the 5 seeds or report median.
[§5.3] §5.3 (Computational Analysis): the reported wall-clock times should include the hyperparameter-search budget so readers can distinguish training cost from tuning cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The comments help clarify the robustness of our central claims and the generalizability of the benchmark. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4, §5] §4 (Experimental Protocol) and §5 (Results): the central claim that 'no single method consistently dominates' requires a precise definition of dominance (e.g., win-rate thresholds, handling of statistical ties, and the exact multiple-comparison correction). Without these details it is unclear whether the reported pattern is robust to reasonable variations in aggregation.

Authors: We agree that an explicit operational definition strengthens the claim. In the revision we will add to §4 a precise definition: a method is considered to 'dominate' if it obtains the highest mean rank (or win rate > 0.5) across datasets within a regime; ties are resolved by Wilcoxon signed-rank tests (α = 0.05) and multiple comparisons are corrected via the Holm-Bonferroni procedure. We will also report sensitivity of the 'no single method dominates' conclusion to reasonable variations in these thresholds and corrections in §5. revision: yes
Referee: [§3.2] §3.2 (Dataset Selection) and meta-feature analysis: while selection criteria are stated, the paper should quantify how well the 57 datasets span the space of real-world imbalance ratios, feature types, and class-overlap regimes; a sensitivity check removing the most frequent meta-feature clusters would strengthen the generalization claim.

Authors: We will expand §3.2 with quantitative coverage statistics: distributions and summary metrics for imbalance ratios, proportion of categorical vs. numerical features, and class-overlap measures (e.g., F1 overlap and nearest-neighbor overlap). We will also add a sensitivity analysis that clusters datasets by meta-features, removes the largest cluster, and re-evaluates the main findings to verify that the dependence on data characteristics remains consistent. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a purely empirical benchmark paper that evaluates >40 algorithms on 57 external public datasets via >200k controlled experiments. The central claim (no method dominates; performance depends on data characteristics) is an observed pattern from those runs, not a derivation or fitted quantity. No equations, self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided text or abstract. The work is self-contained against external benchmarks and meets the criteria for a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the selected datasets and algorithms form a representative sample of the problem space; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The 57 tabular datasets and 40+ algorithms are representative of real-world imbalanced learning scenarios across data regimes
All comparative conclusions depend on this coverage claim; if the selection misses important regimes the observed dependence on data characteristics may not generalize.

pith-pipeline@v0.9.1-grok · 5678 in / 1128 out tokens · 28169 ms · 2026-06-30T21:10:08.169637+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 1 canonical work pages

[1]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Transactions on knowledge and data engineering 21 (9) (2009) 1263–1284. 1https://imbalanced-learn.org/stable/index.html 2https://smote-variants.readthedocs.io/en/latest/index.html 3http://scikit-learn.org/stable/ 4https://github.com/Luojiaqimath/ClassbalancedLoss4GBDT 5https://xgboost.readthedocs.io...

2009
[2]

Haixiang, L

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning from class-imbalanced data: Review of methods and applica- tions, Expert Systems with Applications 73 (2017) 220–239

2017
[3]

31 Table A.13: Method abbreviations used in figures

A.Fernández, S.García, M.Galar, R.C.Prati, B.Krawczyk, F.Herrera, Learning from imbalanced data sets, Springer (2018). 31 Table A.13: Method abbreviations used in figures. FamilyAbbreviationMethod FamilyAbbreviationMethod Baseline XGB XGBoost Alg.-level FL XGBoostFL Data-level TL TomekLinks WCE XGBoostWCEENN EditedNearestNeighbours CBE XGBoostCBENCR Neigh...

2018
[4]

Krawczyk, Learning from imbalanced data: open challenges and fu- ture directions, Progress in Artificial Intelligence 5 (4) (2016) 221–232

B. Krawczyk, Learning from imbalanced data: open challenges and fu- ture directions, Progress in Artificial Intelligence 5 (4) (2016) 221–232

2016
[5]

H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah, Q. Kang, Optimizing weighted extreme learning machines for imbalanced classification and 32 Table A.14: The hyperparameters involved in training are given. Methods in the data- level family, as well as those involving XGBoost in the algorithm-level family, share the same four hyperparameters as the base model...

2020
[6]

L. I. Santos, M. O. Camargos, M. F. S. V. D’Angelo, J. B. Mendes, 33 Table A.14: Hyperparameters for different methods(continued) Category Algorithm Hyperparameter Type Range/V alues Ensemble-based Methods SelfPacedEnsemble n_estimators int[20,200] k_bins int[2,10] BalanceCascadeEnsemble n_estimators int[20,200] BalancedRandomForest n_estimators int[20,20...

2022
[7]

Zhang, X

Y. Zhang, X. Li, L. Gao, L. Wang, L. Wen, Imbalanced data fault 35 Table B.16: Complete performance results for multi-class classification tasks. Rank Method F1-score Method G-mean score Multi-class 1 SMOTE 75.84±1.45 XGBoostCost 83.91±1.16 2 XGBoostCost 75.83±1.57 SMOTE 83.63±1.00 3 SMOTETomek 75.49±1.45 SMOTETomek 83.44±0.94 4 BorderlineSMOTE75.49±1.71 ...

2018
[8]

M.R.Smith, T.Martinez, C.Giraud-Carrier, Instancehardness: Amea- sure of difficulty for an instance based on classification error, Machine 37 Table B.19: Top five methods in each imbalance severity regime ranked by G-mean score for binary and multi-class tasks. Imbalance RatioRank Method G-mean scoreMethod G-mean score Binary Multi-class <10 1 UnderBaggin...

2014
[9]

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, Journal of artificial intelli- gence research 16 (2002) 321–357. 38 Figure B.11: Family-level G-mean scores across imbalance severity regimes. Each box shows the distribution of method performance within a family for each imbalance group. Results a...

2002
[10]

Han, W.-Y

H. Han, W.-Y. Wang, B.-H. Mao, Borderline-smote: a new over- sampling method in imbalanced data sets learning, in: International conference on intelligent computing, Springer, 2005, pp. 878–887

2005
[11]

J. Luo, Y. Yuan, S. Xu, Improving gbdt performance on imbalanced datasets: An empirical study of class-balanced loss functions, Neuro- computing 634 (2025) 129896

2025
[12]

Q. Xu, S. Lu, W. Jia, C. Jiang, Imbalanced fault diagnosis of rotating machinery via multi-domain feature extraction and cost-sensitive learn- ing, Journal of Intelligent Manufacturing 31 (6) (2020) 1467–1481

2020
[13]

W. Liu, H. Fan, M. Xia, M. Xia, A focal-aware cost-sensitive boosted tree for imbalanced credit scoring, Expert Systems with Applications 208 (2022) 118158

2022
[14]

Z. Liu, W. Cao, Z. Gao, J. Bian, H. Chen, Y. Chang, T.-Y. Liu, Self- paced ensemble for highly imbalanced massive data classification, in: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, 2020, pp. 841–852

2020
[15]

Karakoulas, J

G. Karakoulas, J. Shawe-Taylor, Optimizing classifers for imbalanced training sets, Advances in neural information processing systems 11 (1998). 39

1998
[16]

Viola, M

P. Viola, M. Jones, Fast and robust classification using asymmetric ad- aboost and a detector cascade, Advances in neural information process- ing systems 14 (2001)

2001
[17]

A. A. Khan, O. Chaudhari, R. Chandra, A review of ensemble learning anddataaugmentationmodelsforclassimbalancedproblems: Combina- tion, implementation and evaluation, Expert Systems with Applications 244 (2024) 122778

2024
[18]

Kovács, An empirical comparison and evaluation of minority over- sampling techniques on a large number of imbalanced datasets, Applied Soft Computing 83 (2019) 105662

G. Kovács, An empirical comparison and evaluation of minority over- sampling techniques on a large number of imbalanced datasets, Applied Soft Computing 83 (2019) 105662

2019
[19]

Z. Liu, Z. Li, Z. Yang, T. Wei, J. Kang, Y. Zhu, H. Hamann, J. He, H. Tong, Climb: Class-imbalanced learning benchmark on tabular data, arXiv preprint arXiv:2505.17451 (2025)

work page arXiv 2025
[20]

Tomek, Two modifications of cnn

I. Tomek, Two modifications of cnn. (1976)

1976
[21]

D. L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics (3) (2007) 408–421

2007
[22]

J. Laurikkala, Improving identification of difficult small classes by bal- ancing class distribution, in: Conference on artificial intelligence in medicine in Europe, Springer, 2001, pp. 63–66

2001
[23]

Gazzah, N

S. Gazzah, N. E. B. Amara, New oversampling approaches based on polynomial fitting for imbalanced data sets, in: 2008 the eighth iapr international workshop on document analysis systems, IEEE, 2008, pp. 677–684

2008
[24]

G. E. Batista, R. C. Prati, M. C. Monard, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD explorations newsletter 6 (1) (2004) 20–29

2004
[25]

G. E. Batista, A. L. Bazzan, M. C. Monard, et al., Balancing training data for automated annotation of keywords: a case study., Wob 3 (2003) 10–18. 40

2003
[26]

W. Liu, H. Fan, M. Xia, C. Pang, Predicting and interpreting financial distress using a weighted boosted tree-based tree, Engineering Applica- tions of Artificial Intelligence 116 (2022) 105466

2022
[27]

J. Luo, Y. Quan, S. Xu, Robust-gbdt: leveraging robust loss for noisy and imbalanced classification with gbdt, Knowledge and Information Systems 67 (12) (2025) 12361–12381

2025
[28]

C. Wang, C. Deng, S. Wang, Imbalance-xgboost: leveraging weighted and focal losses for binary label-imbalanced classification with xgboost, Pattern recognition letters 136 (2020) 190–197

2020
[29]

L. M. Manevitz, M. Yousef, One-class svms for document classification, Journal of machine Learning research 2 (Dec) (2001) 139–154

2001
[30]

X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class- imbalance learning, IEEE Transactions on Systems, Man, and Cyber- netics, Part B (Cybernetics) 39 (2) (2008) 539–550

2008
[31]

C. Chen, A. Liaw, L. Breiman, et al., Using random forest to learn imbalanced data, University of California, Berkeley 110 (1-12) (2004) 24

2004
[32]

N. V. Chawla, A. Lazarevic, L. O. Hall, K. W. Bowyer, Smoteboost: Improving prediction of the minority class in boosting, in: Knowledge Discovery in Databases: PKDD 2003: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat- Dubrovnik, Croatia, September 22-26, 2003. Proceedings 7, Springer, 2003, pp. 107–119

2003
[33]

W. Fan, S. J. Stolfo, J. Zhang, P. K. Chan, Adacost: misclassification cost-sensitive boosting, in: Icml, Vol. 99, 1999, pp. 97–105

1999
[34]

Nikpour, F

B. Nikpour, F. Rahmati, B. Mirzaei, H. Nezamabadi-pour, A compre- hensive review on data-level methods for imbalanced data classification, Expert Systems with Applications 295 (2026) 128920

2026
[35]

S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, R. Togneri, Cost- sensitive learning of deep feature representations from imbalanced data, IEEE Transactions on Neural Networks and Learning Systems 29 (8) (2018) 3573–3587. 41

2018
[36]

I. Araf, A. Idri, I. Chairi, Cost-sensitive learning for imbalanced medical data: a review., Artificial Intelligence Review 57 (4) (2024)

2024
[37]

Rezvani, X

S. Rezvani, X. Wang, A broad review on class imbalance learning tech- niques, Applied Soft Computing 143 (2023) 110415

2023
[38]

G.Aguiar, B.Krawczyk, A.Cano, Asurveyonlearningfromimbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework, Machine learning 113 (7) (2024) 4165–4243

2024
[39]

B. Zhu, B. Baesens, A. Backiel, S. K. Vanden Broucke, Benchmarking sampling techniques for imbalance learning in churn prediction, Journal of the Operational Research Society 69 (1) (2018) 49–65

2018
[40]

J.Xiao, Y.Wang, J.Chen, L.Xie, J.Huang, Impactofresamplingmeth- ods and classification models on the imbalanced credit scoring problems, Information Sciences 569 (2021) 508–526

2021
[41]

Wongvorachan, S

T. Wongvorachan, S. He, O. Bulut, A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classifi- cation in educational data mining, Information 14 (1) (2023) 54

2023
[42]

Vanschoren, J

J. Vanschoren, J. N. Van Rijn, B. Bischl, L. Torgo, Openml: networked science in machine learning, ACM SIGKDD Explorations Newsletter 15 (2) (2014) 49–60

2014
[43]

T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowl- edge discovery and data mining, 2016, pp. 785–794

2016
[44]

Borisov, T

V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE transactions on neural networks and learning systems (2022)

2022
[45]

Gorishniy, I

Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, Advances in Neural Information Pro- cessing Systems 34 (2021) 18932–18943

2021
[46]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on typical tabular data?, in: Thirty- sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. 42

2022
[47]

Lin, C.-F

W.-C. Lin, C.-F. Tsai, Y.-H. Hu, J.-S. Jhang, Clustering-based under- sampling in class-imbalanced data, Information Sciences 409 (2017) 17– 26

2017
[48]

Hart, The condensed nearest neighbor rule (corresp.), IEEE transac- tions on information theory 14 (3) (1968) 515–516

P. Hart, The condensed nearest neighbor rule (corresp.), IEEE transac- tions on information theory 14 (3) (1968) 515–516

1968
[49]

Tomek, An experiment with the edited nearest-nieghbor rule

I. Tomek, An experiment with the edited nearest-nieghbor rule. (1976)

1976
[50]

I. Mani, I. Zhang, knn approach to unbalanced data distributions: a case study involving information extraction, in: Proceedings of workshop on learning from imbalanced datasets, Vol. 126, ICML, 2003, pp. 1–7

2003
[51]

Kubat, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the 14th international conference on ma- chine learning, Morgan Kaufmann, 1997, pp

M. Kubat, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the 14th international conference on ma- chine learning, Morgan Kaufmann, 1997, pp. 179–186

1997
[52]

J. A. Sáez, J. Luengo, J. Stefanowski, F. Herrera, Smote–ipf: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291 (2015) 184–203

2015
[53]

Lee, N.-r

J. Lee, N.-r. Kim, J.-H. Lee, An over-sampling technique with rejection for imbalanced class learning, in: Proceedings of the 9th international conference on ubiquitous information management and communication, 2015, pp. 1–6

2015
[54]

Q. Cao, S. Wang, Applying over-sampling technique based on data den- sity and cost-sensitive svm to imbalanced learning, in: 2011 Interna- tional conference on information management, innovation management and industrial engineering, Vol. 2, IEEE, 2011, pp. 543–548

2011
[55]

Ridnik, E

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, L. Zelnik-Manor, Asymmetric loss for multi-label classification, in: Pro- ceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 82–91

2021
[56]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. 43

2017
[57]

Y. Sun, A. K. Wong, M. S. Kamel, Classification of imbalanced data: A review, International journal of pattern recognition and artificial intel- ligence 23 (04) (2009) 687–719

2009
[58]

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, S. Belongie, Class-balanced loss based on effective number of samples, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9268– 9277

2019
[59]

Seiffert, T

C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, A. Napolitano, Rusboost: A hybrid approach to alleviating class imbalance, IEEE transactions on systems, man, and cybernetics-part A: systems and humans 40 (1) (2009) 185–197

2009
[60]

Maclin, D

R. Maclin, D. Opitz, An empirical evaluation of bagging and boosting, AAAI/IAAI 1997 (1997) 546–551

1997
[61]

S. Wang, X. Yao, Diversity analysis on imbalanced data sets by using en- semble models, in: 2009 IEEE symposium on computational intelligence and data mining, IEEE, 2009, pp. 324–331

2009
[62]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next- generation hyperparameter optimization framework, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discov- ery & data mining, 2019, pp. 2623–2631

2019
[63]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learn- ing research 12 (2011) 2825–2830. 44

2011

[1] [1]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Transactions on knowledge and data engineering 21 (9) (2009) 1263–1284. 1https://imbalanced-learn.org/stable/index.html 2https://smote-variants.readthedocs.io/en/latest/index.html 3http://scikit-learn.org/stable/ 4https://github.com/Luojiaqimath/ClassbalancedLoss4GBDT 5https://xgboost.readthedocs.io...

2009

[2] [2]

Haixiang, L

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning from class-imbalanced data: Review of methods and applica- tions, Expert Systems with Applications 73 (2017) 220–239

2017

[3] [3]

31 Table A.13: Method abbreviations used in figures

A.Fernández, S.García, M.Galar, R.C.Prati, B.Krawczyk, F.Herrera, Learning from imbalanced data sets, Springer (2018). 31 Table A.13: Method abbreviations used in figures. FamilyAbbreviationMethod FamilyAbbreviationMethod Baseline XGB XGBoost Alg.-level FL XGBoostFL Data-level TL TomekLinks WCE XGBoostWCEENN EditedNearestNeighbours CBE XGBoostCBENCR Neigh...

2018

[4] [4]

Krawczyk, Learning from imbalanced data: open challenges and fu- ture directions, Progress in Artificial Intelligence 5 (4) (2016) 221–232

B. Krawczyk, Learning from imbalanced data: open challenges and fu- ture directions, Progress in Artificial Intelligence 5 (4) (2016) 221–232

2016

[5] [5]

H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah, Q. Kang, Optimizing weighted extreme learning machines for imbalanced classification and 32 Table A.14: The hyperparameters involved in training are given. Methods in the data- level family, as well as those involving XGBoost in the algorithm-level family, share the same four hyperparameters as the base model...

2020

[6] [6]

L. I. Santos, M. O. Camargos, M. F. S. V. D’Angelo, J. B. Mendes, 33 Table A.14: Hyperparameters for different methods(continued) Category Algorithm Hyperparameter Type Range/V alues Ensemble-based Methods SelfPacedEnsemble n_estimators int[20,200] k_bins int[2,10] BalanceCascadeEnsemble n_estimators int[20,200] BalancedRandomForest n_estimators int[20,20...

2022

[7] [7]

Zhang, X

Y. Zhang, X. Li, L. Gao, L. Wang, L. Wen, Imbalanced data fault 35 Table B.16: Complete performance results for multi-class classification tasks. Rank Method F1-score Method G-mean score Multi-class 1 SMOTE 75.84±1.45 XGBoostCost 83.91±1.16 2 XGBoostCost 75.83±1.57 SMOTE 83.63±1.00 3 SMOTETomek 75.49±1.45 SMOTETomek 83.44±0.94 4 BorderlineSMOTE75.49±1.71 ...

2018

[8] [8]

M.R.Smith, T.Martinez, C.Giraud-Carrier, Instancehardness: Amea- sure of difficulty for an instance based on classification error, Machine 37 Table B.19: Top five methods in each imbalance severity regime ranked by G-mean score for binary and multi-class tasks. Imbalance RatioRank Method G-mean scoreMethod G-mean score Binary Multi-class <10 1 UnderBaggin...

2014

[9] [9]

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, Journal of artificial intelli- gence research 16 (2002) 321–357. 38 Figure B.11: Family-level G-mean scores across imbalance severity regimes. Each box shows the distribution of method performance within a family for each imbalance group. Results a...

2002

[10] [10]

Han, W.-Y

H. Han, W.-Y. Wang, B.-H. Mao, Borderline-smote: a new over- sampling method in imbalanced data sets learning, in: International conference on intelligent computing, Springer, 2005, pp. 878–887

2005

[11] [11]

J. Luo, Y. Yuan, S. Xu, Improving gbdt performance on imbalanced datasets: An empirical study of class-balanced loss functions, Neuro- computing 634 (2025) 129896

2025

[12] [12]

Q. Xu, S. Lu, W. Jia, C. Jiang, Imbalanced fault diagnosis of rotating machinery via multi-domain feature extraction and cost-sensitive learn- ing, Journal of Intelligent Manufacturing 31 (6) (2020) 1467–1481

2020

[13] [13]

W. Liu, H. Fan, M. Xia, M. Xia, A focal-aware cost-sensitive boosted tree for imbalanced credit scoring, Expert Systems with Applications 208 (2022) 118158

2022

[14] [14]

Z. Liu, W. Cao, Z. Gao, J. Bian, H. Chen, Y. Chang, T.-Y. Liu, Self- paced ensemble for highly imbalanced massive data classification, in: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, 2020, pp. 841–852

2020

[15] [15]

Karakoulas, J

G. Karakoulas, J. Shawe-Taylor, Optimizing classifers for imbalanced training sets, Advances in neural information processing systems 11 (1998). 39

1998

[16] [16]

Viola, M

P. Viola, M. Jones, Fast and robust classification using asymmetric ad- aboost and a detector cascade, Advances in neural information process- ing systems 14 (2001)

2001

[17] [17]

A. A. Khan, O. Chaudhari, R. Chandra, A review of ensemble learning anddataaugmentationmodelsforclassimbalancedproblems: Combina- tion, implementation and evaluation, Expert Systems with Applications 244 (2024) 122778

2024

[18] [18]

Kovács, An empirical comparison and evaluation of minority over- sampling techniques on a large number of imbalanced datasets, Applied Soft Computing 83 (2019) 105662

G. Kovács, An empirical comparison and evaluation of minority over- sampling techniques on a large number of imbalanced datasets, Applied Soft Computing 83 (2019) 105662

2019

[19] [19]

Z. Liu, Z. Li, Z. Yang, T. Wei, J. Kang, Y. Zhu, H. Hamann, J. He, H. Tong, Climb: Class-imbalanced learning benchmark on tabular data, arXiv preprint arXiv:2505.17451 (2025)

work page arXiv 2025

[20] [20]

Tomek, Two modifications of cnn

I. Tomek, Two modifications of cnn. (1976)

1976

[21] [21]

D. L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics (3) (2007) 408–421

2007

[22] [22]

J. Laurikkala, Improving identification of difficult small classes by bal- ancing class distribution, in: Conference on artificial intelligence in medicine in Europe, Springer, 2001, pp. 63–66

2001

[23] [23]

Gazzah, N

S. Gazzah, N. E. B. Amara, New oversampling approaches based on polynomial fitting for imbalanced data sets, in: 2008 the eighth iapr international workshop on document analysis systems, IEEE, 2008, pp. 677–684

2008

[24] [24]

G. E. Batista, R. C. Prati, M. C. Monard, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD explorations newsletter 6 (1) (2004) 20–29

2004

[25] [25]

G. E. Batista, A. L. Bazzan, M. C. Monard, et al., Balancing training data for automated annotation of keywords: a case study., Wob 3 (2003) 10–18. 40

2003

[26] [26]

W. Liu, H. Fan, M. Xia, C. Pang, Predicting and interpreting financial distress using a weighted boosted tree-based tree, Engineering Applica- tions of Artificial Intelligence 116 (2022) 105466

2022

[27] [27]

J. Luo, Y. Quan, S. Xu, Robust-gbdt: leveraging robust loss for noisy and imbalanced classification with gbdt, Knowledge and Information Systems 67 (12) (2025) 12361–12381

2025

[28] [28]

C. Wang, C. Deng, S. Wang, Imbalance-xgboost: leveraging weighted and focal losses for binary label-imbalanced classification with xgboost, Pattern recognition letters 136 (2020) 190–197

2020

[29] [29]

L. M. Manevitz, M. Yousef, One-class svms for document classification, Journal of machine Learning research 2 (Dec) (2001) 139–154

2001

[30] [30]

X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class- imbalance learning, IEEE Transactions on Systems, Man, and Cyber- netics, Part B (Cybernetics) 39 (2) (2008) 539–550

2008

[31] [31]

C. Chen, A. Liaw, L. Breiman, et al., Using random forest to learn imbalanced data, University of California, Berkeley 110 (1-12) (2004) 24

2004

[32] [32]

N. V. Chawla, A. Lazarevic, L. O. Hall, K. W. Bowyer, Smoteboost: Improving prediction of the minority class in boosting, in: Knowledge Discovery in Databases: PKDD 2003: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat- Dubrovnik, Croatia, September 22-26, 2003. Proceedings 7, Springer, 2003, pp. 107–119

2003

[33] [33]

W. Fan, S. J. Stolfo, J. Zhang, P. K. Chan, Adacost: misclassification cost-sensitive boosting, in: Icml, Vol. 99, 1999, pp. 97–105

1999

[34] [34]

Nikpour, F

B. Nikpour, F. Rahmati, B. Mirzaei, H. Nezamabadi-pour, A compre- hensive review on data-level methods for imbalanced data classification, Expert Systems with Applications 295 (2026) 128920

2026

[35] [35]

S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, R. Togneri, Cost- sensitive learning of deep feature representations from imbalanced data, IEEE Transactions on Neural Networks and Learning Systems 29 (8) (2018) 3573–3587. 41

2018

[36] [36]

I. Araf, A. Idri, I. Chairi, Cost-sensitive learning for imbalanced medical data: a review., Artificial Intelligence Review 57 (4) (2024)

2024

[37] [37]

Rezvani, X

S. Rezvani, X. Wang, A broad review on class imbalance learning tech- niques, Applied Soft Computing 143 (2023) 110415

2023

[38] [38]

G.Aguiar, B.Krawczyk, A.Cano, Asurveyonlearningfromimbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework, Machine learning 113 (7) (2024) 4165–4243

2024

[39] [39]

B. Zhu, B. Baesens, A. Backiel, S. K. Vanden Broucke, Benchmarking sampling techniques for imbalance learning in churn prediction, Journal of the Operational Research Society 69 (1) (2018) 49–65

2018

[40] [40]

J.Xiao, Y.Wang, J.Chen, L.Xie, J.Huang, Impactofresamplingmeth- ods and classification models on the imbalanced credit scoring problems, Information Sciences 569 (2021) 508–526

2021

[41] [41]

Wongvorachan, S

T. Wongvorachan, S. He, O. Bulut, A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classifi- cation in educational data mining, Information 14 (1) (2023) 54

2023

[42] [42]

Vanschoren, J

J. Vanschoren, J. N. Van Rijn, B. Bischl, L. Torgo, Openml: networked science in machine learning, ACM SIGKDD Explorations Newsletter 15 (2) (2014) 49–60

2014

[43] [43]

T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowl- edge discovery and data mining, 2016, pp. 785–794

2016

[44] [44]

Borisov, T

V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE transactions on neural networks and learning systems (2022)

2022

[45] [45]

Gorishniy, I

Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, Advances in Neural Information Pro- cessing Systems 34 (2021) 18932–18943

2021

[46] [46]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on typical tabular data?, in: Thirty- sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. 42

2022

[47] [47]

Lin, C.-F

W.-C. Lin, C.-F. Tsai, Y.-H. Hu, J.-S. Jhang, Clustering-based under- sampling in class-imbalanced data, Information Sciences 409 (2017) 17– 26

2017

[48] [48]

Hart, The condensed nearest neighbor rule (corresp.), IEEE transac- tions on information theory 14 (3) (1968) 515–516

P. Hart, The condensed nearest neighbor rule (corresp.), IEEE transac- tions on information theory 14 (3) (1968) 515–516

1968

[49] [49]

Tomek, An experiment with the edited nearest-nieghbor rule

I. Tomek, An experiment with the edited nearest-nieghbor rule. (1976)

1976

[50] [50]

I. Mani, I. Zhang, knn approach to unbalanced data distributions: a case study involving information extraction, in: Proceedings of workshop on learning from imbalanced datasets, Vol. 126, ICML, 2003, pp. 1–7

2003

[51] [51]

Kubat, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the 14th international conference on ma- chine learning, Morgan Kaufmann, 1997, pp

M. Kubat, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the 14th international conference on ma- chine learning, Morgan Kaufmann, 1997, pp. 179–186

1997

[52] [52]

J. A. Sáez, J. Luengo, J. Stefanowski, F. Herrera, Smote–ipf: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291 (2015) 184–203

2015

[53] [53]

Lee, N.-r

J. Lee, N.-r. Kim, J.-H. Lee, An over-sampling technique with rejection for imbalanced class learning, in: Proceedings of the 9th international conference on ubiquitous information management and communication, 2015, pp. 1–6

2015

[54] [54]

Q. Cao, S. Wang, Applying over-sampling technique based on data den- sity and cost-sensitive svm to imbalanced learning, in: 2011 Interna- tional conference on information management, innovation management and industrial engineering, Vol. 2, IEEE, 2011, pp. 543–548

2011

[55] [55]

Ridnik, E

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, L. Zelnik-Manor, Asymmetric loss for multi-label classification, in: Pro- ceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 82–91

2021

[56] [56]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. 43

2017

[57] [57]

Y. Sun, A. K. Wong, M. S. Kamel, Classification of imbalanced data: A review, International journal of pattern recognition and artificial intel- ligence 23 (04) (2009) 687–719

2009

[58] [58]

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, S. Belongie, Class-balanced loss based on effective number of samples, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9268– 9277

2019

[59] [59]

Seiffert, T

C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, A. Napolitano, Rusboost: A hybrid approach to alleviating class imbalance, IEEE transactions on systems, man, and cybernetics-part A: systems and humans 40 (1) (2009) 185–197

2009

[60] [60]

Maclin, D

R. Maclin, D. Opitz, An empirical evaluation of bagging and boosting, AAAI/IAAI 1997 (1997) 546–551

1997

[61] [61]

S. Wang, X. Yao, Diversity analysis on imbalanced data sets by using en- semble models, in: 2009 IEEE symposium on computational intelligence and data mining, IEEE, 2009, pp. 324–331

2009

[62] [62]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next- generation hyperparameter optimization framework, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discov- ery & data mining, 2019, pp. 2623–2631

2019

[63] [63]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learn- ing research 12 (2011) 2825–2830. 44

2011