pith. sign in

arxiv: 2605.18147 · v1 · pith:CTSB7GATnew · submitted 2026-05-18 · 💻 cs.LG

Foundation Models for Credit Risk Prediction: A Game Changer?

Pith reviewed 2026-05-20 13:14 UTC · model grok-4.3

classification 💻 cs.LG
keywords tabular foundation modelscredit riskprobability of defaultloss given defaultmachine learningbenchmarkingsmall datapretraining
0
0 comments X

The pith

Tabular foundation models outperform standard machine learning techniques in credit risk prediction and deliver larger gains as the amount of training data shrinks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests recently introduced tabular foundation models on two central credit risk tasks: estimating the probability that a borrower defaults and the loss that would occur if default happens. These pretrained models are compared against gradient boosting, other advanced machine learning methods, and simpler baselines across multiple real-world datasets and different dataset sizes. The foundation models rank first overall and widen their lead precisely when data becomes scarce, conditions that match many practical lending portfolios. All comparisons use the models exactly as released, with no hyperparameter search or extra tuning. The pattern suggests that broad pretraining on unrelated tabular data can transfer useful structure to credit problems that suffer from limited labels and class imbalance.

Core claim

Tabular foundation models pretrained on large collections of out-of-domain tabular data achieve the highest predictive accuracy for both probability of default and loss given default across the tested datasets and tasks. Their advantage over gradient boosting and other competitors grows markedly as the number of training examples decreases, providing a direct response to the data scarcity, low default rates, and imbalance that have long complicated credit modeling.

What carries the argument

Tabular foundation models that carry knowledge acquired during pretraining on diverse, non-credit tabular datasets into the target tasks of default probability and loss estimation.

Load-bearing premise

Pretraining on data drawn from domains unrelated to lending still supplies useful patterns that improve performance on the structured and often imbalanced tables found in credit risk.

What would settle it

On a new set of small SME or corporate lending datasets, a carefully tuned gradient boosting model would match or exceed the foundation models on standard metrics such as AUC and Brier score.

Figures

Figures reproduced from arXiv: 2605.18147 by Andreas Goethals, Bart Baesens, Christophe Mues, Cristi\'an Bravo, David Martens, Maria Oskarsd\'ottir, Seppe vanden Broucke, Simon De Vos, Stefan Lessmann, Tim Verdonck, Victor Medina-Olivares, Wouter Verbeke.

Figure 1
Figure 1. Figure 1: Distribution of the target variable in the LGD datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: reports the average performance of classification methods in terms of AUC, across the five folds of all PD datasets. A first observation is that TabICL, one of the foundation models considered here, achieves the best performance overall. Although the observed performance differences are small in absolute terms, it is notable that— without any training or hyperparameter optimization—a TFM outperforms widely… view at source ↗
Figure 3
Figure 3. Figure 3: Average [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Probability of maximal AUC (PAMA) analysis for PD datasets. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Probability of maximal R 2 (PAMA) analysis for LGD datasets. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Win/Loss ratio matrix for the PD benchmark (N = 14 datasets). Cell text gives the W/L count from the row method’s perspective; cell color encodes the net win rate (W − L)/N - blue: row method wins more often, red: row method loses more often. An asterisk (*) denotes a statistically significant difference (Holm-corrected Wilcoxon, p ≤ 0.05). 15 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Win/Loss ratio matrix for the LGD benchmark (N = 7 datasets). Cell text gives the W/L count from the row method’s perspective; cell color encodes the net win rate (W − L)/N - blue: row method wins more often, red: row method loses more often. An asterisk (*) denotes a statistically significant difference (Holm-corrected Wilcoxon, p ≤ 0.05). 16 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Spearman rank correlation between a method’s performance rank and PD dataset size (number of observa [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Spearman rank correlation between a method’s performance rank and LGD dataset size (number of obser [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: PD learning curves: average AUC across PD datasets with more than 15,000 observations as the number [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: LGD learning curves: average R 2 across LGD datasets with more than 15,000 observations as the number of randomly sampled training observations increases from 500 to 15,000. Curves are shown for TabPFNv2, Linear Regression, and XGBoost. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
read the original abstract

Predictive models play a pivotal role in credit risk management, guiding critical decisions through accurate estimation of default probabilities and losses. Extensive research has introduced new modeling techniques, complemented by large-scale benchmarking studies consolidating the state-of-the-art. Today, quasi-standards such as gradient-boosting models paired with SHAP explainers have emerged, yet continuous improvement of risk models remains a top priority. Concurrently, rapid advancements in AI, most notably large language models, have disrupted predictive modeling paradigms. Foundation models, pretrained on extensive datasets from diverse domains, have demonstrated remarkable performance by leveraging prior knowledge. While prevalent in natural language processing and computer vision, foundation models for tabular data have only recently emerged. We conjecture that pretraining on out-of-domain data is particularly beneficial in small-data settings, such as SME lending or specialized corporate portfolios, and may help address longstanding challenges including low default portfolios and class imbalance. This paper benchmarks recently proposed tabular foundation models against a broad set of competitors, including established and advanced machine learning techniques, across two core tasks: PD and LGD modeling. Our evaluation encompasses various datasets, performance indicators, and experimental conditions. We find that tabular foundation models generally perform best across datasets and tasks. Moreover, they offer significant improvement in predictive performance as dataset size shrinks. These results are remarkable given that the models are tested out-of-the-box, without hyperparameter tuning, ensuring ease of use and mitigating computational costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript benchmarks recently proposed tabular foundation models against established and advanced machine learning techniques for two core credit-risk tasks: probability of default (PD) and loss given default (LGD) prediction. It reports that tabular foundation models achieve the highest performance across multiple datasets and tasks and deliver larger gains as training-set size decreases, even when applied out-of-the-box without hyperparameter tuning. The authors conjecture that out-of-domain pretraining is especially helpful in small-data regimes such as SME lending and for handling class imbalance and low-default portfolios.

Significance. If the empirical claims are substantiated with complete experimental details, the work could meaningfully influence credit-risk modeling practice by demonstrating that pretrained tabular models can outperform gradient-boosting baselines in data-scarce settings without additional tuning. The study also supplies a timely, broad comparison that consolidates the current state of tabular foundation models on financial tasks.

major comments (3)
  1. [Section 4] Section 4 (Experimental Setup): the manuscript does not specify the exact datasets employed, their sizes, sources, or the precise train/validation/test splits used for each size-reduction experiment. Without these details it is impossible to reproduce the reported performance curves or to judge whether the claimed advantage in small-data regimes is robust.
  2. [Section 5.1 and Table 3] Section 5.1 and Table 3: no statistical significance tests, confidence intervals, or error bars are reported for the performance differences between foundation models and baselines. The central claim that foundation models “generally perform best” therefore rests on point estimates whose variability cannot be assessed.
  3. [Section 4.3] Section 4.3 (Handling of class imbalance): the paper states that class imbalance is a longstanding challenge yet provides no description of the loss functions, sampling strategies, or evaluation metrics (e.g., AUC-PR versus AUC-ROC) used to mitigate or measure its effect. This omission directly affects interpretation of the LGD and low-default results.
minor comments (3)
  1. [Abstract and Section 4] The abstract claims results are “remarkable given that the models are tested out-of-the-box,” but the manuscript never states whether the competing gradient-boosting and neural-network baselines were also run without tuning or with default hyperparameters; this comparison should be clarified.
  2. [Figure 2] Figure 2 (performance vs. dataset size) uses different y-axis scales across panels; consistent scaling or explicit annotation of the metric would improve readability.
  3. [Section 2] A few citations to recent tabular foundation-model papers (e.g., TabPFN, TabTransformer variants) appear to be missing from the related-work section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important aspects of reproducibility and statistical rigor. We address each major comment point by point below and outline the planned revisions.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Experimental Setup): the manuscript does not specify the exact datasets employed, their sizes, sources, or the precise train/validation/test splits used for each size-reduction experiment. Without these details it is impossible to reproduce the reported performance curves or to judge whether the claimed advantage in small-data regimes is robust.

    Authors: We agree that the current description lacks sufficient detail for full reproducibility. In the revised manuscript we will add a dedicated subsection or appendix table that lists every dataset by name, source (public repositories or anonymized financial sources), original sample size, feature count, and the exact train/validation/test split ratios. For the size-reduction experiments we will explicitly describe the subsampling procedure, including whether stratified sampling was used and the precise percentages or absolute sizes retained at each reduction level. revision: yes

  2. Referee: [Section 5.1 and Table 3] Section 5.1 and Table 3: no statistical significance tests, confidence intervals, or error bars are reported for the performance differences between foundation models and baselines. The central claim that foundation models “generally perform best” therefore rests on point estimates whose variability cannot be assessed.

    Authors: The referee is correct that variability measures would strengthen the claims. Although the experiments used fixed random seeds for reproducibility, we did not report results across multiple independent runs. In the revision we will recompute the main results in Table 3 and Section 5.1 over at least five random seeds, add error bars or standard deviations, and include paired statistical tests (e.g., Wilcoxon signed-rank) for the key comparisons between foundation models and the strongest baselines. revision: yes

  3. Referee: [Section 4.3] Section 4.3 (Handling of class imbalance): the paper states that class imbalance is a longstanding challenge yet provides no description of the loss functions, sampling strategies, or evaluation metrics (e.g., AUC-PR versus AUC-ROC) used to mitigate or measure its effect. This omission directly affects interpretation of the LGD and low-default results.

    Authors: We accept that the handling of class imbalance requires explicit description. In the revised Section 4.3 we will specify that for PD tasks we used class-weighted binary cross-entropy loss together with AUC-PR as the primary evaluation metric, while for LGD (a regression task) we report RMSE and MAE without additional sampling. We will also note any oversampling or threshold-tuning steps applied to low-default portfolios. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical benchmarking study that reports performance of tabular foundation models versus baselines on PD and LGD tasks across datasets. The strongest claims rest on observed metrics from out-of-the-box evaluation rather than any derivation, fitted parameter renamed as prediction, or self-referential equation. No mathematical chain, ansatz, or uniqueness theorem is invoked; the conjecture about out-of-domain pretraining is presented as background motivation, not a load-bearing premise that reduces the reported results to inputs by construction. The paper is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that out-of-domain pretraining transfers usefully to tabular credit data in low-sample regimes; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Pretraining on out-of-domain data is particularly beneficial in small-data settings such as SME lending or specialized corporate portfolios
    Explicitly stated as the central conjecture motivating the benchmark.

pith-pipeline@v0.9.0 · 5836 in / 1212 out tokens · 46054 ms · 2026-05-20T13:14:00.661732+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

165 extracted references · 165 canonical work pages · 4 internal anchors

  1. [1]

    Foundation models: A new paradigm for artificial intelligence

    Johannes Schneider, Christian Meske, and Pauline Kuss. Foundation models: A new paradigm for artificial intelligence. Business & Information Systems Engineering, 66 0 (2): 0 221--231, 2024

  2. [3]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748--8763. PmLR, 2021

  3. [4]

    Foundation models defining a new era in vision: a survey and outlook

    Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  4. [5]

    Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research

    Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C Thomas. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European journal of operational research, 247 0 (1): 0 124--136, 2015

  5. [6]

    Deep learning for credit scoring: Do or don’t? European Journal of Operational Research, 295 0 (1): 0 292--305, 2021

    Bj \"o rn Rafn Gunnarsson, Seppe Vanden Broucke, Bart Baesens, Mar \' a \'O skarsd \'o ttir, and Wilfried Lemahieu. Deep learning for credit scoring: Do or don’t? European Journal of Operational Research, 295 0 (1): 0 292--305, 2021

  6. [7]

    Tabular data: Deep learning is not all you need

    Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81: 0 84--90, 2022 a

  7. [9]

    u ller, Lennart Purucker, Arjun Krishnakumar, Max K \

    Noah Hollmann, Samuel M \"u ller, Lennart Purucker, Arjun Krishnakumar, Max K \"o rfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637 0 (8045): 0 319--326, 2025 a

  8. [10]

    The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics

    Mar \' a \'O skarsd \'o ttir, Cristi \'a n Bravo, Carlos Sarraute, Jan Vanthienen, and Bart Baesens. The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing, 74: 0 26--39, 2019

  9. [11]

    Fairness in credit scoring: Assessment, implementation and profit implications

    Nikita Kozodoi, Johannes Jacob, and Stefan Lessmann. Fairness in credit scoring: Assessment, implementation and profit implications. European Journal of Operational Research, 297 0 (3): 0 1083--1094, 2022

  10. [12]

    Algorithmic decision making methods for fair credit scoring

    Darie Moldovan. Algorithmic decision making methods for fair credit scoring. IEEE Access, 11: 0 59729--59743, 2023

  11. [13]

    Credit risk analytics: Measurement techniques, applications, and examples in SAS

    Bart Baesens, Daniel Roesch, and Harald Scheule. Credit risk analytics: Measurement techniques, applications, and examples in SAS. John Wiley & Sons, 2016

  12. [14]

    Benchmarking state-of-the-art classification algorithms for credit scoring

    Bart Baesens, Tony Van Gestel, Stijn Viaene, Maria Stepanova, Johan Suykens, and Jan Vanthienen. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the operational research society, 54 0 (6): 0 627--635, 2003

  13. [15]

    Benchmarking regression algorithms for loss given default modeling

    Gert Loterman, Iain Brown, David Martens, Christophe Mues, and Bart Baesens. Benchmarking regression algorithms for loss given default modeling. International Journal of Forecasting, 28 0 (1): 0 161--170, 2012

  14. [17]

    An experimental comparison of classification algorithms for imbalanced credit scoring data sets

    Iain Brown and Christophe Mues. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert systems with applications, 39 0 (3): 0 3446--3453, 2012

  15. [18]

    On the suitability of resampling techniques for the class imbalance problem in credit scoring

    Ana Isabel Marqu \'e s, Vicente Garc \' a, and Jos \'e Salvador S \'a nchez. On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64 0 (7): 0 1060--1070, 2013

  16. [21]

    Reject inference, augmentation, and sample selection

    John Banasik and Jonathan Crook. Reject inference, augmentation, and sample selection. European Journal of Operational Research, 183 0 (3): 0 1582--1594, 2007

  17. [22]

    Fighting sampling bias: A framework for training and evaluating credit scoring models

    Nikita Kozodoi, Stefan Lessmann, Morteza Alamgir, Luis Moreira-Matias, and Konstantinos Papakonstantinou. Fighting sampling bias: A framework for training and evaluating credit scoring models. European Journal of Operational Research, 324 0 (2): 0 616--628, 2025

  18. [23]

    Loss given default models incorporating macroeconomic variables for credit cards

    Tony Bellotti and Jonathan Crook. Loss given default models incorporating macroeconomic variables for credit cards. International Journal of Forecasting, 28 0 (1): 0 171--182, 2012

  19. [24]

    The devil in the details: Dynamic prediction of loan portfolio profitability with macroeconomic drivers through multi-state modelling

    Viani B Djeundje, Jonathan Crook, and Galina Andreeva. The devil in the details: Dynamic prediction of loan portfolio profitability with macroeconomic drivers through multi-state modelling. European Journal of Operational Research, 2025

  20. [27]

    A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers

    Lyn C Thomas. A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International journal of forecasting, 16 0 (2): 0 149--172, 2000

  21. [28]

    Linear and nonlinear credit scoring by combining logistic regression and support vector machines

    Tony Van Gestel, Bart Baesens, Peter Van Dijcke, Johan Suykens, Joao Garcia, and Thomas Alderweireld. Linear and nonlinear credit scoring by combining logistic regression and support vector machines. Journal of credit Risk, 1 0 (4), 2005

  22. [33]

    P2p network lending, loss given default and credit risks

    Guangyou Zhou, Yijia Zhang, and Sumei Luo. P2p network lending, loss given default and credit risks. Sustainability, 10 0 (4): 0 1010, 2018. ISSN 2071-1050

  23. [46]

    Credit scoring for profitability objectives

    Steven Finlay. Credit scoring for profitability objectives. European Journal of Operational Research, 202 0 (2): 0 528–537, 2010

  24. [66]

    Shapley values as an interpretability technique in credit scoring

    Hendrik Andries du Toit, Willem Dani \ A , Helgard Raubenheimer, et al. Shapley values as an interpretability technique in credit scoring. Journal of Risk Model Validation, 2023

  25. [76]

    The fairness of credit scoring models

    Christophe Hurlin, Christophe Pérignon, and Sébastien Saurin. The fairness of credit scoring models. Management Science, 2025. doi:10.1287/mnsc.2022.03888

  26. [80]

    Why do tree-based models still outperform deep learning on typical tabular data? In Advances in Neural Information Processing Systems, 2022

    Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In Advances in Neural Information Processing Systems, 2022

  27. [82]

    Deep Learning in Banking: Integrating Artificial Intelligence for Next-Generation Financial Services

    Cristian Bravo, Sebastian Maldonado, and Maria Oskarsdottir. Deep Learning in Banking: Integrating Artificial Intelligence for Next-Generation Financial Services. John Wiley & Sons, 2026

  28. [84]

    Revisiting deep learning models for tabular data

    Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. Revisiting deep learning models for tabular data. Advances in neural information processing systems, 34: 0 18932--18943, 2021

  29. [89]

    VIME: extending the success of self- and semi-supervised learning to tabular domain

    Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. VIME: extending the success of self- and semi-supervised learning to tabular domain. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria - Florina Balcan, and Hsuan - Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information ...

  30. [90]

    Subtab: Subsetting features of tabular data for self-supervised representation learning

    Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. Subtab: Subsetting features of tabular data for self-supervised representation learning. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Proces...

  31. [92]

    Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David A. Sontag. Tabllm: Few-shot classification of tabular data with large language models. In Francisco J. R. Ruiz, Jennifer G. Dy, and Jan - Willem van de Meent, editors, International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Con...

  32. [93]

    Tabpfn: A transformer that solves small tabular classification problems in a second

    Noah Hollmann, Samuel M \" u ller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023. URL https://openreview.net/forum?id=cp5PvcI6w8\_

  33. [94]

    Transformers can do bayesian inference

    Samuel M \" u ller, Noah Hollmann, Sebastian Pineda - Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022. URL https://openreview.net/forum?id=KSugKcbNf9

  34. [97]

    TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

    Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölk...

  35. [99]

    Thomas, David B

    Lyn C. Thomas, David B. Edelman, and Jonathan N. Crook. Credit Scoring and its Applications. Siam, Philadelphia, 2002

  36. [102]

    Statistical comparisons of classifiers over multiple data sets

    Janez Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7: 0 1–30, 2006

  37. [103]

    Individual comparisons by ranking methods

    Frank Wilcoxon. Individual comparisons by ranking methods. Biometrics bulletin, 1 0 (6): 0 80--83, 1945

  38. [104]

    Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15 0 (90): 0 3133--3181, 2014

    Manuel Fern \'a ndez-Delgado, Eva Cernadas, Sen \'e n Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15 0 (90): 0 3133--3181, 2014. URL http://jmlr.org/papers/v15/delgado14a.html

  39. [105]

    A comparison of alternative tests of significance for the problem of m rankings

    Milton Friedman. A comparison of alternative tests of significance for the problem of m rankings. The annals of mathematical statistics, 11 0 (1): 0 86--92, 1940

  40. [106]

    Approximations of the critical region of the fbietkan statistic

    Ronald L Iman and James M Davenport. Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods, 9 0 (6): 0 571--595, 1980

  41. [107]

    A simple sequentially rejective multiple test procedure

    Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65--70, 1979

  42. [108]

    statistical comparisons of classifiers over multiple data sets

    Salvador Garcia and Francisco Herrera. An extension on" statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. Journal of Machine Learning Research, 9 0 (12), 2008

  43. [109]

    2026 , publisher=

    Deep Learning in Banking: Integrating Artificial Intelligence for Next-Generation Financial Services , author=. 2026 , publisher=

  44. [110]

    Applied Soft Computing , volume=

    The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics , author=. Applied Soft Computing , volume=. 2019 , publisher=

  45. [111]

    European Journal of Operational Research , volume=

    Deep learning for credit scoring: Do or don’t? , author=. European Journal of Operational Research , volume=. 2021 , publisher=

  46. [112]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Foundation models defining a new era in vision: a survey and outlook , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  47. [113]

    Journal of the operational research society , volume=

    Benchmarking state-of-the-art classification algorithms for credit scoring , author=. Journal of the operational research society , volume=. 2003 , publisher=

  48. [114]

    2016 , publisher=

    Credit risk analytics: Measurement techniques, applications, and examples in SAS , author=. 2016 , publisher=

  49. [115]

    Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? , journal =

    Manuel Fern. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? , journal =. 2014 , volume =

  50. [116]

    Journal of Machine Learning Research , volume =

    Demšar, Janez , title =. Journal of Machine Learning Research , volume =. 2006 , type =

  51. [117]

    TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

    Qu, Jingang and Holzmüller, David and Varoquaux, Gaël and Le Morvan, Marine , title =. arXiv preprint , volume =. doi:10.48550/arXiv.2502.05564 , year =

  52. [118]

    ArXiv preprint , volume =

    Ye, Han-Jia and Liu, Si-Yang and Chao, Wei-Lun , title =. ArXiv preprint , volume =. doi:10.48550/arXiv.2502.17361 , year =

  53. [119]

    Takuya Akiba and Shotaro Sano and Toshihiko Yanase and Takeru Ohta and Masanori Koyama , editor =. Optuna:. Proceedings of the 25th. 2019 , url =. doi:10.1145/3292500.3330701 , timestamp =

  54. [120]

    European Journal of Operational Research , volume =

    Zandi, Sahab and Korangi, Kamesh and Óskarsdóttir, María and Mues, Christophe and Bravo, Cristián , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2024.09.025 , year =

  55. [121]

    Omega , volume =

    Óskarsdóttir, María and Bravo, Cristián , title =. Omega , volume =. doi:10.1016/j.omega.2021.102520 , year =

  56. [122]

    European Journal of Operational Research , volume =

    Calabrese, Raffaella and Crook, Jonathan , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2020.04.031 , year =

  57. [123]

    European Journal of Operational Research , volume =

    Medina-Olivares, Victor and Lindgren, Finn and Calabrese, Raffaella and Crook, Jonathan , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2025.07.060 , year =

  58. [124]

    European Journal of Operational Research , volume =

    Shi, Yong and Qu, Yi and Chen, Zhensong and Mi, Yunlong and Wang, Yunong , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2023.12.028 , year =

  59. [125]

    European Journal of Operational Research , volume =

    Li, Yibei and Wang, Ximei and Djehiche, Boualem and Hu, Xiaoming , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2020.03.078 , year =

  60. [126]

    Journal of the Operational Research Society , pages =

    De Cnudde, Sofie and Moeyersoms, Julie and Stankova, Marija and Tobback, Ellen and Javaly, Vinayak and Martens, David , title =. Journal of the Operational Research Society , pages =. doi:10.1080/01605682.2018.1434402 , year =

  61. [127]

    and Crook, Jonathan and Calabrese, Raffaella and Hamid, Mona , title =

    Djeundje, Viani B. and Crook, Jonathan and Calabrese, Raffaella and Hamid, Mona , title =. Expert Systems with Applications , volume =. doi:10.1016/j.eswa.2020.113766 , year =

  62. [128]

    Journal of Business and Economic Statistics , volume =

    Dirick, Lore and Bellotti, Tony and Claeskens, Gerda and Baesens, Bart , title =. Journal of Business and Economic Statistics , volume =. doi:10.1080/07350015.2016.1260471 , year =

  63. [129]

    and De Caigny, Arno and Lessmann, Stefan , title =

    Mena, Gary and Coussement, Kristof and De Bock, Koen W. and De Caigny, Arno and Lessmann, Stefan , title =. Annals of Operations Research , volume =. doi:10.1007/s10479-023-05259-9 , year =

  64. [130]

    Advances in neural information processing systems , volume=

    Revisiting deep learning models for tabular data , author=. Advances in neural information processing systems , volume=

  65. [131]

    ArXiv preprint , volume =

    Jiang, Jun-Peng and Liu, Si-Yang and Cai, Hao-Run and Zhou, Qile and Ye, Han-Jia , title =. ArXiv preprint , volume =. doi:10.48550/arXiv.2504.16109 , year =

  66. [132]

    European Journal of Operational Research , volume =

    Korangi, Kamesh and Mues, Christophe and Bravo, Cristián , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2022.10.032 , year =

  67. [133]

    European Journal of Operational Research , volume =

    Kriebel, Johannes and Stitz, Lennart , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2021.12.024 , year =

  68. [134]

    European Journal of Operational Research , volume =

    Stevenson, Matthew and Mues, Christophe and Bravo, Cristián , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2021.03.008 , year =

  69. [135]

    European Journal of Operational Research , volume =

    Wu, Zongxiao and Dong, Yizhe and Li, Yaoyiran and Shi, Baofeng , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2025.04.032 , year =

  70. [136]

    Information Systems Research , ISSN =

    Fu, Runshan and Huang, Yan and Singh, Param Vir , title =. Information Systems Research , ISSN =. doi:10.1287/isre.2020.0990 , year =

  71. [137]

    The Journal of Finance , volume =

    Fuster, Andreas and Goldsmith-Pinkham, Paul and Ramadorai, Tarun and Walther, Ansgar , title =. The Journal of Finance , volume =. doi:10.1111/jofi.13090 , year =

  72. [138]

    Management Science , DOI =

    Hurlin, Christophe and Pérignon, Christophe and Saurin, Sébastien , title =. Management Science , DOI =. 2025 , type =

  73. [139]

    European Journal of Operational Research , volume =

    Kraus, Mathias and Tschernutter, Daniel and Weinzierl, Sven and Zschech, Patrick , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2023.06.032 , year =

  74. [140]

    European Journal of Operational Research , volume =

    Carrizosa, Emilio and Kurishchenko, Kseniia and Romero Morales, Dolores , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2025.01.008 , year =

  75. [141]

    European Journal of Operational Research , volume =

    Borgonovo, Emanuele and Plischke, Elmar and Rabitti, Giovanni , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2024.06.023 , year =

  76. [142]

    European Journal of Operational Research , volume =

    Zografopoulos, Lazaros and Iannino, Maria Chiara and Psaradellis, Ioannis and Sermpinis, Georgios , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2024.08.032 , year =

  77. [143]

    IEEE Transactions on Neural Networks and Learning Systems , volume =

    Medina-Olivares, Victor and Lessmann, Stefan and Klein, Nadja , title =. IEEE Transactions on Neural Networks and Learning Systems , volume =. doi:10.1109/TNNLS.2024.3398559 , year =

  78. [144]

    European Journal of Operational Research , volume =

    Tu, Jiancheng and Wu, Zhibin , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2024.10.046 , year =

  79. [145]

    , title =

    De Caigny, Arno and Coussement, Kristof and De Bock, Koen W. , title =. European Journal of Operational Research , volume =. doi:10.1016/j.ejor.2018.02.009 , year =

  80. [146]

    2015 , journal =

    LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey , title =. Nature , volume =. doi:10.1038/nature14539 , year =

Showing first 80 references.