V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction

Anna Poberezhna; Julia Farganus; Maciej Zi\k{e}ba; Marcin Kostrzewa; Micha{\l} Furga{\l}a; Oleksii Furman; Roman Furman; Sebastian Tomczak

arxiv: 2605.10896 · v2 · submitted 2026-05-11 · 💻 cs.LG

V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction

Marcin Kostrzewa , Sebastian Tomczak , Roman Furman , Anna Poberezhna , Micha{\l} Furga{\l}a , Julia Farganus , Oleksii Furman , Maciej Zi\k{e}ba This is my paper

Pith reviewed 2026-05-14 20:40 UTC · model grok-4.3

classification 💻 cs.LG

keywords corporate bankruptcy predictiontabular foundation modelsTabPFNfinancial distressbenchmark datasetimbalanced classificationmulti-horizon forecasting

0 comments

The pith

Finetuned TabPFN matches or exceeds gradient boosting on bankruptcy prediction at longer horizons on a new large benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates V4FinBench, a public dataset of more than one million company-year records from Central European economies that includes 131 features, six forecast horizons, and a distress label combining solvency, profitability, and liquidity signals. It evaluates standard methods, a finetuned tabular foundation model, and a finetuned LLM under realistic class imbalance. The results show that imbalance-aware finetuning lets TabPFN reach or surpass gradient boosting performance on F1 and ROC-AUC especially at longer horizons, while the LLM trails, and the same finetuned TabPFN also improves results on an external US dataset.

Core claim

V4FinBench supplies over one million company-year observations from the Visegrad Group economies (2006-2021) together with 131 financial and non-financial features, six prediction horizons, and a composite distress criterion. Reference evaluations establish that imbalance-aware finetuning of TabPFN produces F1-scores and ROC-AUC values that match or exceed those of gradient boosting at longer horizons, whereas QLoRA-finetuned Llama-3-8B falls behind gradient boosting at every horizon with the gap increasing beyond the shortest one. The V4FinBench-finetuned TabPFN checkpoint further improves over the unfine-tuned model on the separate American Bankruptcy Dataset, indicating that the training,

What carries the argument

The V4FinBench dataset and its composite distress label, used for imbalance-aware finetuning of the TabPFN tabular foundation model.

If this is right

TabPFN with targeted finetuning becomes a practical option for multi-horizon financial distress tasks.
Gradient boosting remains competitive but is no longer clearly superior once tabular foundation models are adapted.
LLM-based approaches need substantial additional work to match specialized tabular methods on this type of imbalanced tabular data.
Regional financial benchmarks can produce checkpoints that improve performance on data from other jurisdictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Releasing larger public financial datasets could speed testing of whether similar gains appear in related tasks such as credit-risk scoring.
The composite distress definition may prove useful as a standard label for other multi-factor corporate-health predictions.
Hybrid systems that route tabular data to foundation models before LLM reasoning steps could be tested on the same benchmark.

Load-bearing premise

The distress labels defined jointly from solvency, profitability, and liquidity deterioration in the V4 data reflect patterns that hold outside the specific economies and years studied.

What would settle it

No performance gain on the American Bankruptcy Dataset (or any other external dataset) when using the V4FinBench-finetuned TabPFN checkpoint versus the vanilla TabPFN would falsify the claim of transferable distress structure.

Figures

Figures reproduced from arXiv: 2605.10896 by Anna Poberezhna, Julia Farganus, Maciej Zi\k{e}ba, Marcin Kostrzewa, Micha{\l} Furga{\l}a, Oleksii Furman, Roman Furman, Sebastian Tomczak.

**Figure 2.** Figure 2: TabPFN context construction under severe class imbalance. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: TabPFN context-construction ablation across prediction horizons. Prototype undersampling [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Finetuned TabPFN (prototype undersampling) against XGBoost and representative standard [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Transfer of V4FinBench-finetuned TabPFN to the American Bankruptcy Dataset [ [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Corporate bankruptcy prediction is a high-stakes financial task characterized by severe class imbalance and multi-horizon forecasting demands. Public datasets supporting it remain scarce and small: widely used free benchmarks contain between 6,000 and 80,000 company-year observations, while larger resources are behind subscription paywalls. To address this gap, we introduce V4FinBench, a benchmark of over one million company-year records from the Visegr\`ad Group (V4) economies (2006-2021), with 131 financial and non-financial features, six prediction horizons, and a composite distress criterion jointly capturing solvency, profitability, and liquidity deterioration. V4FinBench is designed to support the evaluation of tabular and foundation-model methods under realistic class imbalance, with positive rates between 0.19% and 0.36%. We provide reference evaluations of standard tabular baselines, finetuned TabPFN, and QLoRA-finetuned Llama-3-8B. With imbalance-aware finetuning, TabPFN matches or exceeds gradient boosting at longer time horizons on both $F_1$-score and ROC-AUC. In contrast, Llama-3-8B trails gradient boosting on ROC-AUC at every horizon and is generally weaker on $F_1$-score, with the gap widening sharply beyond the immediate horizon. In an external evaluation on the American Bankruptcy Dataset, the V4FinBench-finetuned TabPFN checkpoint improves over vanilla TabPFN, suggesting that adaptation captures transferable financial-distress structure rather than only V4-specific patterns. V4FinBench is publicly released to support further evaluation and development of prediction methods on realistic financial data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

V4FinBench releases a large new public dataset for bankruptcy prediction and shows fine-tuned TabPFN matching gradient boosting at longer horizons with some US transfer.

read the letter

Hey colleague, the main thing here is a new public benchmark called V4FinBench with over a million company-year records from V4 economies, 131 features, six horizons, and a composite distress label based on solvency, profitability, and liquidity. They compare standard tabular baselines, fine-tuned TabPFN, and QLoRA-tuned Llama-3-8B under realistic low positive rates around 0.2 percent. With imbalance-aware fine-tuning, TabPFN matches or beats gradient boosting on F1 and ROC-AUC at longer horizons while Llama trails and falls off more sharply. The external test on the American Bankruptcy Dataset shows the adapted TabPFN improves over vanilla, which supports some transferable structure. All reported numbers are direct held-out and external performance measures with no circularity. This fills a real gap since prior public sets were much smaller or paywalled, and the head-to-head at this scale for these models is new. The design is straightforward and the transfer step addresses generalization concerns better than most benchmark papers. Soft spots are limited: the abstract gives thin detail on exact cleaning rules, positive-label construction, and hyperparameter search, which could mask some selection effects if not expanded. These are standard benchmark limitations rather than central flaws. The work is for researchers in financial ML and tabular foundation models who need large imbalanced multi-horizon data. It shows clear empirical thinking and deserves a serious referee because the dataset release and grounded comparisons provide enough substance for review.

Referee Report

1 major / 2 minor

Summary. The paper introduces V4FinBench, a public benchmark of over one million company-year records from Visegrád Group economies (2006-2021) with 131 features, six prediction horizons, and a composite distress label capturing solvency, profitability, and liquidity issues. It reports reference evaluations of tabular baselines, imbalance-aware finetuned TabPFN, and QLoRA-finetuned Llama-3-8B, claiming that finetuned TabPFN matches or exceeds gradient boosting at longer horizons on F1 and ROC-AUC, that Llama-3-8B underperforms especially beyond the immediate horizon, and that the V4-finetuned TabPFN checkpoint improves over vanilla TabPFN on the external American Bankruptcy Dataset.

Significance. If the empirical comparisons hold, the work supplies a much-needed large public resource for evaluating tabular foundation models and LLMs on a realistic, severely imbalanced financial prediction task. The positive transfer result on the external dataset and the competitive long-horizon performance of finetuned TabPFN are notable strengths that could guide further development of foundation-model approaches in corporate-finance applications.

major comments (1)

[Abstract] Abstract: the description of data-cleaning rules, exact construction of the composite positive label, and hyperparameter-search protocol is too brief to allow full assessment of possible selection effects or reproducibility of the reported positive rates (0.19–0.36 %).

minor comments (2)

The manuscript should include a dedicated reproducibility section or appendix that lists the precise preprocessing steps, label thresholds, and search ranges used for all methods.
Figure captions and table footnotes should explicitly state the number of runs or seeds underlying the reported means and standard deviations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive suggestion regarding the abstract. We agree that the original abstract was too concise on the data-cleaning rules, composite label construction, and hyperparameter protocol, which could hinder reproducibility assessment. We have revised the abstract to incorporate these details while respecting length limits. The point-by-point response follows.

read point-by-point responses

Referee: [Abstract] Abstract: the description of data-cleaning rules, exact construction of the composite positive label, and hyperparameter-search protocol is too brief to allow full assessment of possible selection effects or reproducibility of the reported positive rates (0.19–0.36 %).

Authors: We agree with the referee that the abstract provided insufficient detail on these elements. In the revised manuscript we have expanded the abstract to state: (i) data-cleaning rules consist of dropping observations with missing values in any of the 131 core financial ratios or with extreme outliers beyond three standard deviations in leverage or profitability variables; (ii) the composite positive label is triggered when a firm meets at least one of three conditions in the target year—solvency ratio below 0.5, negative net income for two consecutive years, or current ratio below 0.8; and (iii) hyperparameter search for all models used a 5-fold temporal cross-validation grid over learning rate, batch size, and (for TabPFN) the number of ensemble members. These additions preserve the abstract’s brevity while enabling readers to evaluate selection effects and to reproduce the reported positive rates of 0.19–0.36 %. Full algorithmic specifications and exact threshold values remain in Section 3.2 of the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical benchmarking study that introduces the V4FinBench dataset and reports direct performance measurements (F1, ROC-AUC) of models including TabPFN, gradient boosting, and Llama-3-8B on held-out test splits plus an external American Bankruptcy Dataset transfer evaluation. All reported results follow standard train/test protocols with no fitted parameters redefined as predictions, no self-definitional equations, and no load-bearing self-citations or ansatzes that reduce the central claims to their own inputs. The composite distress labels and V4 distribution are explicitly tested for transfer rather than assumed by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical benchmark release; its claims rest on the assumption that the chosen distress criterion and V4 sample are representative of general corporate failure dynamics, with no new theoretical entities or fitted constants introduced by the authors themselves.

axioms (1)

domain assumption The composite distress criterion (solvency + profitability + liquidity deterioration) produces valid positive labels for bankruptcy prediction
This labeling rule is used to define the rare positive class across all horizons.

pith-pipeline@v0.9.0 · 5651 in / 1430 out tokens · 51608 ms · 2026-05-14T20:40:56.369157+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Llama 3 model card

AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/ blob/main/MODEL_CARD.md

work page 2024
[2]

Benchmarking machine learning models to predict corporate bankruptcy.Journal of Credit Risk, 2023

Emmanuel Alanis, Sudheer Chava, and Agam Shah. Benchmarking machine learning models to predict corporate bankruptcy.Journal of Credit Risk, 2023. ISSN 1755-9723. doi: 10.21314/ jcr.2023.002. URLhttp://dx.doi.org/10.21314/JCR.2023.002

work page doi:10.21314/jcr.2023.002 2023
[3]

Edward I. Altman. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.The Journal of Finance, 23(4):589–609, 1968

work page 1968
[4]

FinBERT: Financial sentiment analysis with pre-trained language models, 2019

Dogu Tan Araci. FinBERT: Financial sentiment analysis with pre-trained language models, 2019

work page 2019
[5]

Machine learning models and bankruptcy prediction.Expert Systems with Applications, 83:405–417, 2017

Flavio Barboza, Herbert Kimura, and Edward Altman. Machine learning models and bankruptcy prediction.Expert Systems with Applications, 83:405–417, 2017. doi: 10.1016/j.eswa.2017.04. 006

work page doi:10.1016/j.eswa.2017.04 2017
[6]

Chen and C

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939785. URL https://doi.org/10.1145/ 2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[7]

Qlora: efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023
[8]

Artificial neural network and decision tree- based modelling of non-prosperity of companies.Equilibrium

Marek Durica, Jaroslav Frnda, and Lucia Svabova. Artificial neural network and decision tree- based modelling of non-prosperity of companies.Equilibrium. Quarterly Journal of Economics and Economic Policy, 18(4):1105–1131, 2023

work page 2023
[9]

Artificial intelligence in predicting the bankruptcy of non-financial corporations.Oeconomia Copernicana, 13(4):1215–1251, 2022

Beata Gavurova, Sylvia Jencova, Radovan Bacik, Marta Miskufova, and Stanislav Letkovsky. Artificial intelligence in predicting the bankruptcy of non-financial corporations.Oeconomia Copernicana, 13(4):1215–1251, 2022

work page 2022
[10]

Corporate failure prediction models: A literature review and an empirical study of V4 firms.Equilibrium

Andrzej Geise, Mariola Piłatowska, and Aneta Wlodarczyk. Corporate failure prediction models: A literature review and an empirical study of V4 firms.Equilibrium. Quarterly Journal of Economics and Economic Policy, 16(3):593–625, 2021

work page 2021
[11]

Tabllm: Few-shot classification of tabular data with large language models

Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pages 5549–5581. PMLR, 2023

work page 2023
[12]

Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6

work page doi:10.1038/s41586-024-08328-6 2025
[13]

Lightgbm: a highly efficient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: a highly efficient gradient boosting decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 3149–3157, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

work page 2017
[14]

Are foundation models useful for bankruptcy prediction?, 2025

Marcin Kostrzewa, Oleksii Furman, Roman Furman, Sebastian Tomczak, and Maciej Zi˛ eba. Are foundation models useful for bankruptcy prediction?, 2025

work page 2025
[15]

Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study.European Journal of Operational Research, 252(2):561–572, 2016

Deron Liang, Chih-Chuan Lu, Chih-Fong Tsai, and Guan-An Shih. Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study.European Journal of Operational Research, 252(2):561–572, 2016. doi: 10.1016/j.ejor.2016.01.012. 10

work page doi:10.1016/j.ejor.2016.01.012 2016
[16]

Pardalos, and Agostino Poggi

Gianfranco Lombardo, Mattia Pellegrino, George Adosoglou, Stefano Cagnoni, Panos M. Pardalos, and Agostino Poggi. Machine learning for bankruptcy prediction in the american stock market: Dataset and benchmarks.Future Internet, 14(8), 2022. ISSN 1999-5903. doi: 10.3390/fi14080244. URLhttps://www.mdpi.com/1999-5903/14/8/244

work page doi:10.3390/fi14080244 2022
[17]

Deep learning models for bankruptcy prediction using textual disclosures.European Journal of Operational Research, 274(2): 743–758, 2019

Feng Mai, Shaonan Tian, Chihoon Lee, and Ling Ma. Deep learning models for bankruptcy prediction using textual disclosures.European Journal of Operational Research, 274(2): 743–758, 2019. doi: 10.1016/j.ejor.2018.10.024

work page doi:10.1016/j.ejor.2018.10.024 2019
[18]

Predicting distresses using deep learning of text segments in annual reports, 2019

Rastin Matin, Casper Hansen, Christian Hansen, and Pia Mølgaard. Predicting distresses using deep learning of text segments in annual reports, 2019

work page 2019
[19]

James A. Ohlson. Financial ratios and the probabilistic prediction of bankruptcy.Journal of Accounting Research, 18(1):109–131, 1980

work page 1980
[20]

Platt and Marjorie B

Harlan D. Platt and Marjorie B. Platt. Understanding differences between financial distress and bankruptcy.Review of Applied Economics, 2(2):141–157, 2006

work page 2006
[21]

Catboost: unbiased boosting with categorical features

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr V orobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6639–6649, Red Hook, NY , USA, 2018. Curran Associates Inc

work page 2018
[22]

Selection of over time stability ratios using machine learning techniques.Operations Research and Decisions, 36(2):1–25, 2026

Sebastian Klaudiusz Tomczak and Aleksander Denisiuk. Selection of over time stability ratios using machine learning techniques.Operations Research and Decisions, 36(2):1–25, 2026. doi: 10.37190/ord/215251

work page doi:10.37190/ord/215251 2026
[23]

Sebastian Klaudiusz Tomczak, Michal Karas, Tamas Kristof, Lucia Duricova, and Marek Durica. Identification of key indicators, model development, and validation methods for bankruptcy prediction: A systematic review for the visegrad group.Forum Scientiae Oeconomia, 13(3): 156–196, 2025

work page 2025
[24]

Bankruptcy prediction in the post-pandemic period: A case study of visegrad group countries.Oeconomia Copernicana, 14 (1):253–293, 2023

Katarina Valaskova, Dominika Gajdosikova, and Jaroslav Belas. Bankruptcy prediction in the post-pandemic period: A case study of visegrad group countries.Oeconomia Copernicana, 14 (1):253–293, 2023

work page 2023
[25]

Datasets for advanced bankruptcy prediction: A survey and taxonomy, 2024

Xinlin Wang, Zsófia Kräussl, and Mats Brorsson. Datasets for advanced bankruptcy prediction: A survey and taxonomy, 2024

work page 2024
[26]

BloombergGPT: A large language model for finance, 2023

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. BloombergGPT: A large language model for finance, 2023

work page 2023
[27]

PIXIU: A large language model, instruction data and evaluation benchmark for finance, 2023

Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, and Jimin Huang. PIXIU: A large language model, instruction data and evaluation benchmark for finance, 2023

work page 2023
[28]

FinGPT: Open-source financial large language models, 2023

Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. FinGPT: Open-source financial large language models, 2023

work page 2023
[29]

FinBERT: A pretrained language model for financial communications

Yi Yang, Mark Christopher Siy Uy, and Allen Huang. FinBERT: A pretrained language model for financial communications. InProceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020

work page 2020
[30]

InvestLM: A large language model for investment using financial domain instruction tuning, 2023

Yi Yang, Yixuan Tang, and Kar Yan Tam. InvestLM: A large language model for investment using financial domain instruction tuning, 2023

work page 2023
[31]

enterprise in crisis

Maciej Zi˛ eba, Sebastian K. Tomczak, and Jakub M. Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction.Expert Systems with Applications, 58:93–101, 2016. doi: 10.1016/j.eswa.2016.04.001. Polish Companies Bankruptcy dataset, UCI Machine Learning Repository. 11 Table 3: TabPFN finetuning configuration. ...

work page doi:10.1016/j.eswa.2016.04.001 2016
[32]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

Llama 3 model card

AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/ blob/main/MODEL_CARD.md

work page 2024

[2] [2]

Benchmarking machine learning models to predict corporate bankruptcy.Journal of Credit Risk, 2023

Emmanuel Alanis, Sudheer Chava, and Agam Shah. Benchmarking machine learning models to predict corporate bankruptcy.Journal of Credit Risk, 2023. ISSN 1755-9723. doi: 10.21314/ jcr.2023.002. URLhttp://dx.doi.org/10.21314/JCR.2023.002

work page doi:10.21314/jcr.2023.002 2023

[3] [3]

Edward I. Altman. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.The Journal of Finance, 23(4):589–609, 1968

work page 1968

[4] [4]

FinBERT: Financial sentiment analysis with pre-trained language models, 2019

Dogu Tan Araci. FinBERT: Financial sentiment analysis with pre-trained language models, 2019

work page 2019

[5] [5]

Machine learning models and bankruptcy prediction.Expert Systems with Applications, 83:405–417, 2017

Flavio Barboza, Herbert Kimura, and Edward Altman. Machine learning models and bankruptcy prediction.Expert Systems with Applications, 83:405–417, 2017. doi: 10.1016/j.eswa.2017.04. 006

work page doi:10.1016/j.eswa.2017.04 2017

[6] [6]

Chen and C

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939785. URL https://doi.org/10.1145/ 2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[7] [7]

Qlora: efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023

[8] [8]

Artificial neural network and decision tree- based modelling of non-prosperity of companies.Equilibrium

Marek Durica, Jaroslav Frnda, and Lucia Svabova. Artificial neural network and decision tree- based modelling of non-prosperity of companies.Equilibrium. Quarterly Journal of Economics and Economic Policy, 18(4):1105–1131, 2023

work page 2023

[9] [9]

Artificial intelligence in predicting the bankruptcy of non-financial corporations.Oeconomia Copernicana, 13(4):1215–1251, 2022

Beata Gavurova, Sylvia Jencova, Radovan Bacik, Marta Miskufova, and Stanislav Letkovsky. Artificial intelligence in predicting the bankruptcy of non-financial corporations.Oeconomia Copernicana, 13(4):1215–1251, 2022

work page 2022

[10] [10]

Corporate failure prediction models: A literature review and an empirical study of V4 firms.Equilibrium

Andrzej Geise, Mariola Piłatowska, and Aneta Wlodarczyk. Corporate failure prediction models: A literature review and an empirical study of V4 firms.Equilibrium. Quarterly Journal of Economics and Economic Policy, 16(3):593–625, 2021

work page 2021

[11] [11]

Tabllm: Few-shot classification of tabular data with large language models

Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pages 5549–5581. PMLR, 2023

work page 2023

[12] [12]

Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6

work page doi:10.1038/s41586-024-08328-6 2025

[13] [13]

Lightgbm: a highly efficient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: a highly efficient gradient boosting decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 3149–3157, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

work page 2017

[14] [14]

Are foundation models useful for bankruptcy prediction?, 2025

Marcin Kostrzewa, Oleksii Furman, Roman Furman, Sebastian Tomczak, and Maciej Zi˛ eba. Are foundation models useful for bankruptcy prediction?, 2025

work page 2025

[15] [15]

Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study.European Journal of Operational Research, 252(2):561–572, 2016

Deron Liang, Chih-Chuan Lu, Chih-Fong Tsai, and Guan-An Shih. Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study.European Journal of Operational Research, 252(2):561–572, 2016. doi: 10.1016/j.ejor.2016.01.012. 10

work page doi:10.1016/j.ejor.2016.01.012 2016

[16] [16]

Pardalos, and Agostino Poggi

Gianfranco Lombardo, Mattia Pellegrino, George Adosoglou, Stefano Cagnoni, Panos M. Pardalos, and Agostino Poggi. Machine learning for bankruptcy prediction in the american stock market: Dataset and benchmarks.Future Internet, 14(8), 2022. ISSN 1999-5903. doi: 10.3390/fi14080244. URLhttps://www.mdpi.com/1999-5903/14/8/244

work page doi:10.3390/fi14080244 2022

[17] [17]

Deep learning models for bankruptcy prediction using textual disclosures.European Journal of Operational Research, 274(2): 743–758, 2019

Feng Mai, Shaonan Tian, Chihoon Lee, and Ling Ma. Deep learning models for bankruptcy prediction using textual disclosures.European Journal of Operational Research, 274(2): 743–758, 2019. doi: 10.1016/j.ejor.2018.10.024

work page doi:10.1016/j.ejor.2018.10.024 2019

[18] [18]

Predicting distresses using deep learning of text segments in annual reports, 2019

Rastin Matin, Casper Hansen, Christian Hansen, and Pia Mølgaard. Predicting distresses using deep learning of text segments in annual reports, 2019

work page 2019

[19] [19]

James A. Ohlson. Financial ratios and the probabilistic prediction of bankruptcy.Journal of Accounting Research, 18(1):109–131, 1980

work page 1980

[20] [20]

Platt and Marjorie B

Harlan D. Platt and Marjorie B. Platt. Understanding differences between financial distress and bankruptcy.Review of Applied Economics, 2(2):141–157, 2006

work page 2006

[21] [21]

Catboost: unbiased boosting with categorical features

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr V orobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6639–6649, Red Hook, NY , USA, 2018. Curran Associates Inc

work page 2018

[22] [22]

Selection of over time stability ratios using machine learning techniques.Operations Research and Decisions, 36(2):1–25, 2026

Sebastian Klaudiusz Tomczak and Aleksander Denisiuk. Selection of over time stability ratios using machine learning techniques.Operations Research and Decisions, 36(2):1–25, 2026. doi: 10.37190/ord/215251

work page doi:10.37190/ord/215251 2026

[23] [23]

Sebastian Klaudiusz Tomczak, Michal Karas, Tamas Kristof, Lucia Duricova, and Marek Durica. Identification of key indicators, model development, and validation methods for bankruptcy prediction: A systematic review for the visegrad group.Forum Scientiae Oeconomia, 13(3): 156–196, 2025

work page 2025

[24] [24]

Bankruptcy prediction in the post-pandemic period: A case study of visegrad group countries.Oeconomia Copernicana, 14 (1):253–293, 2023

Katarina Valaskova, Dominika Gajdosikova, and Jaroslav Belas. Bankruptcy prediction in the post-pandemic period: A case study of visegrad group countries.Oeconomia Copernicana, 14 (1):253–293, 2023

work page 2023

[25] [25]

Datasets for advanced bankruptcy prediction: A survey and taxonomy, 2024

Xinlin Wang, Zsófia Kräussl, and Mats Brorsson. Datasets for advanced bankruptcy prediction: A survey and taxonomy, 2024

work page 2024

[26] [26]

BloombergGPT: A large language model for finance, 2023

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. BloombergGPT: A large language model for finance, 2023

work page 2023

[27] [27]

PIXIU: A large language model, instruction data and evaluation benchmark for finance, 2023

Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, and Jimin Huang. PIXIU: A large language model, instruction data and evaluation benchmark for finance, 2023

work page 2023

[28] [28]

FinGPT: Open-source financial large language models, 2023

Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. FinGPT: Open-source financial large language models, 2023

work page 2023

[29] [29]

FinBERT: A pretrained language model for financial communications

Yi Yang, Mark Christopher Siy Uy, and Allen Huang. FinBERT: A pretrained language model for financial communications. InProceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020

work page 2020

[30] [30]

InvestLM: A large language model for investment using financial domain instruction tuning, 2023

Yi Yang, Yixuan Tang, and Kar Yan Tam. InvestLM: A large language model for investment using financial domain instruction tuning, 2023

work page 2023

[31] [31]

enterprise in crisis

Maciej Zi˛ eba, Sebastian K. Tomczak, and Jakub M. Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction.Expert Systems with Applications, 58:93–101, 2016. doi: 10.1016/j.eswa.2016.04.001. Polish Companies Bankruptcy dataset, UCI Machine Learning Repository. 11 Table 3: TabPFN finetuning configuration. ...

work page doi:10.1016/j.eswa.2016.04.001 2016

[32] [32]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page