An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Mustafa Cavus; Przemys{\l}aw Biecek

arxiv: 2405.01557 · v4 · pith:447ZPBX2new · submitted 2024-03-22 · 💻 cs.LG

An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Mustafa Cavus , Przemys{\l}aw Biecek This is my paper

Pith reviewed 2026-05-24 03:02 UTC · model grok-4.3

classification 💻 cs.LG

keywords Rashomon effectpredictive multiplicityimbalanced classificationbalancing methodsresamplingambiguitydiscrepancyobscurity metric

0 comments

The pith

Balancing methods increase predictive multiplicity among models with similar accuracy on imbalanced data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether common resampling techniques used to fix class imbalance also raise the Rashomon effect, in which many models achieve nearly identical performance yet disagree on individual predictions. Experiments on real datasets compare models trained on balanced versus original data, tracking three multiplicity measures: ambiguity, discrepancy, and a new obscurity metric. Results indicate that balancing consistently produces more conflicting predictions across candidate models. The authors therefore recommend tracking the performance-multiplicity trade-off with an extended performance-gain plot before choosing a final model.

Core claim

Balancing methods inflate predictive multiplicity, measured by higher values of ambiguity, discrepancy, and obscurity, among candidate models that retain comparable predictive performance on imbalanced classification tasks.

What carries the argument

Predictive multiplicity quantified by ambiguity, discrepancy, and obscurity metrics, applied to models trained after balancing versus on the original imbalanced data.

If this is right

Blind selection from a set of equally accurate models becomes more risky after balancing is applied.
Validation and explanation steps must account for the larger set of conflicting predictions.
An extended performance-gain plot can be used to monitor the trade-off between accuracy improvement and increased multiplicity.
Different balancing methods produce different degrees of multiplicity, so method choice affects downstream stability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

High-stakes applications may need to prefer balancing techniques that keep multiplicity low rather than those that maximize minority-class recall alone.
The same multiplicity analysis could be applied to other preprocessing steps such as feature scaling or missing-value imputation.
In production, multiplicity-aware selection or ensemble methods could reduce the practical cost of the observed inflation.

Load-bearing premise

The selected real datasets and model families are representative of typical imbalanced classification problems, and the three metrics together capture the forms of multiplicity that matter for downstream decisions.

What would settle it

Repeating the experiments on a fresh collection of imbalanced datasets and model families yields no systematic rise, or even a drop, in the three multiplicity metrics after balancing.

Figures

Figures reproduced from arXiv: 2405.01557 by Mustafa Cavus, Przemys{\l}aw Biecek.

**Figure 1.** Figure 1: illustrates the computation of ambiguity, discrepancy, and obscurity. Assume that there are five models in the Rashomon set. To simplify the illustration, we analyzed only five samples; however, it is important to note that these two metrics were calculated on all samples. First column represents the reference model predictions ˆyi = fR(Xi) for observations i = 1, 2, 3, 4, 5. The following columns show th… view at source ↗

**Figure 2.** Figure 2: The 2d density plot of the Rashomon metrics [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The distribution plots of the Rashomon metrics [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: The distribution plots of the Rashomon metric [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: The distribution plots of the Rashomon metric [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: The performance gain plots of obscurity, discrepancy, variable importance order discrepancy for different balancing methods and varying partial resampling ratios. Moving the zones towards the positive way on the horizontal axis indicates an increase in performance gain, and moving towards the negative way on the vertical axis indicates a decrease in the multiplicity. The oversampling-based resampling metho… view at source ↗

read the original abstract

Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical data-centric AI approaches in the modeling process to improve prediction performance. However, there have been debates and questions about the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, called the Rashomon effect, in model selection, and they may even produce different predictions for the same observations. Selecting one of these models without considering the predictive multiplicity -- which is the case of yielding conflicting models' predictions for any sample -- can result in blind selection. In this paper, the impact of balancing methods on predictive multiplicity is examined using the Rashomon effect. It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models. This may lead to severe problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect by using a newly proposed metric obscurity in addition to the existing ones: ambiguity and discrepancy. Our findings showed that balancing methods inflate the predictive multiplicity and yield varying results. To monitor the trade-off between the prediction performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended version of the performance-gain plot when balancing the training data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Balancing methods increase predictive multiplicity in imbalanced classification per this study, but the experiments do not isolate that effect from dataset properties.

read the letter

The main thing to know is that this paper finds balancing methods inflate predictive multiplicity among near-equally accurate models in imbalanced classification tasks. They measure this with ambiguity and discrepancy plus a new obscurity metric, and they observe higher values after balancing along with varying outcomes across methods. They also suggest an extended performance-gain plot to track the performance versus multiplicity trade-off during modeling.

Referee Report

1 major / 2 minor

Summary. The paper conducts an experimental study on real-world imbalanced classification datasets to assess how balancing/resampling methods affect predictive multiplicity under the Rashomon effect. It measures this via existing metrics (ambiguity, discrepancy) plus a newly proposed obscurity metric, reports that balancing inflates multiplicity and produces varying results across methods, and recommends an extended performance-gain plot to monitor the performance-multiplicity trade-off during model selection.

Significance. If the attribution of multiplicity inflation to balancing holds after proper controls, the work would usefully caution practitioners in data-centric AI against blind application of balancing without multiplicity checks, potentially improving responsible model selection and validation. The multi-metric approach and real-data focus are practical strengths, though the absence of controlled ablations limits immediate generalizability.

major comments (1)

[§4 and §5] §4 (Experimental Setup) and §5 (Results): the central claim that balancing methods inflate predictive multiplicity requires isolating the effect of resampling from dataset properties. The study uses a fixed collection of real datasets without systematic variation of imbalance ratio, controlled synthetic data with varying class overlap/separability, or ablation on these factors; any observed rise in ambiguity, discrepancy, or obscurity could therefore be an artifact of the chosen data distributions rather than a general consequence of balancing.

minor comments (2)

[§3] The definition and motivation for the new obscurity metric (relative to ambiguity and discrepancy) would benefit from a dedicated subsection with a formal equation and a small illustrative example on a toy dataset.
[§5] Figure captions and axis labels in the performance-gain plots should explicitly state the balancing methods and base learners used in each panel to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on isolating the effect of balancing methods. We address the major comment below and propose targeted revisions to clarify the scope of our claims.

read point-by-point responses

Referee: [§4 and §5] §4 (Experimental Setup) and §5 (Results): the central claim that balancing methods inflate predictive multiplicity requires isolating the effect of resampling from dataset properties. The study uses a fixed collection of real datasets without systematic variation of imbalance ratio, controlled synthetic data with varying class overlap/separability, or ablation on these factors; any observed rise in ambiguity, discrepancy, or obscurity could therefore be an artifact of the chosen data distributions rather than a general consequence of balancing.

Authors: We agree that controlled synthetic experiments or systematic ablations would provide stronger causal isolation between balancing and multiplicity. Our study deliberately focuses on real-world imbalanced datasets to reflect practical data-centric AI scenarios, where datasets exhibit natural variation in imbalance ratios, overlap, and separability. The observed inflation in ambiguity, discrepancy, and obscurity is reported as an empirical finding across these datasets rather than a universal causal claim. To address the concern, we will revise §4 and §5 to explicitly state that results are observational on the chosen real datasets, add a limitations paragraph discussing potential confounding by data properties, and include a recommendation for future controlled studies with synthetic data varying imbalance and overlap. This constitutes a partial revision focused on scope clarification and transparency rather than new experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational experimental study with no derivations or self-referential reductions

full rationale

The paper conducts real-dataset experiments to measure effects of balancing methods on predictive multiplicity (via ambiguity, discrepancy, and a newly proposed obscurity metric). No load-bearing steps involve derivations, first-principles predictions, fitted parameters renamed as predictions, or self-citation chains that justify the central claims. All results are empirical observations; the proposed performance-gain plot extension is a visualization tool, not a definitional reduction. This matches the default expectation for non-circular experimental work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, axioms, or invented entities beyond the standard experimental assumption that the chosen metrics and datasets are appropriate; the new obscurity metric definition is not provided so any implicit parameters remain unknown.

pith-pipeline@v0.9.0 · 5803 in / 1036 out tokens · 30798 ms · 2026-05-24T03:02:50.329928+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Explainable bank failure prediction models: Counterfactual explanations to reduce the failure risk
cs.LG 2024-07 unverdicted novelty 4.0

Compares counterfactual generation methods with balancing strategies on bank failure data, finding NICF with cost-sensitive learning produces the highest quality explanations on validity, proximity, and sparsity.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 1 Pith paper

[1]

M., Seiffert, C., Van Hulse, J., Napolitano, A., Folleco, A.: Learn- ing with limited minority class data

Khoshgoftaar, T. M., Seiffert, C., Van Hulse, J., Napolitano, A., Folleco, A.: Learn- ing with limited minority class data. In: 6th Int. Conf. on Machine Learning and Applications, pp. 348–353 (2007)

work page 2007
[2]

P., Lai, K

Zha, D., Bhat, Z. P., Lai, K. H., Yang, F., Jiang, Z., Zhong, S., Hu, X.: Data-centric AI: A survey. arXiv preprint arXiv:2303.10158 (2023)

work page arXiv 2023
[3]

X., Chukova, S

Wang, A. X., Chukova, S. S., Nguyen, B. P.: Data-Centric AI to Improve Churn Prediction with Synthetic Data. In: 3rd Int. Conf. on Computer, Control, and Robotics, pp. 409–413 (2023)

work page 2023
[4]

Data Sci

Singh, P.: Systematic review of data-centric approaches in AI and ML. Data Sci. Manag., 6(3), pp. 144–157 (2023)

work page 2023
[5]

Vargas, W., Aranda, J. A. S., dos Santos Costa, R., da Silva Pereira, P. R., Vict´ oria Barbosa, J. L.: Imbalanced data pre-processing techniques for ML: A systematic mapping study. Knowl. Inf. Syst., 65(1), pp. 31–57 (2023). The Rashomon Effect of Balancing Methods 15

work page 2023
[6]

Knowl.-Based Syst

Moniz, N., Monteiro, H.: No free lunch in imbalanced learning. Knowl.-Based Syst. 227, 107222 (2021)

work page 2021
[7]

Stando, A., Cavus, M., Biecek, P.: The effect of balancing methods on model be- havior in imbalanced classification. In: Int. Workshop on Learning with Imbalanced Domains, pp. 16–30. PMLR (2024)

work page 2024
[8]

In: 3rd Int

Patil, A., Framewala, A., Kazi, F.: Explainability of SMOTE-based oversampling for imbalanced datasets. In: 3rd Int. Conf. on Information and Computer Tech- nologies, pp. 41–45 (2020)

work page 2020
[9]

Data Sci

Alarab, I., Prakoonwit, S.: Effect of data resampling on feature importance in imbalanced blockchain data. Data Sci. Manag., 5(2), pp. 66–76 (2022)

work page 2022
[10]

Goorbergh, R., Smeden, M., Timmerman, D., Calster, B.: Harm of class imbalance corrections for risk prediction models. J. Am. Med. Inform. Assoc., 29(9), pp. 1525–1534 (2022)

work page 2022
[11]

G., Calster, B., van Smeden, M.: Harms of class imbalance corrections for ML prediction models: A simulation study

Carriero, A., Luijken, K., Hond, A., Moons, K. G., Calster, B., van Smeden, M.: Harms of class imbalance corrections for ML prediction models: A simulation study. arXiv preprint arXiv:2404.19494 (2024)

work page arXiv 2024
[12]

arXiv preprint arXiv:2308.16681 (2024)

Simson, J., Pfisterer, F., Kern, C.: One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate Model Design Decisions. arXiv preprint arXiv:2308.16681 (2024)

work page arXiv 2024
[13]

M., Chouldechova, A.: Multi-target multiplicity: Flexibility and fairness in target specification

Watson-Daniels, J., Barocas, S., Hofman, J. M., Chouldechova, A.: Multi-target multiplicity: Flexibility and fairness in target specification. In: Proc. of the 2023 ACM Conf. on Fairness, Accountability, and Transparency, pp. 297–311 (2023)

work page 2023
[14]

F., Elreedy, D.: Partial resampling of imbalanced data

Kamalov, F., Atiya, A. F., Elreedy, D.: Partial resampling of imbalanced data. arXiv preprint arXiv:2207.04631 (2022)

work page arXiv 2022
[15]

C., Ustun, B.: Predictive multiplicity in probabilis- tic classification

Watson-Daniels, J., Parkes, D. C., Ustun, B.: Predictive multiplicity in probabilis- tic classification. In Proc. AAAI Conf. Artif. Intell. 37(9), pp. 10306–10314 (2023)

work page 2023
[16]

Breiman, L.: Statistical modeling: The two cultures (with comments and a rejoin- der). Stat. Sci. 16(3), pp. 199–231 (2001)

work page 2001
[17]

Marx, C., Calmon, F., Ustun, B.: Predictive multiplicity in classification. In: Int. Conf. on Machine Learning, pp. 6765–6774. PMLR (2020)

work page 2020
[18]

Biecek, P., Baniecki, H., Krzyznski, M., Cook, D.: Performance is not enough: The story told by a Rashomon Quartet. J. Comput. Graph. Stat., pp. 1–4 (2024)

work page 2024
[19]

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable ML: Fundamental principles and 10 grand challenges. Stat. Surveys, 16, pp. 1–85 (2022)

work page 2022
[20]

arXiv preprint arXiv:2402.00728 (2024)

Hsu, H., Li, G., Hu, S.: Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation. arXiv preprint arXiv:2402.00728 (2024)

work page arXiv 2024
[21]

Donnelly, J., Katta, S., Rudin, C., Browne, E.: The Rashomon Importance Distri- bution: Getting RID of Unstable, Single Model-based Variable Importance. Adv. Neural Inf. Process. Syst., 36 (2024)

work page 2024
[22]

V., Bowyer, K

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res., 16, pp. 321–357 (2002)

work page 2002
[23]

In: Workshop on Learning from Imbalanced Datasets, pp

Mani, I., Zhang, I.: kNN approach to unbalanced data distributions: A case study. In: Workshop on Learning from Imbalanced Datasets, pp. 1–7 (2003)

work page 2003
[24]

AutoML Conf

Kozak, A., Ruczy´ nski, H.: Forester: A Novel Approach to Accessible and Inter- pretable AutoML for Tree-Based Modeling. AutoML Conf. (2023)

work page 2023
[25]

Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: Learning variable importance by studying a class of prediction models. J. Mach. Learn. Res. 20(177), 1–81 (2019)

work page 2019
[26]

Chapman and Hall/CRC, New York (2021)

Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hall/CRC, New York (2021). 16 Cavus and Biecek

work page 2021
[27]

M., Boehmke, B

Greenwell, B. M., Boehmke, B. C., Gray, B.: Variable importance plots: An intro- duction to the VIP package. R J., 21(1), pp. 343–366 (2020)

work page 2020
[28]

R package ver- sion 0.2.1, https://CRAN.R-project.org/package=vivo (2020)

Kozak, A., Biecek, P.: Vivo: Variable Importance via Oscillations. R package ver- sion 0.2.1, https://CRAN.R-project.org/package=vivo (2020)

work page 2020
[29]

L., Krinkin, K

Zhang, Y., Xu, F., Zou, J., Petrosian, O. L., Krinkin, K. V.: XAI Evaluation: Evaluating Black-Box Model Explanations. In: 2nd Int. Conf. on Neural Networks and Neurotechnologies, pp. 13–16 (2021)

work page 2021
[30]

arXiv preprint arXiv:2308.11446 (2023)

Kobyli´ nska, K., Krzyzi´ nski, M., Machowicz, R., Adamek, M., Biecek, P.: Ex- ploration of Rashomon set assists explanations for medical data. arXiv preprint arXiv:2308.11446 (2023)

work page arXiv 2023
[31]

G.: A new measure of rank correlation

Kendall, M. G.: A new measure of rank correlation. Biometrics, 30, pp. 81–93 (1938)

work page 1938
[32]

Patil, I.: Visualizations with statistical details: The ggstatsplot approach. J. Open Source Softw., 6(61), pp. 3167 (2021)

work page 2021
[33]

H., Wallis, W

Kruskal, W. H., Wallis, W. A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc., 47, pp. 583–621 (1952)

work page 1952
[34]

Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc., 32, pp. 675–701 (1937)

work page 1937
[35]

W.: A multiple comparison procedure for comparing several treat- ments with a control

Dunnett, C. W.: A multiple comparison procedure for comparing several treat- ments with a control. J. Am. Stat. Assoc., 50, pp. 1096–1121 (1955)

work page 1955
[36]

Hsu, H., Calmon, F.: Rashomon capacity: A metric for predictive multiplicity in classification. Adv. Neural Inf. Process. Syst., 35, pp. 28988–29000 (2022)

work page 2022
[37]

arXiv preprint arXiv:2308.07247 (2023)

Poiret, C., Grigis, A., Thomas, J., Noulhiane, M.: Can we Agree? On the Rashomon Effect and the Reliability of Post-Hoc Explainable AI. arXiv preprint arXiv:2308.07247 (2023)

work page arXiv 2023
[38]

In: 31st ACM Int

Oh, S., Ustun, B., McAuley, J., Kumar, S.: Rank list sensitivity of recommender systems to interaction perturbations. In: 31st ACM Int. Conf. on Information & Knowledge Management, pp. 1584–1594 (2022)

work page 2022
[39]

Elor, Y., Averbuch-Elor, H.: To SMOTE, or not to SMOTE? arXiv preprint arXiv:2201.08528 (2022)

work page arXiv 2022
[40]

M., Cruz, R., Calmon, F

Paes, L. M., Cruz, R., Calmon, F. P., Diaz, M.: On the inevitability of the Rashomon effect. In: IEEE Int. Symp. on Information Theory, pp. 549–554 (2023)

work page 2023
[41]

P., Albarghouthi, A., D’Antoni, L.: The dataset multiplicity problem: How unreliable data impacts predictions

Meyer, A. P., Albarghouthi, A., D’Antoni, L.: The dataset multiplicity problem: How unreliable data impacts predictions. In: Proc. of the 2023 ACM Conf. on Fairness, Accountability, and Transparency, pp. 193–204 (2023)

work page 2023
[42]

In: 4th Int

Komorniczak, J., Ksieniewicz, P., Wo´ zniak, M.: Data complexity and classification accuracy correlation in oversampling algorithms. In: 4th Int. Workshop on Learning with Imbalanced Domains, pp. 175–186. PMLR (2022)

work page 2022
[43]

Junior, J. D. S. F., Pisani, P. H.: Performance and model complexity on imbalanced datasets using resampling and cost-sensitive algorithms. In: 4th Int. Workshop on Learning with Imbalanced Domains, pp. 83–97. PMLR (2022)

work page 2022
[44]

S., Mollineda, R

Garcia, V., S´ anchez, J. S., Mollineda, R. A.: On the effectiveness of pre-processing methods for class imbalance. Knowl.-Based Syst., 25(1), pp. 13–21 (2012)

work page 2012
[45]

C., Batista, G

Prati, R. C., Batista, G. E., Silva, D. F.: Class imbalance revisited: A new experi- mental setup. Knowl. Inf. Syst., 45, pp. 247–270 (2015)

work page 2015

[1] [1]

M., Seiffert, C., Van Hulse, J., Napolitano, A., Folleco, A.: Learn- ing with limited minority class data

Khoshgoftaar, T. M., Seiffert, C., Van Hulse, J., Napolitano, A., Folleco, A.: Learn- ing with limited minority class data. In: 6th Int. Conf. on Machine Learning and Applications, pp. 348–353 (2007)

work page 2007

[2] [2]

P., Lai, K

Zha, D., Bhat, Z. P., Lai, K. H., Yang, F., Jiang, Z., Zhong, S., Hu, X.: Data-centric AI: A survey. arXiv preprint arXiv:2303.10158 (2023)

work page arXiv 2023

[3] [3]

X., Chukova, S

Wang, A. X., Chukova, S. S., Nguyen, B. P.: Data-Centric AI to Improve Churn Prediction with Synthetic Data. In: 3rd Int. Conf. on Computer, Control, and Robotics, pp. 409–413 (2023)

work page 2023

[4] [4]

Data Sci

Singh, P.: Systematic review of data-centric approaches in AI and ML. Data Sci. Manag., 6(3), pp. 144–157 (2023)

work page 2023

[5] [5]

Vargas, W., Aranda, J. A. S., dos Santos Costa, R., da Silva Pereira, P. R., Vict´ oria Barbosa, J. L.: Imbalanced data pre-processing techniques for ML: A systematic mapping study. Knowl. Inf. Syst., 65(1), pp. 31–57 (2023). The Rashomon Effect of Balancing Methods 15

work page 2023

[6] [6]

Knowl.-Based Syst

Moniz, N., Monteiro, H.: No free lunch in imbalanced learning. Knowl.-Based Syst. 227, 107222 (2021)

work page 2021

[7] [7]

Stando, A., Cavus, M., Biecek, P.: The effect of balancing methods on model be- havior in imbalanced classification. In: Int. Workshop on Learning with Imbalanced Domains, pp. 16–30. PMLR (2024)

work page 2024

[8] [8]

In: 3rd Int

Patil, A., Framewala, A., Kazi, F.: Explainability of SMOTE-based oversampling for imbalanced datasets. In: 3rd Int. Conf. on Information and Computer Tech- nologies, pp. 41–45 (2020)

work page 2020

[9] [9]

Data Sci

Alarab, I., Prakoonwit, S.: Effect of data resampling on feature importance in imbalanced blockchain data. Data Sci. Manag., 5(2), pp. 66–76 (2022)

work page 2022

[10] [10]

Goorbergh, R., Smeden, M., Timmerman, D., Calster, B.: Harm of class imbalance corrections for risk prediction models. J. Am. Med. Inform. Assoc., 29(9), pp. 1525–1534 (2022)

work page 2022

[11] [11]

G., Calster, B., van Smeden, M.: Harms of class imbalance corrections for ML prediction models: A simulation study

Carriero, A., Luijken, K., Hond, A., Moons, K. G., Calster, B., van Smeden, M.: Harms of class imbalance corrections for ML prediction models: A simulation study. arXiv preprint arXiv:2404.19494 (2024)

work page arXiv 2024

[12] [12]

arXiv preprint arXiv:2308.16681 (2024)

Simson, J., Pfisterer, F., Kern, C.: One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate Model Design Decisions. arXiv preprint arXiv:2308.16681 (2024)

work page arXiv 2024

[13] [13]

M., Chouldechova, A.: Multi-target multiplicity: Flexibility and fairness in target specification

Watson-Daniels, J., Barocas, S., Hofman, J. M., Chouldechova, A.: Multi-target multiplicity: Flexibility and fairness in target specification. In: Proc. of the 2023 ACM Conf. on Fairness, Accountability, and Transparency, pp. 297–311 (2023)

work page 2023

[14] [14]

F., Elreedy, D.: Partial resampling of imbalanced data

Kamalov, F., Atiya, A. F., Elreedy, D.: Partial resampling of imbalanced data. arXiv preprint arXiv:2207.04631 (2022)

work page arXiv 2022

[15] [15]

C., Ustun, B.: Predictive multiplicity in probabilis- tic classification

Watson-Daniels, J., Parkes, D. C., Ustun, B.: Predictive multiplicity in probabilis- tic classification. In Proc. AAAI Conf. Artif. Intell. 37(9), pp. 10306–10314 (2023)

work page 2023

[16] [16]

Breiman, L.: Statistical modeling: The two cultures (with comments and a rejoin- der). Stat. Sci. 16(3), pp. 199–231 (2001)

work page 2001

[17] [17]

Marx, C., Calmon, F., Ustun, B.: Predictive multiplicity in classification. In: Int. Conf. on Machine Learning, pp. 6765–6774. PMLR (2020)

work page 2020

[18] [18]

Biecek, P., Baniecki, H., Krzyznski, M., Cook, D.: Performance is not enough: The story told by a Rashomon Quartet. J. Comput. Graph. Stat., pp. 1–4 (2024)

work page 2024

[19] [19]

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable ML: Fundamental principles and 10 grand challenges. Stat. Surveys, 16, pp. 1–85 (2022)

work page 2022

[20] [20]

arXiv preprint arXiv:2402.00728 (2024)

Hsu, H., Li, G., Hu, S.: Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation. arXiv preprint arXiv:2402.00728 (2024)

work page arXiv 2024

[21] [21]

Donnelly, J., Katta, S., Rudin, C., Browne, E.: The Rashomon Importance Distri- bution: Getting RID of Unstable, Single Model-based Variable Importance. Adv. Neural Inf. Process. Syst., 36 (2024)

work page 2024

[22] [22]

V., Bowyer, K

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res., 16, pp. 321–357 (2002)

work page 2002

[23] [23]

In: Workshop on Learning from Imbalanced Datasets, pp

Mani, I., Zhang, I.: kNN approach to unbalanced data distributions: A case study. In: Workshop on Learning from Imbalanced Datasets, pp. 1–7 (2003)

work page 2003

[24] [24]

AutoML Conf

Kozak, A., Ruczy´ nski, H.: Forester: A Novel Approach to Accessible and Inter- pretable AutoML for Tree-Based Modeling. AutoML Conf. (2023)

work page 2023

[25] [25]

Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: Learning variable importance by studying a class of prediction models. J. Mach. Learn. Res. 20(177), 1–81 (2019)

work page 2019

[26] [26]

Chapman and Hall/CRC, New York (2021)

Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hall/CRC, New York (2021). 16 Cavus and Biecek

work page 2021

[27] [27]

M., Boehmke, B

Greenwell, B. M., Boehmke, B. C., Gray, B.: Variable importance plots: An intro- duction to the VIP package. R J., 21(1), pp. 343–366 (2020)

work page 2020

[28] [28]

R package ver- sion 0.2.1, https://CRAN.R-project.org/package=vivo (2020)

Kozak, A., Biecek, P.: Vivo: Variable Importance via Oscillations. R package ver- sion 0.2.1, https://CRAN.R-project.org/package=vivo (2020)

work page 2020

[29] [29]

L., Krinkin, K

Zhang, Y., Xu, F., Zou, J., Petrosian, O. L., Krinkin, K. V.: XAI Evaluation: Evaluating Black-Box Model Explanations. In: 2nd Int. Conf. on Neural Networks and Neurotechnologies, pp. 13–16 (2021)

work page 2021

[30] [30]

arXiv preprint arXiv:2308.11446 (2023)

Kobyli´ nska, K., Krzyzi´ nski, M., Machowicz, R., Adamek, M., Biecek, P.: Ex- ploration of Rashomon set assists explanations for medical data. arXiv preprint arXiv:2308.11446 (2023)

work page arXiv 2023

[31] [31]

G.: A new measure of rank correlation

Kendall, M. G.: A new measure of rank correlation. Biometrics, 30, pp. 81–93 (1938)

work page 1938

[32] [32]

Patil, I.: Visualizations with statistical details: The ggstatsplot approach. J. Open Source Softw., 6(61), pp. 3167 (2021)

work page 2021

[33] [33]

H., Wallis, W

Kruskal, W. H., Wallis, W. A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc., 47, pp. 583–621 (1952)

work page 1952

[34] [34]

Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc., 32, pp. 675–701 (1937)

work page 1937

[35] [35]

W.: A multiple comparison procedure for comparing several treat- ments with a control

Dunnett, C. W.: A multiple comparison procedure for comparing several treat- ments with a control. J. Am. Stat. Assoc., 50, pp. 1096–1121 (1955)

work page 1955

[36] [36]

Hsu, H., Calmon, F.: Rashomon capacity: A metric for predictive multiplicity in classification. Adv. Neural Inf. Process. Syst., 35, pp. 28988–29000 (2022)

work page 2022

[37] [37]

arXiv preprint arXiv:2308.07247 (2023)

Poiret, C., Grigis, A., Thomas, J., Noulhiane, M.: Can we Agree? On the Rashomon Effect and the Reliability of Post-Hoc Explainable AI. arXiv preprint arXiv:2308.07247 (2023)

work page arXiv 2023

[38] [38]

In: 31st ACM Int

Oh, S., Ustun, B., McAuley, J., Kumar, S.: Rank list sensitivity of recommender systems to interaction perturbations. In: 31st ACM Int. Conf. on Information & Knowledge Management, pp. 1584–1594 (2022)

work page 2022

[39] [39]

Elor, Y., Averbuch-Elor, H.: To SMOTE, or not to SMOTE? arXiv preprint arXiv:2201.08528 (2022)

work page arXiv 2022

[40] [40]

M., Cruz, R., Calmon, F

Paes, L. M., Cruz, R., Calmon, F. P., Diaz, M.: On the inevitability of the Rashomon effect. In: IEEE Int. Symp. on Information Theory, pp. 549–554 (2023)

work page 2023

[41] [41]

P., Albarghouthi, A., D’Antoni, L.: The dataset multiplicity problem: How unreliable data impacts predictions

Meyer, A. P., Albarghouthi, A., D’Antoni, L.: The dataset multiplicity problem: How unreliable data impacts predictions. In: Proc. of the 2023 ACM Conf. on Fairness, Accountability, and Transparency, pp. 193–204 (2023)

work page 2023

[42] [42]

In: 4th Int

Komorniczak, J., Ksieniewicz, P., Wo´ zniak, M.: Data complexity and classification accuracy correlation in oversampling algorithms. In: 4th Int. Workshop on Learning with Imbalanced Domains, pp. 175–186. PMLR (2022)

work page 2022

[43] [43]

Junior, J. D. S. F., Pisani, P. H.: Performance and model complexity on imbalanced datasets using resampling and cost-sensitive algorithms. In: 4th Int. Workshop on Learning with Imbalanced Domains, pp. 83–97. PMLR (2022)

work page 2022

[44] [44]

S., Mollineda, R

Garcia, V., S´ anchez, J. S., Mollineda, R. A.: On the effectiveness of pre-processing methods for class imbalance. Knowl.-Based Syst., 25(1), pp. 13–21 (2012)

work page 2012

[45] [45]

C., Batista, G

Prati, R. C., Batista, G. E., Silva, D. F.: Class imbalance revisited: A new experi- mental setup. Knowl. Inf. Syst., 45, pp. 247–270 (2015)

work page 2015