pith. sign in

arxiv: 2605.22529 · v1 · pith:BQVLKOMLnew · submitted 2026-05-21 · 💻 cs.LG · cs.AI

Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets

Pith reviewed 2026-05-22 07:50 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords multicollinearityexplainabilityintrusion detection systemsSHAPLIMEXAIUNSW-NB15attribution instability
0
0 comments X

The pith

Multicollinearity inflates attribution variance and renders explanations non-identifiable in AI for intrusion detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how correlated features in public datasets undermine the stability of AI explanations used in cybersecurity for intrusion detection. It introduces a formal theorem showing that multicollinearity increases the variance of attributions produced by tools such as SHAP and LIME, which in turn makes feature importances non-identifiable. A sympathetic reader would care because unreliable explanations reduce trust in AI systems deployed for security tasks. Experiments across different model types on the UNSW-NB15 dataset support the theorem, and the authors propose new methods to mitigate the problem. These methods aim to improve stability while keeping predictive performance intact.

Core claim

The central discovery is a formal theorem that multicollinearity inflates attribution variance, demonstrating that explanations and feature importances are non-identifiable under such conditions. This is validated through comprehensive experiments on the UNSW-NB15 benchmark using linear, tree-based, kernel, and neural models, with full and pruned feature sets. The paper also defines the Explanability Fragility Score and introduces CAA-Filtering and SHARP as mitigation approaches that stabilize explanations across bootstrapped runs.

What carries the argument

The formal theorem on multicollinearity-induced inflation of attribution variance, which carries the argument by linking data correlations directly to explanation instability; this is complemented by the Explanability Fragility Score and the SHARP regularisation method.

Load-bearing premise

The proposed mitigation methods stabilize explanations without reducing predictive performance on the UNSW-NB15 dataset across bootstrapped runs.

What would settle it

A direct measurement on the UNSW-NB15 dataset showing no increase in attribution variance despite high multicollinearity, or a drop in model accuracy when using CAA-Filtering or SHARP.

Figures

Figures reproduced from arXiv: 2605.22529 by Anna Lito Michala, Ioannis J. Vourganas.

Figure 1
Figure 1. Figure 1: Pearson correlation heatmap for the numeric feature space [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 𝜆-Ablation results for values from 0 to 10 increasing by powers of 10 for the LR model To formally validate the controllability of explainability stability, 𝜆-ablation evaluation was performed for 𝜆 ∈ {0, 0.01, 0.1, 1.0, 10.0}. When 𝜆 = 0, SHARP reduces to standard empirical risk minimisation, ensuring backward compatibility with existing training pipelines. Results are presented in [PITH_FULL_IMAGE:figur… view at source ↗
Figure 3
Figure 3. Figure 3: 𝜆-Ablation results for values from 0 to 10 increasing by powers of 10 for the MLP NN model [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

This paper investigates a unexplored yet impactful vulnerability in AI explainability used in intrusion detection (IDS): multicollinearity-induced instability. Despite extensive reliance on post-hoc explainability tools such as SHAP or LIME, the impact of correlated features on explanation robustness is not evaluated. We introduce a formal theorem stating that multicollinearity inflates attribution variance. This demonstrates that explanations and feature importances are non-identifiable under multicollinearity. A suite of comprehensive experiments validates the theorem on a representative benchmark dataset, UNSW-NB15. Four widely used families of models are evaluated, including linear, tree-based, kernel, and neural, across full and pruned feature sets based on VIF and correlation thresholding. We propose the novel metric of Explanability Fragility Score and two novel methods to mitigate it with variable integration complexity. CAA-Filtering focuses on stabilising explanations by grouping attributions of trained models. SHARP is a novel training-time regularisation framework that penalises attribution instability, enabling controllable and monotonic improvement of explainability stability. The findings support stable predictive performance, using Kendall's {\tau} to quantify instability across bootstrapped explanations. This work has direct implications for the trustworthiness and reproducibility of XAI in security-critical contexts, and motivates incorporating multicollinearity mitigations into the IDS pipelines, providing a set of guidelines for practitioners.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that multicollinearity induces instability in post-hoc explainability (SHAP, LIME) for cybersecurity AI models on datasets like UNSW-NB15. It presents a formal theorem that multicollinearity inflates attribution variance and renders explanations non-identifiable. This is validated experimentally across linear, tree-based, kernel, and neural model families using full and pruned feature sets (via VIF and correlation thresholds). The authors introduce the Explanability Fragility Score, CAA-Filtering for attribution grouping, and SHARP training-time regularization to improve stability (measured by Kendall's tau on bootstrapped runs) while maintaining predictive performance, and offer practitioner guidelines.

Significance. If the theorem holds generally and the mitigations preserve performance, the work would address a relevant gap in XAI robustness for security-critical systems. Credit is due for the formal theorem providing independent grounding and for the multi-family experiments with Kendall's tau quantification of instability. The controllable SHARP method offers a practical contribution if the no-degradation claim is substantiated.

major comments (2)
  1. [Formal theorem section] Theorem 1 (formal theorem section): The derivation correctly identifies variance inflation for linear attribution operators via the ill-conditioned covariance matrix, but provides no extension or additional argument showing that non-identifiability transfers to the sampling-based (KernelSHAP), path-dependent (TreeSHAP), or surrogate (DeepSHAP) attributions used for the tree-based, kernel, and neural models in the UNSW-NB15 experiments.
  2. [Experimental results and mitigation sections] Experimental results and mitigation sections: The central practical claim that CAA-Filtering and SHARP stabilize explanations without degrading predictive performance is load-bearing, yet the manuscript does not report explicit before/after comparisons of standard performance metrics (accuracy, F1-score, or AUC) on the original versus VIF-pruned or SHARP-regularized models.
minor comments (2)
  1. [Abstract] Abstract: 'Explanability Fragility Score' is a typographical error and should read 'Explainability Fragility Score'.
  2. [Methods section] Methods section: The choice of VIF threshold and correlation threshold (listed as free parameters) should be justified with sensitivity analysis rather than presented as fixed without further detail.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, indicating the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: [Formal theorem section] Theorem 1 (formal theorem section): The derivation correctly identifies variance inflation for linear attribution operators via the ill-conditioned covariance matrix, but provides no extension or additional argument showing that non-identifiability transfers to the sampling-based (KernelSHAP), path-dependent (TreeSHAP), or surrogate (DeepSHAP) attributions used for the tree-based, kernel, and neural models in the UNSW-NB15 experiments.

    Authors: We acknowledge that Theorem 1 is formally derived for linear attribution operators using the ill-conditioned covariance matrix. For the non-linear cases (KernelSHAP, TreeSHAP, DeepSHAP), the manuscript relies on empirical validation across model families rather than a direct theoretical extension. In the revised manuscript, we will add a clarifying discussion in the theorem section noting the empirical support for generalization while acknowledging the difficulty of a full non-linear proof; we do not claim the theorem directly transfers but argue the underlying multicollinearity mechanism is shared. revision: partial

  2. Referee: [Experimental results and mitigation sections] Experimental results and mitigation sections: The central practical claim that CAA-Filtering and SHARP stabilize explanations without degrading predictive performance is load-bearing, yet the manuscript does not report explicit before/after comparisons of standard performance metrics (accuracy, F1-score, or AUC) on the original versus VIF-pruned or SHARP-regularized models.

    Authors: We agree that explicit before-and-after comparisons of predictive performance metrics are necessary to fully substantiate the no-degradation claim. In the revised manuscript, we will add tables in the experimental results section reporting accuracy, F1-score, and AUC for the original full-feature models, the VIF-pruned and correlation-thresholded variants, and the SHARP-regularized models, confirming that performance is maintained. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; theorem and metrics stand independently

full rationale

The paper introduces a formal theorem claiming multicollinearity inflates attribution variance and renders explanations non-identifiable, which for linear models follows from standard properties of ill-conditioned covariance matrices without requiring self-reference. Experiments validate this on UNSW-NB15 across linear, tree-based, kernel, and neural models using VIF pruning and Kendall's τ for instability. The novel Explanability Fragility Score and methods CAA-Filtering and SHARP are defined and applied as independent contributions with stated assumptions about preserving predictive performance; no equations or definitions reduce by construction to fitted thresholds or prior self-citations. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 3 invented entities

Based on abstract only; paper introduces new metric and methods whose mathematical foundations and independence from fitted choices are described at high level.

free parameters (2)
  • VIF threshold
    Used for pruning feature sets in experiments.
  • Correlation threshold
    Applied for feature selection alongside VIF.
axioms (1)
  • ad hoc to paper Multicollinearity inflates attribution variance in post-hoc explainers such as SHAP and LIME.
    Core statement of the introduced formal theorem.
invented entities (3)
  • Explanability Fragility Score no independent evidence
    purpose: Quantify instability of explanations due to multicollinearity.
    Newly proposed metric.
  • CAA-Filtering no independent evidence
    purpose: Stabilize explanations by grouping attributions of trained models.
    Novel post-training mitigation method.
  • SHARP no independent evidence
    purpose: Training-time regularisation that penalises attribution instability.
    New framework enabling controllable improvement.

pith-pipeline@v0.9.0 · 5779 in / 1525 out tokens · 59720 ms · 2026-05-22T07:50:37.931108+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    Sunday Adeola Ajagbe, Joseph Bamidele Awotunde, and Hector Florez. 2024. Intrusion detection: A comparison study of machine learning models using unbalanced dataset.SN Computer Science5, 8 (2024), 1028

  2. [2]

    Abhishek Divekar, Meet Parekh, Vaibhav Savla, Rudra Mishra, and Mahesh Shirole. 2018. Benchmarking datasets for anomaly-based network intrusion detection: KDD CUP 99 alternatives. In2018 IEEE 3rd international conference on computing, communication and security (ICCCS). IEEE, 1–8

  3. [3]

    Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab, Madhusudan Mukherjee, and Helge Janicke. 2020. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study.Journal of Information Security and Applications50 (2020), 102419. doi:10.1016/j.jisa.2019.102419

  4. [4]

    Hardik Gunjal, Preetkumar Patel, and Dariush Ebrahimi. 2023. Smart network intrusion detection system for cyber security of industrial IoT. Authorea Preprints(2023)

  5. [5]

    Joseph F Hair Jr, G Tomas M Hult, Christian M Ringle, Marko Sarstedt, Nicholas P Danks, and Soumya Ray. 2021. An introduction to structural equation modeling. InPartial least squares structural equation modeling (PLS-SEM) using R: a workbook. Springer, 1–29

  6. [6]

    Pamela Hermosilla, Sebastián Berríos, and Héctor Allende-Cid. 2025. Explainable AI for Forensic Analysis: A Comparative Study of SHAP and LIME in Intrusion Detection Models.Applied Sciences15, 13 (2025), 7329

  7. [7]

    Giles Hooker, Lucas Mentch, and Siyu Zhou. 2021. Unrestricted permutation forces extrapolation: Variable importance requires at least one more model.The American Statistician75, 1 (2021), 65–72. doi:10.1080/00031305.2020.1802063

  8. [8]

    2021.An Introduction to Statistical Learning(2nd ed.)

    Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021.An Introduction to Statistical Learning(2nd ed.). Springer. https: //www.statlearning.com

  9. [9]

    Marwa Keshk, Nickolaos Koroniotis, Nam Pham, Nour Moustafa, Benjamin Turnbull, and Albert Y Zomaya. 2023. An explainable deep learning- enabled intrusion detection framework in IoT networks.Information Sciences639 (2023), 119000

  10. [10]

    Friedler

    Amit Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle A. Friedler. 2020. Problems with Shapley-value-based explanations as feature importance measures. InProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5491–5500. https://proceedings.mlr.press/v119/kumar20b.html

  11. [11]

    Liyou Liu and Ming Xu. 2025. A network intrusion detection method based on contrastive learning and Bayesian Gaussian Mixture Model. Cybersecurity8, 1 (2025), 59

  12. [12]

    Muhammad Luqman, Muhammad Zeeshan, Qaiser Riaz, Mehdi Hussain, Hasan Tahir, Noman Mazhar, and Muhammad Saffeer Khan. 2025. Intelligent parameter-based in-network IDS for IoT using UNSW-NB15 and BoT-IoT datasets.Journal of the Franklin Institute362, 1 (2025), 107440

  13. [13]

    Shraddha Mane and Dattaraj Rao. 2021. Explaining network intrusion detection system using explainable AI framework.arXiv preprint arXiv:2103.07110(2021)

  14. [14]

    Souhail Meftah, Tajjeeddine Rachidi, and Nasser Assem. 2019. Network based intrusion detection using the UNSW-NB15 dataset.International Journal of Computing and Digital Systems8, 5 (2019), 478–487

  15. [15]

    Shweta More, Moad Idrissi, Haitham Mahmoud, and A Taufiq Asyhari. 2024. Enhanced intrusion detection systems performance with UNSW-NB15 data analysis.Algorithms17, 2 (2024), 64

  16. [16]

    Nour Moustafa and Jill Slay. 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In2015 military communications and information systems conference (MilCIS). IEEE, 1–6

  17. [17]

    Robert M. O’Brien. 2007. A caution regarding rules of thumb for variance inflation factors.Quality & Quantity41, 5 (2007), 673–690. doi:10.1007/s11135- 006-9018-6

  18. [18]

    Kaivalya Rawal, Zihao Fu, Eoin Delaney, and Chris Russell. 2025. Evaluating Model Explanations without Ground Truth. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 3400–3411

  19. [19]

    Adel Thaljaoui. 2025. Intelligent network intrusion detection system using optimized deep CNN-LSTM with UNSW-NB15.International Journal of Information Technology(2025), 1–17. Manuscript submitted to ACM 24 Vourganas et al

  20. [20]

    Amol D Vibhute, Minhaj Khan, Chandrashekhar H Patil, Sandeep V Gaikwad, Arjun V Mane, and Kanubhai K Patel. 2024. Network anomaly detection and performance evaluation of Convolutional Neural Networks on UNSW-NB15 dataset.Procedia computer science235 (2024), 2227–2236

  21. [21]

    Ioannis J Vourganas and Anna Lito Michala. 2024. Applications of machine learning in cyber security: a review.Journal of Cybersecurity and Privacy 4, 4 (2024), 972–992

  22. [22]

    Zeinab Zoghi and Gursel Serpen. 2024. Building an intrusion detection system on UNSW-NB15: Reducing the margin of error to deal with data overlap and imbalance.Concurrency and Computation: Practice and Experience36, 25 (2024), e8242. A Theorem Proof Theorem derivation. Setting.Let 𝑋∈R 𝑛×𝑝 be the feature matrix, 𝑦∈R 𝑛, and consider the homoscedastic linear...