Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets
Pith reviewed 2026-05-22 07:50 UTC · model grok-4.3
The pith
Multicollinearity inflates attribution variance and renders explanations non-identifiable in AI for intrusion detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a formal theorem that multicollinearity inflates attribution variance, demonstrating that explanations and feature importances are non-identifiable under such conditions. This is validated through comprehensive experiments on the UNSW-NB15 benchmark using linear, tree-based, kernel, and neural models, with full and pruned feature sets. The paper also defines the Explanability Fragility Score and introduces CAA-Filtering and SHARP as mitigation approaches that stabilize explanations across bootstrapped runs.
What carries the argument
The formal theorem on multicollinearity-induced inflation of attribution variance, which carries the argument by linking data correlations directly to explanation instability; this is complemented by the Explanability Fragility Score and the SHARP regularisation method.
Load-bearing premise
The proposed mitigation methods stabilize explanations without reducing predictive performance on the UNSW-NB15 dataset across bootstrapped runs.
What would settle it
A direct measurement on the UNSW-NB15 dataset showing no increase in attribution variance despite high multicollinearity, or a drop in model accuracy when using CAA-Filtering or SHARP.
Figures
read the original abstract
This paper investigates a unexplored yet impactful vulnerability in AI explainability used in intrusion detection (IDS): multicollinearity-induced instability. Despite extensive reliance on post-hoc explainability tools such as SHAP or LIME, the impact of correlated features on explanation robustness is not evaluated. We introduce a formal theorem stating that multicollinearity inflates attribution variance. This demonstrates that explanations and feature importances are non-identifiable under multicollinearity. A suite of comprehensive experiments validates the theorem on a representative benchmark dataset, UNSW-NB15. Four widely used families of models are evaluated, including linear, tree-based, kernel, and neural, across full and pruned feature sets based on VIF and correlation thresholding. We propose the novel metric of Explanability Fragility Score and two novel methods to mitigate it with variable integration complexity. CAA-Filtering focuses on stabilising explanations by grouping attributions of trained models. SHARP is a novel training-time regularisation framework that penalises attribution instability, enabling controllable and monotonic improvement of explainability stability. The findings support stable predictive performance, using Kendall's {\tau} to quantify instability across bootstrapped explanations. This work has direct implications for the trustworthiness and reproducibility of XAI in security-critical contexts, and motivates incorporating multicollinearity mitigations into the IDS pipelines, providing a set of guidelines for practitioners.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that multicollinearity induces instability in post-hoc explainability (SHAP, LIME) for cybersecurity AI models on datasets like UNSW-NB15. It presents a formal theorem that multicollinearity inflates attribution variance and renders explanations non-identifiable. This is validated experimentally across linear, tree-based, kernel, and neural model families using full and pruned feature sets (via VIF and correlation thresholds). The authors introduce the Explanability Fragility Score, CAA-Filtering for attribution grouping, and SHARP training-time regularization to improve stability (measured by Kendall's tau on bootstrapped runs) while maintaining predictive performance, and offer practitioner guidelines.
Significance. If the theorem holds generally and the mitigations preserve performance, the work would address a relevant gap in XAI robustness for security-critical systems. Credit is due for the formal theorem providing independent grounding and for the multi-family experiments with Kendall's tau quantification of instability. The controllable SHARP method offers a practical contribution if the no-degradation claim is substantiated.
major comments (2)
- [Formal theorem section] Theorem 1 (formal theorem section): The derivation correctly identifies variance inflation for linear attribution operators via the ill-conditioned covariance matrix, but provides no extension or additional argument showing that non-identifiability transfers to the sampling-based (KernelSHAP), path-dependent (TreeSHAP), or surrogate (DeepSHAP) attributions used for the tree-based, kernel, and neural models in the UNSW-NB15 experiments.
- [Experimental results and mitigation sections] Experimental results and mitigation sections: The central practical claim that CAA-Filtering and SHARP stabilize explanations without degrading predictive performance is load-bearing, yet the manuscript does not report explicit before/after comparisons of standard performance metrics (accuracy, F1-score, or AUC) on the original versus VIF-pruned or SHARP-regularized models.
minor comments (2)
- [Abstract] Abstract: 'Explanability Fragility Score' is a typographical error and should read 'Explainability Fragility Score'.
- [Methods section] Methods section: The choice of VIF threshold and correlation threshold (listed as free parameters) should be justified with sensitivity analysis rather than presented as fixed without further detail.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below, indicating the revisions we will make to improve the manuscript.
read point-by-point responses
-
Referee: [Formal theorem section] Theorem 1 (formal theorem section): The derivation correctly identifies variance inflation for linear attribution operators via the ill-conditioned covariance matrix, but provides no extension or additional argument showing that non-identifiability transfers to the sampling-based (KernelSHAP), path-dependent (TreeSHAP), or surrogate (DeepSHAP) attributions used for the tree-based, kernel, and neural models in the UNSW-NB15 experiments.
Authors: We acknowledge that Theorem 1 is formally derived for linear attribution operators using the ill-conditioned covariance matrix. For the non-linear cases (KernelSHAP, TreeSHAP, DeepSHAP), the manuscript relies on empirical validation across model families rather than a direct theoretical extension. In the revised manuscript, we will add a clarifying discussion in the theorem section noting the empirical support for generalization while acknowledging the difficulty of a full non-linear proof; we do not claim the theorem directly transfers but argue the underlying multicollinearity mechanism is shared. revision: partial
-
Referee: [Experimental results and mitigation sections] Experimental results and mitigation sections: The central practical claim that CAA-Filtering and SHARP stabilize explanations without degrading predictive performance is load-bearing, yet the manuscript does not report explicit before/after comparisons of standard performance metrics (accuracy, F1-score, or AUC) on the original versus VIF-pruned or SHARP-regularized models.
Authors: We agree that explicit before-and-after comparisons of predictive performance metrics are necessary to fully substantiate the no-degradation claim. In the revised manuscript, we will add tables in the experimental results section reporting accuracy, F1-score, and AUC for the original full-feature models, the VIF-pruned and correlation-thresholded variants, and the SHARP-regularized models, confirming that performance is maintained. revision: yes
Circularity Check
No significant circularity detected; theorem and metrics stand independently
full rationale
The paper introduces a formal theorem claiming multicollinearity inflates attribution variance and renders explanations non-identifiable, which for linear models follows from standard properties of ill-conditioned covariance matrices without requiring self-reference. Experiments validate this on UNSW-NB15 across linear, tree-based, kernel, and neural models using VIF pruning and Kendall's τ for instability. The novel Explanability Fragility Score and methods CAA-Filtering and SHARP are defined and applied as independent contributions with stated assumptions about preserving predictive performance; no equations or definitions reduce by construction to fitted thresholds or prior self-citations. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- VIF threshold
- Correlation threshold
axioms (1)
- ad hoc to paper Multicollinearity inflates attribution variance in post-hoc explainers such as SHAP and LIME.
invented entities (3)
-
Explanability Fragility Score
no independent evidence
-
CAA-Filtering
no independent evidence
-
SHARP
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Sunday Adeola Ajagbe, Joseph Bamidele Awotunde, and Hector Florez. 2024. Intrusion detection: A comparison study of machine learning models using unbalanced dataset.SN Computer Science5, 8 (2024), 1028
work page 2024
-
[2]
Abhishek Divekar, Meet Parekh, Vaibhav Savla, Rudra Mishra, and Mahesh Shirole. 2018. Benchmarking datasets for anomaly-based network intrusion detection: KDD CUP 99 alternatives. In2018 IEEE 3rd international conference on computing, communication and security (ICCCS). IEEE, 1–8
work page 2018
-
[3]
Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab, Madhusudan Mukherjee, and Helge Janicke. 2020. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study.Journal of Information Security and Applications50 (2020), 102419. doi:10.1016/j.jisa.2019.102419
-
[4]
Hardik Gunjal, Preetkumar Patel, and Dariush Ebrahimi. 2023. Smart network intrusion detection system for cyber security of industrial IoT. Authorea Preprints(2023)
work page 2023
-
[5]
Joseph F Hair Jr, G Tomas M Hult, Christian M Ringle, Marko Sarstedt, Nicholas P Danks, and Soumya Ray. 2021. An introduction to structural equation modeling. InPartial least squares structural equation modeling (PLS-SEM) using R: a workbook. Springer, 1–29
work page 2021
-
[6]
Pamela Hermosilla, Sebastián Berríos, and Héctor Allende-Cid. 2025. Explainable AI for Forensic Analysis: A Comparative Study of SHAP and LIME in Intrusion Detection Models.Applied Sciences15, 13 (2025), 7329
work page 2025
-
[7]
Giles Hooker, Lucas Mentch, and Siyu Zhou. 2021. Unrestricted permutation forces extrapolation: Variable importance requires at least one more model.The American Statistician75, 1 (2021), 65–72. doi:10.1080/00031305.2020.1802063
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2020.1802063 2021
-
[8]
2021.An Introduction to Statistical Learning(2nd ed.)
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021.An Introduction to Statistical Learning(2nd ed.). Springer. https: //www.statlearning.com
work page 2021
-
[9]
Marwa Keshk, Nickolaos Koroniotis, Nam Pham, Nour Moustafa, Benjamin Turnbull, and Albert Y Zomaya. 2023. An explainable deep learning- enabled intrusion detection framework in IoT networks.Information Sciences639 (2023), 119000
work page 2023
-
[10]
Amit Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle A. Friedler. 2020. Problems with Shapley-value-based explanations as feature importance measures. InProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5491–5500. https://proceedings.mlr.press/v119/kumar20b.html
work page 2020
-
[11]
Liyou Liu and Ming Xu. 2025. A network intrusion detection method based on contrastive learning and Bayesian Gaussian Mixture Model. Cybersecurity8, 1 (2025), 59
work page 2025
-
[12]
Muhammad Luqman, Muhammad Zeeshan, Qaiser Riaz, Mehdi Hussain, Hasan Tahir, Noman Mazhar, and Muhammad Saffeer Khan. 2025. Intelligent parameter-based in-network IDS for IoT using UNSW-NB15 and BoT-IoT datasets.Journal of the Franklin Institute362, 1 (2025), 107440
work page 2025
- [13]
-
[14]
Souhail Meftah, Tajjeeddine Rachidi, and Nasser Assem. 2019. Network based intrusion detection using the UNSW-NB15 dataset.International Journal of Computing and Digital Systems8, 5 (2019), 478–487
work page 2019
-
[15]
Shweta More, Moad Idrissi, Haitham Mahmoud, and A Taufiq Asyhari. 2024. Enhanced intrusion detection systems performance with UNSW-NB15 data analysis.Algorithms17, 2 (2024), 64
work page 2024
-
[16]
Nour Moustafa and Jill Slay. 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In2015 military communications and information systems conference (MilCIS). IEEE, 1–6
work page 2015
-
[17]
Robert M. O’Brien. 2007. A caution regarding rules of thumb for variance inflation factors.Quality & Quantity41, 5 (2007), 673–690. doi:10.1007/s11135- 006-9018-6
-
[18]
Kaivalya Rawal, Zihao Fu, Eoin Delaney, and Chris Russell. 2025. Evaluating Model Explanations without Ground Truth. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 3400–3411
work page 2025
-
[19]
Adel Thaljaoui. 2025. Intelligent network intrusion detection system using optimized deep CNN-LSTM with UNSW-NB15.International Journal of Information Technology(2025), 1–17. Manuscript submitted to ACM 24 Vourganas et al
work page 2025
-
[20]
Amol D Vibhute, Minhaj Khan, Chandrashekhar H Patil, Sandeep V Gaikwad, Arjun V Mane, and Kanubhai K Patel. 2024. Network anomaly detection and performance evaluation of Convolutional Neural Networks on UNSW-NB15 dataset.Procedia computer science235 (2024), 2227–2236
work page 2024
-
[21]
Ioannis J Vourganas and Anna Lito Michala. 2024. Applications of machine learning in cyber security: a review.Journal of Cybersecurity and Privacy 4, 4 (2024), 972–992
work page 2024
-
[22]
Zeinab Zoghi and Gursel Serpen. 2024. Building an intrusion detection system on UNSW-NB15: Reducing the margin of error to deal with data overlap and imbalance.Concurrency and Computation: Practice and Experience36, 25 (2024), e8242. A Theorem Proof Theorem derivation. Setting.Let 𝑋∈R 𝑛×𝑝 be the feature matrix, 𝑦∈R 𝑛, and consider the homoscedastic linear...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.