arxiv: 2604.09628 · v2 · submitted 2026-03-19 · 💻 cs.CY · cs.AI

Recognition: no theorem link

Assessing Model-Agnostic XAI Methods against EU AI Act Explainability Requirements

Francesco Sovrano , Giulia Vilone , Michael Lognoul

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:52 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords XAIEU AI Actexplainabilitycompliance scoringmodel-agnostic methodsinterpretabilityregulatory requirements

0 comments

The pith

A scoring framework converts expert judgments on XAI features into compliance scores for the EU AI Act.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a system that takes qualitative expert reviews of model-agnostic XAI methods and turns them into quantitative scores tied to the explainability rules in the EU AI Act. This step addresses the current mismatch between what technical XAI tools can provide and what the regulation demands for AI systems sold or used in Europe. A sympathetic reader would see this as practical help for companies that need to pick or adapt explanation methods to avoid legal risks. The work also flags specific gaps in existing methods that call for more technical development and clearer rules from regulators.

Core claim

The authors propose a qualitative-to-quantitative scoring framework in which expert assessments of XAI properties are aggregated into a regulation-specific compliance score that relates model-agnostic explanation methods directly to the requirements of the EU AI Act.

What carries the argument

The qualitative-to-quantitative scoring framework that aggregates expert assessments of interpretability features into compliance scores aligned with EU AI Act rules.

If this is right

Companies can use the scores to select or adjust XAI methods that better meet EU legal explanation duties.
The framework reveals concrete technical shortcomings in current model-agnostic XAI tools that need further research.
Practitioners receive guidance on closing the gap between technical capabilities and regulatory demands in the EU market.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The scoring approach could be applied to other emerging AI regulations outside the EU once similar requirements are defined.
Automating parts of the expert assessment step would allow faster and more repeatable use of the framework at scale.
Validation against actual enforcement outcomes would strengthen the link between the numerical scores and legal risk.

Load-bearing premise

Expert judgments about XAI interpretability features can be turned into reliable numerical scores that match the legal requirements of the EU AI Act.

What would settle it

Empirical tests in which the framework's compliance scores fail to predict whether a given XAI method actually satisfies EU regulators during real compliance reviews or audits.

Figures

Figures reproduced from arXiv: 2604.09628 by Francesco Sovrano, Giulia Vilone, Michael Lognoul.

**Figure 2.** Figure 2: Results of the sensitivity analysis over the compliance scores related to [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

Explainable AI (XAI) has evolved in response to expectations and regulations, such as the EU AI Act, which introduces regulatory requirements on AI-powered systems. However, a persistent gap remains between existing XAI methods and society's legal requirements, leaving practitioners without clear guidance on how to approach compliance in the EU market. To bridge this gap, we study model-agnostic XAI methods and relate their interpretability features to the requirements of the AI Act. We then propose a qualitative-to-quantitative scoring framework: qualitative expert assessments of XAI properties are aggregated into a regulation-specific compliance score. This helps practitioners identify when XAI solutions may support legal explanation requirements while highlighting technical issues that require further research and regulatory clarification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript studies model-agnostic XAI methods and maps their interpretability features to the explainability requirements of the EU AI Act. It proposes a qualitative-to-quantitative scoring framework in which expert assessments of XAI properties are aggregated into regulation-specific compliance scores intended to guide practitioners on legal compliance.

Significance. If the aggregation procedure can be shown to be reliable and legally aligned, the framework would supply a concrete tool for selecting XAI methods under the EU AI Act and would usefully flag technical gaps that still require regulatory clarification. The interdisciplinary linkage between XAI properties and specific legal criteria is a timely contribution.

major comments (1)

[Framework description (following the abstract)] The central aggregation step that converts qualitative expert ratings into numeric compliance scores is described only at a high level; no inter-rater reliability statistics, weighting scheme, normalization procedure, or calibration against actual regulatory decisions or case law is supplied. This absence directly undermines the claim that the resulting scores meaningfully indicate support for EU AI Act requirements.

minor comments (1)

The abstract refers to “model-agnostic XAI methods” without enumerating the concrete methods examined or the criteria used to select them.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on the framework's aggregation procedure. We address the major comment below and will revise the manuscript to provide greater transparency.

read point-by-point responses

Referee: The central aggregation step that converts qualitative expert ratings into numeric compliance scores is described only at a high level; no inter-rater reliability statistics, weighting scheme, normalization procedure, or calibration against actual regulatory decisions or case law is supplied. This absence directly undermines the claim that the resulting scores meaningfully indicate support for EU AI Act requirements.

Authors: We agree that the aggregation step is described at a high level and that additional detail is needed to support the framework's claims. In the revised manuscript we will expand the methods section to explicitly describe the aggregation procedure, including the weighting scheme (derived from mapping XAI properties to specific AI Act articles), the normalization steps to produce [0,1] compliance scores, and the rationale for treating the authors' expert assessments as the initial input. We will also add a dedicated limitations subsection that reports the absence of formal inter-rater reliability statistics and discusses the implications. Regarding calibration against regulatory decisions or case law, we will clarify that such calibration is not feasible at present because the AI Act has only recently been adopted and relevant precedents remain limited; this will be framed as an important direction for future work rather than a current capability of the framework. revision: yes

standing simulated objections not resolved

Calibration of the compliance scores against actual regulatory decisions or case law, given the recent adoption of the EU AI Act and the current scarcity of relevant precedents.

Circularity Check

0 steps flagged

No circularity: framework is a proposed aggregation method without reduction to fitted inputs or self-citations

full rationale

The paper proposes a qualitative-to-quantitative scoring framework that aggregates expert assessments of XAI properties into a compliance score aligned with the EU AI Act. No equations, derivations, or load-bearing steps are presented that reduce by construction to prior fitted parameters, self-defined quantities, or unverified self-citations. The central claim is methodological and independent of any quantitative inputs from the same work; it does not rename known results or smuggle ansatzes via citation chains. This is a standard non-circular proposal of a new assessment approach.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only; no explicit free parameters, axioms, or independent evidence are described. The scoring framework itself is treated as the main invented entity.

invented entities (1)

qualitative-to-quantitative scoring framework no independent evidence
purpose: Aggregate expert assessments of XAI properties into regulation-specific compliance scores
Introduced in the abstract as the primary contribution to bridge XAI methods and EU AI Act requirements

pith-pipeline@v0.9.0 · 5419 in / 1047 out tokens · 36018 ms · 2026-05-15T08:52:39.372697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Information fusion58, 82–115 (2020), https://doi.org/10.1016/j.inffus.2019.12.012

Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., et al.: Explainableartificialintelligence(xai):Concepts,taxonomies,opportunities and challenges toward responsible ai. Information fusion58, 82–115 (2020), https://doi.org/10.1016/j.inffus.2019.12.012

work page doi:10.1016/j.inffus.2019.12.012 2020
[2]

Interpreting Blackbox Models via Model Extraction

Bastani, O., Kim, C., Bastani, H.: Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[3]

Kluwer Academic Publishers, Boston, MA (2002)

Belton, V., Stewart, T.J.: Multiple Criteria Decision Analysis: An Inte- grated Approach. Kluwer Academic Publishers, Boston, MA (2002)

work page 2002
[4]

Artificial Intelligence and Law29, 149– 169 (2021),https://doi.org/10.1007/s10506-020-09270-4

Bibal, A., Lognoul, M., De Streel, A., Frénay, B.: Legal requirements on explainability in machine learning. Artificial Intelligence and Law29, 149– 169 (2021),https://doi.org/10.1007/s10506-020-09270-4

work page doi:10.1007/s10506-020-09270-4 2021
[5]

Oxford University Press, USA (2020),https://doi.org/10.1093/oso/ 9780190088583.001.0001

Bradford, A.: The Brussels effect: How the European Union rules the world. Oxford University Press, USA (2020),https://doi.org/10.1093/oso/ 9780190088583.001.0001

work page doi:10.1093/oso/ 2020
[6]

Principles of data mining pp

Bramer, M.: Avoiding overfitting of decision trees. Principles of data mining pp. 119–134 (2007)

work page 2007
[7]

International Review of Law, Computers & Technol- ogy pp

Bringas Colmenarejo, A., State, L., Comandé, G.: How should an explana- tion be? a mapping of technical and legal desiderata of explanations for ma- chine learning models. International Review of Law, Computers & Technol- ogy pp. 1–32 (2025),https://doi.org/10.1080/13600869.2025.2497633

work page doi:10.1080/13600869.2025.2497633 2025
[8]

In: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, pp

Chen, Z., Subhash, V., Havasi, M., Pan, W., Doshi-Velez, F.: What makes a good explanation?: A harmonized view of properties of explanations. In: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, pp. 1–11 (2022),https://doi.org/10.48550/arXiv.2211. 05667, URLhttps://openreview.net/forum?id=TnFHizNosji

work page doi:10.48550/arxiv.2211 2022
[9]

Omega96, 102261 (2020),https://doi.org/ 10.1016/j.omega.2020.102261

Cinelli, M., Kadziński, M., Gonzalez, M., Słowiński, R.: How to support the application of multiple criteria decision analysis? let us start with a Assessing XAI Methods against AI Act Requirements 17 comprehensive taxonomy. Omega96, 102261 (2020),https://doi.org/ 10.1016/j.omega.2020.102261

work page doi:10.1016/j.omega.2020.102261 2020
[10]

europa.eu/eli/reg/2024/1689/oj/eng

Commission, E.: Artificial intelligence act (2024), URLhttps://eur-lex. europa.eu/eli/reg/2024/1689/oj/eng

work page 2024
[11]

In: Hasan, M.A., Xiong, L

Cugny, R., Aligon, J., Chevalier, M., Roman-Jimenez, G., Teste, O.: Au- toxai: A framework to automatically select the most adapted XAI solution. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM Interna- tional Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, pp. 315–324, ACM (2022),https://doi.org/ 10....

work page doi:10.1145/3511808.3557247 2022
[12]

In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R

Dhurandhar, A., Chen, P., Luss, R., Tu, C., Ting, P., Shanmugam, K., Das, P.: Explanations based on the missing: Towards contrastive explana- tions with pertinent negatives. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Info...

work page 2018
[13]

International journal of qualitative methods (2006),https: //doi.org/10.1177/160940690600500107

Fereday, J., Muir-Cochrane, E.: Demonstrating rigor using thematic analy- sis: A hybrid approach of inductive and deductive coding and theme de- velopment. International journal of qualitative methods (2006),https: //doi.org/10.1177/160940690600500107

work page doi:10.1177/160940690600500107 2006
[14]

In: Das, S., Green, B.P., Varshney, K., Ganap- ini, M., Renda, A

Frész, B., Dubovitskaya, E., Brajovic, D., Huber, M.F., Horz, C.: How should AI decisions be explained? requirements for explanations from the perspective of european law. In: Das, S., Green, B.P., Varshney, K., Ganap- ini, M., Renda, A. (eds.) Proceedings of the Seventh AAAI/ACM Con- ference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, Oc...

work page doi:10.1609/aies.v7i1.31648 2024
[15]

Annals of statistics pp

Friedman, J.H.: Greedy function approximation: a gradient boosting ma- chine. Annals of statistics pp. 1189–1232 (2001),https://doi.org/10. 1214/aos/1013203451

work page arXiv 2001
[16]

The Annals of Applied Statistics pp

Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. The Annals of Applied Statistics pp. 916–954 (2008),https://doi.org/10. 1214/07-AOAS148

work page 2008
[17]

journal of Computational and Graphical Statistics24(1), 44–65 (2015),https://doi.org/10.1080/10618600.2014.907095

Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional ex- pectation. journal of Computational and Graphical Statistics24(1), 44–65 (2015),https://doi.org/10.1080/10618600.2014.907095

work page doi:10.1080/10618600.2014.907095 2015
[18]

Gyevnar, B., Ferguson, N., Schafer, B.: Bridging the transparency gap: What can explainable AI learn from the AI act? In: Gal, K., Nowé, A., Nalepa, G.J., Fairstein, R., Radulescu, R. (eds.) ECAI 2023 - 26th Euro- pean Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland - Including 12th Conference on Prestigious Applicati...

work page doi:10.3233/faia230367 2023
[19]

Intelligent Automation & Soft Computing39(6) (2024)

Halabaku, E., Bytyçi, E.: Overfitting in machine learning: A comparative analysis of decision trees and random forests. Intelligent Automation & Soft Computing39(6) (2024)

work page 2024
[20]

Information Fusion79, 263– 278 (2022),https://doi.org/10.1016/j.inffus.2021.10.007

Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I., Díaz-Rodríguez, N.: Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion79, 263– 278 (2022),https://doi.org/10.1016/j.inffus.2021.10.007

work page doi:10.1016/j.inffus.2021.10.007 2022
[21]

Liao, Q.V., Pribic, M., Han, J., Miller, S., Sow, D.: Question-driven design processforexplainableAIuserexperiences.CoRRabs/2104.03483(2021), URLhttps://arxiv.org/abs/2104.03483

work page arXiv 2021
[22]

Longo, L., Brcic, M., Cabitza, F., Choi, J., Confalonieri, R., Ser, J.D., Guidotti, R., Hayashi, Y., Herrera, F., Holzinger, A., Jiang, R., Khosravi, H., Lécué, F., Malgieri, G., Páez, A., Samek, W., Schneider, J., Speith, T., Stumpf, S.: Explainable artificial intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research direction...

work page doi:10.1016/j.inffus.2024.102301 2024
[23]

In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R

Lundberg, S.M., Lee, S.: A unified approach to interpreting model predic- tions. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Infor- mation Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp...

work page 2017
[24]

The Journal of Financial Data Science (2020),https:// doi.org/10.3905/jfds.2020.1.047

Man, X., Chan, E.P.: The best way to select features? comparing mda, lime, and shap. The Journal of Financial Data Science (2020),https:// doi.org/10.3905/jfds.2020.1.047

work page doi:10.3905/jfds.2020.1.047 2020
[25]

In: Hildebrandt, M., Castillo, C., Celis, L.E., Ruggieri, S., Taylor, L., Zanfir-Fortuna, G

Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Hildebrandt, M., Castillo, C., Celis, L.E., Ruggieri, S., Taylor, L., Zanfir-Fortuna, G. (eds.) FAT* ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, pp. 607–617, ACM (2020),http...

work page doi:10.1145/3351095.3372850 2020
[26]

ACM Comput

Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., van Keulen, M., Seifert, C.: From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating ex- plainable AI. ACM Comput. Surv.55(13s), 295:1–295:42 (2023),https: //doi.org/10.1145/3583558, URLhttps://doi.org/10.1145/3583558

work page doi:10.1145/3583558 2023
[27]

Panigutti, C., Hamon, R., Hupont, I., Llorca, D.F., Yela, D.F., Junkle- witz, H., Scalzo, S., Mazzini, G., Sánchez, I., Garrido, J.S., Gómez, E.: The role of explainable AI in the context of the AI act. In: Proceed- Assessing XAI Methods against AI Act Requirements 19 ings of the 2023 ACM Conference on Fairness, Accountability, and Trans- parency, FAccT 2...

work page doi:10.1145/3593013.3594069 2023
[28]

Why should I trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: "why should I trust you?": Ex- plaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135...

work page doi:10.1145/2939672.2939778 2016
[29]

In: McIlraith, S.A., Weinberger, K.Q

Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: High-precision model- agnostic explanations. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Pro- ceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18),the30thinnovativeApplicationsofArtificialIntelligence(IAAI- 18),andthe8thAAAISymposiumonEducationalAdvancesinArtificialIn- tellig...

work page doi:10.1609/aaai.v32i1 2018
[30]

Digital Soci- ety3(1), 1 (2024),https://doi.org/10.1007/s44206-023-00081-z

Richmond,K.M.,Muddamsetty,S.M.,Gammeltoft-Hansen,T.,Olsen,H.P., Moeslund, T.B.: Explainable ai and law: an evidential survey. Digital Soci- ety3(1), 1 (2024),https://doi.org/10.1007/s44206-023-00081-z

work page doi:10.1007/s44206-023-00081-z 2024
[31]

Sovrano, F.: How to explain: from theory to practice. Ph.D. thesis, University of Bologna (June 2023),https://doi.org/10.48676/unibo/ amsdottorato/10943, URLhttp://amsdottorato.unibo.it/10943/

work page doi:10.48676/unibo/ 2023
[32]

CoRRabs/2505.11189 (2025),https://doi.org/10.48550/ARXIV.2505.11189, URLhttps:// doi.org/10.48550/arXiv.2505.11189

Sovrano, F.: Can global xai methods reveal injected behaviours in llms? shap vs rule extraction vs ruleshap. CoRRabs/2505.11189 (2025),https://doi.org/10.48550/ARXIV.2505.11189, URLhttps:// doi.org/10.48550/arXiv.2505.11189

work page doi:10.48550/arxiv.2505.11189 2025
[33]

Sovrano, F., Hine, E., Anzolut, S., Bacchelli, A.: Simplifying soft- ware compliance: AI technologies in drafting technical documenta- tion for the AI act. Empir. Softw. Eng.30(3), 91 (2025),https: //doi.org/10.1007/S10664-025-10645-X, URLhttps://doi.org/10. 1007/s10664-025-10645-x

work page doi:10.1007/s10664-025-10645-x 2025
[34]

In: Endriss, U., Melo, F.S., Bach, K., Diz, A.J.B., Alonso-Moral, J.M., Barro, S., Heintz, F

Sovrano, F., Lognoul, M., Vilone, G.: Aligning XAI with EU regulations for smart biomedical devices: A methodology for compliance analysis. In: Endriss, U., Melo, F.S., Bach, K., Diz, A.J.B., Alonso-Moral, J.M., Barro, S., Heintz, F. (eds.) ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain -...

work page doi:10.3233/faia240568 2024
[35]

In: Schweighofer, E

Sovrano, F., Sapienza, S., Palmirani, M., Vitali, F.: A survey on meth- ods and metrics for the assessment of explainability under the proposed AI act. In: Schweighofer, E. (ed.) Legal Knowledge and Information Sys- 20 F. Sovrano et al. tems - JURIX 2021: The Thirty-fourth Annual Conference, Vilnius, Lithua- nia, 8-10 December 2021, Frontiers in Artificia...

work page doi:10.3233/faia210342 2021
[36]

In: World Conference on Explainable Artificial Intelligence, pp

Sovrano, F., Vitali, F.: Perlocution vs illocution: How different interpreta- tions of the act of explaining impact on the evaluation of explanations and XAI. In: Longo, L. (ed.) Explainable Artificial Intelligence - First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part I, Communications in Computer and Information Scienc...

work page doi:10.1007/978-3-031-44064-9_2 2023
[37]

In: Ko, A., Francesconi, E., Kotsis, G., Tjoa, A.M., Khalil, I

Sovrano, F., Vitali, F., Palmirani, M.: Modelling gdpr-compliant explana- tions for trustworthy AI. In: Ko, A., Francesconi, E., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) Electronic Government and the Information Sys- tems Perspective - 9th International Conference, EGOVIS 2020, Bratislava, Slovakia, September 14-17, 2020, Proceedings, Lecture Notes in Co...

work page doi:10.1007/978-3-030-58957-8_16 2020
[38]

Vilone, G., Longo, L.: Classification of explainable artificial intelligence methods through their output formats. Mach. Learn. Knowl. Extr.3(3), 615–661 (2021),https://doi.org/10.3390/MAKE3030032, URLhttps: //doi.org/10.3390/make3030032

work page doi:10.3390/make3030032 2021
[39]

Information Fusion76, 89–106 (2021), https://doi.org/10.1016/j.inffus.2021.05.009

Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion76, 89–106 (2021), https://doi.org/10.1016/j.inffus.2021.05.009

work page doi:10.1016/j.inffus.2021.05.009 2021
[40]

International data privacy law7(2), 76–99 (2017),https:// doi.org/10.1093/idpl/ipx005

Wachter, S., Mittelstadt, B., Floridi, L.: Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International data privacy law7(2), 76–99 (2017),https:// doi.org/10.1093/idpl/ipx005

work page doi:10.1093/idpl/ipx005 2017