A Mechanistic Explanatory Strategy for XAI

Marcin Rabiza

arxiv: 2411.01332 · v5 · pith:XMLITNLZnew · submitted 2024-11-02 · 💻 cs.LG · cs.AI

A Mechanistic Explanatory Strategy for XAI

Marcin Rabiza This is my paper

classification 💻 cs.LG cs.AI

keywords mechanisticbroaderdeepexplainableexplanationexplanatoryresearchstrategy

0 comments

read the original abstract

Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align these theoretical approaches with mechanistic interpretability research from OpenAI and Anthropic. The findings suggest that pursuing mechanistic explanations can uncover elements that traditional explainability techniques may overlook, ultimately contributing to more thoroughly explainable AI

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mechanistic Interpretability Needs Philosophy
cs.CL 2025-06 unverdicted novelty 4.0

The paper claims that mechanistic interpretability needs philosophy as a partner to clarify concepts, refine methods, and navigate epistemic and ethical complexities in AI systems.