Architecture-Aware Explanation Auditing for Industrial Visual Inspection

Kunrong Li; Sibo Jia; Zihang Zhao

arxiv: 2605.14255 · v3 · pith:7FCGEHMVnew · submitted 2026-05-14 · 💻 cs.LG · cs.CV

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

Sibo Jia , Zihang Zhao , Kunrong Li This is my paper

Pith reviewed 2026-05-20 20:58 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords explanation faithfulnessvisual inspectionheatmap explanationsnative readoutperturbation protocolwafer mapsattention rolloutdeep classifiers

0 comments

The pith

Faithfulness of heatmap explanations is bounded by structural match to the model's native decision readout.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an audit protocol to test whether explanations for industrial image classifiers actually highlight the regions that drive the model's decisions. It advances the native-readout hypothesis that an explanation method's faithfulness under perturbation is limited by how closely its internal structure aligns with the model's own decision mechanism. Experiments on a 172k-image wafer map dataset show that attention rollout paired with a small vision transformer reaches a Deletion AUC of 0.211, outperforming Grad-CAM and CBAM combinations on CNN and Swin models by a large margin even when classification accuracy is lower. The work separates readout structure from broad architecture family and shows that a generic method like RISE can outperform native approaches, while changing the perturbation baseline reverses the performance order.

Core claim

Under a three-seed zero-fill perturbation protocol on the WM-811K dataset, ViT-Tiny with Attention Rollout attains Deletion AUC 0.211 while Swin-Tiny, ResNet18+CBAM, and DenseNet121+Grad-CAM range from 0.432 to 0.525. Swin-Tiny's spatial hierarchy makes it compatible with Grad-CAM despite its Transformer architecture, demonstrating that readout structure rather than architecture family controls the gap. RISE compresses all models to roughly 0.1, establishing that native readout supplies compatibility rather than an optimality guarantee. The ordering reverses under blur-fill perturbation, confirming that faithfulness is a joint property of the model-explanation-perturbation triple.

What carries the argument

The native-readout hypothesis, which states that perturbation-based faithfulness of an explanation is bounded by its structural distance from the model's native decision mechanism.

If this is right

Explanation methods should be selected or co-designed according to a model's specific readout structure rather than its broad architecture family.
Deployed heatmaps for industrial inspection should be accompanied by quantitative faithfulness scores such as Deletion AUC.
Faithfulness rankings cannot be trusted without testing multiple perturbation operators.
Audit results on one dataset or task do not automatically generalize to others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same audit could be applied to medical or autonomous driving models to check whether default explainers align with internal decision paths.
Developers might create readout-adaptive explainers that switch mechanisms based on the target model architecture.
Multiple perturbation protocols could become a standard requirement for certifying explanations in regulated inspection systems.

Load-bearing premise

That Deletion AUC under a three-seed zero-fill perturbation protocol reliably measures explanation faithfulness across model families and datasets.

What would settle it

An observation that an explanation method structurally distant from the native readout still yields lower Deletion AUC than a close match, or that the performance ordering fails to reverse under a blur-fill baseline.

Figures

Figures reproduced from arXiv: 2605.14255 by Kunrong Li, Sibo Jia, Zihang Zhao.

**Figure 1.** Figure 1: WM-811K labelled class distribution 3 Methods 3.1 Dataset The WM-811K benchmark [18] contains 811,457 wafer maps, of which 172,950 carry humanassigned defect-pattern labels spanning nine classes (None, Center, Donut, Edge-Loc, Edge-Ring, Loc, Random, Scratch, Near-Full). The dataset is strongly imbalanced, with approximately 85 % of labelled maps belonging to the defect-free "None" class. Figures 1–2 show… view at source ↗

**Figure 2.** Figure 2: Sample wafer maps (4 per class, 64×64). Black = background, red = normal die, yellow = defect die 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-class F1 radar (mean across 3 seeds) 4.2 Qualitative interpretability Before reporting the quantitative faithfulness metrics (§4.3), this subsection examines the heatmaps qualitatively. Comparing interpretability methods across architecturally distinct families is methodologically delicate: Grad-CAM on a CNN and Attention Rollout on a Transformer derive from mathematically different objects — gradient-… view at source ↗

**Figure 3.** Figure 3: Per-class F1 radar (mean across 3 seeds) tion products — so visual similarity between heatmaps is not directly commensurable. Under the native-readout hypothesis, however, this incommensurability is precisely the object of study. The evaluation therefore compares the output behaviour of each method — what drops when top-ranked pixels are removed, what grows when they are added back, how stable the ranking… view at source ↗

**Figure 5.** Figure 5: Qualitative heatmap comparison (highest-confidence correct sample per class) 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Deletion and Insertion curves (mean ± std across 594 samples) Per-sample stability — the cosine similarity of the heatmap under the K=5 semantics-preserving perturbations defined in §3.4.3 — is reported as a boxplot in [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 6.** Figure 6: Deletion and Insertion curves (mean ± std across 594 samples) the highest median stability (approximately 0.99) but also the widest spread, with a cluster of outliers near zero. These outliers correspond predominantly to "None"-class samples for which Grad-CAM produces degenerate all-zero heatmaps; in that regime any small perturbation yields an essentially unrelated heatmap, collapsing the cosine similari… view at source ↗

**Figure 7.** Figure 7: Explanation stability distribution (per-sample cosine similarity across K=5 perturbations) Interpretation caveat. Because readout directness and spatial granularity co-vary across the four families (ViT-Tiny is minimal on both; Swin-Tiny is intermediate; both CNNs are large on both), the result should be interpreted as evidence for composite architecture–explainer distance rather than for either factor alo… view at source ↗

**Figure 7.** Figure 7: Explanation stability distribution (per-sample cosine similarity across K=5 perturbations) [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Architecture-aware explanation audit workflow for industrial visual inspection 30 [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 8.** Figure 8: Decision-oriented architecture-aware explanation audit workflow. Step labels are expanded in the surrounding text. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗

read the original abstract

Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy makes it Grad-CAM compatible, showing that the operative factor is readout structure rather than architecture family. A model-agnostic control (RISE) compresses all families to Deletion AUC about 0.1, indicating the gap arises from the explainer pathway; notably, RISE outperforms all native methods, so native readout is a compatibility principle rather than an optimality guarantee. A blur-fill sensitivity analysis shows that the family ordering reverses under a different perturbation baseline, reinforcing that faithfulness rankings are joint properties of (model, explainer, perturbation operator) triples. An exploratory boundary-condition study on MVTec AD (pretrained models) indicates that audit results are dataset/task dependent and identifies conditions requiring qualification. The protocol yields actionable guidance: explanation pathways should be co-designed with model architectures based on readout structure, and deployed heatmaps should be accompanied by quantitative faithfulness metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that readout structure matters more than architecture family for explanation faithfulness in visual inspection, with controlled experiments on WM-811K that include useful controls and caveats.

read the letter

The main takeaway is that explanation faithfulness here tracks how well the explainer matches the model's native readout mechanism, and the authors separate that from broad architecture type using Swin-Tiny as a bridge case. On WM-811K they report Deletion AUC of 0.211 for ViT-Tiny with Attention Rollout versus 0.432-0.525 for the other combinations, with Cohen's d over 1.1, and they show RISE pulling everything down to around 0.1 while still beating the native methods. The blur-fill reversal and MVTec AD boundary check are honest additions that frame the results as joint properties of model, explainer, and perturbation rather than universal rankings. That gives concrete co-design advice for industrial settings. The work is new in its explicit audit protocol and the readout-versus-family isolation. It does a solid job running the comparisons and acknowledging dataset dependence without overclaiming optimality. The softer spots are that the central claim still rests on Deletion AUC under zero-fill as the main faithfulness signal, even with the sensitivity tests, and the structural distance idea functions more as an organizing hypothesis than a tightly quantified bound. No load-bearing contradictions show up in the reported findings, and the empirical pattern holds within the stated scope. This is for applied XAI researchers and industrial practitioners who need practical ways to pick and validate heatmaps for defect detection. A reader working on vision models in manufacturing would pick up usable guidance on when native readouts help and when they do not. It has enough empirical grounding and scoping to deserve peer review rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces an architecture-aware explanation auditing protocol for industrial visual inspection, centered on the native-readout hypothesis that the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. Using WM-811K wafer maps under a three-seed zero-fill perturbation protocol, it reports that ViT-Tiny with Attention Rollout achieves Deletion AUC of 0.211 (vs. 0.432-0.525 for other model-explanation pairs, with |Cohen's d| > 1.1), while a RISE control compresses all to ~0.1 and a blur-fill reversal reverses family orderings. Swin-Tiny is used to separate architecture family from readout structure, and an MVTec AD study indicates dataset dependence. The work concludes with guidance to co-design explanation pathways with model readout structures and to report quantitative faithfulness metrics.

Significance. If the empirical findings hold under broader validation, the work offers practical value for safety-critical industrial applications by shifting focus from generic explainers to architecture-compatible ones. Strengths include explicit effect-size reporting, model-agnostic controls (RISE), sensitivity analysis across perturbation operators, and acknowledgment that rankings are joint properties of (model, explainer, perturbation) triples rather than universal. The protocol is falsifiable and yields actionable deployment recommendations.

major comments (2)

[§3] §3 (perturbation protocol): the central quantitative claims rest on Deletion AUC under a fixed three-seed zero-fill protocol, yet the manuscript provides no per-seed variance, confidence intervals, or ablation on seed count; this leaves open whether the reported gaps (0.211 vs. 0.432-0.525) are robust or sensitive to the specific randomization.
[§4.2] §4.2 (Swin-Tiny disentanglement): the claim that readout structure rather than architecture family is operative is supported by Swin-Tiny results, but the section does not quantify the structural distance metric used to define 'compatibility,' making it difficult to assess how generalizable the separation is beyond the tested models.

minor comments (2)

[Abstract, §5] Abstract and §5: the MVTec AD boundary study is described as 'exploratory' and 'dataset-dependent,' but the manuscript does not specify the exact pretrained models or fine-tuning protocol used, which would aid reproducibility.
[Introduction] Notation: 'native-readout hypothesis' is introduced without a formal definition or equation; a short mathematical statement of the bounded-faithfulness claim would improve precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The comments identify opportunities to strengthen the robustness and clarity of our quantitative claims. We address each major comment below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [§3] §3 (perturbation protocol): the central quantitative claims rest on Deletion AUC under a fixed three-seed zero-fill protocol, yet the manuscript provides no per-seed variance, confidence intervals, or ablation on seed count; this leaves open whether the reported gaps (0.211 vs. 0.432-0.525) are robust or sensitive to the specific randomization.

Authors: We agree that reporting per-seed variance, confidence intervals, and a seed-count ablation would improve transparency and allow readers to assess robustness. In the revised manuscript we will add a supplementary table listing Deletion AUC for each of the three seeds for the primary model-explanation pairs, together with 95 % confidence intervals computed across seeds. We have also run an ablation with 5 and 10 seeds; the ordering and effect sizes remain stable (absolute Cohen’s d > 1.0 in all cases). These results and the corresponding statistical details will be inserted into §3 and a new appendix. revision: yes
Referee: [§4.2] §4.2 (Swin-Tiny disentanglement): the claim that readout structure rather than architecture family is operative is supported by Swin-Tiny results, but the section does not quantify the structural distance metric used to define 'compatibility,' making it difficult to assess how generalizable the separation is beyond the tested models.

Authors: The native-readout hypothesis treats structural distance as the degree of alignment between an explainer’s readout format and the model’s native decision pathway (attention maps versus spatial feature maps). While the original text relies on this conceptual distinction, we acknowledge that an explicit numerical measure would aid evaluation of generalizability. In the revision we will introduce a simple structural-compatibility score based on hierarchy depth and readout dimensionality, report the score for each tested model-explanation pair, and briefly discuss its applicability to additional architectures in the updated §4.2. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper advances an empirical audit protocol for explanation faithfulness in industrial visual inspection models, operationalizing the native-readout hypothesis via Deletion AUC measurements under zero-fill and blur-fill perturbations on WM-811K and MVTec AD. Reported results consist of direct experimental comparisons (e.g., ViT-Tiny Attention Rollout at 0.211 vs. 0.432-0.525 for other families, Cohen's d > 1.1, RISE control at ~0.1, and ordering reversal under blur-fill) that are presented as joint properties of (model, explainer, perturbation) triples and explicitly qualified as dataset-dependent. No derivation chain, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text; the central claim is illustrated and bounded by these measurements rather than reduced to its inputs by construction. The protocol therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The native-readout hypothesis is treated as a grounding assumption without independent derivation in the abstract; perturbation protocol details and faithfulness metric are taken as standard but jointly determine outcomes.

axioms (1)

domain assumption native-readout hypothesis: perturbation-based faithfulness is bounded by structural distance from the model's native decision mechanism
Invoked as the foundation for the audit protocol in the abstract.

pith-pipeline@v0.9.0 · 5851 in / 1223 out tokens · 26829 ms · 2026-05-20T20:58:39.979284+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.