{"paper":{"title":"Architecture-Aware Explanation Auditing for Industrial Visual Inspection","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"The faithfulness of heatmap explanations is bounded by structural distance to the model's native decision mechanism.","cross_cats":["cs.CV"],"primary_cat":"cs.LG","authors_text":"Kunrong Li, Sibo Jia, Zihang Zhao","submitted_at":"2026-05-14T01:48:00Z","abstract_excerpt":"Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollo"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"that the chosen perturbation protocols (zero-fill and blur-fill) provide an unbiased measure of faithfulness without introducing artifacts that favor certain readout structures over others","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Explanation faithfulness for deep classifiers on wafer maps is highest when the explainer matches the model's native readout structure, with ViT-Tiny plus Attention Rollout achieving lower Deletion AUC than mismatched methods despite lower accuracy.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"The faithfulness of heatmap explanations is bounded by structural distance to the model's native decision mechanism.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c795b659e5ecdd992db2a83a8afaf5a7950091e58d59d59044a31dd654abd569"},"source":{"id":"2605.14255","kind":"arxiv","version":1},"verdict":{"id":"9d33063f-8d93-44be-9c43-cb039d42cf8b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:38:15.539309Z","strongest_claim":"the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism","one_line_summary":"Explanation faithfulness for deep classifiers on wafer maps is highest when the explainer matches the model's native readout structure, with ViT-Tiny plus Attention Rollout achieving lower Deletion AUC than mismatched methods despite lower accuracy.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"that the chosen perturbation protocols (zero-fill and blur-fill) provide an unbiased measure of faithfulness without introducing artifacts that favor certain readout structures over others","pith_extraction_headline":"The faithfulness of heatmap explanations is bounded by structural distance to the model's native decision mechanism."},"references":{"count":23,"sample":[{"doi":"","year":2016,"title":"He, K., Zhang, X., Ren, S., & Sun, J.Deep Residual Learning for Image Recognition. CVPR, 2016.https://www.cv-foundation.org/openaccess/content_cvpr_2016/pap ers/He_Deep_Residual_Learning_CVPR_2016_pap","work_id":"f94e5013-f1fa-48d3-b5e7-c46a36ca1986","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"CBAM: Convolutional Block Attention Module","work_id":"93e1d222-f117-4eb8-91b3-af67da1aa59d","ref_index":2,"cited_arxiv_id":"1807.06521","is_internal_anchor":true},{"doi":"","year":2017,"title":"Q.Densely Connected Convo- lutional Networks","work_id":"3a74b10b-f013-44f7-bd18-556db7f58906","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"et al.An Image Is Worth 16×16 Words: Transformers for Image Recogni- tion at Scale","work_id":"990ba93b-2c58-4677-96fc-b40c7c5eb967","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Selvaraju, R. R. et al.Grad-CAM: Visual Explanations from Deep Networks via Gradient- Based Localization. ICCV, 2017.https://openaccess.thecvf.com/content_ICCV_201 7/papers/Selvaraju_Grad-CAM_Visual_E","work_id":"9d88b010-2ccf-4c37-ad56-521173bf6056","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":23,"snapshot_sha256":"e54b5822545767b9b5b8cc8937c32f114321fa7bc87344fc7be9ec8dfe0225d6","internal_anchors":6},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}