Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites
Pith reviewed 2026-05-20 05:28 UTC · model grok-4.3
The pith
Extending ResNet-50 with a prototype layer matches black-box accuracy on SiC/SiC X-ray defect detection while tracing each decision to expert semantic categories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
p-ResNet-50 couples high-accuracy defect detection with case-based explanations by extending ResNet-50 with a prototype layer whose six learned prototypes align directly with five expert-defined semantic categories of material features in XCT images; anchor-based and medoid-based regularizations keep the prototypes stable and representative, yielding accuracy 0.957 and ROC-AUC 0.994 on 12,000 patches while producing traceable evidence patches and uncertainty flags for every decision.
What carries the argument
The prototype layer that learns six prototypes aligned to expert semantic categories of SiC/SiC features and applies anchor-based and medoid-based regularizations to maintain their separation and relevance to physical patches.
Load-bearing premise
The six learned prototypes stay stably aligned with the five expert semantic categories across the full range of real aerospace components without the regularizations introducing bias into accept/reject decisions.
What would settle it
Running the trained model on a new collection of SiC/SiC XCT specimens and checking whether the prototypes remain matched to the expert categories or whether sensitivity and specificity diverge from the reported values.
Figures
read the original abstract
Non-destructive testing of aerospace SiC/SiC composites via X-ray computed tomography (XCT) relies on expert visual assessment, with current workflows offering limited traceability for accept/reject decisions. Deep convolutional networks can automate defect detection, yet their black-box nature conflicts with the transparency that industrial inspection practice demands. To close this gap, we introduce p-ResNet-50, a convolutional framework extended with a prototype layer that couples high detection accuracy with case-based explanations. Six learned prototypes are explicitly aligned with expert-defined semantic categories-healthy matrix, matrix--air interfaces, pores, line-like defects, and mixed morphologies-so that every classification is traceable to a physically meaningful reference. Two novel regularisation terms, anchor-based and medoid-based, tether prototypes to expert-selected patches and prevent prototype collapse, addressing a known limitation of prototype networks. Latent-space analysis via UMAP delineates semantically coherent sub-domains and maps zones of uncertainty where misclassifications concentrate, giving inspectors an explicit picture of where the model is-and is not-reliable. The framework is validated on an XCT patch dataset of approximately 12,000 patches extracted from four defect-rich SiC/SiC laboratory specimens. Taking a black-box ResNet-50 as a baseline (ROC-AUC = 0.991), the prototype extension achieves comparable performance (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) while trading a slight reduction in sensitivity for higher precision and specificity. Each decision is backed by representative evidence patches, and the model explicitly flags its uncertainty regions. Beyond defect mapping, the framework establishes a reusable methodology for embedding domain-expert knowledge into prototype networks, applicable to other XCT inspection scenarios requiring traceable, auditable decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces p-ResNet-50, a ResNet-50 extension incorporating a prototype layer for interpretable defect detection in XCT images of aerospace SiC/SiC composites. Six learned prototypes are aligned with expert semantic categories (healthy matrix, matrix-air interfaces, pores, line-like defects, mixed morphologies) via two novel anchor- and medoid-based regularization terms that tether prototypes to expert patches and prevent collapse. Latent-space UMAP analysis maps uncertainty regions. The framework is validated on ~12k patches from four laboratory specimens, achieving accuracy 0.957 and ROC-AUC 0.994 versus a black-box ResNet-50 baseline (0.959 and 0.993), with each decision traceable to representative evidence patches.
Significance. If the reported prototype alignment and regularization effects prove stable, the work provides a concrete methodology for embedding domain-expert knowledge into prototype networks, yielding traceable, auditable decisions in safety-critical NDT. The direct baseline comparison on a concrete 12k-patch dataset and the explicit uncertainty mapping via UMAP are strengths that support the interpretability claim.
major comments (2)
- [Validation section] Validation section (and abstract): The central claim that the six prototypes remain stably aligned with the five expert semantic categories on real aerospace components, with anchor- and medoid-based regularizations preventing collapse or decision bias, is load-bearing for industrial applicability. However, all results are obtained exclusively from ~12k patches extracted from four defect-rich laboratory specimens; no experiments, discussion, or evidence address generalization under manufacturing variability, differing defect distributions, or production imaging conditions.
- [Results and Experimental Setup] Results and Experimental Setup: Performance metrics (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) are reported via direct held-out test comparison, but the manuscript provides no information on statistical significance testing, cross-validation strategy, or quantitative validation of prototype-to-category alignment against multiple experts, weakening confidence that the slight sensitivity-precision trade-off is robust.
minor comments (2)
- [Abstract] Abstract: The statement that prototypes are 'explicitly aligned with expert-defined semantic categories' would benefit from a brief parenthetical note on the exact mapping (six prototypes to five categories) and how mixed morphologies are handled.
- [Figures] Figure captions and UMAP description: Ensure all panels clearly label the uncertainty zones and prototype assignments so readers can directly trace the claimed interpretability benefits without cross-referencing the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments highlight important aspects of validation scope and experimental rigor that we will address in the revised manuscript. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Validation section] Validation section (and abstract): The central claim that the six prototypes remain stably aligned with the five expert semantic categories on real aerospace components, with anchor- and medoid-based regularizations preventing collapse or decision bias, is load-bearing for industrial applicability. However, all results are obtained exclusively from ~12k patches extracted from four defect-rich laboratory specimens; no experiments, discussion, or evidence address generalization under manufacturing variability, differing defect distributions, or production imaging conditions.
Authors: We agree that the current validation is confined to patches from four laboratory specimens and that this limits direct claims about generalization to production-scale manufacturing variability or differing imaging conditions. These specimens were deliberately chosen because they contain a representative distribution of the defect morphologies (pores, line-like defects, matrix-air interfaces) that occur in aerospace SiC/SiC components. In the revised manuscript we will expand the Validation section with an explicit discussion of dataset limitations, including the absence of production-line samples, and we will add a forward-looking statement on the need for future multi-site validation. Because acquiring and labeling new production datasets lies outside the scope of the present study, we cannot add new experimental results on those conditions; the revision will therefore focus on transparent acknowledgment rather than new empirical claims. revision: partial
-
Referee: [Results and Experimental Setup] Results and Experimental Setup: Performance metrics (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) are reported via direct held-out test comparison, but the manuscript provides no information on statistical significance testing, cross-validation strategy, or quantitative validation of prototype-to-category alignment against multiple experts, weakening confidence that the slight sensitivity-precision trade-off is robust.
Authors: We acknowledge that the original manuscript reports only a single held-out test split and does not include statistical significance tests or quantitative alignment metrics. In the revision we will (i) describe the exact train/validation/test partitioning procedure, (ii) add a statistical comparison (McNemar’s test) between p-ResNet-50 and the baseline ResNet-50 to evaluate whether the observed differences in accuracy and ROC-AUC are significant, and (iii) introduce quantitative prototype-alignment measures such as mean latent-space distance between each learned prototype and its corresponding expert-selected patches. The semantic categories were defined by a single domain expert; we will therefore note the lack of multi-expert quantitative validation as a limitation and indicate that inter-rater agreement studies are planned for future work. These additions will be incorporated into the Results and Experimental Setup sections. revision: yes
Circularity Check
Performance metrics obtained via independent held-out test set; no reduction of claims to fitted inputs by construction
full rationale
The paper reports accuracy, precision, specificity and ROC-AUC values by direct evaluation on a held-out portion of the ~12k-patch dataset extracted from four laboratory specimens, using a standard ResNet-50 as external baseline. The six prototypes are tethered to expert-selected patches via the two novel anchor- and medoid-based regularizers; this is an explicit design mechanism rather than a self-definitional loop in which the reported detection performance is forced by the same quantities used to define the prototypes. No equation or derivation step equates the final accuracy/AUC figures to quantities defined solely by the fitted prototype parameters. The alignment with the five semantic categories is therefore a methodological choice whose success is measured externally, not presupposed by construction. This yields a minor (score-2) finding consistent with normal self-contained empirical validation.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of prototypes
- regularization weights for anchor- and medoid-based terms
axioms (2)
- domain assumption Patches extracted from four laboratory SiC/SiC specimens are representative of defects encountered in flight hardware.
- domain assumption Expert semantic categories (healthy matrix, matrix-air interfaces, pores, line-like defects, mixed morphologies) form a complete and stable basis for all relevant defect morphologies.
invented entities (1)
-
p-ResNet-50 prototype layer
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
six learned prototypes ... anchor-based and medoid-based regularisation terms that tether prototypes to expert-selected examples and mitigate prototype collapse
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
UMAP visualizations reveal ... compact, semantically coherent sub-domains and localizes the zones of uncertainty
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anton du Plessis and Stephan G. le Roux. Comparison of medical and industrial x-ray com- puted tomography for non-destructive testing. Case Studies in Nondestructive Testing and Evaluation, 6:17–25, 2016
work page 2016
-
[2]
X-ray computed tomography for non-destructive test- ing and materials characterization
Johann Kastner and Christoph Heinzl. X-ray computed tomography for non-destructive test- ing and materials characterization. In Integrated Imaging and Vision Techniques for Industrial Inspection: Advances and Applications , volume 48 of Advances in Computer Vision and Pat- tern Recognition, pages 227–250. Springer, 2015
work page 2015
-
[3]
A review on high- performance sic f/sic composites prepared by pip process
Jingyao Hu, Chidong Liu, Fang Ye, Laifei Cheng, and Yucong Wei. A review on high- performance sic f/sic composites prepared by pip process. Journal of Materials Research and Technology, 33:7216–7235, 2024. SiC f/SiC CMCs for aero-engine hot-end components
work page 2024
-
[4]
Arhatari, Mitchell Sesso, Chris Wood, Matthew Zonneveldt, Sun Yung Kim, Justin A
John Thornton, Benedicta D. Arhatari, Mitchell Sesso, Chris Wood, Matthew Zonneveldt, Sun Yung Kim, Justin A. Kimpton, and Chris Hall. Failure evaluation of a sic/sic ceramic matrix composite during in-situ loading using micro x-ray computed tomography. Microscopy and Microanalysis, 25(3):583–595, 2019
work page 2019
-
[5]
Pirzada, Rongjun Liu, Yanfei Wang, Changrui Zhang, and Thomas J
Fan Wan, Talha J. Pirzada, Rongjun Liu, Yanfei Wang, Changrui Zhang, and Thomas J. Marrow. Microstructure characterization by x-ray computed tomography of c/c–sic ceramic composites fabricated with different carbon fiber architectures. Applied Composite Materials , 26(4):1247–1260, 2019
work page 2019
-
[6]
J. E. See. Visual inspection: A review of the literature. International Journal of Industrial Ergonomics, 42(6):575–580, 2012
work page 2012
-
[7]
Human factors in non-destructive testing: Issues and challenges
Marija Bertovic. Human factors in non-destructive testing: Issues and challenges. In Proceed- ings of the 19th World Conference on Non-Destructive Testing (WCNDT) , Munich, Germany,
-
[8]
Law, Yung-Tsun Tina Lee, and Paul With- erell
Vivian Wen Hui Wong, Max Ferguson, Kincho H. Law, Yung-Tsun Tina Lee, and Paul With- erell. Segmentation of additive manufacturing defects using u-net. Journal of Computing and Information Science in Engineering , 22(3):031005, 2022
work page 2022
-
[9]
Amirkoushyar Ziabari, S. V. Venkatakrishnan, Zackary Snow, Aleksander Lisovich, Michael Sprayberry, Paul Brackman, Curtis Frederick, Pradeep Bhattad, Sarah Graham, Philip Bing- ham, Ryan Dehoff, Alex Plotkowski, and Vincent Paquit. Enabling rapid x-ray ct character- isation for additive manufacturing using cad models and deep learning-based reconstruction...
work page 2023
-
[10]
High-performance deep learning segmentation for non-destructive testing of x-ray tomography
Cong Xu, Gongxiang Wei, Yu Guan, Shou Zhang, Hongwei Wang, Xingbang Chen, Fuli Wang, and Huiqiang Liu. High-performance deep learning segmentation for non-destructive testing of x-ray tomography. Journal of Manufacturing Processes , 128:98–110, 2024
work page 2024
-
[11]
Classification of practical floor moisture damage using gpr – limits and opportunities
Tim Klewe, Christoph Strangfeld, Tobias Ritzer, and Sabine Kruschwitz. Classification of practical floor moisture damage using gpr – limits and opportunities. Journal of Nondestruc- tive Evaluation , 43(3):95, 2024
work page 2024
-
[12]
Deep residual learning for im- age recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016
work page 2016
-
[13]
Deep inside convolutional net- works: Visualising image classification models and saliency maps
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional net- works: Visualising image classification models and saliency maps. In Proceedings of the 2nd International Conference on Learning Representations (ICLR) – Workshop Track , 2014. 16
work page 2014
-
[14]
Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014 , volume 8689 of Lecture Notes in Computer Science , pages 818–833. Springer, 2014
work page 2014
-
[15]
Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient- based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017
work page 2017
-
[16]
Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , 1(5):206–215, 2019
work page 2019
-
[17]
James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu
W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences (PNAS) , 116(44):22071–22080, 2019
work page 2019
-
[18]
Jeewanthi Ukwaththa, Sumudu Herath, and D. P. P. Meddage. A review of machine learn- ing (ml) and explainable artificial intelligence (xai) methods in additive manufacturing (3d printing). Materials Today Communications , 41:110294, 2024
work page 2024
-
[19]
A survey of methods for explaining black box models
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Computing Surveys, 51(5):93:1–93:42, 2018
work page 2018
-
[20]
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learn- ing. arXiv preprint arXiv:1702.08608 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
This looks like that: Deep learning for interpretable image recognition
Chaofan Chen, Oscar Li, Alina Tao, Alicja Barnett, Jonathan Su, and Cynthia Rudin. This looks like that: Deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems (NeurIPS) , volume 32, 2019
work page 2019
-
[22]
Jake Snell, Kevin Swersky, and Richard S. Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS) , volume 30, 2017
work page 2017
-
[23]
Neural prototype trees for interpretable fine-grained image recognition
Meike Nauta, Sander van Bree, and Christin Seifert. Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 14933–14943, 2021
work page 2021
-
[24]
Interpretable image classification with differentiable prototypes as- signment
Dawid Rymarczyk, Lukasz Struski, Micha l G´ orszczak, Koryna Lewandowska, Jacek Tabor, and Bartosz Zieli´ nski. Interpretable image classification with differentiable prototypes as- signment. In Computer Vision – ECCV 2022 , volume 13672 of Lecture Notes in Computer Science, pages 351–368. Springer, 2022
work page 2022
-
[25]
Antonio Pe˜ na Corredor, Cl´ ement Ernould, Jean-Marc Auger, and Ludovic Barri` ere. Physically informed machine learning for the control of l-pbf processes.International Journal of Advanced Manufacturing Technology, 138(2):339–351, 2025
work page 2025
-
[26]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[27]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML) , pages 1321–1330, 2017
work page 2017
-
[28]
Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , 1(5):206–215, 2019. 17
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.