pith. sign in

arxiv: 2605.20159 · v1 · pith:VIY2KA52new · submitted 2026-05-19 · 💻 cs.CV · cond-mat.mtrl-sci· cs.LG

Interpretable Computer Vision for Defect Detection in X-ray Tomography of Aerospace SiC/SiC Composites

Pith reviewed 2026-05-20 05:28 UTC · model grok-4.3

classification 💻 cs.CV cond-mat.mtrl-scics.LG
keywords defect detectionX-ray computed tomographySiC/SiC compositesprototype networksinterpretable AInon-destructive testingaerospace inspection
0
0 comments X

The pith

Extending ResNet-50 with a prototype layer matches black-box accuracy on SiC/SiC X-ray defect detection while tracing each decision to expert semantic categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents p-ResNet-50, a convolutional network that adds a prototype layer to standard ResNet-50 for detecting defects in X-ray tomography scans of aerospace SiC/SiC composites. Six prototypes are aligned with expert categories including healthy matrix, pores, line-like defects, and mixed morphologies through two new regularization terms that tether them to selected patches and avoid collapse. On a dataset of roughly 12,000 patches from four specimens, the model reaches accuracy 0.957 and ROC-AUC 0.994, nearly identical to a plain ResNet-50, but supplies case-based explanations and explicit uncertainty maps for each classification. This addresses the need for traceable, auditable decisions in industrial non-destructive testing.

Core claim

p-ResNet-50 couples high-accuracy defect detection with case-based explanations by extending ResNet-50 with a prototype layer whose six learned prototypes align directly with five expert-defined semantic categories of material features in XCT images; anchor-based and medoid-based regularizations keep the prototypes stable and representative, yielding accuracy 0.957 and ROC-AUC 0.994 on 12,000 patches while producing traceable evidence patches and uncertainty flags for every decision.

What carries the argument

The prototype layer that learns six prototypes aligned to expert semantic categories of SiC/SiC features and applies anchor-based and medoid-based regularizations to maintain their separation and relevance to physical patches.

Load-bearing premise

The six learned prototypes stay stably aligned with the five expert semantic categories across the full range of real aerospace components without the regularizations introducing bias into accept/reject decisions.

What would settle it

Running the trained model on a new collection of SiC/SiC XCT specimens and checking whether the prototypes remain matched to the expert categories or whether sensitivity and specificity diverge from the reported values.

Figures

Figures reproduced from arXiv: 2605.20159 by Antonio Pe\~na Corredor, Julien Lesseur, Paul Rivalland (SES), Romain Nunez, Thomas Philippe.

Figure 1
Figure 1. Figure 1: Schematic overview of the defect–detection framework. The prototype-based p–ResNet [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Labelled (expert) anchors for each prototype type. Grayscale intensities are windowed [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Network architectures. (a) Baseline ResNet-50 (b) Prototype-based variant (p-ResNet [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Nearest (learned) anchors for the six semantic prototype types. Each row corresponds [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Patch-wise comparison of the prototype-based network and the black-box ResNet-50 on [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: UMAP projection of patch-level embeddings for the ResNet-50 (a) and p–ResNet-50 (b) [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Non-destructive testing of aerospace SiC/SiC composites via X-ray computed tomography (XCT) relies on expert visual assessment, with current workflows offering limited traceability for accept/reject decisions. Deep convolutional networks can automate defect detection, yet their black-box nature conflicts with the transparency that industrial inspection practice demands. To close this gap, we introduce p-ResNet-50, a convolutional framework extended with a prototype layer that couples high detection accuracy with case-based explanations. Six learned prototypes are explicitly aligned with expert-defined semantic categories-healthy matrix, matrix--air interfaces, pores, line-like defects, and mixed morphologies-so that every classification is traceable to a physically meaningful reference. Two novel regularisation terms, anchor-based and medoid-based, tether prototypes to expert-selected patches and prevent prototype collapse, addressing a known limitation of prototype networks. Latent-space analysis via UMAP delineates semantically coherent sub-domains and maps zones of uncertainty where misclassifications concentrate, giving inspectors an explicit picture of where the model is-and is not-reliable. The framework is validated on an XCT patch dataset of approximately 12,000 patches extracted from four defect-rich SiC/SiC laboratory specimens. Taking a black-box ResNet-50 as a baseline (ROC-AUC = 0.991), the prototype extension achieves comparable performance (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) while trading a slight reduction in sensitivity for higher precision and specificity. Each decision is backed by representative evidence patches, and the model explicitly flags its uncertainty regions. Beyond defect mapping, the framework establishes a reusable methodology for embedding domain-expert knowledge into prototype networks, applicable to other XCT inspection scenarios requiring traceable, auditable decisions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces p-ResNet-50, a ResNet-50 extension incorporating a prototype layer for interpretable defect detection in XCT images of aerospace SiC/SiC composites. Six learned prototypes are aligned with expert semantic categories (healthy matrix, matrix-air interfaces, pores, line-like defects, mixed morphologies) via two novel anchor- and medoid-based regularization terms that tether prototypes to expert patches and prevent collapse. Latent-space UMAP analysis maps uncertainty regions. The framework is validated on ~12k patches from four laboratory specimens, achieving accuracy 0.957 and ROC-AUC 0.994 versus a black-box ResNet-50 baseline (0.959 and 0.993), with each decision traceable to representative evidence patches.

Significance. If the reported prototype alignment and regularization effects prove stable, the work provides a concrete methodology for embedding domain-expert knowledge into prototype networks, yielding traceable, auditable decisions in safety-critical NDT. The direct baseline comparison on a concrete 12k-patch dataset and the explicit uncertainty mapping via UMAP are strengths that support the interpretability claim.

major comments (2)
  1. [Validation section] Validation section (and abstract): The central claim that the six prototypes remain stably aligned with the five expert semantic categories on real aerospace components, with anchor- and medoid-based regularizations preventing collapse or decision bias, is load-bearing for industrial applicability. However, all results are obtained exclusively from ~12k patches extracted from four defect-rich laboratory specimens; no experiments, discussion, or evidence address generalization under manufacturing variability, differing defect distributions, or production imaging conditions.
  2. [Results and Experimental Setup] Results and Experimental Setup: Performance metrics (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) are reported via direct held-out test comparison, but the manuscript provides no information on statistical significance testing, cross-validation strategy, or quantitative validation of prototype-to-category alignment against multiple experts, weakening confidence that the slight sensitivity-precision trade-off is robust.
minor comments (2)
  1. [Abstract] Abstract: The statement that prototypes are 'explicitly aligned with expert-defined semantic categories' would benefit from a brief parenthetical note on the exact mapping (six prototypes to five categories) and how mixed morphologies are handled.
  2. [Figures] Figure captions and UMAP description: Ensure all panels clearly label the uncertainty zones and prototype assignments so readers can directly trace the claimed interpretability benefits without cross-referencing the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of validation scope and experimental rigor that we will address in the revised manuscript. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Validation section] Validation section (and abstract): The central claim that the six prototypes remain stably aligned with the five expert semantic categories on real aerospace components, with anchor- and medoid-based regularizations preventing collapse or decision bias, is load-bearing for industrial applicability. However, all results are obtained exclusively from ~12k patches extracted from four defect-rich laboratory specimens; no experiments, discussion, or evidence address generalization under manufacturing variability, differing defect distributions, or production imaging conditions.

    Authors: We agree that the current validation is confined to patches from four laboratory specimens and that this limits direct claims about generalization to production-scale manufacturing variability or differing imaging conditions. These specimens were deliberately chosen because they contain a representative distribution of the defect morphologies (pores, line-like defects, matrix-air interfaces) that occur in aerospace SiC/SiC components. In the revised manuscript we will expand the Validation section with an explicit discussion of dataset limitations, including the absence of production-line samples, and we will add a forward-looking statement on the need for future multi-site validation. Because acquiring and labeling new production datasets lies outside the scope of the present study, we cannot add new experimental results on those conditions; the revision will therefore focus on transparent acknowledgment rather than new empirical claims. revision: partial

  2. Referee: [Results and Experimental Setup] Results and Experimental Setup: Performance metrics (accuracy 0.957 vs. 0.959; ROC-AUC 0.994 vs. 0.993) are reported via direct held-out test comparison, but the manuscript provides no information on statistical significance testing, cross-validation strategy, or quantitative validation of prototype-to-category alignment against multiple experts, weakening confidence that the slight sensitivity-precision trade-off is robust.

    Authors: We acknowledge that the original manuscript reports only a single held-out test split and does not include statistical significance tests or quantitative alignment metrics. In the revision we will (i) describe the exact train/validation/test partitioning procedure, (ii) add a statistical comparison (McNemar’s test) between p-ResNet-50 and the baseline ResNet-50 to evaluate whether the observed differences in accuracy and ROC-AUC are significant, and (iii) introduce quantitative prototype-alignment measures such as mean latent-space distance between each learned prototype and its corresponding expert-selected patches. The semantic categories were defined by a single domain expert; we will therefore note the lack of multi-expert quantitative validation as a limitation and indicate that inter-rater agreement studies are planned for future work. These additions will be incorporated into the Results and Experimental Setup sections. revision: yes

Circularity Check

0 steps flagged

Performance metrics obtained via independent held-out test set; no reduction of claims to fitted inputs by construction

full rationale

The paper reports accuracy, precision, specificity and ROC-AUC values by direct evaluation on a held-out portion of the ~12k-patch dataset extracted from four laboratory specimens, using a standard ResNet-50 as external baseline. The six prototypes are tethered to expert-selected patches via the two novel anchor- and medoid-based regularizers; this is an explicit design mechanism rather than a self-definitional loop in which the reported detection performance is forced by the same quantities used to define the prototypes. No equation or derivation step equates the final accuracy/AUC figures to quantities defined solely by the fitted prototype parameters. The alignment with the five semantic categories is therefore a methodological choice whose success is measured externally, not presupposed by construction. This yields a minor (score-2) finding consistent with normal self-contained empirical validation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that a small number of expert-selected patches can anchor prototypes without loss of coverage for real-world defect variability, and that the 12k patches from four lab specimens are statistically representative of production aerospace parts.

free parameters (2)
  • number of prototypes
    Fixed at six to match the expert-defined semantic categories listed in the abstract.
  • regularization weights for anchor- and medoid-based terms
    Hyperparameters introduced to tether prototypes and prevent collapse; values not stated in abstract.
axioms (2)
  • domain assumption Patches extracted from four laboratory SiC/SiC specimens are representative of defects encountered in flight hardware.
    Dataset construction described in abstract as coming from four defect-rich laboratory specimens.
  • domain assumption Expert semantic categories (healthy matrix, matrix-air interfaces, pores, line-like defects, mixed morphologies) form a complete and stable basis for all relevant defect morphologies.
    Prototypes are explicitly aligned with these five categories plus one additional mixed class.
invented entities (1)
  • p-ResNet-50 prototype layer no independent evidence
    purpose: To supply case-based visual explanations traceable to expert categories.
    New layer added to ResNet-50 whose outputs are constrained by the novel regularization terms.

pith-pipeline@v0.9.0 · 5883 in / 1814 out tokens · 42874 ms · 2026-05-20T05:28:43.681373+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Anton du Plessis and Stephan G. le Roux. Comparison of medical and industrial x-ray com- puted tomography for non-destructive testing. Case Studies in Nondestructive Testing and Evaluation, 6:17–25, 2016

  2. [2]

    X-ray computed tomography for non-destructive test- ing and materials characterization

    Johann Kastner and Christoph Heinzl. X-ray computed tomography for non-destructive test- ing and materials characterization. In Integrated Imaging and Vision Techniques for Industrial Inspection: Advances and Applications , volume 48 of Advances in Computer Vision and Pat- tern Recognition, pages 227–250. Springer, 2015

  3. [3]

    A review on high- performance sic f/sic composites prepared by pip process

    Jingyao Hu, Chidong Liu, Fang Ye, Laifei Cheng, and Yucong Wei. A review on high- performance sic f/sic composites prepared by pip process. Journal of Materials Research and Technology, 33:7216–7235, 2024. SiC f/SiC CMCs for aero-engine hot-end components

  4. [4]

    Arhatari, Mitchell Sesso, Chris Wood, Matthew Zonneveldt, Sun Yung Kim, Justin A

    John Thornton, Benedicta D. Arhatari, Mitchell Sesso, Chris Wood, Matthew Zonneveldt, Sun Yung Kim, Justin A. Kimpton, and Chris Hall. Failure evaluation of a sic/sic ceramic matrix composite during in-situ loading using micro x-ray computed tomography. Microscopy and Microanalysis, 25(3):583–595, 2019

  5. [5]

    Pirzada, Rongjun Liu, Yanfei Wang, Changrui Zhang, and Thomas J

    Fan Wan, Talha J. Pirzada, Rongjun Liu, Yanfei Wang, Changrui Zhang, and Thomas J. Marrow. Microstructure characterization by x-ray computed tomography of c/c–sic ceramic composites fabricated with different carbon fiber architectures. Applied Composite Materials , 26(4):1247–1260, 2019

  6. [6]

    J. E. See. Visual inspection: A review of the literature. International Journal of Industrial Ergonomics, 42(6):575–580, 2012

  7. [7]

    Human factors in non-destructive testing: Issues and challenges

    Marija Bertovic. Human factors in non-destructive testing: Issues and challenges. In Proceed- ings of the 19th World Conference on Non-Destructive Testing (WCNDT) , Munich, Germany,

  8. [8]

    Law, Yung-Tsun Tina Lee, and Paul With- erell

    Vivian Wen Hui Wong, Max Ferguson, Kincho H. Law, Yung-Tsun Tina Lee, and Paul With- erell. Segmentation of additive manufacturing defects using u-net. Journal of Computing and Information Science in Engineering , 22(3):031005, 2022

  9. [9]

    Amirkoushyar Ziabari, S. V. Venkatakrishnan, Zackary Snow, Aleksander Lisovich, Michael Sprayberry, Paul Brackman, Curtis Frederick, Pradeep Bhattad, Sarah Graham, Philip Bing- ham, Ryan Dehoff, Alex Plotkowski, and Vincent Paquit. Enabling rapid x-ray ct character- isation for additive manufacturing using cad models and deep learning-based reconstruction...

  10. [10]

    High-performance deep learning segmentation for non-destructive testing of x-ray tomography

    Cong Xu, Gongxiang Wei, Yu Guan, Shou Zhang, Hongwei Wang, Xingbang Chen, Fuli Wang, and Huiqiang Liu. High-performance deep learning segmentation for non-destructive testing of x-ray tomography. Journal of Manufacturing Processes , 128:98–110, 2024

  11. [11]

    Classification of practical floor moisture damage using gpr – limits and opportunities

    Tim Klewe, Christoph Strangfeld, Tobias Ritzer, and Sabine Kruschwitz. Classification of practical floor moisture damage using gpr – limits and opportunities. Journal of Nondestruc- tive Evaluation , 43(3):95, 2024

  12. [12]

    Deep residual learning for im- age recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  13. [13]

    Deep inside convolutional net- works: Visualising image classification models and saliency maps

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional net- works: Visualising image classification models and saliency maps. In Proceedings of the 2nd International Conference on Learning Representations (ICLR) – Workshop Track , 2014. 16

  14. [14]

    Zeiler and Rob Fergus

    Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014 , volume 8689 of Lecture Notes in Computer Science , pages 818–833. Springer, 2014

  15. [15]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient- based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017

  16. [16]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

    Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , 1(5):206–215, 2019

  17. [17]

    James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu

    W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences (PNAS) , 116(44):22071–22080, 2019

  18. [18]

    Jeewanthi Ukwaththa, Sumudu Herath, and D. P. P. Meddage. A review of machine learn- ing (ml) and explainable artificial intelligence (xai) methods in additive manufacturing (3d printing). Materials Today Communications , 41:110294, 2024

  19. [19]

    A survey of methods for explaining black box models

    Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Computing Surveys, 51(5):93:1–93:42, 2018

  20. [20]

    Towards A Rigorous Science of Interpretable Machine Learning

    Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learn- ing. arXiv preprint arXiv:1702.08608 , 2017

  21. [21]

    This looks like that: Deep learning for interpretable image recognition

    Chaofan Chen, Oscar Li, Alina Tao, Alicja Barnett, Jonathan Su, and Cynthia Rudin. This looks like that: Deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems (NeurIPS) , volume 32, 2019

  22. [22]

    Jake Snell, Kevin Swersky, and Richard S. Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS) , volume 30, 2017

  23. [23]

    Neural prototype trees for interpretable fine-grained image recognition

    Meike Nauta, Sander van Bree, and Christin Seifert. Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 14933–14943, 2021

  24. [24]

    Interpretable image classification with differentiable prototypes as- signment

    Dawid Rymarczyk, Lukasz Struski, Micha l G´ orszczak, Koryna Lewandowska, Jacek Tabor, and Bartosz Zieli´ nski. Interpretable image classification with differentiable prototypes as- signment. In Computer Vision – ECCV 2022 , volume 13672 of Lecture Notes in Computer Science, pages 351–368. Springer, 2022

  25. [25]

    Physically informed machine learning for the control of l-pbf processes.International Journal of Advanced Manufacturing Technology, 138(2):339–351, 2025

    Antonio Pe˜ na Corredor, Cl´ ement Ernould, Jean-Marc Auger, and Ludovic Barri` ere. Physically informed machine learning for the control of l-pbf processes.International Journal of Advanced Manufacturing Technology, 138(2):339–351, 2025

  26. [26]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 , 2018

  27. [27]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML) , pages 1321–1330, 2017

  28. [28]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

    Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , 1(5):206–215, 2019. 17