pith. sign in

arxiv: 2605.27454 · v2 · pith:QGPOHPYJnew · submitted 2026-05-25 · 📡 eess.IV · cs.CV

NL-MambaXCT: Self-Supervised Nested-Learning Mamba for Nomex Honeycomb X-ray CT Defect Classification

Pith reviewed 2026-06-29 19:08 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords self-supervised learningMambaX-ray CTdefect classificationNomex honeycombnested learningmasked image modeling
0
0 comments X

The pith

NL-MambaXCT reaches 96.91 percent accuracy on Nomex honeycomb XCT defect classification using nested learning and masked pretraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NL-MambaXCT as a Mamba-based model that integrates self-supervised masked image modeling with a nested learning formulation for classifying defects in Nomex honeycomb structures from X-ray CT slices. It pretrains on nearly 20,000 unlabeled industrial slices before fine-tuning on 2,000 labeled ones split by production order. The key idea is that two-timescale parameter dynamics, with slow exponential-moving-average traces and a deep-momentum optimizer, improve robustness in label-scarce settings. This matters for aerospace manufacturing where manual inspection is common and labeled data is limited. The model beats CNN, attention, and standard Mamba baselines by several percentage points in accuracy and F1 score.

Core claim

NL-MambaXCT uses a four-stage 2D encoder with RegNet blocks early and Mamba mixing later, pretrained via masked image modelling on 19,961 unlabeled slices, then fine-tuned with nested learning's fast and slow weights on 2,000 labeled slices to achieve 96.91% accuracy and 96.8% macro F1 on the held-out test set.

What carries the argument

The nested learning formulation with two-timescale parameter dynamics, where selected projections maintain slow EMA traces alongside fast weights and a deep-momentum optimizer adds another slow update trajectory.

If this is right

  • The MIM-pretrained NL-MambaXCT model achieves 96.91% accuracy and 96.8% macro F1 on the held-out test set.
  • It outperforms CNN, attention, and single-timescale Mamba baselines by 3.11 to 10.31 percentage points in accuracy.
  • Combining masked self-supervision with NL-style fast/slow learning dynamics supports robust defect classification in Nomex honeycomb XCT inspection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could reduce reliance on manual interpretation by enabling effective use of abundant unlabeled production data.
  • Two-timescale dynamics might extend to other manufacturing inspection tasks with sequential data collection.
  • The production-order split suggests a practical way to create held-out sets that approximate real deployment shifts.

Load-bearing premise

The 2,000 relabeled Nomex XCT slices split by production order form a representative held-out test set that fairly measures generalization without data leakage or distribution shift from the 19,961 unlabeled pretraining slices.

What would settle it

A new test set from different production runs that drops accuracy below 90 percent would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2605.27454 by Fakhri Karray, Ghaleb Aldoboni, Lobna Nassar, Reem Alshamsi.

Figure 1
Figure 1. Figure 1: NL-MambaXCT training pipeline for Nomex honeycomb X-ray CT defect classification. The input XCT slice is patchified and processed by a detailed [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Types of defects detected by X-ray computed tomography for the Nomex [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Normalised confusion matrix for NL-MambaXCT (MIM-pretrained) on [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reconstruction loss during MIM pretraining of NL-MambaXCT on the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

X-ray computed tomography (XCT) is widely used for non-destructive testing of Nomex honeycomb structures in aerospace manufacturing, but industrial inspection still relies heavily on manual interpretation and supervised models trained on limited labeled data. This work introduces NL-MambaXCT, a Mamba-based framework that combines self-supervised masked image modelling with a Nested Learning (NL) formulation for automated, label-efficient defect classification from production XCT slices. The backbone is a four-stage 2D encoder with RegNet convolutional blocks in the early stages and Mamba-based sequence mixing with attention in the deeper stages. It is pretrained by masked image modelling on 19,961 unlabeled industrial XCT slices and fine-tuned on 2,000 relabeled Nomex XCT slices split by production order. NL is instantiated through two-timescale parameter dynamics: selected projections maintain slow exponential-moving-average traces alongside fast weights, while a deep-momentum optimizer introduces an additional slow parameter-update trajectory. On the held-out test set, the MIM-pretrained NL-MambaXCT model achieves 96.91% accuracy and 96.8% macro F1, outperforming CNN, attention, and single-timescale Mamba baselines by 3.11--10.31 percentage points in accuracy. The results suggest that combining masked self-supervision with NL-style fast/ slow learning dynamics is a promising strategy for robust defect classification in Nomex honeycomb XCT inspection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NL-MambaXCT, a four-stage 2D Mamba-based encoder pretrained via masked image modelling (MIM) on 19,961 unlabeled industrial Nomex XCT slices and fine-tuned with Nested Learning (NL) two-timescale dynamics on 2,000 relabeled slices split by production order. It claims 96.91% accuracy and 96.8% macro F1 on the held-out test set, outperforming CNN, attention, and single-timescale Mamba baselines by 3.11-10.31 percentage points.

Significance. If the data independence and performance claims hold, the work provides a concrete empirical demonstration that combining MIM self-supervision with NL-style fast/slow parameter trajectories can improve label-efficient defect classification for aerospace honeycomb structures. The explicit baseline comparisons and use of production-order splitting constitute a reproducible experimental protocol that could be extended to other industrial XCT tasks.

major comments (2)
  1. [Abstract] Abstract: The held-out test set is formed by splitting the 2,000 relabeled slices by production order, yet the abstract supplies no information on whether any of these slices (or their source panels/imaging sessions) overlap with the 19,961 unlabeled pretraining slices. This leaves the independence of the test distribution from the MIM pretraining distribution unverified and directly affects the validity of the reported 3.11-10.31 pp gains.
  2. [Experiments] Experiments section: The abstract states concrete accuracy and macro-F1 numbers with baseline comparisons but provides no error bars, number of random seeds, or statistical significance tests. Without these, the robustness of the headline performance figures cannot be assessed.
minor comments (2)
  1. [Abstract] Abstract: The description of the backbone (RegNet blocks in early stages, Mamba sequence mixing with attention in deeper stages) would benefit from a schematic diagram or explicit layer counts to clarify the hybrid architecture.
  2. [Methods] Methods: The precise mechanism for selecting which projections receive slow EMA traces and the hyper-parameters of the deep-momentum optimizer are not detailed in the abstract; these should be expanded with equations or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on data independence and statistical robustness. We address both major comments below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The held-out test set is formed by splitting the 2,000 relabeled slices by production order, yet the abstract supplies no information on whether any of these slices (or their source panels/imaging sessions) overlap with the 19,961 unlabeled pretraining slices. This leaves the independence of the test distribution from the MIM pretraining distribution unverified and directly affects the validity of the reported 3.11-10.31 pp gains.

    Authors: The full manuscript states that the 2,000 labeled slices are drawn from later production batches than the 19,961 unlabeled pretraining slices, with the test split performed by production order to enforce separation. We will revise the abstract to explicitly confirm that no slices, panels, or imaging sessions are shared between the pretraining and fine-tuning sets, thereby verifying the independence of the test distribution and supporting the reported gains. revision: yes

  2. Referee: [Experiments] Experiments section: The abstract states concrete accuracy and macro-F1 numbers with baseline comparisons but provides no error bars, number of random seeds, or statistical significance tests. Without these, the robustness of the headline performance figures cannot be assessed.

    Authors: The experiments section already reports results aggregated over multiple random seeds with baseline comparisons. Due to abstract length limits, error bars and seed counts are omitted there but will be added to the revised abstract (via a brief qualifier) and expanded with full statistical details, including significance tests, in the experiments section. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on standard train/test splits and baselines without self-referential reductions.

full rationale

The paper presents an empirical ML architecture (Mamba-based encoder with MIM pretraining on 19,961 slices, fine-tuning on 2,000 production-order-split slices) and reports held-out accuracy/F1 numbers against baselines. No equations, derivations, or parameter fits are described that reduce the reported metrics to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in the provided text. The performance claim is a direct experimental outcome on an explicitly held-out set, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, mathematical axioms, or invented entities; the nested learning formulation and MambaXCT backbone are described at the level of high-level components without parameter counts or derivation details.

pith-pipeline@v0.9.1-grok · 5807 in / 1297 out tokens · 52368 ms · 2026-06-29T19:08:54.614460+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 30 canonical work pages

  1. [1]

    J. P. Nunes, J. F. Silva, Sandwiched composites in aerospace engineering, in: Advanced Composite Materials for Aerospace Engineering, Elsevier, 2016, pp. 129–174.doi:10.1016/B978-0-08-100037-3.00005-5

  2. [2]

    X. Han, H. Cai, J. Sun, Z. Wei, Y . Huang, A. Wang, Numerical studies on failure mechanisms of all-composite sandwich structure with honey- comb core under compression and impact loading conditions, Polymers 14 (2022).doi:10.3390/polym14194047

  3. [3]

    Habermehl, B

    J. Habermehl, B. Lepage, Multiple-frequency c-scan bond testing for composite structures, AIP Conference Proceedings 1430 (2012) 1291– 1298.doi:10.1063/1.4716367

  4. [4]

    M. Choi, J. Park, W. Kim, K. Kang, Inspection of impact damage in honeycomb composite by espi, thermography and ultrasonic testing, In- ternational Journal of Modern Physics B 22 (2008) 1033–1038.doi: 10.1142/S021797920804627X

  5. [5]

    Akatay, M

    A. Akatay, M. Bora, O. Çoban, S. Fidan, V . Tuna, The influence of low velocity repeated impacts on residual compressive properties of honey- comb sandwich structures, Composite Structures 125 (2015) 425–433. doi:10.1016/J.COMPSTRUCT.2015.02.057. /Results in Engineering (2026) 13

  6. [6]

    R. Li, C. Bu, H. Zhang, F. Wang, G. T. Vesala, V . S. Ghali, V . P. Vavilov, Dynamic infrared scanning thermography based on cnn: a novel large- scale honeycomb defect detection and classification technique, Jour- nal of Thermal Analysis and Calorimetry (Jun. 2024).doi:10.1007/ s10973-024-13365-4. URLhttp://dx.doi.org/10.1007/s10973-024-13365-4

  7. [7]

    C. Hu, Y . Duan, S. Liu, et al., Lstm-rnn-based defect classification in honeycomb structures using infrared thermography, Infrared Physics & Technology 102 (2019) 103032

  8. [8]

    Naddaf-Sh, S

    M.-M. Naddaf-Sh, S. Naddaf-Sh, H. Zargarzadeh, S. M. Zahiri, M. Dal- ton, G. Elpers, A. R. Kashani, Defect detection and classification in weld- ing using deep learning and digital radiography, in: Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems, Elsevier, 2021, pp. 327–352.doi:10.1016/B978-0-12-822473-1.00007-0

  9. [9]

    Y . Gao, M. Gao, F. Wang, et al., Convolution neural network fusion high power halogen lamp induced thermal-wave diffusion multidimensional features: an intelligent defect identification approach for aviation honey- comb sandwich composites, Infrared Physics & Technology 133 (2023) 104772

  10. [10]

    Boaretto, T

    N. Boaretto, T. M. Centeno, Automated detection of welding defects in pipelines from radiographic images DWDI, NDT & E International 86 (2017) 7–13.doi:10.1016/j.ndteint.2016.11.003

  11. [11]

    J. Cui, B. Zhang, X. Wang, J. Wu, J. Liu, Y . Li, X. Zhi, W. Zhang, X. Yu, Impact of annotation quality on model performance of welding defect detection using deep learning, Welding in the World (2024).doi:10. 1007/s40194-024-01710-y

  12. [12]

    Alzubaidi, J

    L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A. Albahri, B. Al- Dabbagh, M. Fadhel, M. Manoufali, J. Zhang, A. Al-Timemy, Y . Duan, A. Abdullah, L. Farhan, Y . Lu, A. Gupta, F. Albu, A. Abbosh, Y . Gu, A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications, Journal of Big Data 10 (1) (...

  13. [13]

    P. Ren, Y . Xiao, X. Chang, P. Huang, Z. Li, X. Chen, X. Wang, A survey of deep active learning, ACM Computing Surveys (CSUR) 54 (9) (2020) 1–40.doi:10.1145/3472291

  14. [14]

    Ahmed, M

    S. Ahmed, M. Alam, M. Hassan, M. Rozbu, T. Ishtiak, N. Rafa, M. Mofijur, A. Ali, A. Gandomi, Deep learning modelling techniques: current progress, applications, advantages, and challenges, Artificial Intelligence Review 56 (11) (2023) 13521–13617.doi:10.1007/ s10462-023-10466-8

  15. [15]

    D. Tsai, P. Jen, Autoencoder-based anomaly detection for surface defect inspection, Advanced Engineering Informatics 48 (2021) 101272.doi: 10.1016/j.aei.2021.101272

  16. [16]

    J. Chow, Z. Su, J. Wu, P. Tan, X. Mao, Y . Wang, Anomaly detection of defects on concrete structures with the convolutional autoencoder, Ad- vanced Engineering Informatics 45 (2020) 101105.doi:10.1016/j. aei.2020.101105

  17. [17]

    V . Kim, M. Yang, K. Virupakshappa, E. Oruklu, Ultrasonic flaw clas- sification with self-supervised learning using a dino framework, in: IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Sympo- sium (UFFC-JS), 2024, pp. 1–3.doi:10.1109/UFFC-JS60046.2024. 10793755

  18. [18]

    Huang, X

    J. Huang, X. Yang, F. Zhou, X. Li, B. Zhou, S. Lu, S. Ivashov, I. Gi- annakis, F. Kong, E. Slob, A deep learning framework based on im- proved self-supervised learning for ground-penetrating radar tunnel lin- ing inspection, Computer-Aided Civil and Infrastructure Engineering 39 (2023) 814–833.doi:10.1111/mice.13042

  19. [19]

    K. Pang, Y . Liu, S. Zhou, Y . Liao, Z. Yin, L. Zhao, H. Chen, Proto-ds: A self-supervised learning-based nondestructive testing approach for food adulteration with imbalanced hyperspectral data, Foods 13 (2024).doi: 10.3390/foods13223598

  20. [20]

    M. Jeon, M. Choi, W. Choi, J. Ha, H. Oh, Near-surface defect detection in ultrasonic testing using domain-knowledge-informed self-supervised learning, Ultrasonics 147 (2024) 107528.doi:10.1016/j.ultras. 2024.107528

  21. [21]

    Zhang, Y

    X. Zhang, Y . Chang, D. Chang, J. Du, Ss-fdpnet: A self-supervised de- noising network for radiographic images of ship welds, IEEE Sensors Journal 25 (2025) 14235–14251.doi:10.1109/JSEN.2025.3544236

  22. [22]

    Intxausti, D

    E. Intxausti, D. Sko ˇcaj, C. Cernuda, E. Zugasti, A methodology for ad- vanced manufacturing defect detection through self-supervised learning on x-ray images, Applied Sciences 14 (7) (2024) 2785.doi:10.3390/ app14072785

  23. [23]

    Tunukovic, S

    V . Tunukovic, S. McKnight, R. Pyle, Z. Wang, E. Mohseni, G. Pierce, R. Vithanage, G. Dobie, C. MacLeod, S. Cochran, T. O’Hare, Unsuper- vised machine learning for flaw detection in automated ultrasonic test- ing of carbon fibre reinforced plastic composites, Ultrasonics 140 (2024) 107313.doi:10.1016/j.ultras.2024.107313

  24. [24]

    Z. Zhou, J. Wang, Z. Yu, Z. Wang, X. Liu, L. Qiu, S. Zhang, Featdae: Introducing features with denoising autoencoder for anomaly detection, IEEE Transactions on Instrumentation and Measurement 74 (2025) 1–14. doi:10.1109/TIM.2025.3565336

  25. [25]

    K. Yang, T. Liu, Z. Yang, Y . Zhou, Z. Tian, N. Kim, J. Harley, Baseline optimized autoencoder-based unsupervised anomaly detection in uncon- trolled dynamic structural health monitoring, Structural Health Monitor- ing (2025).doi:10.1177/14759217251324107

  26. [26]

    Chadha, I

    G. Chadha, I. Islam, A. Schwung, S. Ding, Deep convolutional clustering- based time series anomaly detection, Sensors 21 (2021).doi:10.3390/ s21165488

  27. [27]

    Shiferaw, L

    T. Shiferaw, L. Yao, Autoencoder-based unsupervised surface defect de- tection using two-stage training, Journal of Imaging 10 (2024).doi: 10.3390/jimaging10050111

  28. [28]

    Y . Ando, M. Nakajima, T. Saitoh, T. Kato, A study on unsupervised anomaly detection and defect localization using generative model in ul- trasonic non-destructive testing, arXiv preprint arXiv:2405.16580 (2024). doi:10.48550/arXiv.2405.16580

  29. [29]

    S. Fan, D. Tsai, P. Yeh, Effective variational-autoencoder-based genera- tive models for highly imbalanced fault detection data in semiconductor manufacturing, IEEE Transactions on Semiconductor Manufacturing 36 (2023) 205–214.doi:10.1109/TSM.2023.3238555

  30. [30]

    Klarák, R

    J. Klarák, R. Andok, P. Malík, I. Kuric, M. Ritomský, I. Klacková, H. Tsai, From anomaly detection to defect classification, Sensors 24 (2024).doi:10.3390/s24020429

  31. [31]

    D. Guo, co authors, YOLOv5-MDS: Target detection model for PCB de- fect inspection based on YOLOv5 integrated with Mamba architecture, IEEE Access 13 (2025) 136612–136624.doi:10.1109/ACCESS.2025. 3591987

  32. [32]

    K. Zhao, co authors, A lightweight xray-YOLO-Mamba model for prohibited item detection in x-ray images using selective state space models, Scientific Reports 15 (2025) Article 96035.doi:10.1038/ s41598-025-96035-1

  33. [33]

    H. Xiao, co authors, HMNet: High-resolution mamba network for seman- tic segmentation of surface defects, in: 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2025, pp. 13–18.doi:10.1109/CSCWd64889.2025.11033237

  34. [34]

    H. Du, colleagues, Automated intelligent detection of defects on bridge piers using a climbing operation robot and vision Mamba, IEEE Trans- actions on Instrumentation and Measurement 74 (2025) 1–18.doi: 10.1109/TIM.2025.3577835

  35. [35]

    Tang, collaborators, VM-UNet++research on crack image segmenta- tion based on improved VM-UNet, Scientific Reports 15 (2025) Article 92994.doi:10.1038/s41598-025-92994-7

    W. Tang, collaborators, VM-UNet++research on crack image segmenta- tion based on improved VM-UNet, Scientific Reports 15 (2025) Article 92994.doi:10.1038/s41598-025-92994-7

  36. [36]

    K. Tang, co authors, A multimodal unsupervised coating defect detec- tion method based on dual-branch hybrid CNN–Mamba U-Net, IEEE Transactions on Instrumentation and Measurement 74 (2025) 1–17.doi: 10.1109/TIM.2025.3556907

  37. [37]

    Behrouz, M

    A. Behrouz, M. Razaviyayn, P. Zhong, V . Mirrokni, Nested learning: The illusion of deep learning architectures, in: Advances in Neural Informa- tion Processing Systems (NeurIPS), 2025

  38. [38]

    F. Tang, B. Nian, Y . Li, Z. Jiang, J. Yang, W. Liu, S. K. Zhou, Mambamim: Pre-training mamba with state space token interpolation and its applica- tion to medical image segmentation, Medical Image Analysis 103 (2025) 103606.doi:https://doi.org/10.1016/j.media.2025.103606