NL-MambaXCT: Self-Supervised Nested-Learning Mamba for Nomex Honeycomb X-ray CT Defect Classification
Pith reviewed 2026-06-29 19:08 UTC · model grok-4.3
The pith
NL-MambaXCT reaches 96.91 percent accuracy on Nomex honeycomb XCT defect classification using nested learning and masked pretraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NL-MambaXCT uses a four-stage 2D encoder with RegNet blocks early and Mamba mixing later, pretrained via masked image modelling on 19,961 unlabeled slices, then fine-tuned with nested learning's fast and slow weights on 2,000 labeled slices to achieve 96.91% accuracy and 96.8% macro F1 on the held-out test set.
What carries the argument
The nested learning formulation with two-timescale parameter dynamics, where selected projections maintain slow EMA traces alongside fast weights and a deep-momentum optimizer adds another slow update trajectory.
If this is right
- The MIM-pretrained NL-MambaXCT model achieves 96.91% accuracy and 96.8% macro F1 on the held-out test set.
- It outperforms CNN, attention, and single-timescale Mamba baselines by 3.11 to 10.31 percentage points in accuracy.
- Combining masked self-supervision with NL-style fast/slow learning dynamics supports robust defect classification in Nomex honeycomb XCT inspection.
Where Pith is reading between the lines
- The method could reduce reliance on manual interpretation by enabling effective use of abundant unlabeled production data.
- Two-timescale dynamics might extend to other manufacturing inspection tasks with sequential data collection.
- The production-order split suggests a practical way to create held-out sets that approximate real deployment shifts.
Load-bearing premise
The 2,000 relabeled Nomex XCT slices split by production order form a representative held-out test set that fairly measures generalization without data leakage or distribution shift from the 19,961 unlabeled pretraining slices.
What would settle it
A new test set from different production runs that drops accuracy below 90 percent would falsify the generalization claim.
Figures
read the original abstract
X-ray computed tomography (XCT) is widely used for non-destructive testing of Nomex honeycomb structures in aerospace manufacturing, but industrial inspection still relies heavily on manual interpretation and supervised models trained on limited labeled data. This work introduces NL-MambaXCT, a Mamba-based framework that combines self-supervised masked image modelling with a Nested Learning (NL) formulation for automated, label-efficient defect classification from production XCT slices. The backbone is a four-stage 2D encoder with RegNet convolutional blocks in the early stages and Mamba-based sequence mixing with attention in the deeper stages. It is pretrained by masked image modelling on 19,961 unlabeled industrial XCT slices and fine-tuned on 2,000 relabeled Nomex XCT slices split by production order. NL is instantiated through two-timescale parameter dynamics: selected projections maintain slow exponential-moving-average traces alongside fast weights, while a deep-momentum optimizer introduces an additional slow parameter-update trajectory. On the held-out test set, the MIM-pretrained NL-MambaXCT model achieves 96.91% accuracy and 96.8% macro F1, outperforming CNN, attention, and single-timescale Mamba baselines by 3.11--10.31 percentage points in accuracy. The results suggest that combining masked self-supervision with NL-style fast/ slow learning dynamics is a promising strategy for robust defect classification in Nomex honeycomb XCT inspection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NL-MambaXCT, a four-stage 2D Mamba-based encoder pretrained via masked image modelling (MIM) on 19,961 unlabeled industrial Nomex XCT slices and fine-tuned with Nested Learning (NL) two-timescale dynamics on 2,000 relabeled slices split by production order. It claims 96.91% accuracy and 96.8% macro F1 on the held-out test set, outperforming CNN, attention, and single-timescale Mamba baselines by 3.11-10.31 percentage points.
Significance. If the data independence and performance claims hold, the work provides a concrete empirical demonstration that combining MIM self-supervision with NL-style fast/slow parameter trajectories can improve label-efficient defect classification for aerospace honeycomb structures. The explicit baseline comparisons and use of production-order splitting constitute a reproducible experimental protocol that could be extended to other industrial XCT tasks.
major comments (2)
- [Abstract] Abstract: The held-out test set is formed by splitting the 2,000 relabeled slices by production order, yet the abstract supplies no information on whether any of these slices (or their source panels/imaging sessions) overlap with the 19,961 unlabeled pretraining slices. This leaves the independence of the test distribution from the MIM pretraining distribution unverified and directly affects the validity of the reported 3.11-10.31 pp gains.
- [Experiments] Experiments section: The abstract states concrete accuracy and macro-F1 numbers with baseline comparisons but provides no error bars, number of random seeds, or statistical significance tests. Without these, the robustness of the headline performance figures cannot be assessed.
minor comments (2)
- [Abstract] Abstract: The description of the backbone (RegNet blocks in early stages, Mamba sequence mixing with attention in deeper stages) would benefit from a schematic diagram or explicit layer counts to clarify the hybrid architecture.
- [Methods] Methods: The precise mechanism for selecting which projections receive slow EMA traces and the hyper-parameters of the deep-momentum optimizer are not detailed in the abstract; these should be expanded with equations or pseudocode.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on data independence and statistical robustness. We address both major comments below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The held-out test set is formed by splitting the 2,000 relabeled slices by production order, yet the abstract supplies no information on whether any of these slices (or their source panels/imaging sessions) overlap with the 19,961 unlabeled pretraining slices. This leaves the independence of the test distribution from the MIM pretraining distribution unverified and directly affects the validity of the reported 3.11-10.31 pp gains.
Authors: The full manuscript states that the 2,000 labeled slices are drawn from later production batches than the 19,961 unlabeled pretraining slices, with the test split performed by production order to enforce separation. We will revise the abstract to explicitly confirm that no slices, panels, or imaging sessions are shared between the pretraining and fine-tuning sets, thereby verifying the independence of the test distribution and supporting the reported gains. revision: yes
-
Referee: [Experiments] Experiments section: The abstract states concrete accuracy and macro-F1 numbers with baseline comparisons but provides no error bars, number of random seeds, or statistical significance tests. Without these, the robustness of the headline performance figures cannot be assessed.
Authors: The experiments section already reports results aggregated over multiple random seeds with baseline comparisons. Due to abstract length limits, error bars and seed counts are omitted there but will be added to the revised abstract (via a brief qualifier) and expanded with full statistical details, including significance tests, in the experiments section. revision: partial
Circularity Check
No circularity: empirical performance claims rest on standard train/test splits and baselines without self-referential reductions.
full rationale
The paper presents an empirical ML architecture (Mamba-based encoder with MIM pretraining on 19,961 slices, fine-tuning on 2,000 production-order-split slices) and reports held-out accuracy/F1 numbers against baselines. No equations, derivations, or parameter fits are described that reduce the reported metrics to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in the provided text. The performance claim is a direct experimental outcome on an explicitly held-out set, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. P. Nunes, J. F. Silva, Sandwiched composites in aerospace engineering, in: Advanced Composite Materials for Aerospace Engineering, Elsevier, 2016, pp. 129–174.doi:10.1016/B978-0-08-100037-3.00005-5
-
[2]
X. Han, H. Cai, J. Sun, Z. Wei, Y . Huang, A. Wang, Numerical studies on failure mechanisms of all-composite sandwich structure with honey- comb core under compression and impact loading conditions, Polymers 14 (2022).doi:10.3390/polym14194047
-
[3]
J. Habermehl, B. Lepage, Multiple-frequency c-scan bond testing for composite structures, AIP Conference Proceedings 1430 (2012) 1291– 1298.doi:10.1063/1.4716367
-
[4]
M. Choi, J. Park, W. Kim, K. Kang, Inspection of impact damage in honeycomb composite by espi, thermography and ultrasonic testing, In- ternational Journal of Modern Physics B 22 (2008) 1033–1038.doi: 10.1142/S021797920804627X
-
[5]
A. Akatay, M. Bora, O. Çoban, S. Fidan, V . Tuna, The influence of low velocity repeated impacts on residual compressive properties of honey- comb sandwich structures, Composite Structures 125 (2015) 425–433. doi:10.1016/J.COMPSTRUCT.2015.02.057. /Results in Engineering (2026) 13
-
[6]
R. Li, C. Bu, H. Zhang, F. Wang, G. T. Vesala, V . S. Ghali, V . P. Vavilov, Dynamic infrared scanning thermography based on cnn: a novel large- scale honeycomb defect detection and classification technique, Jour- nal of Thermal Analysis and Calorimetry (Jun. 2024).doi:10.1007/ s10973-024-13365-4. URLhttp://dx.doi.org/10.1007/s10973-024-13365-4
-
[7]
C. Hu, Y . Duan, S. Liu, et al., Lstm-rnn-based defect classification in honeycomb structures using infrared thermography, Infrared Physics & Technology 102 (2019) 103032
2019
-
[8]
M.-M. Naddaf-Sh, S. Naddaf-Sh, H. Zargarzadeh, S. M. Zahiri, M. Dal- ton, G. Elpers, A. R. Kashani, Defect detection and classification in weld- ing using deep learning and digital radiography, in: Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems, Elsevier, 2021, pp. 327–352.doi:10.1016/B978-0-12-822473-1.00007-0
-
[9]
Y . Gao, M. Gao, F. Wang, et al., Convolution neural network fusion high power halogen lamp induced thermal-wave diffusion multidimensional features: an intelligent defect identification approach for aviation honey- comb sandwich composites, Infrared Physics & Technology 133 (2023) 104772
2023
-
[10]
N. Boaretto, T. M. Centeno, Automated detection of welding defects in pipelines from radiographic images DWDI, NDT & E International 86 (2017) 7–13.doi:10.1016/j.ndteint.2016.11.003
-
[11]
J. Cui, B. Zhang, X. Wang, J. Wu, J. Liu, Y . Li, X. Zhi, W. Zhang, X. Yu, Impact of annotation quality on model performance of welding defect detection using deep learning, Welding in the World (2024).doi:10. 1007/s40194-024-01710-y
2024
-
[12]
L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A. Albahri, B. Al- Dabbagh, M. Fadhel, M. Manoufali, J. Zhang, A. Al-Timemy, Y . Duan, A. Abdullah, L. Farhan, Y . Lu, A. Gupta, F. Albu, A. Abbosh, Y . Gu, A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications, Journal of Big Data 10 (1) (...
-
[13]
P. Ren, Y . Xiao, X. Chang, P. Huang, Z. Li, X. Chen, X. Wang, A survey of deep active learning, ACM Computing Surveys (CSUR) 54 (9) (2020) 1–40.doi:10.1145/3472291
-
[14]
Ahmed, M
S. Ahmed, M. Alam, M. Hassan, M. Rozbu, T. Ishtiak, N. Rafa, M. Mofijur, A. Ali, A. Gandomi, Deep learning modelling techniques: current progress, applications, advantages, and challenges, Artificial Intelligence Review 56 (11) (2023) 13521–13617.doi:10.1007/ s10462-023-10466-8
2023
-
[15]
D. Tsai, P. Jen, Autoencoder-based anomaly detection for surface defect inspection, Advanced Engineering Informatics 48 (2021) 101272.doi: 10.1016/j.aei.2021.101272
-
[16]
J. Chow, Z. Su, J. Wu, P. Tan, X. Mao, Y . Wang, Anomaly detection of defects on concrete structures with the convolutional autoencoder, Ad- vanced Engineering Informatics 45 (2020) 101105.doi:10.1016/j. aei.2020.101105
work page doi:10.1016/j 2020
-
[17]
V . Kim, M. Yang, K. Virupakshappa, E. Oruklu, Ultrasonic flaw clas- sification with self-supervised learning using a dino framework, in: IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Sympo- sium (UFFC-JS), 2024, pp. 1–3.doi:10.1109/UFFC-JS60046.2024. 10793755
-
[18]
J. Huang, X. Yang, F. Zhou, X. Li, B. Zhou, S. Lu, S. Ivashov, I. Gi- annakis, F. Kong, E. Slob, A deep learning framework based on im- proved self-supervised learning for ground-penetrating radar tunnel lin- ing inspection, Computer-Aided Civil and Infrastructure Engineering 39 (2023) 814–833.doi:10.1111/mice.13042
-
[19]
K. Pang, Y . Liu, S. Zhou, Y . Liao, Z. Yin, L. Zhao, H. Chen, Proto-ds: A self-supervised learning-based nondestructive testing approach for food adulteration with imbalanced hyperspectral data, Foods 13 (2024).doi: 10.3390/foods13223598
-
[20]
M. Jeon, M. Choi, W. Choi, J. Ha, H. Oh, Near-surface defect detection in ultrasonic testing using domain-knowledge-informed self-supervised learning, Ultrasonics 147 (2024) 107528.doi:10.1016/j.ultras. 2024.107528
-
[21]
X. Zhang, Y . Chang, D. Chang, J. Du, Ss-fdpnet: A self-supervised de- noising network for radiographic images of ship welds, IEEE Sensors Journal 25 (2025) 14235–14251.doi:10.1109/JSEN.2025.3544236
-
[22]
Intxausti, D
E. Intxausti, D. Sko ˇcaj, C. Cernuda, E. Zugasti, A methodology for ad- vanced manufacturing defect detection through self-supervised learning on x-ray images, Applied Sciences 14 (7) (2024) 2785.doi:10.3390/ app14072785
2024
-
[23]
V . Tunukovic, S. McKnight, R. Pyle, Z. Wang, E. Mohseni, G. Pierce, R. Vithanage, G. Dobie, C. MacLeod, S. Cochran, T. O’Hare, Unsuper- vised machine learning for flaw detection in automated ultrasonic test- ing of carbon fibre reinforced plastic composites, Ultrasonics 140 (2024) 107313.doi:10.1016/j.ultras.2024.107313
-
[24]
Z. Zhou, J. Wang, Z. Yu, Z. Wang, X. Liu, L. Qiu, S. Zhang, Featdae: Introducing features with denoising autoencoder for anomaly detection, IEEE Transactions on Instrumentation and Measurement 74 (2025) 1–14. doi:10.1109/TIM.2025.3565336
-
[25]
K. Yang, T. Liu, Z. Yang, Y . Zhou, Z. Tian, N. Kim, J. Harley, Baseline optimized autoencoder-based unsupervised anomaly detection in uncon- trolled dynamic structural health monitoring, Structural Health Monitor- ing (2025).doi:10.1177/14759217251324107
-
[26]
Chadha, I
G. Chadha, I. Islam, A. Schwung, S. Ding, Deep convolutional clustering- based time series anomaly detection, Sensors 21 (2021).doi:10.3390/ s21165488
2021
-
[27]
T. Shiferaw, L. Yao, Autoencoder-based unsupervised surface defect de- tection using two-stage training, Journal of Imaging 10 (2024).doi: 10.3390/jimaging10050111
-
[28]
Y . Ando, M. Nakajima, T. Saitoh, T. Kato, A study on unsupervised anomaly detection and defect localization using generative model in ul- trasonic non-destructive testing, arXiv preprint arXiv:2405.16580 (2024). doi:10.48550/arXiv.2405.16580
-
[29]
S. Fan, D. Tsai, P. Yeh, Effective variational-autoencoder-based genera- tive models for highly imbalanced fault detection data in semiconductor manufacturing, IEEE Transactions on Semiconductor Manufacturing 36 (2023) 205–214.doi:10.1109/TSM.2023.3238555
-
[30]
J. Klarák, R. Andok, P. Malík, I. Kuric, M. Ritomský, I. Klacková, H. Tsai, From anomaly detection to defect classification, Sensors 24 (2024).doi:10.3390/s24020429
-
[31]
D. Guo, co authors, YOLOv5-MDS: Target detection model for PCB de- fect inspection based on YOLOv5 integrated with Mamba architecture, IEEE Access 13 (2025) 136612–136624.doi:10.1109/ACCESS.2025. 3591987
-
[32]
K. Zhao, co authors, A lightweight xray-YOLO-Mamba model for prohibited item detection in x-ray images using selective state space models, Scientific Reports 15 (2025) Article 96035.doi:10.1038/ s41598-025-96035-1
2025
-
[33]
H. Xiao, co authors, HMNet: High-resolution mamba network for seman- tic segmentation of surface defects, in: 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2025, pp. 13–18.doi:10.1109/CSCWd64889.2025.11033237
-
[34]
H. Du, colleagues, Automated intelligent detection of defects on bridge piers using a climbing operation robot and vision Mamba, IEEE Trans- actions on Instrumentation and Measurement 74 (2025) 1–18.doi: 10.1109/TIM.2025.3577835
-
[35]
W. Tang, collaborators, VM-UNet++research on crack image segmenta- tion based on improved VM-UNet, Scientific Reports 15 (2025) Article 92994.doi:10.1038/s41598-025-92994-7
-
[36]
K. Tang, co authors, A multimodal unsupervised coating defect detec- tion method based on dual-branch hybrid CNN–Mamba U-Net, IEEE Transactions on Instrumentation and Measurement 74 (2025) 1–17.doi: 10.1109/TIM.2025.3556907
-
[37]
Behrouz, M
A. Behrouz, M. Razaviyayn, P. Zhong, V . Mirrokni, Nested learning: The illusion of deep learning architectures, in: Advances in Neural Informa- tion Processing Systems (NeurIPS), 2025
2025
-
[38]
F. Tang, B. Nian, Y . Li, Z. Jiang, J. Yang, W. Liu, S. K. Zhou, Mambamim: Pre-training mamba with state space token interpolation and its applica- tion to medical image segmentation, Medical Image Analysis 103 (2025) 103606.doi:https://doi.org/10.1016/j.media.2025.103606
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.