A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding
Pith reviewed 2026-06-26 01:35 UTC · model grok-4.3
The pith
A multi-task spatiotemporal neural network predicts laser weld penetration state, depth, and cross-section morphology from top-view pool images plus process parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a multi-task model that integrates spatiotemporal features extracted from top weld pool images along with welding parameters, using a convolutional neural network and state space model architecture, together with a dataset-construction method, and report validation performance of 99.35 percent accuracy for penetration state, 1.79 mm error for penetration depth, and 95.65 percent accuracy for weld cross-section reconstruction.
What carries the argument
The multi-task spatiotemporal deep neural network that fuses convolutional and state-space processing of weld-pool image sequences with welding parameters to produce simultaneous predictions of penetration state, depth, and morphology.
If this is right
- The model supplies simultaneous, real-time estimates of three weld-quality metrics from a single camera view.
- The dataset-construction procedure is presented as a way to improve robustness and generalization of similar image-based welding monitors.
- The approach is positioned as a component of in-situ quality control strategies for laser penetration welding systems.
- High test-set numbers on state, depth, and morphology are offered as evidence that image-plus-parameter inputs suffice for these three tasks.
Where Pith is reading between the lines
- If the model runs fast enough on embedded hardware, it could close the loop for automatic adjustment of laser power or speed during welding.
- The same image-sequence plus parameter format might be reused for related processes such as laser cladding if comparable labeled data can be collected.
- Adding a second synchronized camera angle or acoustic emission signals could be tested as a way to reduce the remaining depth error without changing the network architecture.
Load-bearing premise
The dataset-construction method produces training examples whose distribution matches the distribution of future production welds closely enough for the reported test-set numbers to generalize.
What would settle it
Running the trained model on a new collection of welds made under production conditions with different material batches or parameter ranges and observing penetration-state accuracy below 90 percent or depth error above 3 mm would falsify the generalization claim.
read the original abstract
In laser penetration welding, the assessment of penetration state and weld seam morphology plays a crucial role in determining the weld quality. This paper presents a comprehensive introduction of the innovative muti-task deep learning model that has the capability to predict penetration state, depth, and weld seam morphology with high accuracy. The monitoring platform relies on weld pool images captured during the laser welding process using a complementary metal-oxide-semiconductor camera. The proposed model integrates spatiotemporal features extracted from top weld pool images along with welding parameters, establishing a deep learning framework based on convolutional neural networks and state space models for more efficient extraction and processing of spatial-temporal information. Furthermore, a reliable method for constructing the dataset is proposed to enhance both robustness and generalization capability of the developed model. Validation results on the test set demonstrate that prediction accuracy for penetration state can reach 99.35%, while prediction error for penetration depth is 1.79 millimeter, and accuracy of reconstructing the weld cross-section is 95.65%. This study provides new insights and methodologies for in-situ quality control strategies in laser penetration welding systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a multi-task deep neural network for in-situ monitoring in laser penetration welding. It processes top-view weld pool images captured by a CMOS camera together with process parameters using a combination of convolutional networks and state-space models to extract spatiotemporal features. The model simultaneously predicts penetration state (binary classification), penetration depth (regression), and weld cross-section morphology (reconstruction). A custom dataset-construction procedure is proposed to improve robustness and generalization. On a held-out test set the authors report 99.35 % accuracy for penetration-state prediction, 1.79 mm mean error for depth, and 95.65 % accuracy for cross-section reconstruction.
Significance. If the reported test-set metrics prove reliable under independent validation, the work would offer a practical multi-task framework for real-time weld-quality assessment that integrates visual and parametric inputs. Such a system could support closed-loop control in laser welding, reducing defects in high-value manufacturing. The emphasis on spatiotemporal modeling and an explicit dataset-construction method addresses domain-specific challenges, though the absence of ablations and distribution-shift checks limits immediate deployability claims.
major comments (4)
- [Abstract] Abstract: the headline metrics (99.35 % state accuracy, 1.79 mm depth error, 95.65 % morphology accuracy) are presented as single aggregate values with no error bars, confidence intervals, or description of the loss function and aggregation method used to obtain the depth error. Without these details it is impossible to assess whether the numbers reflect stable performance or are sensitive to a few outliers.
- [Abstract] Abstract (final paragraph) and dataset-construction section: the claim that the proposed dataset-construction method improves generalization rests on the unverified assumption that the constructed training and test distributions match future production welds. No statistical distance metrics, covariate-shift tests, or external validation welds collected on different equipment or parameter regimes are reported to support this assumption.
- [Abstract] Abstract: no ablation studies, baseline comparisons, or component-wise analysis are described to justify the multi-task formulation, the choice of state-space model, or the spatiotemporal fusion strategy. Consequently the contribution of each architectural element to the reported numbers cannot be isolated.
- [Abstract] Abstract: the test-set construction is described only at high level; it is unclear whether the held-out examples are temporally or spatially independent of the training data or whether they were collected under identical process conditions, which directly affects the validity of the generalization claim.
minor comments (1)
- [Abstract] Abstract contains the typo "muti-task" (should be "multi-task").
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating revisions made to the manuscript where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline metrics (99.35 % state accuracy, 1.79 mm depth error, 95.65 % morphology accuracy) are presented as single aggregate values with no error bars, confidence intervals, or description of the loss function and aggregation method used to obtain the depth error. Without these details it is impossible to assess whether the numbers reflect stable performance or are sensitive to a few outliers.
Authors: We agree that reporting variability and methodological details strengthens the presentation of results. In the revised manuscript we now include the mean and standard deviation of each metric computed across five independent training runs with different random seeds. The depth error is explicitly defined as mean absolute error (MAE) aggregated over the test samples, and the methods section now details the loss functions (binary cross-entropy for state classification, mean-squared error for depth regression, and a weighted combination of MSE and perceptual loss for morphology reconstruction). revision: yes
-
Referee: [Abstract] Abstract (final paragraph) and dataset-construction section: the claim that the proposed dataset-construction method improves generalization rests on the unverified assumption that the constructed training and test distributions match future production welds. No statistical distance metrics, covariate-shift tests, or external validation welds collected on different equipment or parameter regimes are reported to support this assumption.
Authors: The referee correctly notes the lack of quantitative support for the generalization claim. We have revised the abstract and dataset-construction section to present the method more cautiously as a procedure intended to increase robustness within the collected data regime, and we added qualitative comparisons of image and parameter distributions between training and test splits. Formal statistical distance metrics and external validation on different equipment were not performed; we now explicitly list this as a limitation and direction for future work. revision: partial
-
Referee: [Abstract] Abstract: no ablation studies, baseline comparisons, or component-wise analysis are described to justify the multi-task formulation, the choice of state-space model, or the spatiotemporal fusion strategy. Consequently the contribution of each architectural element to the reported numbers cannot be isolated.
Authors: We acknowledge that the original submission did not isolate component contributions. In the revised manuscript we have added an ablation study subsection that compares the full multi-task model against single-task variants, a version without the state-space model, and alternative spatiotemporal fusion strategies. The new results are summarized in an additional table and support the design decisions. revision: yes
-
Referee: [Abstract] Abstract: the test-set construction is described only at high level; it is unclear whether the held-out examples are temporally or spatially independent of the training data or whether they were collected under identical process conditions, which directly affects the validity of the generalization claim.
Authors: We have expanded the dataset section to clarify that the test set comprises complete, temporally disjoint welding trials collected on separate days using the same equipment and overlapping but not identical parameter settings. This ensures temporal independence while maintaining comparable process conditions; the revised text now states this explicitly. revision: yes
Circularity Check
No circularity; empirical ML performance on held-out test data with no derivations or self-referential reductions
full rationale
The paper introduces a multi-task CNN+state-space model for predicting weld penetration state, depth, and morphology from images and parameters, then reports standard empirical metrics on a held-out test set (99.35% state accuracy, 1.79 mm depth error, 95.65% morphology accuracy). No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains exist. The dataset-construction procedure is described at a high level to support robustness, but the reported numbers are ordinary train/test evaluation and do not reduce to inputs by construction under any of the enumerated circularity patterns. The work is self-contained as empirical validation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.1016/j.ijheatmasstransfer.2018.05.031 Bai, S., Kolter, J.Z., Koltun, V .,
-
[2]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. https://doi.org/10.48550/arXiv.1803.01271 Bertasius, G., Wang, H., Torresani, L.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.01271
-
[3]
Is Space -Time Attention All You Need for Video Understanding? https://doi.org/10.48550/arXiv.2102.05095 Brock, C., Hohenstein, R., Schmidt, M.,
-
[4]
Mechanisms of vapour plume formation in laser deep penetration welding. Opt. Lasers Eng. 58, 93 –101. https://doi.org/10.1016/j.optlaseng.2014.02.001 Cai, W., Wang, J., Jiang, P., Cao, L., Mi, G., Zhou, Q.,
-
[5]
Application of sensing techniques and artificial intelligence-based methods to laser welding real-time monitoring: A critical review of recent literature. J. Manuf. Syst. 57, 1 –18. https://doi.org/10.1016/j.jmsy.2020.07.021 Chang, Z., Zhang, X., Wang, S., Ma, S., Ye, Y ., Xinguang, X., Gao, W.,
-
[6]
Multi -task learning for data- efficient spatiotemporal modeling of tool surface progression in ultrasonic metal welding. J. Manuf. Syst. 58, 306–315. https://doi.org/10.1016/j.jmsy.2020.12.009 Gao, X., Sun, Y ., Katayama, S.,
-
[7]
Neural network of plume and spatter for monitoring high-power disk laser welding. Int. J. Precis. Eng. Manuf. -Green Technol. 1, 293–298. https://doi.org/10.1007/s40684-014-0035-y Gao, X., Zhang, Y .,
-
[8]
Monitoring of welding status by molten pool morphology during high-power disk laser welding. Opt. - Int. J. Light Electron Opt. 126, 1797 –1802. https://doi.org/10.1016/j.ijleo.2015.04.060 Gianfrancesco, A.D.,
-
[9]
Bead geometry prediction and optimization for corner structures in directed energy deposition using machine learning. Addit. Manuf. 84, 104080. https://doi.org/10.1016/j.addma.2024.104080 He, K., Zhang, X., Ren, S., Sun, J.,
-
[10]
Deep Residual Learning for Image Recognition. pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 Hinton, G.E., Salakhutdinov, R.R.,
-
[11]
Reducing the Dimensionality of Data with Neural Networks. Science 313, 504–507. https://doi.org/10.1126/science.1127647 Hong, Y ., Pan, H., Sun, W., Jia, Y .,
-
[12]
https://doi.org/10.48550/arXiv.2101.06085 Kim, C.-H., Ahn, D.-C.,
Deep Dual-resolution Networks for Real- time and Accurate Semantic Segmentation of Road Scenes. https://doi.org/10.48550/arXiv.2101.06085 Kim, C.-H., Ahn, D.-C.,
-
[13]
https://doi.org/10.1016/j.optlastec.2012.02.025 Kingma, D.P., Ba, J.,
-
[14]
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/arXiv.1412.6980 Le-Hong, T., Lin, P.C., Chen, J.- Z., Pham, T.D.Q., Van Tran, X.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980
-
[15]
Data -driven models for predictions of geometric characteristics of bead fabricated by selective laser melting. J. Intell. Manuf. 34, 1241–1257. https://doi.org/10.1007/s10845-021-01845-5 Li, H., Ren, H., Liu, Z., Huang, F., Xia, G., Long, Y .,
-
[16]
In-situ monitoring system for weld geometry of laser welding based on multi- task convolutional neural network model. Measurement 204, 112138. https://doi.org/10.1016/j.measurement.2022.112138 Liu, M., Dan, J., Lu, Z., Yu, Y ., Li, Y ., Li, X.,
-
[17]
CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation. https://doi.org/10.48550/arXiv.2405.10530 Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A.X., Dustdar, S.,
-
[18]
Presented at the International Conference on Learning Representations
Pyraformer: Low- Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting. Presented at the International Conference on Learning Representations. Luo, M., Shin, Y .C., 2015a. Vision-based weld pool boundary extraction and width measurement during keyhole fiber laser welding. Opt. Lasers Eng. 64, 59 –70. https://doi.org/10.1016/j....
-
[19]
Imperfections in narrow gap multi- layer welding - Potential causes and countermeasures. Opt. Lasers Eng. 129, 106011. https://doi.org/10.1016/j.optlaseng.2020.106011 Olague, G., Hernández, D.E., Llamas, P., Clemente, E., Briseño, J.L.,
-
[20]
Brain programming as a new strategy to create visual routines for object tracking. Multimed. Tools Appl. 78, 5881–5918. https://doi.org/10.1007/s11042-018-6634-9 Rahman, M.M., Tutul, A.A., Nath, A., Laishram, L., Jung, S.K., Hammond, T.,
-
[21]
https://doi.org/10.48550/arXiv.2410.03105 Sebestova, H., Chmelickova, H., Nozka, L., Moudry, J.,
Mamba in Vision: A Comprehensive Survey of Techniques and Applications. https://doi.org/10.48550/arXiv.2410.03105 Sebestova, H., Chmelickova, H., Nozka, L., Moudry, J.,
-
[22]
Non- destructive Real Time Monitoring of the Laser Welding Process. J. Mater. Eng. Perform. 21, 764 –769. https://doi.org/10.1007/s11665-012-0193-4 Shelhamer, E., Long, J., Darrell, T.,
-
[23]
Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640 –651. https://doi.org/10.1109/TPAMI.2016.2572683 Shi, X., Chen, Z., Wang, H., Yeung, D.-Y ., Wong, W., Woo, W.,
-
[24]
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. https://doi.org/10.48550/arXiv.1506.04214 Squillace, A., Prisco, U., Ciliberto, S., Astarita, A.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1506.04214
-
[25]
Effect of welding parameters on morphology and mechanical properties of Ti–6Al–4V laser beam welded butt joints. J. Mater. Process. Technol. 212, 427–436. https://doi.org/10.1016/j.jmatprotec.2011.10.005 Taylor, K.E.,
-
[26]
https://doi.org/10.1029/2000JD900719 Wan, X., Wang, Y ., Zhao, D., Huang, Y ., Yin, Z.,
-
[27]
Weld quality monitoring research in small scale resistance spot welding by dynamic resistance and neural network. Measurement 99, 120–127. https://doi.org/10.1016/j.measurement.2016.12.010 Wu, J., Zhang, C., Giam, A., Chia, H.Y ., Cao, H., Ge, W., Yan, W.,
-
[28]
Physics - assisted transfer learning metamodels to predict bead geometry and carbon emission in laser butt welding. Appl. Energy 359, 122682. https://doi.org/10.1016/j.apenergy.2024.122682 Yan, S., Chen, B., Tan, C., Song, X., Wang, G.,
-
[29]
A data -driven time-sequence feature-based composite network of time- distributed CNN -LSTM for detecting pore defects in laser penetration welding. J. Intell. Manuf. https://doi.org/10.1007/s10845 -024- 02391-6 You, D., Gao, X., Katayama, S.,
-
[30]
Data-driven based analyzing and modeling of MIMO laser welding process by integration of six advanced sensors. Int. J. Adv. Manuf. Technol. 82, 1127–1139. https://doi.org/10.1007/s00170-015-7455-x You, D., Gao, X., Katayama, S.,
-
[31]
Multisensor Fusion System for Monitoring High-Power Disk Laser Welding Using Support Vector Machine. IEEE Trans. Ind. Inform. 10, 1285–1295. https://doi.org/10.1109/TII.2014.2309482 Yu, R., Kershaw, J., Wang, P., Zhang, Y .,
-
[32]
How to Accurately Monitor the Weld Penetration From Dynamic Weld Pool Serial Images Using CNN -LSTM Deep Learning Model? IEEE Robot. Autom. Lett. 7, 6519 –6525. https://doi.org/10.1109/LRA.2022.3173659 Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.- P.,
-
[33]
Memory Fusion Network for Multi-view Sequential Learning
Memory Fusion Network for Multi- view Sequential Learning. https://doi.org/10.48550/arXiv.1802.00927 Zhang, B., Hong, K.- M., Shin, Y .C.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.00927
-
[34]
Deep-learning-based porosity monitoring of laser welding process. Manuf. Lett. 23, 62 –66. https://doi.org/10.1016/j.mfglet.2020.01.001 Zhou, F., Liu, X., Jia, C., Li, S., Tian, J., Zhou, W., Wu, C.,
-
[35]
Unified CNN-LSTM for keyhole status prediction in PAW based on spatial-temporal features. Expert Syst. Appl. 237, 121425. https://doi.org/10.1016/j.eswa.2023.121425
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.