Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis
Pith reviewed 2026-06-30 12:15 UTC · model grok-4.3
The pith
A bilevel proxy-FEA framework shows that cheap path-based metrics for RL scan-order optimisation mainly capture distortion with weak correlation to full finite-element stress and plasticity labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within the ten evaluated scan strategies the center_out pattern offers a workable balance between final residual Mises stress and U3 vertical distortion, raster_left_to_right and edge_in sit at opposite ends of that trade-off, and the lightweight path-based proxies predominantly track U3 behaviour while correlating only weakly with the sparse FEA reference labels for stress and PEEQ plasticity.
What carries the argument
The bilevel Proxy--FEA diagnostic framework, in which lower-level lightweight scan-path and thermo-inspired proxies generate and screen candidates while the upper level supplies sparse Abaqus FEA reference labels for reward and world-model diagnosis.
If this is right
- Proxy-only reward designs carry a risk of misalignment when used for large-scale RL policy optimisation in scan-order tasks.
- Sparse FEA reference signals provide diagnostic value for refining rewards and world models before full training runs.
- The observed stress-distortion trade-off implies that a single monotonic quality objective may not exist for these problems.
- Center_out emerges as a practical compromise candidate among the tested strategies.
- The framework supports preliminary policy screening at low cost before committing to dense simulation.
Where Pith is reading between the lines
- A hybrid reward that occasionally inserts full FEA checks could reduce the misalignment risk identified in the proxy-only case.
- Extending the diagnostic to actual RL training loops on varied geometries would test whether the weak correlations persist under policy-driven exploration.
- The trade-off finding suggests that multi-objective RL formulations may be more suitable than single scalar rewards for this domain.
- The same bilevel structure could be applied to diagnose other manufacturing simulation proxies beyond scan-order problems.
Load-bearing premise
The ten representative scan strategies tested on the simplified LDED32 stripe benchmark are enough to reveal misalignment patterns that would appear in more complex geometries and processes.
What would settle it
Running the same proxy-FEA correlation analysis on a different part geometry or process and obtaining strong alignment between the cheap metrics and the FEA labels for stress and plasticity.
Figures
read the original abstract
Reinforcement learning offers a promising approach for scan-order optimisation in laser additive manufacturing, where sequential scan decisions critically influence thermal accumulation, residual stress, distortion, and final part quality. A central challenge in applying RL to this domain lies in reward and world-model fidelity: full finite-element analysis is computationally prohibitive for dense in-the-loop evaluation, while cheap thermo-inspired proxy metrics, though efficient, may capture only partial aspects of the true thermo-mechanical objectives. This paper investigates a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in reinforcement-learning-guided scan-order optimisation. The lower level employs lightweight scan-path and thermo-inspired proxies for rapid candidate generation and preliminary policy-side screening, while the upper level utilises sparse Abaqus FEA simulations to provide simulation-based reference labels. The framework is examined on a simplified whole-track heating LDED32 stripe benchmark comprising ten representative scan strategies. Final-cooling residual Mises stress, U3 vertical distortion, and PEEQ plasticity metrics reveal an observed stress--distortion trade-off rather than a single monotonic quality objective. Within the evaluated set, the center_out strategy emerges as a robust compromise candidate, while raster_left_to_right and edge_in form opposing endpoints of the trade-off. Proxy--FEA alignment analysis shows that current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels. These findings highlight that proxy-only reward designs risk misalignment in future RL training and underscore the value of sparse FEA reference signals for diagnostic-guided reward and world-model refinement prior to large-scale policy optimisation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in RL-guided scan-order optimization for laser additive manufacturing. The lower level uses lightweight scan-path and thermo-inspired proxies for rapid screening, while the upper level applies sparse Abaqus FEA simulations as reference labels. Evaluated on a simplified whole-track heating LDED32 stripe benchmark with ten representative scan strategies, the work identifies a stress--distortion trade-off (rather than monotonic quality), positions center_out as a robust compromise, and reports that path-based proxies predominantly capture U3 distortion with only weak correlation to FEA labels, concluding that proxy-only rewards risk misalignment.
Significance. If the misalignment findings hold, the bilevel diagnostic approach offers a practical method for validating RL reward designs in manufacturing domains where full FEA is prohibitive, explicitly highlighting trade-offs and the limitations of cheap proxies. The framework's separation of proxy screening from sparse FEA reference labels is a constructive contribution, though its diagnostic value is currently constrained by the narrow benchmark.
major comments (2)
- [Framework examination section] Framework examination section: The central claim that 'current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels' (and thus risk misalignment) rests on evaluation of only ten scan strategies; no evidence is provided that this set is representative of the space of path-based proxies or that the U3 dominance and weak correlation pattern would persist for more complex geometries or processes.
- [Abstract and framework examination] Abstract and framework examination: The stress--distortion trade-off is presented as a key finding that 'rather than a single monotonic quality objective,' yet the manuscript provides no quantitative characterization (e.g., correlation coefficients between Mises stress and U3 across the ten strategies or statistical tests) to establish the trade-off's robustness or generality beyond the LDED32 stripe simplification.
minor comments (2)
- [Framework examination section] The ten strategies are described as 'representative' without explicit selection criteria or coverage analysis of the scan-order space.
- Notation for the bilevel framework (lower/upper levels) could be formalized with a diagram or pseudocode to clarify data flow between proxy screening and FEA labeling.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below with honest responses, indicating revisions where appropriate to improve clarity and rigor while respecting the scope of the simplified benchmark.
read point-by-point responses
-
Referee: Framework examination section: The central claim that 'current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels' (and thus risk misalignment) rests on evaluation of only ten scan strategies; no evidence is provided that this set is representative of the space of path-based proxies or that the U3 dominance and weak correlation pattern would persist for more complex geometries or processes.
Authors: The ten strategies were deliberately chosen to span a diverse set of common scan patterns (raster variants, inward/outward spirals, and center_out) on the LDED32 stripe to illustrate the diagnostic framework's utility. We agree that this does not prove representativeness across all possible proxies or that the observed U3 dominance and weak correlations would hold for arbitrary geometries. We will revise the framework examination section to explicitly limit the claim to the evaluated benchmark and to position the bilevel approach as a diagnostic method rather than asserting broad generality. revision: partial
-
Referee: Abstract and framework examination: The stress--distortion trade-off is presented as a key finding that 'rather than a single monotonic quality objective,' yet the manuscript provides no quantitative characterization (e.g., correlation coefficients between Mises stress and U3 across the ten strategies or statistical tests) to establish the trade-off's robustness or generality beyond the LDED32 stripe simplification.
Authors: We accept this point. The revised manuscript will include quantitative measures such as Pearson and Spearman correlation coefficients between Mises stress and U3 (and PEEQ) across the ten strategies, plus any relevant statistical tests, to substantiate the trade-off observation within the benchmark. revision: yes
- Whether the U3 dominance and weak proxy-FEA correlation pattern would persist for more complex geometries or processes (this would require new experiments outside the current simplified LDED32 benchmark).
Circularity Check
No circularity detected in proxy-FEA diagnostic evaluation
full rationale
The paper performs an empirical comparison of lightweight proxy metrics against independent sparse Abaqus FEA reference labels on ten fixed scan strategies within a simplified LDED32 stripe benchmark. No equations, fitted parameters, or self-citations are presented as load-bearing derivations; the observed U3 correlation patterns and stress-distortion trade-off emerge directly from the computed outputs rather than by construction from the inputs. The framework treats FEA as an external diagnostic signal, satisfying the self-contained benchmark criterion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The simplified LDED32 stripe benchmark with ten scan strategies adequately represents thermo-mechanical behaviors for diagnosing proxy-FEA misalignment in scan-order optimization
invented entities (1)
-
Bilevel Proxy--FEA diagnostic framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
J. Dar, A. G. Ponsot, C. J. Jolma, D. Lin, A review on scan strategies in laser- based metal additive manufacturing, Journal of Materials Research and Technology 36 (2025) 5425–5467.doi:10.1016/j.jmrt.2025.04.068
-
[2]
W.Zhang, M.Tong, N.M.Harrison, Scanningstrategieseffectontemperature, resid- ual stress and deformation by multi-laser beam powder bed fusion manufacturing, Additive Manufacturing 36 (2020) 101507.doi:10.1016/j.addma.2020.101507. 28
-
[3]
C.-J. Hassila, A. Malmelöv, C. Andersson, J. Hektor, M. Fisk, A. Lundbäck, U. Wik- lund, Influence of scanning strategy on residual stresses in laser-based powder bed fusion manufactured alloy 718: Modeling and experiments, Materials 17 (24) (2024) 6265.doi:10.3390/ma17246265
-
[4]
D. Xie, F. Lv, Y. Yang, L. Shen, Z. Tian, C. S. Shuai, B. Chen, J. Zhao, A re- view on distortion and residual stress in additive manufacturing, Chinese Journal of Mechanical Engineering: Additive Manufacturing Frontiers 1 (3) (2022) 100039. doi:10.1016/j.cjmeam.2022.100039
-
[5]
N.Bastola, M.P.Jahan, N.Rangasamy, C.S.Rakurty, Areviewoftheresidualstress generation in metal additive manufacturing: Analysis of cause, measurement, effects, and prevention, Micromachines 14 (7) (2023) 1480.doi:10.3390/mi14071480
-
[6]
J. Liu, J. Ye, D. Silva Izquierdo, A. Vinel, N. Shamsaei, S. Shao, A review of machine learning techniques for process and performance optimization in laser beam powder bed fusion additive manufacturing, Journal of Intelligent Manufacturing 34 (2023) 3249–3275.doi:10.1007/s10845-022-02012-0
-
[7]
S.-H. Wu, U. Tariq, R. s. Joy, T. Sparks, A. Flood, F. Liou, Experimental, com- putational, and machine learning methods for prediction of residual stresses in laser additive manufacturing: A critical review, Materials 17 (7) (2024) 1498. doi:10.3390/ma17071498
-
[8]
R. Zhang, J. Strickland, X. Hou, F. Yang, X. Li, J. A. de Oliveira, J. Li, S. Zhang, Rapid residual stress simulation and distortion mitigation in laser additive man- ufacturing through machine learning, Additive Manufacturing 102 (2025) 104721. doi:10.1016/j.addma.2025.104721
-
[9]
S. Liao, S. Webster, D. Huang, R. Council, K. Ehmann, J. Cao, Simulation-guided variable laser power design for melt pool depth control in directed energy deposition, Additive Manufacturing 56 (2022) 102912.doi:10.1016/j.addma.2022.102912
-
[10]
F. Ogoke, A. B. Farimani, Thermal control of laser powder bed fusion using deep reinforcement learning, Additive Manufacturing 46 (2021) 102033.doi:10.1016/j. addma.2021.102033. 29
work page doi:10.1016/j 2021
-
[11]
Vagenas, G
S. Vagenas, G. Panoutsos, Stability in reinforcement learning process control for ad- ditive manufacturing, IFAC-PapersOnLine 56 (2) (2023) 4719–4724.doi:10.1016/ j.ifacol.2023.10.1233
2023
-
[12]
R. d. R. Faria, B. D. O. Capron, A. R. Secchi, M. B. de Souza, Where reinforcement learning meets process control: Review and guidelines, Processes 10 (11) (2022) 2311.doi:10.3390/pr10112311
-
[13]
C.Dou, J.Chung, R.Gnanasambandam, Y.Wu, J.Li, Z.J.Kong, Reinforcedscan: a reinforcement learning enabled optimal laser scan path planning in laser powder bed fusion additive manufacturing, The International Journal of Advanced Manufactur- ing Technology 142 (9–10) (2026) 5257–5273.doi:10.1007/s00170-025-17144-9
-
[14]
M. Qin, J. Ding, S. Qu, X. Song, C. C. L. Wang, W.-H. Liao, Deep reinforcement learning based toolpath generation for thermal uniformity in laser powder bed fusion process, Additive Manufacturing 79 (2024) 103937.doi:10.1016/j.addma.2023. 103937
-
[15]
T. Kim, D. Kim, S.-S. Kwon, S. L. Sing, N. Kim, I. D. Jung, Reinforcement learning- based toolpath optimisation with 3d u-net driven rapid thermal prediction, Virtual and Physical Prototyping 21 (1) (2026) e2627765.doi:10.1080/17452759.2026. 2627765
-
[16]
Survey of multifidelity methods in uncertainty propagation, inference, and optimization
B. Peherstorfer, K. Willcox, M. Gunzburger, Survey of multifidelity methods in uncertainty propagation, inference, and optimization, SIAM Review 60 (3) (2018) 550–591.arXiv:1806.10761,doi:10.1137/16M1082469
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/16m1082469 2018
-
[17]
R. S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin 2 (4) (1991) 160–163.doi:10.1145/122344.122377
-
[18]
D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv:1803.10122 (2018). arXiv:1803.10122
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Concrete Problems in AI Safety
D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, D. Mané, Concrete problems in ai safety, arXiv preprint arXiv:1606.06565 (2016).arXiv:1606.06565. 30
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hul- lender, Learning to rank using gradient descent, in: Proceedings of the 22nd Inter- national Conference on Machine Learning, ICML ’05, Association for Computing Machinery, 2005, pp. 89–96.doi:10.1145/1102351.1102363
-
[21]
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: Nsga-ii, IEEE Transactions on Evolutionary Computation 6 (2) (2002) 182–197.doi:10.1109/4235.996017
-
[22]
Goldak, A
J. Goldak, A. Chakravarti, M. Bibby, A new finite element model for welding heat sources, Metallurgical Transactions B 15 (2) (1984) 299–305.doi:10.1007/ BF02667333
1984
-
[23]
P. Michaleris, Modeling metal deposition in heat transfer analyses of additive man- ufacturing processes, Finite Elements in Analysis and Design 86 (2014) 51–60. doi:10.1016/j.finel.2014.04.003
-
[24]
Q. Yang, P. Zhang, L. Cheng, Z. Min, M. Chyu, A. C. To, Finite element mod- eling and validation of thermomechanical behavior of ti-6al-4v in directed energy deposition additive manufacturing, Additive Manufacturing 12 (2016) 169–177. doi:10.1016/j.addma.2016.06.012
-
[25]
T.Mukherjee, W.Zhang, T.DebRoy, Animprovedpredictionofresidualstressesand distortion in additive manufacturing, Computational Materials Science 126 (2017) 360–372.doi:10.1016/j.commatsci.2016.10.003. 31
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.