pith. sign in

arxiv: 2605.25063 · v1 · pith:K72TWWORnew · submitted 2026-05-24 · 💻 cs.LG · cond-mat.mtrl-sci

Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis

Pith reviewed 2026-06-30 12:15 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-sci
keywords reinforcement learninglaser additive manufacturingscan-order optimisationproxy-FEA frameworkreward diagnosisfinite element analysisresidual stressdistortion
0
0 comments X

The pith

A bilevel proxy-FEA framework shows that cheap path-based metrics for RL scan-order optimisation mainly capture distortion with weak correlation to full finite-element stress and plasticity labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a diagnostic approach for using reinforcement learning to choose scan orders in laser additive manufacturing, where the order of laser passes affects heat buildup, residual stresses, distortion and part quality. Full finite-element analysis gives accurate labels but is too slow for repeated use inside RL loops, so the work tests whether inexpensive thermo-inspired proxy metrics can stand in for those labels. A two-level setup runs fast proxies at the lower level to screen candidate scan paths and calls sparse Abaqus simulations at the upper level to supply reference values. On a simplified whole-track heating benchmark with ten scan strategies the analysis finds a stress-distortion trade-off rather than one clear quality direction, and it finds that the proxies align mainly with vertical distortion while showing only weak links to the FEA stress and plasticity numbers. The authors therefore conclude that reward functions built from proxies alone are likely to steer RL policies away from the true thermo-mechanical targets.

Core claim

Within the ten evaluated scan strategies the center_out pattern offers a workable balance between final residual Mises stress and U3 vertical distortion, raster_left_to_right and edge_in sit at opposite ends of that trade-off, and the lightweight path-based proxies predominantly track U3 behaviour while correlating only weakly with the sparse FEA reference labels for stress and PEEQ plasticity.

What carries the argument

The bilevel Proxy--FEA diagnostic framework, in which lower-level lightweight scan-path and thermo-inspired proxies generate and screen candidates while the upper level supplies sparse Abaqus FEA reference labels for reward and world-model diagnosis.

If this is right

  • Proxy-only reward designs carry a risk of misalignment when used for large-scale RL policy optimisation in scan-order tasks.
  • Sparse FEA reference signals provide diagnostic value for refining rewards and world models before full training runs.
  • The observed stress-distortion trade-off implies that a single monotonic quality objective may not exist for these problems.
  • Center_out emerges as a practical compromise candidate among the tested strategies.
  • The framework supports preliminary policy screening at low cost before committing to dense simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A hybrid reward that occasionally inserts full FEA checks could reduce the misalignment risk identified in the proxy-only case.
  • Extending the diagnostic to actual RL training loops on varied geometries would test whether the weak correlations persist under policy-driven exploration.
  • The trade-off finding suggests that multi-objective RL formulations may be more suitable than single scalar rewards for this domain.
  • The same bilevel structure could be applied to diagnose other manufacturing simulation proxies beyond scan-order problems.

Load-bearing premise

The ten representative scan strategies tested on the simplified LDED32 stripe benchmark are enough to reveal misalignment patterns that would appear in more complex geometries and processes.

What would settle it

Running the same proxy-FEA correlation analysis on a different part geometry or process and obtaining strong alignment between the cheap metrics and the FEA labels for stress and plasticity.

Figures

Figures reproduced from arXiv: 2605.25063 by Bin Wang, Dongbin Zhao, Haoran Li, Ruiyao Zhang, Xian Wu, Yuanqi Chu.

Figure 1
Figure 1. Figure 1: Bilevel Proxy–FEA diagnostic framework for RL-guided scan-order optimisation. The lower [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Simplified LDED32 scan-order benchmark. The build region is divided into 32 discrete scan [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FEA reference ranking and normalised metric decomposition. The figure shows the sparse [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stress–distortion trade-off in the LDED32 benchmark. Residual Mises stress is plotted against [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative final-cooling residual field maps. Mises and U3 maps are shown [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ranking robustness across metric-weighting schemes. The heatmap summarises how scan [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Proxy–FEA pairwise agreement and limitation summary. The figure compares v1 and v2 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
read the original abstract

Reinforcement learning offers a promising approach for scan-order optimisation in laser additive manufacturing, where sequential scan decisions critically influence thermal accumulation, residual stress, distortion, and final part quality. A central challenge in applying RL to this domain lies in reward and world-model fidelity: full finite-element analysis is computationally prohibitive for dense in-the-loop evaluation, while cheap thermo-inspired proxy metrics, though efficient, may capture only partial aspects of the true thermo-mechanical objectives. This paper investigates a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in reinforcement-learning-guided scan-order optimisation. The lower level employs lightweight scan-path and thermo-inspired proxies for rapid candidate generation and preliminary policy-side screening, while the upper level utilises sparse Abaqus FEA simulations to provide simulation-based reference labels. The framework is examined on a simplified whole-track heating LDED32 stripe benchmark comprising ten representative scan strategies. Final-cooling residual Mises stress, U3 vertical distortion, and PEEQ plasticity metrics reveal an observed stress--distortion trade-off rather than a single monotonic quality objective. Within the evaluated set, the center_out strategy emerges as a robust compromise candidate, while raster_left_to_right and edge_in form opposing endpoints of the trade-off. Proxy--FEA alignment analysis shows that current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels. These findings highlight that proxy-only reward designs risk misalignment in future RL training and underscore the value of sparse FEA reference signals for diagnostic-guided reward and world-model refinement prior to large-scale policy optimisation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in RL-guided scan-order optimization for laser additive manufacturing. The lower level uses lightweight scan-path and thermo-inspired proxies for rapid screening, while the upper level applies sparse Abaqus FEA simulations as reference labels. Evaluated on a simplified whole-track heating LDED32 stripe benchmark with ten representative scan strategies, the work identifies a stress--distortion trade-off (rather than monotonic quality), positions center_out as a robust compromise, and reports that path-based proxies predominantly capture U3 distortion with only weak correlation to FEA labels, concluding that proxy-only rewards risk misalignment.

Significance. If the misalignment findings hold, the bilevel diagnostic approach offers a practical method for validating RL reward designs in manufacturing domains where full FEA is prohibitive, explicitly highlighting trade-offs and the limitations of cheap proxies. The framework's separation of proxy screening from sparse FEA reference labels is a constructive contribution, though its diagnostic value is currently constrained by the narrow benchmark.

major comments (2)
  1. [Framework examination section] Framework examination section: The central claim that 'current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels' (and thus risk misalignment) rests on evaluation of only ten scan strategies; no evidence is provided that this set is representative of the space of path-based proxies or that the U3 dominance and weak correlation pattern would persist for more complex geometries or processes.
  2. [Abstract and framework examination] Abstract and framework examination: The stress--distortion trade-off is presented as a key finding that 'rather than a single monotonic quality objective,' yet the manuscript provides no quantitative characterization (e.g., correlation coefficients between Mises stress and U3 across the ten strategies or statistical tests) to establish the trade-off's robustness or generality beyond the LDED32 stripe simplification.
minor comments (2)
  1. [Framework examination section] The ten strategies are described as 'representative' without explicit selection criteria or coverage analysis of the scan-order space.
  2. Notation for the bilevel framework (lower/upper levels) could be formalized with a diagram or pseudocode to clarify data flow between proxy screening and FEA labeling.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below with honest responses, indicating revisions where appropriate to improve clarity and rigor while respecting the scope of the simplified benchmark.

read point-by-point responses
  1. Referee: Framework examination section: The central claim that 'current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels' (and thus risk misalignment) rests on evaluation of only ten scan strategies; no evidence is provided that this set is representative of the space of path-based proxies or that the U3 dominance and weak correlation pattern would persist for more complex geometries or processes.

    Authors: The ten strategies were deliberately chosen to span a diverse set of common scan patterns (raster variants, inward/outward spirals, and center_out) on the LDED32 stripe to illustrate the diagnostic framework's utility. We agree that this does not prove representativeness across all possible proxies or that the observed U3 dominance and weak correlations would hold for arbitrary geometries. We will revise the framework examination section to explicitly limit the claim to the evaluated benchmark and to position the bilevel approach as a diagnostic method rather than asserting broad generality. revision: partial

  2. Referee: Abstract and framework examination: The stress--distortion trade-off is presented as a key finding that 'rather than a single monotonic quality objective,' yet the manuscript provides no quantitative characterization (e.g., correlation coefficients between Mises stress and U3 across the ten strategies or statistical tests) to establish the trade-off's robustness or generality beyond the LDED32 stripe simplification.

    Authors: We accept this point. The revised manuscript will include quantitative measures such as Pearson and Spearman correlation coefficients between Mises stress and U3 (and PEEQ) across the ten strategies, plus any relevant statistical tests, to substantiate the trade-off observation within the benchmark. revision: yes

standing simulated objections not resolved
  • Whether the U3 dominance and weak proxy-FEA correlation pattern would persist for more complex geometries or processes (this would require new experiments outside the current simplified LDED32 benchmark).

Circularity Check

0 steps flagged

No circularity detected in proxy-FEA diagnostic evaluation

full rationale

The paper performs an empirical comparison of lightweight proxy metrics against independent sparse Abaqus FEA reference labels on ten fixed scan strategies within a simplified LDED32 stripe benchmark. No equations, fitted parameters, or self-citations are presented as load-bearing derivations; the observed U3 correlation patterns and stress-distortion trade-off emerge directly from the computed outputs rather than by construction from the inputs. The framework treats FEA as an external diagnostic signal, satisfying the self-contained benchmark criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The analysis rests on a domain assumption about benchmark representativeness and introduces a new diagnostic framework without external validation.

axioms (1)
  • domain assumption The simplified LDED32 stripe benchmark with ten scan strategies adequately represents thermo-mechanical behaviors for diagnosing proxy-FEA misalignment in scan-order optimization
    All reported findings and conclusions derive from evaluation on this single simplified case.
invented entities (1)
  • Bilevel Proxy--FEA diagnostic framework no independent evidence
    purpose: To diagnose reward and world-model fidelity for RL-guided scan-order optimization
    Newly proposed structure combining lower-level proxies and upper-level sparse FEA; no independent evidence provided outside the paper.

pith-pipeline@v0.9.1-grok · 5850 in / 1412 out tokens · 42911 ms · 2026-06-30T12:15:24.777907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 23 canonical work pages · 3 internal anchors

  1. [1]

    J. Dar, A. G. Ponsot, C. J. Jolma, D. Lin, A review on scan strategies in laser- based metal additive manufacturing, Journal of Materials Research and Technology 36 (2025) 5425–5467.doi:10.1016/j.jmrt.2025.04.068

  2. [2]

    W.Zhang, M.Tong, N.M.Harrison, Scanningstrategieseffectontemperature, resid- ual stress and deformation by multi-laser beam powder bed fusion manufacturing, Additive Manufacturing 36 (2020) 101507.doi:10.1016/j.addma.2020.101507. 28

  3. [3]

    Hassila, A

    C.-J. Hassila, A. Malmelöv, C. Andersson, J. Hektor, M. Fisk, A. Lundbäck, U. Wik- lund, Influence of scanning strategy on residual stresses in laser-based powder bed fusion manufactured alloy 718: Modeling and experiments, Materials 17 (24) (2024) 6265.doi:10.3390/ma17246265

  4. [4]

    D. Xie, F. Lv, Y. Yang, L. Shen, Z. Tian, C. S. Shuai, B. Chen, J. Zhao, A re- view on distortion and residual stress in additive manufacturing, Chinese Journal of Mechanical Engineering: Additive Manufacturing Frontiers 1 (3) (2022) 100039. doi:10.1016/j.cjmeam.2022.100039

  5. [5]

    N.Bastola, M.P.Jahan, N.Rangasamy, C.S.Rakurty, Areviewoftheresidualstress generation in metal additive manufacturing: Analysis of cause, measurement, effects, and prevention, Micromachines 14 (7) (2023) 1480.doi:10.3390/mi14071480

  6. [6]

    J. Liu, J. Ye, D. Silva Izquierdo, A. Vinel, N. Shamsaei, S. Shao, A review of machine learning techniques for process and performance optimization in laser beam powder bed fusion additive manufacturing, Journal of Intelligent Manufacturing 34 (2023) 3249–3275.doi:10.1007/s10845-022-02012-0

  7. [7]

    S.-H. Wu, U. Tariq, R. s. Joy, T. Sparks, A. Flood, F. Liou, Experimental, com- putational, and machine learning methods for prediction of residual stresses in laser additive manufacturing: A critical review, Materials 17 (7) (2024) 1498. doi:10.3390/ma17071498

  8. [8]

    Zhang, J

    R. Zhang, J. Strickland, X. Hou, F. Yang, X. Li, J. A. de Oliveira, J. Li, S. Zhang, Rapid residual stress simulation and distortion mitigation in laser additive man- ufacturing through machine learning, Additive Manufacturing 102 (2025) 104721. doi:10.1016/j.addma.2025.104721

  9. [9]

    S. Liao, S. Webster, D. Huang, R. Council, K. Ehmann, J. Cao, Simulation-guided variable laser power design for melt pool depth control in directed energy deposition, Additive Manufacturing 56 (2022) 102912.doi:10.1016/j.addma.2022.102912

  10. [10]

    Hedging strategies in academic discourse: A compara- tive analysis of turkish writers and native writers of english.Procedia - Social and Be- havioral Sciences, 158:260–268, 2014

    F. Ogoke, A. B. Farimani, Thermal control of laser powder bed fusion using deep reinforcement learning, Additive Manufacturing 46 (2021) 102033.doi:10.1016/j. addma.2021.102033. 29

  11. [11]

    Vagenas, G

    S. Vagenas, G. Panoutsos, Stability in reinforcement learning process control for ad- ditive manufacturing, IFAC-PapersOnLine 56 (2) (2023) 4719–4724.doi:10.1016/ j.ifacol.2023.10.1233

  12. [12]

    R. d. R. Faria, B. D. O. Capron, A. R. Secchi, M. B. de Souza, Where reinforcement learning meets process control: Review and guidelines, Processes 10 (11) (2022) 2311.doi:10.3390/pr10112311

  13. [13]

    C.Dou, J.Chung, R.Gnanasambandam, Y.Wu, J.Li, Z.J.Kong, Reinforcedscan: a reinforcement learning enabled optimal laser scan path planning in laser powder bed fusion additive manufacturing, The International Journal of Advanced Manufactur- ing Technology 142 (9–10) (2026) 5257–5273.doi:10.1007/s00170-025-17144-9

  14. [14]

    M. Qin, J. Ding, S. Qu, X. Song, C. C. L. Wang, W.-H. Liao, Deep reinforcement learning based toolpath generation for thermal uniformity in laser powder bed fusion process, Additive Manufacturing 79 (2024) 103937.doi:10.1016/j.addma.2023. 103937

  15. [15]

    T. Kim, D. Kim, S.-S. Kwon, S. L. Sing, N. Kim, I. D. Jung, Reinforcement learning- based toolpath optimisation with 3d u-net driven rapid thermal prediction, Virtual and Physical Prototyping 21 (1) (2026) e2627765.doi:10.1080/17452759.2026. 2627765

  16. [16]

    Survey of multifidelity methods in uncertainty propagation, inference, and optimization

    B. Peherstorfer, K. Willcox, M. Gunzburger, Survey of multifidelity methods in uncertainty propagation, inference, and optimization, SIAM Review 60 (3) (2018) 550–591.arXiv:1806.10761,doi:10.1137/16M1082469

  17. [17]

    R. S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin 2 (4) (1991) 160–163.doi:10.1145/122344.122377

  18. [18]

    D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv:1803.10122 (2018). arXiv:1803.10122

  19. [19]

    Concrete Problems in AI Safety

    D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, D. Mané, Concrete problems in ai safety, arXiv preprint arXiv:1606.06565 (2016).arXiv:1606.06565. 30

  20. [20]

    C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hul- lender, Learning to rank using gradient descent, in: Proceedings of the 22nd Inter- national Conference on Machine Learning, ICML ’05, Association for Computing Machinery, 2005, pp. 89–96.doi:10.1145/1102351.1102363

  21. [21]

    K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: Nsga-ii, IEEE Transactions on Evolutionary Computation 6 (2) (2002) 182–197.doi:10.1109/4235.996017

  22. [22]

    Goldak, A

    J. Goldak, A. Chakravarti, M. Bibby, A new finite element model for welding heat sources, Metallurgical Transactions B 15 (2) (1984) 299–305.doi:10.1007/ BF02667333

  23. [23]

    Michaleris, Modeling metal deposition in heat transfer analyses of additive man- ufacturing processes, Finite Elements in Analysis and Design 86 (2014) 51–60

    P. Michaleris, Modeling metal deposition in heat transfer analyses of additive man- ufacturing processes, Finite Elements in Analysis and Design 86 (2014) 51–60. doi:10.1016/j.finel.2014.04.003

  24. [24]

    Q. Yang, P. Zhang, L. Cheng, Z. Min, M. Chyu, A. C. To, Finite element mod- eling and validation of thermomechanical behavior of ti-6al-4v in directed energy deposition additive manufacturing, Additive Manufacturing 12 (2016) 169–177. doi:10.1016/j.addma.2016.06.012

  25. [25]

    T.Mukherjee, W.Zhang, T.DebRoy, Animprovedpredictionofresidualstressesand distortion in additive manufacturing, Computational Materials Science 126 (2017) 360–372.doi:10.1016/j.commatsci.2016.10.003. 31