Application of Reinforcement Learning for Multigroup Energy Grid Optimization for Neutron Transport Criticality Problems
Pith reviewed 2026-06-29 09:44 UTC · model grok-4.3
The pith
Reinforcement learning with surrogate models identifies energy group structures that outperform standard choices in neutron transport criticality calculations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A modified PPO reinforcement learning agent can select which energy bounds to retain when coarsening a high-fidelity grid for multigroup neutron transport, guided by surrogate-model evaluations that embed energy, material, and spatial features, thereby producing group structures that deliver higher accuracy than standard libraries on the Godiva and BeRP spherical criticality problems.
What carries the argument
A modified proximal policy optimization (PPO) algorithm that treats energy grid coarsening as an action sequence, rewarding k-effective accuracy and group economy, with neural-network surrogates supplying fast fitness estimates that incorporate energy, material, and spatial data.
If this is right
- Multigroup neutron transport calculations for spherical criticality problems can reach target accuracy with fewer energy groups than fixed library structures.
- Optimization of group structures can proceed without repeated full transport runs once the surrogate models are trained.
- The RL procedure supplies an alternative route to group-structure selection that avoids the locality constraints of hierarchical agglomeration.
- The same reward formulation can be reused across different one-dimensional spherical problems by retraining only the surrogate and policy on new material data.
Where Pith is reading between the lines
- If the surrogate models generalize across geometries, the method could be applied to two- or three-dimensional transport problems without redesigning the RL loop.
- Incorporating explicit computational-cost terms into the reward could produce group structures that trade accuracy against runtime in a single optimization pass.
- The bound-removal formulation might transfer to other particle-transport contexts such as photon or charged-particle multigroup problems.
Load-bearing premise
Neural network surrogate models that incorporate energy, material, and spatial information accurately evaluate candidate energy grid structures without requiring full transport simulations.
What would settle it
Running a full transport simulation on the Godiva problem with the RL-optimized group structure and obtaining a larger k-effective error than the error from a standard 16- or 27-group structure would falsify the outperformance claim.
Figures
read the original abstract
The optimization of energy group structures is integral to ensure the accuracy of multigroup neutron transport calculations. This works introduces the use of reinforcement learning (RL) with surrogate modeling to optimize the group structure for one-dimensional spherical k-criticality problems. The proximal policy optimization (PPO) RL algorithm is modified to be used with energy grid structures, rewarding accurate group structures while favoring fewer energy groups. This method starts from a high-fidelity energy grid and remove energy bounds until reaching a target energy structure. The RL agent identify which bounds are important for the final group structure, which prevent it being stuck in local minima without limiting the initial group structure. Neural network surrogate models that incorporate energy, material, and spatial information are used for evaluating energy grid structures without requiring full transport simulations. This alleviates the computational constraint commonly used in other group structure optimization problems in addition to accelerating the RL training process. Applied to Godiva and BeRP ball problems, the RL constructed group structures outperform commonly used group structures. The RL group structure optimization method is also shown to perform similar to the hierarchical agglomeration approach but offers more flexibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a reinforcement learning method based on proximal policy optimization (PPO) to optimize multigroup energy structures for 1D spherical k-criticality neutron transport problems. Starting from a fine grid, the agent removes energy bounds while a neural-network surrogate (incorporating energy, material, and spatial features) supplies the reward signal to avoid repeated full transport solves. The reward balances k-eff accuracy against the number of groups. On the Godiva and BeRP ball problems the resulting RL grids are stated to outperform standard group structures and to perform comparably to hierarchical agglomeration while offering greater flexibility.
Significance. If the surrogate predictions are shown to be sufficiently accurate relative to the reported performance deltas and if the final RL grids are independently verified with full transport calculations, the work would supply a flexible, surrogate-accelerated alternative to existing group-structure optimization techniques. The absence of quantitative metrics, error bars, or surrogate-validation tables in the abstract, however, leaves the magnitude and robustness of the claimed improvements unassessable from the provided material.
major comments (2)
- [Results] Results section (and any associated tables/figures): the headline claim that RL grids outperform common structures on Godiva and BeRP requires that surrogate k-eff errors be demonstrably smaller than the performance gains versus baselines and versus hierarchical agglomeration. No such surrogate-validation data (e.g., mean absolute error on held-out grids, comparison of surrogate vs. full-transport k-eff for the final RL structures) is referenced in the abstract; without it the optimization could be guided by surrogate bias rather than true physics.
- [Methods] Methods (surrogate model description): the neural-network surrogate is asserted to incorporate energy, material, and spatial information and to replace full transport solves during training. The manuscript must quantify the surrogate's predictive accuracy on the specific criticality metric used in the reward (k-eff or equivalent) and show that this accuracy margin exceeds the optimization deltas; otherwise the flexibility advantage over agglomeration cannot be substantiated.
minor comments (2)
- [Abstract] Abstract: quantitative metrics, error bars, and explicit comparison tables are absent; adding at least the final k-eff values and group counts for RL, standard, and agglomeration structures would allow immediate assessment of the claimed outperformance.
- [Methods] The reward weighting between accuracy and group count is listed as a free parameter; its specific values and sensitivity should be reported so that the balance achieved by the reported grids can be reproduced.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight the need for explicit surrogate validation to support the claimed performance gains. We address each major comment below and have revised the manuscript accordingly by adding the requested quantitative metrics and comparisons.
read point-by-point responses
-
Referee: [Results] Results section (and any associated tables/figures): the headline claim that RL grids outperform common structures on Godiva and BeRP requires that surrogate k-eff errors be demonstrably smaller than the performance gains versus baselines and versus hierarchical agglomeration. No such surrogate-validation data (e.g., mean absolute error on held-out grids, comparison of surrogate vs. full-transport k-eff for the final RL structures) is referenced in the abstract; without it the optimization could be guided by surrogate bias rather than true physics.
Authors: We agree that explicit demonstration of surrogate accuracy relative to the reported deltas is essential to substantiate the results. In the revised manuscript we have added a dedicated surrogate-validation subsection (with new Table X and Figure Y) reporting mean absolute error on held-out grids, direct surrogate-versus-full-transport k-eff comparisons for the final RL structures on both Godiva and BeRP, and error bars across multiple random seeds. These data confirm that surrogate errors remain smaller than the observed performance gains versus both standard group structures and hierarchical agglomeration. revision: yes
-
Referee: [Methods] Methods (surrogate model description): the neural-network surrogate is asserted to incorporate energy, material, and spatial information and to replace full transport solves during training. The manuscript must quantify the surrogate's predictive accuracy on the specific criticality metric used in the reward (k-eff or equivalent) and show that this accuracy margin exceeds the optimization deltas; otherwise the flexibility advantage over agglomeration cannot be substantiated.
Authors: We have expanded the surrogate-model description in the Methods section to include quantitative accuracy metrics (MAE, max error, and R²) specifically for the k-eff reward signal, evaluated on an independent test set of energy grids. The revised text also directly compares these accuracy margins to the optimization deltas versus baselines and agglomeration, demonstrating that the surrogate error is sufficiently small to support the flexibility claims. These additions are cross-referenced to the new validation results in the Results section. revision: yes
Circularity Check
No significant circularity; method is standard RL + surrogate optimization
full rationale
The paper applies proximal policy optimization (PPO) to remove energy bounds from an initial fine grid, using a neural surrogate (incorporating energy/material/spatial features) to supply the reward signal for k-criticality accuracy while penalizing group count. No equations, definitions, or performance claims reduce by construction to fitted parameters or self-referential inputs; the surrogate is trained separately and the final grids are evaluated on Godiva/BeRP problems against external baselines (common structures and hierarchical agglomeration). No self-citations appear as load-bearing premises, no uniqueness theorems are imported, and no ansatz or renaming is smuggled in. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- reward weighting between accuracy and group count
axioms (1)
- domain assumption Surrogate neural network predictions are sufficiently accurate to guide RL optimization toward globally better structures
Reference graph
Works this paper leans on
-
[1]
E. E. Lewis, W. F. Miller, Computational Methods of Neutron Trans- port, American Nuclear Society Scientific Publications, 1993
1993
-
[2]
Macfarlane, D
R. Macfarlane, D. W. Muir, R. M. Boicourt, A. C. Kahler III, J. L. Conlin, The NJOY nuclear data processing system, version 2016, Tech. Rep. LA-UR-17-20093, Los Alamos National Laboratory (2017)
2016
-
[3]
Fasina, T
O. Fasina, T. Saller, Particle swarm optimisation for group structure op- timization for radiotherapy shielding, in: Proceedings of the PHYSOR- 2022, American Nuclear Society, Pittsburgh, PA, USA, 2022
2022
-
[4]
C. Yi, G. Sjoden, Energy group structure determination using particle swarm optimization, Annals of Nuclear Energy 56 (2013) 53–56
2013
-
[5]
J. J. Berry, T. G. Saller, Hierarchical division and clustering of group structures, Tech. Rep. LA-UR-22-29528, Los Alamos National Labora- tory (2022)
2022
-
[6]
N. K. Rouse, B. J. Whewell, N. A. Gibson, Metaheuristic feature selec- tion for energy group optimization and analysis, Tech. Rep. LA-UR–25- 29285, Los Alamos National Laboratory (2025)
2025
-
[7]
J. J. Berry, G. G. Gil-Delgado, A. G. Osborne, Classification of group structures for a multigroup collision probability model using machine learning, Annals of Nuclear Energy 160 (2021) 108367. 25
2021
-
[8]
T. G. Saller, V. Nair, A. Till, N. Gibson, Using a random forest model to choose optimized group structures, Nuclear science and engineering 197 (8) (2023) 2117–2135
2023
-
[9]
Chandrasekhar, Radiative Transfer, Dover, New York, 1950
S. Chandrasekhar, Radiative Transfer, Dover, New York, 1950
1950
-
[10]
rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)
R.E.MacFarlane, D.W.Muir, Thenjoynucleardataprocessingsystem: Volume 3, the groupr, gaminr, and moder modules, Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)
1987
-
[11]
R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, The MIT Press, Cambridge, Massachusetts, USA, 2020
2020
-
[12]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
Raffin, A
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dor- mann, Stable-baselines3: Reliable reinforcement learning implementa- tions, Journal of Machine Learning Research 22 (268) (2021) 1–8. URLhttp://jmlr.org/papers/v22/20-1364.html
2021
-
[14]
S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, The International FLAIRS Conference Proceedings 35 (May 2022). doi:10.32473/flairs.v35i.130584
- [15]
-
[16]
Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neu- ral networks: Analysis, applications, and prospects, IEEE Transactions on Neural Networks and Learning Systems 33 (12) (2022) 6999–7019
2022
-
[17]
Hochreiter, J
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural com- putation 9 (8) (1997) 1735–1780
1997
-
[18]
J. D. Bess, T. Ivanova, L. Scott, I. Hill, The 2019 edition of the icsbep handbook, Tech. rep., Idaho National Laboratory (INL), Idaho Falls, ID (United States) (2019). 26
2019
-
[19]
T. F. Wimett, R. H. White, W. R. Stratton, D. P. Wood, Godiva ii—an unmoderated pulse-irradiation reactor*, Nuclear Science and Engineer- ing 8 (6) (1960) 691–708
1960
-
[20]
Residual Reinforcement Learning for Robot Control
T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, S. Levine, Residual reinforcement learning for robot control, arXiv preprint arXiv:1812.03201 (2018). 27
work page internal anchor Pith review Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.