Application of Reinforcement Learning for Multigroup Energy Grid Optimization for Neutron Transport Criticality Problems

Ajeeta Khatiwada; Ben Whewell; Nathan Gibson

arxiv: 2605.27895 · v1 · pith:LAS7XQILnew · submitted 2026-05-27 · ⚛️ physics.comp-ph

Application of Reinforcement Learning for Multigroup Energy Grid Optimization for Neutron Transport Criticality Problems

Ben Whewell , Nathan Gibson , Ajeeta Khatiwada This is my paper

Pith reviewed 2026-06-29 09:44 UTC · model grok-4.3

classification ⚛️ physics.comp-ph

keywords reinforcement learningmultigroup energy gridneutron transportcriticality problemssurrogate modelingGodivaBeRP ballproximal policy optimization

0 comments

The pith

Reinforcement learning with surrogate models identifies energy group structures that outperform standard choices in neutron transport criticality calculations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces reinforcement learning to optimize multigroup energy structures for one-dimensional spherical k-criticality problems in neutron transport. A modified proximal policy optimization algorithm starts from a high-fidelity grid and removes bounds, using a reward that favors both accuracy and fewer groups. Neural network surrogates that take energy, material, and spatial information as input evaluate candidate grids without running full transport simulations. Tests on the Godiva and BeRP ball problems show the RL-derived structures yield better results than commonly used group structures. The method achieves performance comparable to hierarchical agglomeration while offering greater flexibility in the search process.

Core claim

A modified PPO reinforcement learning agent can select which energy bounds to retain when coarsening a high-fidelity grid for multigroup neutron transport, guided by surrogate-model evaluations that embed energy, material, and spatial features, thereby producing group structures that deliver higher accuracy than standard libraries on the Godiva and BeRP spherical criticality problems.

What carries the argument

A modified proximal policy optimization (PPO) algorithm that treats energy grid coarsening as an action sequence, rewarding k-effective accuracy and group economy, with neural-network surrogates supplying fast fitness estimates that incorporate energy, material, and spatial data.

If this is right

Multigroup neutron transport calculations for spherical criticality problems can reach target accuracy with fewer energy groups than fixed library structures.
Optimization of group structures can proceed without repeated full transport runs once the surrogate models are trained.
The RL procedure supplies an alternative route to group-structure selection that avoids the locality constraints of hierarchical agglomeration.
The same reward formulation can be reused across different one-dimensional spherical problems by retraining only the surrogate and policy on new material data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the surrogate models generalize across geometries, the method could be applied to two- or three-dimensional transport problems without redesigning the RL loop.
Incorporating explicit computational-cost terms into the reward could produce group structures that trade accuracy against runtime in a single optimization pass.
The bound-removal formulation might transfer to other particle-transport contexts such as photon or charged-particle multigroup problems.

Load-bearing premise

Neural network surrogate models that incorporate energy, material, and spatial information accurately evaluate candidate energy grid structures without requiring full transport simulations.

What would settle it

Running a full transport simulation on the Godiva problem with the RL-optimized group structure and obtaining a larger k-effective error than the error from a standard 16- or 27-group structure would falsify the outperformance claim.

Figures

Figures reproduced from arXiv: 2605.27895 by Ajeeta Khatiwada, Ben Whewell, Nathan Gibson.

**Figure 2.** Figure 2: Error distribution for the k-effective, total reaction rate, ν-fission reaction rate, and absorption reaction rate errors in Eq. (12). Each data point comes from the Godiva training data. The k-effective error distribution differs in both magnitude and shape [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Additionally, the relative k-effective error is multiplied by 3 to give it greater significance than the combined reaction rate errors in the root-sumsquare equation. 5.3. Classification Surrogate Model Training data for the surrogate model is created by generating random group structures that are subsets of LANL618. These group structures are used in the transport code to create simulation data and its a… view at source ↗

**Figure 3.** Figure 3: Transformed error distribution for the k-effective, total reaction rate, ν-fission reaction rate, and absorption reaction rate errors in Eq. (12) for the Godiva training data. A quantile transformation and scaling the values between 0 and 10 were used to transform the training data in [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: The confusion matrix for the Godiva surrogate models on test data showing [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Test episode for the Godiva RL models compared to LANL30 and LANL70. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Test episodes from Fig. 5. These results show the total reaction rate and [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of the uranium total cross section for the Godiva problem using [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: The confusion matrix for the BeRP ball surrogate models on test data showing [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Test episode results for the RL BeRP ball model compared to the LANL30 and [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of the beryllium total cross section for the BeRP ball using RL [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

read the original abstract

The optimization of energy group structures is integral to ensure the accuracy of multigroup neutron transport calculations. This works introduces the use of reinforcement learning (RL) with surrogate modeling to optimize the group structure for one-dimensional spherical k-criticality problems. The proximal policy optimization (PPO) RL algorithm is modified to be used with energy grid structures, rewarding accurate group structures while favoring fewer energy groups. This method starts from a high-fidelity energy grid and remove energy bounds until reaching a target energy structure. The RL agent identify which bounds are important for the final group structure, which prevent it being stuck in local minima without limiting the initial group structure. Neural network surrogate models that incorporate energy, material, and spatial information are used for evaluating energy grid structures without requiring full transport simulations. This alleviates the computational constraint commonly used in other group structure optimization problems in addition to accelerating the RL training process. Applied to Godiva and BeRP ball problems, the RL constructed group structures outperform commonly used group structures. The RL group structure optimization method is also shown to perform similar to the hierarchical agglomeration approach but offers more flexibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RL grid pruning with surrogates beats standard structures on Godiva/BeRP but the surrogate's accuracy relative to the gains is not shown.

read the letter

The core of this paper is a modified PPO agent that starts from a fine energy grid and removes bounds to reach a target number of groups for 1D spherical k-criticality problems. A neural surrogate that folds in energy, material, and spatial features supplies the reward signal so the agent does not need a full transport solve for every candidate. On Godiva and BeRP they report the resulting grids outperform common fixed structures and perform similarly to hierarchical agglomeration while allowing more flexibility.

The practical move is the surrogate plus the energy-aware PPO modification; that combination makes the search tractable where repeated transport runs would not. The authors correctly note that the agent can identify which bounds matter without being locked into a preset initial structure.

The soft spot is exactly the one the stress-test note flags. The headline result requires the surrogate error on k-eff to be smaller than the reported improvements over baselines. The abstract supplies no surrogate validation numbers, no comparison of surrogate versus full-transport k-eff on the final grids, and no error bars. Without that check it is possible the agent is simply finding grids that look good under the surrogate. The claim of greater flexibility than agglomeration also rests on the same unverified signal.

This is for computational nuclear engineers who already run multigroup codes and want an automated way to choose groups. A reader who works on RL for discrete optimization in physics simulations will see a concrete use case, but anyone expecting quantitative evidence will need the full tables and validation plots.

Send it to review. The method is a straightforward extension that addresses a real workflow pain point, and the authors should be able to add the missing surrogate checks and full-transport verification in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a reinforcement learning method based on proximal policy optimization (PPO) to optimize multigroup energy structures for 1D spherical k-criticality neutron transport problems. Starting from a fine grid, the agent removes energy bounds while a neural-network surrogate (incorporating energy, material, and spatial features) supplies the reward signal to avoid repeated full transport solves. The reward balances k-eff accuracy against the number of groups. On the Godiva and BeRP ball problems the resulting RL grids are stated to outperform standard group structures and to perform comparably to hierarchical agglomeration while offering greater flexibility.

Significance. If the surrogate predictions are shown to be sufficiently accurate relative to the reported performance deltas and if the final RL grids are independently verified with full transport calculations, the work would supply a flexible, surrogate-accelerated alternative to existing group-structure optimization techniques. The absence of quantitative metrics, error bars, or surrogate-validation tables in the abstract, however, leaves the magnitude and robustness of the claimed improvements unassessable from the provided material.

major comments (2)

[Results] Results section (and any associated tables/figures): the headline claim that RL grids outperform common structures on Godiva and BeRP requires that surrogate k-eff errors be demonstrably smaller than the performance gains versus baselines and versus hierarchical agglomeration. No such surrogate-validation data (e.g., mean absolute error on held-out grids, comparison of surrogate vs. full-transport k-eff for the final RL structures) is referenced in the abstract; without it the optimization could be guided by surrogate bias rather than true physics.
[Methods] Methods (surrogate model description): the neural-network surrogate is asserted to incorporate energy, material, and spatial information and to replace full transport solves during training. The manuscript must quantify the surrogate's predictive accuracy on the specific criticality metric used in the reward (k-eff or equivalent) and show that this accuracy margin exceeds the optimization deltas; otherwise the flexibility advantage over agglomeration cannot be substantiated.

minor comments (2)

[Abstract] Abstract: quantitative metrics, error bars, and explicit comparison tables are absent; adding at least the final k-eff values and group counts for RL, standard, and agglomeration structures would allow immediate assessment of the claimed outperformance.
[Methods] The reward weighting between accuracy and group count is listed as a free parameter; its specific values and sensitivity should be reported so that the balance achieved by the reported grids can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight the need for explicit surrogate validation to support the claimed performance gains. We address each major comment below and have revised the manuscript accordingly by adding the requested quantitative metrics and comparisons.

read point-by-point responses

Referee: [Results] Results section (and any associated tables/figures): the headline claim that RL grids outperform common structures on Godiva and BeRP requires that surrogate k-eff errors be demonstrably smaller than the performance gains versus baselines and versus hierarchical agglomeration. No such surrogate-validation data (e.g., mean absolute error on held-out grids, comparison of surrogate vs. full-transport k-eff for the final RL structures) is referenced in the abstract; without it the optimization could be guided by surrogate bias rather than true physics.

Authors: We agree that explicit demonstration of surrogate accuracy relative to the reported deltas is essential to substantiate the results. In the revised manuscript we have added a dedicated surrogate-validation subsection (with new Table X and Figure Y) reporting mean absolute error on held-out grids, direct surrogate-versus-full-transport k-eff comparisons for the final RL structures on both Godiva and BeRP, and error bars across multiple random seeds. These data confirm that surrogate errors remain smaller than the observed performance gains versus both standard group structures and hierarchical agglomeration. revision: yes
Referee: [Methods] Methods (surrogate model description): the neural-network surrogate is asserted to incorporate energy, material, and spatial information and to replace full transport solves during training. The manuscript must quantify the surrogate's predictive accuracy on the specific criticality metric used in the reward (k-eff or equivalent) and show that this accuracy margin exceeds the optimization deltas; otherwise the flexibility advantage over agglomeration cannot be substantiated.

Authors: We have expanded the surrogate-model description in the Methods section to include quantitative accuracy metrics (MAE, max error, and R²) specifically for the k-eff reward signal, evaluated on an independent test set of energy grids. The revised text also directly compares these accuracy margins to the optimization deltas versus baselines and agglomeration, demonstrating that the surrogate error is sufficiently small to support the flexibility claims. These additions are cross-referenced to the new validation results in the Results section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is standard RL + surrogate optimization

full rationale

The paper applies proximal policy optimization (PPO) to remove energy bounds from an initial fine grid, using a neural surrogate (incorporating energy/material/spatial features) to supply the reward signal for k-criticality accuracy while penalizing group count. No equations, definitions, or performance claims reduce by construction to fitted parameters or self-referential inputs; the surrogate is trained separately and the final grids are evaluated on Godiva/BeRP problems against external baselines (common structures and hierarchical agglomeration). No self-citations appear as load-bearing premises, no uniqueness theorems are imported, and no ansatz or renaming is smuggled in. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the surrogate neural networks being accurate enough to replace full transport solves during RL training and on the RL agent being able to escape local minima by learning which bounds matter.

free parameters (1)

reward weighting between accuracy and group count
The reward function balances k-effective accuracy against number of groups; specific weights or scaling are not stated.

axioms (1)

domain assumption Surrogate neural network predictions are sufficiently accurate to guide RL optimization toward globally better structures
Used to evaluate grids without full transport simulations.

pith-pipeline@v0.9.1-grok · 5729 in / 1187 out tokens · 131170 ms · 2026-06-29T09:44:46.348891+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 4 canonical work pages · 2 internal anchors

[1]

E. E. Lewis, W. F. Miller, Computational Methods of Neutron Trans- port, American Nuclear Society Scientific Publications, 1993

1993
[2]

Macfarlane, D

R. Macfarlane, D. W. Muir, R. M. Boicourt, A. C. Kahler III, J. L. Conlin, The NJOY nuclear data processing system, version 2016, Tech. Rep. LA-UR-17-20093, Los Alamos National Laboratory (2017)

2016
[3]

Fasina, T

O. Fasina, T. Saller, Particle swarm optimisation for group structure op- timization for radiotherapy shielding, in: Proceedings of the PHYSOR- 2022, American Nuclear Society, Pittsburgh, PA, USA, 2022

2022
[4]

C. Yi, G. Sjoden, Energy group structure determination using particle swarm optimization, Annals of Nuclear Energy 56 (2013) 53–56

2013
[5]

J. J. Berry, T. G. Saller, Hierarchical division and clustering of group structures, Tech. Rep. LA-UR-22-29528, Los Alamos National Labora- tory (2022)

2022
[6]

N. K. Rouse, B. J. Whewell, N. A. Gibson, Metaheuristic feature selec- tion for energy group optimization and analysis, Tech. Rep. LA-UR–25- 29285, Los Alamos National Laboratory (2025)

2025
[7]

J. J. Berry, G. G. Gil-Delgado, A. G. Osborne, Classification of group structures for a multigroup collision probability model using machine learning, Annals of Nuclear Energy 160 (2021) 108367. 25

2021
[8]

T. G. Saller, V. Nair, A. Till, N. Gibson, Using a random forest model to choose optimized group structures, Nuclear science and engineering 197 (8) (2023) 2117–2135

2023
[9]

Chandrasekhar, Radiative Transfer, Dover, New York, 1950

S. Chandrasekhar, Radiative Transfer, Dover, New York, 1950

1950
[10]

rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)

R.E.MacFarlane, D.W.Muir, Thenjoynucleardataprocessingsystem: Volume 3, the groupr, gaminr, and moder modules, Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)

1987
[11]

R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, The MIT Press, Cambridge, Massachusetts, USA, 2020

2020
[12]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Raffin, A

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dor- mann, Stable-baselines3: Reliable reinforcement learning implementa- tions, Journal of Machine Learning Research 22 (268) (2021) 1–8. URLhttp://jmlr.org/papers/v22/20-1364.html

2021
[14]

Huang, S

S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, The International FLAIRS Conference Proceedings 35 (May 2022). doi:10.32473/flairs.v35i.130584

work page doi:10.32473/flairs.v35i.130584 2022
[15]

T. Gao, K. A. Neusypin, D. D. Dmitriev, B. Yang, S. Rao, Enhancing sample efficiency and exploration in reinforcement learning through the integration of diffusion models and proximal policy optimization, arXiv preprint arXiv:2409.01427 (2024)

work page arXiv 2024
[16]

Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neu- ral networks: Analysis, applications, and prospects, IEEE Transactions on Neural Networks and Learning Systems 33 (12) (2022) 6999–7019

2022
[17]

Hochreiter, J

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural com- putation 9 (8) (1997) 1735–1780

1997
[18]

J. D. Bess, T. Ivanova, L. Scott, I. Hill, The 2019 edition of the icsbep handbook, Tech. rep., Idaho National Laboratory (INL), Idaho Falls, ID (United States) (2019). 26

2019
[19]

T. F. Wimett, R. H. White, W. R. Stratton, D. P. Wood, Godiva ii—an unmoderated pulse-irradiation reactor*, Nuclear Science and Engineer- ing 8 (6) (1960) 691–708

1960
[20]

Residual Reinforcement Learning for Robot Control

T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, S. Levine, Residual reinforcement learning for robot control, arXiv preprint arXiv:1812.03201 (2018). 27

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

E. E. Lewis, W. F. Miller, Computational Methods of Neutron Trans- port, American Nuclear Society Scientific Publications, 1993

1993

[2] [2]

Macfarlane, D

R. Macfarlane, D. W. Muir, R. M. Boicourt, A. C. Kahler III, J. L. Conlin, The NJOY nuclear data processing system, version 2016, Tech. Rep. LA-UR-17-20093, Los Alamos National Laboratory (2017)

2016

[3] [3]

Fasina, T

O. Fasina, T. Saller, Particle swarm optimisation for group structure op- timization for radiotherapy shielding, in: Proceedings of the PHYSOR- 2022, American Nuclear Society, Pittsburgh, PA, USA, 2022

2022

[4] [4]

C. Yi, G. Sjoden, Energy group structure determination using particle swarm optimization, Annals of Nuclear Energy 56 (2013) 53–56

2013

[5] [5]

J. J. Berry, T. G. Saller, Hierarchical division and clustering of group structures, Tech. Rep. LA-UR-22-29528, Los Alamos National Labora- tory (2022)

2022

[6] [6]

N. K. Rouse, B. J. Whewell, N. A. Gibson, Metaheuristic feature selec- tion for energy group optimization and analysis, Tech. Rep. LA-UR–25- 29285, Los Alamos National Laboratory (2025)

2025

[7] [7]

J. J. Berry, G. G. Gil-Delgado, A. G. Osborne, Classification of group structures for a multigroup collision probability model using machine learning, Annals of Nuclear Energy 160 (2021) 108367. 25

2021

[8] [8]

T. G. Saller, V. Nair, A. Till, N. Gibson, Using a random forest model to choose optimized group structures, Nuclear science and engineering 197 (8) (2023) 2117–2135

2023

[9] [9]

Chandrasekhar, Radiative Transfer, Dover, New York, 1950

S. Chandrasekhar, Radiative Transfer, Dover, New York, 1950

1950

[10] [10]

rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)

R.E.MacFarlane, D.W.Muir, Thenjoynucleardataprocessingsystem: Volume 3, the groupr, gaminr, and moder modules, Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (1987)

1987

[11] [11]

R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, The MIT Press, Cambridge, Massachusetts, USA, 2020

2020

[12] [12]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Raffin, A

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dor- mann, Stable-baselines3: Reliable reinforcement learning implementa- tions, Journal of Machine Learning Research 22 (268) (2021) 1–8. URLhttp://jmlr.org/papers/v22/20-1364.html

2021

[14] [14]

Huang, S

S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, The International FLAIRS Conference Proceedings 35 (May 2022). doi:10.32473/flairs.v35i.130584

work page doi:10.32473/flairs.v35i.130584 2022

[15] [15]

T. Gao, K. A. Neusypin, D. D. Dmitriev, B. Yang, S. Rao, Enhancing sample efficiency and exploration in reinforcement learning through the integration of diffusion models and proximal policy optimization, arXiv preprint arXiv:2409.01427 (2024)

work page arXiv 2024

[16] [16]

Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neu- ral networks: Analysis, applications, and prospects, IEEE Transactions on Neural Networks and Learning Systems 33 (12) (2022) 6999–7019

2022

[17] [17]

Hochreiter, J

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural com- putation 9 (8) (1997) 1735–1780

1997

[18] [18]

J. D. Bess, T. Ivanova, L. Scott, I. Hill, The 2019 edition of the icsbep handbook, Tech. rep., Idaho National Laboratory (INL), Idaho Falls, ID (United States) (2019). 26

2019

[19] [19]

T. F. Wimett, R. H. White, W. R. Stratton, D. P. Wood, Godiva ii—an unmoderated pulse-irradiation reactor*, Nuclear Science and Engineer- ing 8 (6) (1960) 691–708

1960

[20] [20]

Residual Reinforcement Learning for Robot Control

T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, S. Levine, Residual reinforcement learning for robot control, arXiv preprint arXiv:1812.03201 (2018). 27

work page internal anchor Pith review Pith/arXiv arXiv 2018