Inverse Critical Experiment Design via Gradient Optimization and a Multigroup Attention-Based Neural Network Architecture
Pith reviewed 2026-06-28 15:04 UTC · model grok-4.3
The pith
A neural network surrogate and gradient optimization can design critical experiment geometries achieving c_k scores up to 0.97757 for HALEU fuel validation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A differentiable neural network surrogate for multigroup sensitivity vectors, built with U-Net convolutional layers and a novel multigroup attention pooling mechanism, allows nonparametric gradient optimization to maximize c_k by altering material assignments in a grid geometry, yielding optimized designs with c_k scores of 0.97757, 0.81324, and 0.93276 for TN-LC HALEU validation cases.
What carries the argument
The multigroup attention pooling layer inside the U-Net encoder-decoder, which learns to weight spatial sensitivity dependencies differently per energy group and enables end-to-end differentiability for gradient optimization of material placements.
If this is right
- Critical experiment design for technologies with limited prior coverage can reach c_k above the 0.9 threshold needed for high neutronic similarity.
- Gradient optimization over material assignments replaces manual or discrete search of the geometry space.
- The attention pooling layer yields both improved surrogate accuracy and interpretable per-group spatial attention maps.
- The overall workflow reduces reliance on repeated high-fidelity transport calculations during the design loop.
Where Pith is reading between the lines
- The same surrogate-plus-gradient pattern could be tested on inverse problems that optimize detector placement or fuel loading patterns rather than experiment grids.
- If the attention mechanism proves robust, similar multigroup attention blocks might improve other physics surrogates that must respect energy-dependent spatial correlations.
- Success on the TN-LC case suggests the method could be applied to other transportation or storage cask designs where existing benchmarks are sparse.
Load-bearing premise
The trained neural network surrogate accurately predicts c_k for optimized geometries that lie outside the original training distribution of grid-based configurations.
What would settle it
Run a full OpenMC Monte Carlo calculation of the sensitivity vectors and resulting c_k on one of the reported optimized geometries and compare the value to the surrogate prediction; a discrepancy larger than typical Monte Carlo uncertainty would falsify reliable extrapolation.
Figures
read the original abstract
The validation of advanced nuclear reactor designs and fuel concepts requires critical experiments with high neutronic similarity to the target technology. Neutronic similarity is quantified by the correlation coefficient $c_k$, which captures the shared bias in $k_\text{eff}$ induced by uncertainties in nuclear data. Generally, a $c_k\geq0.9$ is needed for an experiment to be sufficiently similar to a target technology. This work presents a methodology for the inverse design of critical experiments. Deep neural network surrogate modeling and nonparametric gradient optimization are used to generate experiment geometries that maximize $c_k$. A deep neural network is trained on OpenMC-calculated sensitivity vectors for grid-based critical experiment geometries. The model architecture combines a U-Net convolutional encoder-decoder with a novel multigroup attention pooling layer, introduced to capture the differing spatial dependencies of sensitivities. Multigroup attention pooling is shown to achieve better performance than traditional pooling, as well as interpretable internal behavior. The differentiability of the surrogate enables gradient-based optimization of the full combinatorial design space, allowing $c_k$ to be maximized by directly changing the material assignment of each position in the geometry grid. The method is applied to the validation of the TN-Americas TN-LC transportation cask with HALEU fuel, for which existing critical experiment coverage is limited. The optimization procedure is shown to produce experiment geometries achieving $c_k$ scores of 0.97757, 0.81324, and 0.93276 for three configurations of interest. This approach demonstrates the potential of deep learning and gradient optimization to accelerate the development of advanced nuclear technology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an inverse critical experiment design method that trains a U-Net convolutional encoder-decoder augmented with a novel multigroup attention pooling layer on OpenMC-computed sensitivity vectors from grid-based critical experiment geometries; the resulting differentiable surrogate is then used with gradient-based optimization to assign materials in the geometry grid so as to maximize the neutronic similarity metric c_k to three configurations of the TN-Americas TN-LC transportation cask, reporting optimized c_k values of 0.97757, 0.81324, and 0.93276.
Significance. If the surrogate predictions on the optimized (combinatorially generated) geometries prove accurate when recomputed with OpenMC, the method would offer a scalable route to exploring the full discrete design space of critical experiments, which is otherwise intractable by direct Monte Carlo search; the multigroup attention mechanism is also presented as both performance-improving and interpretable.
major comments (3)
- [Abstract / Results] Abstract and results (presumed §4–5): the headline c_k scores (0.97757, 0.81324, 0.93276) are stated as achieved by the optimization procedure, yet the manuscript supplies no OpenMC re-evaluations of the final material-assignment grids; all reported performance therefore rests on untested extrapolation of the surrogate to points generated by exhaustive combinatorial search.
- [Methods] Methods (surrogate training description): while the network is trained on independent OpenMC sensitivity vectors, no quantitative held-out test metrics (e.g., mean absolute error on c_k or sensitivity-vector cosine similarity) or cross-validation against full OpenMC runs on the three optimized geometries are provided, leaving the central performance claim without direct empirical support.
- [Optimization] Optimization section: the gradient optimization operates entirely on the surrogate; no analysis is given of how surrogate error on out-of-distribution geometries (those lying outside the original grid-based training distribution) propagates into the reported c_k values, nor is a threshold for acceptable surrogate error stated.
minor comments (2)
- [Architecture] Notation: the definition and normalization of the multigroup attention pooling layer should be given explicitly (e.g., as an equation) rather than only described qualitatively.
- [Results] Figure clarity: the attention-weight visualizations would benefit from quantitative comparison (e.g., correlation with physical sensitivity maps) to substantiate the claim of interpretability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments correctly identify gaps in empirical validation of the surrogate predictions. We address each point below and will revise the manuscript to incorporate additional verification and analysis.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results (presumed §4–5): the headline c_k scores (0.97757, 0.81324, 0.93276) are stated as achieved by the optimization procedure, yet the manuscript supplies no OpenMC re-evaluations of the final material-assignment grids; all reported performance therefore rests on untested extrapolation of the surrogate to points generated by exhaustive combinatorial search.
Authors: We agree that the reported c_k values are surrogate predictions and that direct OpenMC re-evaluation of the optimized grids would strengthen the results. The surrogate was trained exclusively on OpenMC sensitivity data from grid-based geometries, but the final optimized assignments were not re-computed with OpenMC. In the revised manuscript we will add OpenMC re-simulations for the three highest-scoring optimized configurations and report the resulting c_k values alongside the surrogate predictions. We will also revise the abstract and results to explicitly state that the headline scores are surrogate outputs. revision: yes
-
Referee: [Methods] Methods (surrogate training description): while the network is trained on independent OpenMC sensitivity vectors, no quantitative held-out test metrics (e.g., mean absolute error on c_k or sensitivity-vector cosine similarity) or cross-validation against full OpenMC runs on the three optimized geometries are provided, leaving the central performance claim without direct empirical support.
Authors: The manuscript does not currently report quantitative held-out test metrics or OpenMC cross-validation on the optimized geometries. In the revised version we will add a dedicated validation subsection that includes mean absolute error on predicted c_k, cosine similarity on sensitivity vectors for a held-out test set, and direct OpenMC comparisons for the three optimized geometries. revision: yes
-
Referee: [Optimization] Optimization section: the gradient optimization operates entirely on the surrogate; no analysis is given of how surrogate error on out-of-distribution geometries (those lying outside the original grid-based training distribution) propagates into the reported c_k values, nor is a threshold for acceptable surrogate error stated.
Authors: We acknowledge the absence of error-propagation analysis for out-of-distribution geometries. The revised manuscript will include a new paragraph in the optimization section that quantifies surrogate error on additional out-of-distribution test grids, examines its effect on the optimized c_k values, and states an explicit acceptance threshold derived from the training and validation performance. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper trains a neural network surrogate on independent OpenMC-calculated sensitivity vectors for grid-based geometries, then uses gradient optimization over the differentiable surrogate to maximize c_k against a fixed target application. The reported c_k values (0.97757, 0.81324, 0.93276) are direct outputs of this optimization process on the surrogate rather than quantities fitted to the target cask data or defined in terms of themselves. No load-bearing step reduces the claimed results to the inputs by construction, and the provided text contains no self-citations, uniqueness theorems, or ansatzes that would trigger the enumerated circularity patterns. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- neural network weights
- optimizer hyperparameters
axioms (1)
- domain assumption The surrogate model remains accurate for material configurations produced by the optimizer
invented entities (1)
-
multigroup attention pooling layer
no independent evidence
Reference graph
Works this paper leans on
-
[1]
rep., IEA, Paris (2025)
International Energy Agency, Energy and AI, Tech. rep., IEA, Paris (2025)
2025
-
[2]
Department of Energy, Pathways to commer- cial liftoff: Advanced nuclear, Tech
U.S. Department of Energy, Pathways to commer- cial liftoff: Advanced nuclear, Tech. rep., U.S. De- partment of Energy (sep 2024)
2024
-
[3]
rep., IEA, Paris (2025)
International Energy Agency, The path to a new era for nuclear energy, Tech. rep., IEA, Paris (2025)
2025
-
[4]
N. W. Thompson, A. Maldonado, T. E. Cutler, H. R. Trellue, K. M. Amundson, V . Rao Dasari, J. M. Goda, T. J. Grove, D. K. Hayes, J. D. Hutchinson, H. M. Kistle, C. M. Kostelac, J. R. Lamproe, C. Matthews, G. McKenzie, G. E. Mc- Math, A. T. McSpaden, R. G. Sanchez, K. N. 14 Stolte, J. L. Walker, R. Weldon, N. H. Whitman, The National Criticality Experiment...
-
[5]
N. Thompson, R. Sanchez, J. Goda, K. Amund- son, T. Cutler, T. Grove, D. Hayes, J. Hutchin- son, C. Kostelac, G. McKenzie, A. McSpaden, W. Myers, J. Walker, A new era of nuclear criticality experiments: The first 10 years of COMET operations at NCERC, Nuclear Science and Engineering 195 (sup. 1) (2021) S17–S36. doi:10.1080/00295639.2021.1947105
-
[6]
R. Sanchez, T. Cutler, J. Goda, T. Grove, D. Hayes, J. Hutchinson, G. McKenzie, A. McSpaden, W. Myers, R. Rico, J. Walker, R. Weldon, A new era of nuclear criticality experiments: The first 10 years of PLANET operations at NCERC, Nuclear Science and Engineering 195 (sup. 1) (2021) S1–S16. doi:10.1080/00295639.2021.1951077
-
[7]
R. M. Lell, ZPR/ZPPR critical experiment facil- ities and as-built ZPR/ZPPR models, Tech. rep., Argonne National Laboratory, Argonne, IL, USA (apr 2024). doi:10.2172/2506757
-
[8]
Woolstenhulme, M
N. Woolstenhulme, M. Cole, N. Martin, T. Reiss, J. Johnson, M. Lund, R. Claghorn, K. Gagnon, M. DeHart, W. Wieselquist, T. Cutler, C. Percher, A. Barto, D. Algama, SPARC — plans for a new critical experiment facility with a horizontal split table, Tech. Rep. INL/RPT-25-84855, Idaho Na- tional Laboratory (jul 2025)
2025
-
[9]
A. Maldonado, C. Perfetti, Utilizing sen- sitivity and correlation coefficients from MCNP and WHISPER to guide microre- actor experiment design, Nuclear Science and Engineering 197 (8) (2023) 2086–2098. doi:10.1080/00295639.2022.2162782
-
[10]
Walters, N
W. Walters, N. Roskoff, Designing experi- ments for microreactor neutronic validation, in: PHYSOR 2026 – The International Conference on Physics of Reactors, Torino, Italy, 2026
2026
-
[11]
Eidelpes, J
E. Eidelpes, J. J. Jarrell, H. E. Adkins, B. M. Hom, J. M. Scaglione, R. A. Hall, B. D. Brick- ner, UO2 HALEU transportation package evalua- tion and recommendations, Tech. Rep. INL/EXT- 19-56333, Revision 0, Idaho National Laboratory, Idaho Falls, ID, USA (nov 2019)
2019
-
[12]
R. Hall, W. J. Marshall, W. A. Wieselquist, Assessment of existing transportation pack- ages for use with HALEU, Tech. Rep. ORNL/TM-2020/1725, Oak Ridge National Laboratory, Oak Ridge, TN, USA (oct 2020). doi:10.2172/1731046
-
[13]
M. Branco-Katcher, D. Siefman, R. Araj, T. Cis- ernos, C. Percher, T. S. Palmer, Design optimiza- tion of a criticality experiment for the Molten Chloride Reactor Experiment facility, Nuclear Sci- ence and Engineering 199 (11) (2025) 1794–1815. doi:10.1080/00295639.2025.2464459
-
[14]
N. Kleedtke, T. Cutler, D. Neudecker, J. Hutchin- son, P. Brain, N. Thompson, R. C. Little, Ge- netic algorithm optimization of nuclear critical- ity experiment for reduction of intermediate- energy 239Pu nuclear data uncertainties, An- nals of Nuclear Energy 228 (2026) 112037. doi:10.1016/j.anucene.2025.112037
-
[15]
A. Whyte, G. Parks, Surrogate model opti- mization of a ‘Micro Core’ PWR fuel assem- bly arrangement using deep learning models, in: Proceedings of PHYSOR 2020, V ol. 247 of EPJ Web of Conferences, 2021, p. 12003. doi:10.1051/epjconf/202124712003
-
[16]
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multi-objective genetic algo- rithm: NSGA-II, IEEE Transactions on Evo- lutionary Computation 6 (2) (2002) 182–197. doi:10.1109/4235.996017
-
[17]
P. Seurin, D. Price, L. Nunez, Techno- economic optimization of a heat-pipe mi- croreactor, part I: Theory and cost optimization, arXiv preprint (2025). arXiv:2512.16032, doi:10.48550/arXiv.2512.16032
-
[18]
P. Seurin, D. Price, Techno-economic op- timization of a heat-pipe microreactor, part II: Multi-objective optimization analysis, arXiv preprint (2026). arXiv:2601.20079, doi:10.48550/arXiv.2601.20079
-
[19]
D. Price, M. I. Radaideh, B. Kochunas, Mul- tiobjective optimization of nuclear microreac- tor reactivity control system operation with swarm and evolutionary algorithms, Nuclear Engineering and Design 393 (2022) 111776. doi:10.1016/j.nucengdes.2022.111776. 15
-
[20]
C. Sabater, P. Bekemeyer, S. Görtz, Effi- cient bilevel surrogate approach for optimiza- tion under uncertainty of shock control bumps, AIAA Journal 58 (12) (2020) 5228–5242. doi:10.2514/1.J059480
-
[21]
E. Nikolaou, S. Kilimtzidis, V . Kostopoulos, Multi-fidelity surrogate-assisted aerodynamic op- timization of aircraft wings, Aerospace 12 (4) (2025) 359. doi:10.3390/aerospace12040359
-
[22]
Nature 521(7553), 436–444 (2015) https://doi.org/10.1038/nature14539
Y . LeCun, Y . Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. doi:10.1038/nature14539
-
[23]
J. Xu, Distance-based protein folding powered by deep learning, Proceedings of the National Academy of Sciences 116 (34) (2019) 16856– 16865. doi:10.1073/pnas.1821309116
-
[24]
J. N. Kutz, Deep learning in fluid dynamics, Journal of Fluid Mechanics 814 (2017) 1–4. doi:10.1017/jfm.2016.803
-
[25]
Proceedings of the IEEE , author =
Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. doi:10.1109/5.726791
-
[26]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems, V ol. 30, 2017
2017
-
[27]
J. Nocedal, Updating quasi-Newton matrices with limited storage, Mathematics of Computation 35 (151) (1980) 773–782. doi:10.1090/S0025- 5718-1980-0572855-7
-
[28]
A. Williams, N. Walton, A. Maryanski, S. Bo- getic, W. Hines, V . Sobes, Stochastic gradient de- scent for optimization for nuclear systems, Scien- tific Reports 13 (2023) 8474. doi:10.1038/s41598- 023-32112-7
-
[29]
D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint (2014). arXiv:1412.6980, doi:10.48550/arXiv.1412.6980
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2014
-
[30]
J. Pevey, V . Sobes, W. J. Hines, Neural network ac- celeration of genetic algorithms for the optimiza- tion of a coupled fast/thermal nuclear experiment, Frontiers in Energy Research 10 (2022) 874194. doi:10.3389/fenrg.2022.874194
-
[31]
J. Leppänen, V . Valtavirta, A. Rintala, R. Tuomi- nen, Status of the Serpent Monte Carlo code in 2024, EPJ Nuclear Sciences & Technologies 11 (2025) 3. doi:10.1051/epjn/2024031
-
[32]
W. A. Wieselquist, R. A. Lefebvre, SCALE 6.3.2 user manual, Tech. Rep. ORNL/TM-2024/3386, Oak Ridge National Laboratory, Oak Ridge, TN, USA (feb 2024). doi:10.2172/2361197
-
[33]
X. Peng, J. Liang, A. Alhajri, B. Forget, K. Smith, Development of continuous-energy sensitivity analysis capability in OpenMC, An- nals of Nuclear Energy 110 (2017) 362–383. doi:10.1016/j.anucene.2017.06.061
-
[34]
U-Net: Convolutional Networks for Biomedical Image Segmentation
O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, arXiv preprint (2015). arXiv:1505.04597, doi:10.48550/arXiv.1505.04597
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1505.04597 2015
-
[35]
Loshchilov, F
I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: Proceedings of the 7th Inter- national Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2019
2019
-
[36]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Y . Bengio, N. Léonard, A. Courville, Es- timating or propagating gradients through stochastic neurons for conditional computa- tion, arXiv preprint (2013). arXiv:1308.3432, doi:10.48550/arXiv.1308.3432. Appendix A. Surrogate Model Architecture and Training Details Appendix B. TN-LC Sensitivity Profiles Appendix C. Per-Reaction Surrogate Error 16 Figure B...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1308.3432 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.