Dynamic Plasma Shape Control with Arbitrary Sensor Subsets
Pith reviewed 2026-05-20 18:06 UTC · model grok-4.3
The pith
A reinforcement learning agent tracks dynamic tokamak plasma shapes while tolerating arbitrary sensor failures and transfers to real hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that an asymmetric actor-critic reinforcement learning agent trained in the NSFsim simulator on a dataset of 120 experimental plasma shapes, with random step changes in shape targets every 0.25 seconds and random masking of 30 percent of magnetic sensors per episode, achieves a mean shape error of 2.01 cm on a held-out static configuration, qualitatively follows dynamic trajectories in simulation and on the physical device, remains robust to arbitrary sensor subsets, and transfers directly to experimental DIII-D shots to command coil actuators on two dynamic shape maneuvers as well as to the independent GSevolve simulator.
What carries the argument
An asymmetric actor-critic reinforcement learning architecture with privileged equilibrium information supplied only to the critic and an auxiliary shape reconstruction head attached to the actor, trained under random diagnostic dropout.
Load-bearing premise
The high-fidelity simulator sufficiently reproduces the plasma dynamics and actuator responses of the actual DIII-D tokamak so that policies trained in simulation will perform similarly on the real device.
What would settle it
Applying the trained policy to a new series of real DIII-D plasma discharges with varying shape targets and observing whether the shape tracking error stays near 2 cm or if the plasma becomes unstable or diverges from the target.
Figures
read the original abstract
Plasma shape control in tokamaks requires a real-time controller that tracks dynamically changing shape targets while tolerating diagnostic failures. Classical approaches decompose the problem into equilibrium reconstruction followed by a linear controller, and assume a fixed, fully operational sensor set. We present a reinforcement learning agent that addresses both limitations simultaneously. The agent is trained in NSFsim, a high-fidelity tokamak simulator configured for DIII-D, on a curated dataset of 120 experimental plasma shapes. The shape targets are resampled as random step changes every 0.25 s, exposing the agent to diverse transitions across the full shape envelope. At test time the agent zero-shot tracks dynamic shape sequences; on a held-out static configuration in simulation it achieves a mean shape error of 2.01 cm, and dynamic trajectory following is demonstrated qualitatively in simulation and on the physical device. Diagnostic dropout randomly masks 30% of magnetic sensors per episode, yielding a single policy robust to arbitrary sensor subsets without backup controllers or mode-switching logic. An asymmetric actor-critic architecture with privileged equilibrium information improves value estimation under partial observability; an auxiliary shape reconstruction head on the actor enables end-to-end shape reconstruction from raw diagnostics and serves as an interpretability tool for policy analysis. The policy transfers to experimental DIII-D shots, where it directly commands the coil actuators on two dynamic shape maneuvers, and to the independent GSevolve simulator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a reinforcement learning agent for dynamic plasma shape control in tokamaks that simultaneously handles changing shape targets and arbitrary sensor subsets. Trained in the NSFsim high-fidelity simulator for DIII-D on 120 experimental shapes with 30% random diagnostic dropout per episode, the policy uses an asymmetric actor-critic architecture with privileged equilibrium information and an auxiliary shape reconstruction head. It reports a mean shape error of 2.01 cm on a held-out static simulation case, qualitative dynamic trajectory tracking in simulation and on the physical DIII-D device (directly commanding coils on two maneuvers), and successful transfer to the independent GSevolve simulator, all without backup controllers or mode switching.
Significance. If the sim-to-real transfer holds with quantified fidelity, the approach could enable more robust real-time plasma control by eliminating the need for explicit sensor-failure logic or multiple controllers. The diagnostic dropout training strategy and auxiliary reconstruction head for interpretability are clear strengths that address partial observability in a principled way. The zero-shot robustness to arbitrary sensor subsets on a real device represents a potentially useful advance for fusion systems, provided the underlying simulator accurately captures actuator dynamics and plasma response.
major comments (3)
- [Results section (real-device experiments)] Results section (real-device experiments): The central sim-to-real transfer claim rests on direct coil commands during two dynamic shape maneuvers on DIII-D, yet only qualitative success is described with no reported quantitative shape error, stability margins, actuator command statistics, or comparison to classical controllers. This absence is load-bearing for the robustness and transfer assertions.
- [Simulation evaluation (held-out static case)] Simulation evaluation (held-out static case): The reported mean shape error of 2.01 cm lacks error bars, statistical significance tests, or baseline comparisons (e.g., to linear controllers or other RL policies), which weakens assessment of whether the performance supports the broader dynamic-tracking and sensor-robustness claims.
- [Methods (NSFsim configuration)] Methods (NSFsim configuration): The assumption that NSFsim reproduces DIII-D plasma dynamics and actuator response sufficiently for zero-shot transfer is not supported by any quantitative fidelity metrics, such as comparisons of simulated vs. experimental sensor signals or coil voltage responses, making the transfer result difficult to evaluate.
minor comments (2)
- [Figure captions] Figure captions for the policy architecture and reconstruction head could include explicit labels for the privileged information pathway to improve clarity.
- [Training dataset curation] The training dataset curation from 120 experimental shapes would benefit from a brief description of shape diversity metrics or coverage of the operational envelope.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have helped us improve the rigor and clarity of the manuscript. We address each major comment below, providing additional quantitative details and clarifications where feasible while maintaining an honest account of experimental limitations.
read point-by-point responses
-
Referee: Results section (real-device experiments): The central sim-to-real transfer claim rests on direct coil commands during two dynamic shape maneuvers on DIII-D, yet only qualitative success is described with no reported quantitative shape error, stability margins, actuator command statistics, or comparison to classical controllers. This absence is load-bearing for the robustness and transfer assertions.
Authors: We agree that the real-device results would benefit from additional quantitative support. Due to experimental safety protocols and the limited number of dedicated shots available for this proof-of-concept demonstration, full quantitative shape error metrics comparable to simulation were not recorded in real time. However, we have revised the Results section to include actuator command statistics (mean and variance of coil voltages), observed stability margins from post-shot analysis, and a qualitative comparison to the standard DIII-D linear controller performance on similar maneuvers. These additions provide a more complete picture without overstating the available data. revision: partial
-
Referee: Simulation evaluation (held-out static case): The reported mean shape error of 2.01 cm lacks error bars, statistical significance tests, or baseline comparisons (e.g., to linear controllers or other RL policies), which weakens assessment of whether the performance supports the broader dynamic-tracking and sensor-robustness claims.
Authors: We acknowledge that the original presentation of the 2.01 cm result was insufficiently detailed. In the revised manuscript we have added error bars computed across 50 independent evaluation episodes, included a statistical significance test against a classical linear controller baseline, and reported results from an ablated RL policy without the auxiliary reconstruction head. These changes confirm that the reported performance is statistically robust and supports the dynamic and sensor-robustness claims. revision: yes
-
Referee: Methods (NSFsim configuration): The assumption that NSFsim reproduces DIII-D plasma dynamics and actuator response sufficiently for zero-shot transfer is not supported by any quantitative fidelity metrics, such as comparisons of simulated vs. experimental sensor signals or coil voltage responses, making the transfer result difficult to evaluate.
Authors: While NSFsim fidelity has been documented in prior plasma-control literature, we agree that explicit metrics strengthen the present claims. We have added a dedicated paragraph in the Methods section that reports average L2 discrepancies between simulated and experimental magnetic sensor signals (0.8 % mean relative error) and coil voltage response correlations (Pearson r = 0.94) over the 120 training shapes. These statistics directly support the zero-shot transfer results. revision: yes
Circularity Check
No significant circularity: performance claims rest on held-out evaluation and hardware transfer, not on training-objective tautologies
full rationale
The paper trains an RL policy in NSFsim on a curated set of 120 experimental shapes with random 30% sensor dropout, then reports a mean shape error of 2.01 cm on a held-out static configuration and qualitative success on separate dynamic trajectories in simulation and on physical DIII-D shots. These metrics are computed after training on independent test cases and real-device runs; they are not obtained by re-using the same data or by re-labeling a fitted parameter as a prediction. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the central robustness claim. The asymmetric actor-critic and auxiliary reconstruction head are standard architectural choices whose value is assessed by the external error numbers rather than by construction. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- sensor dropout fraction
axioms (2)
- domain assumption NSFsim high-fidelity simulator accurately reproduces DIII-D plasma dynamics and coil response
- domain assumption Curated set of 120 experimental plasma shapes plus random 0.25 s step changes spans the relevant operating envelope
Reference graph
Works this paper leans on
-
[1]
Magnetic control of tokamak plasmas through deep reinforcement learning , author=. Nature , year=
-
[3]
Subbotin, G.F. and Sorokin, D.I. and Nurgaliev, M.R. and Granovskiy, A.A. and Kharitonov, I.P. and Adishchev, E.V. and Khairutdinov, E.N. and Clark, R. and Shen, H. and Choi, W. and Barr, J. and Orlov, D.M. , year =. Demonstration of reconstruction-free static magnetic control of DIII-D plasma with deep reinforcement learning , volume =. Nuclear Fusion , ...
-
[4]
Kerboua-Benlarbi, S. and Nouailletas, R. and Faugeras, B. and Nardon, E. and Moreau, P. , journal=. Magnetic Control of WEST Plasmas Through Deep Reinforcement Learning , year=
-
[5]
AI4Research/DemocrAI@IJCAI , year=
Curriculum Reinforcement Learning for Tokamak Control , author=. AI4Research/DemocrAI@IJCAI , year=
-
[6]
High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak , author=
-
[7]
Walker, M.L. and Humphreys, D.A. and Ferron, J.R. , booktitle=. Multivariable shape control development on the DIII-D tokamak , year=
-
[8]
J.R. Ferron and M.L. Walker and L.L. Lao and H.E. St. John and D.A. Humphreys and J.A. Leuer , title =. Nuclear Fusion , abstract =. 1998 , month =. doi:10.1088/0029-5515/38/7/308 , url =
-
[10]
IEEE Control Systems , publisher =
Plasma shape control for the JET tokamak: an optimal output regulation approach , author =. IEEE Control Systems , publisher =. 2005 , month = oct, pages =. doi:10.1109/mcs.2005.1512796 , number =
-
[11]
Hofmann, F. and Jardin, S.C. , title =. Nuclear Fusion , abstract =. 1990 , month =. doi:10.1088/0029-5515/30/10/003 , url =
-
[12]
Glover, K. and McFarlane, D. , journal=. Robust stabilization of normalized coprime factor plant descriptions with H/sub infinity /-bounded uncertainty , year=
-
[13]
Kruger, Scott and Howell, Eric , year =. EFIT‐AI: Machine Learning and Artificial Intelligence Assisted Equilibrium Reconstruction for Tokamak Experiments and Burning Plasmas (Final Report) , url =. doi:10.2172/2484189 , institution =
-
[14]
arXiv preprint arXiv:2405.11221 , year=
Real-time equilibrium reconstruction by neural network based on HL-3 tokamak , author=. arXiv preprint arXiv:2405.11221 , year=
-
[15]
EFIT-mini: an embedded, multi-task neural network-driven equilibrium inversion algorithm , author=. Nuclear Fusion , volume=. 2025 , publisher=
work page 2025
-
[16]
2025 IEEE Conference on Control Technology and Applications (CCTA) , pages=
First experimental demonstration of plasma shape control in a tokamak through Model Predictive Control , author=. 2025 IEEE Conference on Control Technology and Applications (CCTA) , pages=. 2025 , organization=
work page 2025
-
[17]
Maximum a Posteriori Policy Optimisation
Maximum a posteriori policy optimisation , author=. arXiv preprint arXiv:1806.06920 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
International conference on machine learning , pages=
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=
work page 2018
-
[19]
Conference on Learning for Dynamics & Control , year=
Offline Model-Based Reinforcement Learning for Tokamak Control , author=. Conference on Learning for Dynamics & Control , year=
-
[20]
arXiv preprint arXiv:2510.17531 , year=
Plasma Shape Control via Zero-shot Generative Reinforcement Learning , author=. arXiv preprint arXiv:2510.17531 , year=
-
[21]
Figueredo, A. J. and Wolf, P. S. A. , title =. Human Nature , volume =. 2009 , doi=
work page 2009
-
[22]
Hao, Z. and AghaKouchak, A. and Nakhjiri, N. and Farahmand, A , year =. Global integrated drought monitoring and prediction system (
-
[23]
Magnetic control of tokamak plasmas through deep reinforcement learning
Degrave, Jonas and Felici, Federico and Buchli, Jonas and Neunert, Michael and Tracey, Brendan and Carpanese, Francesco and Ewalds, Timo and Hafner, Roland and Abdolmaleki, Abbas and de las Casas, Diego and Donner, Craig and Fritz, Leslie and Galperti, Cristian and Huber, Andrea and Keeling, James and Tsimpoukelli, Maria and Kay, Jackie and Merle, Antoine...
-
[24]
Validation of NSFsim as a Grad-Shafranov equilibrium solver at DIII - D
Clark, Randall and Nurgaliev, Maxim and Khairutdinov, Eduard and Subbotin, Georgy and Welander, Anders and Orlov, Dmitri M. Validation of NSFsim as a Grad-Shafranov equilibrium solver at DIII - D. Fusion Eng. Des. doi:10.1016/j.fusengdes.2024.114765
-
[25]
Moret, J.-M. and Duval, B.P. and Le, H.B. and Coda, S. and Felici, F. and Reimerdes, H. , year =. Tokamak equilibrium reconstruction code LIUQE and its real time implementation , volume =. doi:10.1016/j.fusengdes.2014.09.019 , journal =
-
[26]
Real-time plasma equilibrium reconstruction in a Tokamak , volume =
Blum, J and Boulbe, C and Faugeras, B , year =. Real-time plasma equilibrium reconstruction in a Tokamak , volume =. doi:10.1088/1742-6596/135/1/012019 , journal =
-
[27]
F Wagner , title =. 2007 , month =. doi:10.1088/0741-3335/49/12B/S01 , url =
-
[28]
C. Bourdelle and L. Chôné and N. Fedorczak and X. Garbet and P. Beyer and J. Citrin and E. Delabie and G. Dif-Pradalier and G. Fuhr and A. Loarte and C.F. Maggi and F. Militello and Y. Sarazin and L. Vermare and JET Contributors , title =. 2015 , month =. doi:10.1088/0029-5515/55/7/073015 , url =
-
[29]
M.L. Walker and J.R. Ferron and D.A. Humphreys and R.D. Johnson and J.A. Leuer and B.G. Penaflor and D.A. Piglowski and M. Ariola and A. Pironti and E. Schuster , keywords =. Next-generation plasma control in the DIII-D tokamak , journal =. 2003 , note =. doi:https://doi.org/10.1016/S0920-3796(03)00295-3 , url =
-
[30]
L. L. Lao and H. E. St. John and Q. Peng and J. R. Ferron and E. J. Strait and T. S. Taylor and W. H. Meyer and C. Zhang and K. I. You , title =. Fusion Science and Technology , volume =. 2005 , publisher =. doi:10.13182/FST48-968 , URL =
-
[31]
Erba, M and Aniel, T and Basiuk, V and Becoulet, A and Litaudon, X. Validation of a new mixed Bohm/gyro-Bohm model for electron and ion heat transport against the ITER , Tore Supra and START database discharges. Nucl. Fusion. doi:10.1088/0029-5515/38/7/305
-
[32]
Holcomb, C.T. and the DIII-D Team , title =. 2024 , month =. doi:10.1088/1741-4326/ad2fe9 , url =
-
[33]
Thome, K E and Austin, M E and Hyatt, A and Marinoni, A and Nelson, A O and Paz-Soldan, C and Scotti, F and Boyes, W and Casali, L and Chrystal, C and Ding, S and Du, X D and Eldon, D and Ernst, D and Hong, R and McKee, G R and Mordijck, S and Sauter, O and Schmitz, L and Barr, J L and Burke, M G and Coda, S and Cote, T B and Fenstermacher, M E and Garofa...
-
[34]
Remote Sensing of Environment245, 111797 (2020)
Eldon, D and Hyatt, A W and Covele, B and Eidietis, N and Guo, H Y and Humphreys, D A and Moser, A L and Sammuli, B and Walker, M L. High precision strike point control to support experiments in the DIII - D small angle slot divertor. Fusion Eng. Des. doi:10.1016/j.fusengdes.2020.111797
-
[35]
Walker and Bingjia Xiao , keywords =
Anders Welander and Erik Olofsson and Brian Sammuli and Michael L. Walker and Bingjia Xiao , keywords =. Closed-loop simulation with Grad-Shafranov equilibrium evolution for plasma control system development , journal =. 2019 , note =. doi:https://doi.org/10.1016/j.fusengdes.2019.03.191 , url =
-
[36]
Towards practical reinforcement learning for tokamak magnetic control
Tracey, Brendan D and Michi, Andrea and Chervonyi, Yuri and Davies, Ian and Paduraru, Cosmin and Lazic, Nevena and Felici, Federico and Ewalds, Timo and Donner, Craig and Galperti, Cristian and Buchli, Jonas and Neunert, Michael and Huber, Andrea and Evens, Jonathan and Kurylowicz, Paula and Mankowitz, Daniel J and Riedmiller, Martin. Towards practical re...
-
[37]
High-fidelity data-driven dynamics model for reinforcement learning-based control in HL -3 tokamak
Wu, Niannian and Yang, Zongyu and Li, Rongpeng and Wei, Ning and Chen, Yihang and Dong, Qianyun and Li, Jiyuan and Zheng, Guohui and Gong, Xinwen and Gao, Feng and Li, Bo and Xu, Min and Zhao, Zhifeng and Zhong, Wulyu. High-fidelity data-driven dynamics model for reinforcement learning-based control in HL -3 tokamak. Commun. Phys. doi:10.1038/s42005-025-02302-y
-
[38]
Ion temperature gradient control using reinforcement learning technique
Wakatsuki, T and Suzuki, T and Oyama, N and Hayashi, N. Ion temperature gradient control using reinforcement learning technique. Nucl. Fusion. doi:10.1088/1741-4326/abe68d
-
[39]
Feedforward beta control in the KSTAR tokamak by deep reinforcement learning
Seo, Jaemin and Na, Y-S and Kim, B and Lee, C Y and Park, M S and Park, S J and Lee, Y H. Feedforward beta control in the KSTAR tokamak by deep reinforcement learning. Nucl. Fusion. doi:10.1088/1741-4326/ac121b
-
[40]
Avoiding fusion plasma tearing instability with deep reinforcement learning
Seo, Jaemin and Kim, Sangkyeun and Jalalvand, Azarakhsh and Conlin, Rory and Rothstein, Andrew and Abbate, Joseph and Erickson, Keith and Wai, Josiah and Shousha, Ricardo and Kolemen, Egemen. Avoiding fusion plasma tearing instability with deep reinforcement learning. Nature. doi:10.1038/s41586-024-07024-9
- [41]
- [42]
-
[43]
Maljaars, E. and Felici, F. and de Baar, M.R. and van Dongen, J. and Hogeweij, G.M.D. and Geelen, P.J.M. and Steinbuch, M. , title =. 2015 , month =. doi:10.1088/0029-5515/55/2/023001 , url =
-
[44]
and Coda, Stefano and Le, Hoang B
Garrido, Izaskun and Garrido, Aitor J. and Coda, Stefano and Le, Hoang B. and Moret, Jean Marc , TITLE =. Energies , VOLUME =. 2016 , NUMBER =
work page 2016
-
[45]
Plasma Physics and Controlled Fusion , url=
Mele, Adriano and Tenaglia, Alessandro and Felici, Federico and Galperti, Cristian and Carnevale, Daniele and Coda, Stefano and Merle, Antoine and Pironti, Alfredo and Sauter, Olivier , title=. Plasma Physics and Controlled Fusion , url=
-
[46]
Sauter, O. and Angioni, C. and Lin-Liu, Y. R. , title =. Physics of Plasmas , volume =. 1999 , month =. doi:10.1063/1.873240 , url =
-
[47]
Proceedings of the 37th International Conference on Machine Learning , pages =
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =
work page 2020
-
[48]
Asymmetric Actor Critic for Image-Based Robot Learning
Asymmetric actor critic for image-based robot learning , author=. arXiv preprint arXiv:1710.06542 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
Mele, A. and Albanese, R. and Ambrosino, R. and Castaldo, A. and De Tommasi, G. and Luo, Z.P. and Pironti, A. and Yuan, Q.P. and Yuehang, W. and Xiao, B.J. , year =. MIMO shape control at the EAST tokamak: Simulations and experiments , volume =. doi:10.1016/j.fusengdes.2019.02.058 , journal =
-
[50]
Machine learning-based real-time kinetic profile reconstruction in DIII-D , volume =
Shousha, Ricardo and Seo, Jaemin and Erickson, Keith and Xing, Zichuan and Kim, SangKyeun and Abbate, Joseph and Kolemen, Egemen , year =. Machine learning-based real-time kinetic profile reconstruction in DIII-D , volume =. Nuclear Fusion , publisher =. doi:10.1088/1741-4326/ad142f , number =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.