Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems
Pith reviewed 2026-05-25 16:05 UTC · model grok-4.3
The pith
Nested reinforcement learning tunes protective relays to distinguish faults from heavy loads in distribution systems with distributed energy resources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a new nested reinforcement learning architecture can be used to tune the discrete ON/OFF control devices at branches and nodes in a power network, enabling them to successfully differentiate heavy load and faulty operating conditions in the presence of distributed energy resources.
What carries the argument
The nested reinforcement learning approach that takes advantage of the hierarchical structure of distribution networks to train relay control policies.
Load-bearing premise
A simulation environment can be built that accurately captures the distinction between heavy load and fault conditions across varying DER outputs so that policies learned in simulation transfer to real hardware without unsafe behavior.
What would settle it
Running the trained relay policies on a physical distribution system testbed with varying DER outputs and observing whether relays trip incorrectly on heavy loads or fail to trip on actual faults.
Figures
read the original abstract
This paper envisions a new control architecture for the protective relay setting in future power distribution systems. With deepening penetration of distributed energy resources at the end users level, it has been recognized as a key engineering challenge to redesign the protective relays in the future distribution system. Conceptually, these protective relays are the discrete ON/OFF control devices at the end of each branch and node in a power network. The key technical difficulty lies in how to set up the relay control logic so that the protection could successfully differentiate heavy load and faulty operating conditions. This paper proposes a new nested reinforcement learning approach to take advantage of the structural properties of distribution networks and develop a new set of training methods for tuning the protective relays.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a nested reinforcement learning architecture for setting protective relays in future power distribution systems with high DER penetration. It claims that exploiting network structural properties will enable new training methods allowing relays to reliably distinguish heavy-load from fault conditions.
Significance. The underlying engineering problem is timely. A working nested-RL solution could supply an adaptive, data-driven alternative to conventional relay coordination. At present the manuscript offers only a high-level concept; therefore any significance remains prospective rather than demonstrated.
major comments (3)
- [Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.
- [§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.
- [Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.
minor comments (1)
- Notation for relay logic and network topology is introduced only informally; a compact diagram or table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review. The manuscript is positioned as a conceptual proposal for a nested RL architecture exploiting distribution network structure. We address each major comment below and will revise the manuscript accordingly to include formal definitions and preliminary validation.
read point-by-point responses
-
Referee: [Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.
Authors: We agree that the current text presents the architecture at a conceptual level without the requested formal elements. In revision we will expand §1 (or add a new subsection) with explicit state/action definitions, reward structure, nesting mechanism, and supporting equations so that the proposed training methods can be evaluated. revision: yes
-
Referee: [§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.
Authors: The observation is correct; §2 currently lacks a concrete model description and sensitivity analysis. We will revise §2 to include the simulation model details together with sensitivity results for key parameters and measurement noise, thereby supporting the separability claim. revision: yes
-
Referee: [Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.
Authors: We acknowledge the absence of any results section. The revised manuscript will add a dedicated results section presenting preliminary simulation outcomes, training curves, and misclassification rates. Hardware-in-the-loop experiments lie outside the scope of the present conceptual work and will be identified as future research. revision: yes
Circularity Check
No circularity; nested RL proposal is independent of its own outputs
full rationale
The provided abstract and description present a proposal for a new nested reinforcement learning architecture that exploits distribution network structure to tune protective relays. No equations, parameter fits, self-citations, or derivations are shown that would reduce any claimed prediction or result to the inputs by construction. The central claim is the introduction of this training method itself, which does not rely on self-referential definitions or fitted quantities renamed as predictions. This matches the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. M. Gers and E. J. Holmes, Protection of Electricity Distribution Networks, The Institution of Engineering and Technology, 3rd edition, 2011
work page 2011
-
[2]
W. El-Khattam and T. S. Sidhu, Restoration of directional overcurrent relay coordination in distributed generation systems utilizing fault current limiter, IEEE Transactions on Power Delivery , 2008, vol. 23, no. 2, pp. 576585
work page 2008
-
[3]
H. Zhan, C. Wang, Y . Wang, X. Yang, X. Zhang, C. Wu, and Y . Chen, Relay protection coordination integrated optimal placement and sizing of distributed generation sources in distribution networks, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 1, pp. 5565
work page 2016
-
[4]
P. Dash, S. Samantaray, and G. Panda, “Fault classification and section identification of an advanced series-compensated transmission line using support vector machine”, IEEE transactions on power delivery , 2007, vol. 22, no. 1, pp. 6773
work page 2007
-
[5]
H.-T. Yang, W.-Y . Chang, and C.-L. Huang, ”A new neural networks approach to on-line fault section estimation using information of protective relays and circuit breakers”, IEEE Transactions on Power delivery, 1994, vol. 9, no. 1, pp. 220230
work page 1994
- [6]
-
[7]
H. A. Abyane, K. Faez and H. K. Karegar, ”A new method for over- current relay (O/C) using neural network and fuzzy logic”, TENCON ’97 Brisbane - Australia. Proceedings of IEEE TENCON ’97. IEEE Region 10 Annual Conference: Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162) , Brisbane, Queensland, Australia, 1997, pp. 40...
work page 1997
-
[8]
D. N. Vishwakarma and Z. Moravej, ”ANN based directional overcur- rent relay”, 2001 IEEE/PES Transmission and Distribution Conference and Exposition. Developing New Perspectives (Cat. No.01CH37294) , 2001, Atlanta, GA, USA, pp. 59-64 vol.1
work page 2001
- [9]
- [10]
-
[11]
D. Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol. 529, no. 7587, pp. 484
work page 2016
-
[12]
T. P. Lillicrap et al., ”Continuous control with deep reinforcement learning”, arXiv preprint, 2015, arXiv:1509.02971
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [13]
- [14]
-
[15]
B. Kim, Y . Zhang, M. van der Schaar and J. Lee, ”Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning”, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 5, pp. 2187-2198
work page 2016
-
[16]
R. Lincoln, S. Galloway, B. Stephen and G. Burt, ”Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade”, IEEE Transactions on Power Systems, 2012, vol. 27, no. 1, pp. 373-380
work page 2012
-
[17]
Y . Xu, W. Zhang, W. Liu and F. Ferrese, ”Multiagent-Based Re- inforcement Learning for Optimal Reactive Power Dispatch”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, vol. 42, no. 6, pp. 1742-1751
work page 2012
-
[18]
T. Yu, B. Zhou, K. W. Chan, L. Chen and B. Yang, ”Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(λ)Learning”, IEEE Transactions on Power Systems, 2011, vol. 26, no. 3, pp. 1272-1282
work page 2011
-
[19]
F. Ruelens, B. J. Claessens, S. Vandael, B. De Schutter, R. Babuka and R. Belmans, ”Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning”, IEEE Trans- actions on Smart Grid , 2017, vol. 8, no. 5, pp. 2149-2159
work page 2017
-
[20]
M. Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol. 13, no. 5, pp. 743-751
work page 2005
-
[21]
T. Ademoye and A. Feliachi, ”Reinforcement learning tuned decen- tralized synergetic control of power systems”, Electric Power Systems Research, 2012, vol. 86, pp. 34-40
work page 2012
-
[22]
H. C. Kilikiran, B. Kekezoglu and G. N. Paterakis, ”Reinforcement Learning for Optimal Protection Coordination”, 2018 International Conference on Smart Energy Systems and Technologies (SEST) , Sevilla, 2018, pp. 1-6
work page 2018
-
[23]
IEEE Distribution System Analysis Subcommittee, ”Radial Test Feeders”, [Online], 2019, Available: http://sites.ieee.org/pes- testfeeders/resources/
work page 2019
-
[24]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc- tion, MIT Press, 2nd Edition, 2018
work page 2018
-
[25]
Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529
V . Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529
work page 2015
-
[26]
G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, OpenAI Gym, 2016, arXiv:1606.01540
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[27]
D. A. S. Jos and S. Elmer, ”Typical expected values of the fault resistance in power systems”, 2010 IEEE/PES Transmission and Distribution Conference and Exposition: Latin America , T and D-LA
work page 2010
-
[28]
602 - 609. 10.1109/TDC-LA.2010.5762944
-
[29]
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches
S. Kapoor, ”Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches”, Computing Research Repository, arXiv, 2018, arXiv:1807.09427
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
L. Kraemer and B. Banerjee, ”Multi-Agent Reinforcement Learning as a Rehearsal for Decentralized Planning”, Neuralcomputing, 2016, 190:82-94
work page 2016
-
[31]
M. Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl
work page 2016
- [32]
-
[33]
M. Woodward, ”Record U.S. electricity generation in 2018 driven by record residential, commercial sales”, Independent Statistics & Analysis, [Online], 2019, U.S. Energy Information Adminstration, Available: https://www.eia.gov/todayinenergy/detail.php?id=38572
work page 2018
-
[34]
IEEE Standards Association, ”C37.112-2018 - IEEE Standard for Inverse-Time Characteristics Equations for Overcurrent Relays”, IEEE Standard, 2019, 10.1109/IEEESTD.2019.8635630
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.