pith. sign in

arxiv: 1906.10815 · v1 · pith:GWBQW363new · submitted 2019-06-26 · 📡 eess.SY · cs.SY

Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems

Pith reviewed 2026-05-25 16:05 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords protective relaysreinforcement learningpower distribution systemsdistributed energy resourcesfault detectionrelay settingnested control
0
0 comments X

The pith

Nested reinforcement learning tunes protective relays to distinguish faults from heavy loads in distribution systems with distributed energy resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a nested reinforcement learning method for setting the control logic of protective relays in future power distribution systems. The increasing presence of distributed energy resources creates difficulty in distinguishing between heavy load conditions and actual faults using traditional fixed settings. By exploiting the structural properties of the networks, the nested approach develops specialized training methods for the relays. A sympathetic reader would care because successful implementation could allow reliable protection without frequent manual adjustments or unnecessary outages in modern grids.

Core claim

The paper claims that a new nested reinforcement learning architecture can be used to tune the discrete ON/OFF control devices at branches and nodes in a power network, enabling them to successfully differentiate heavy load and faulty operating conditions in the presence of distributed energy resources.

What carries the argument

The nested reinforcement learning approach that takes advantage of the hierarchical structure of distribution networks to train relay control policies.

Load-bearing premise

A simulation environment can be built that accurately captures the distinction between heavy load and fault conditions across varying DER outputs so that policies learned in simulation transfer to real hardware without unsafe behavior.

What would settle it

Running the trained relay policies on a physical distribution system testbed with varying DER outputs and observing whether relays trip incorrectly on heavy loads or fail to trip on actual faults.

Figures

Figures reproduced from arXiv: 1906.10815 by Dileep Kalathil, Dongqi Wu, Le Xie, Xiangtian Zheng.

Figure 1
Figure 1. Figure 1: IEEE34 node test feeder [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Protective relays in a radial network This paper is organized as follows. Section II formulates the relay operation problem. Section III gives a brief review on RL. Section IV provides our new algorithm. Section V presents simulation studies that show the efficiency of the proposed method. Concluding remarks are presented in Section VI. II. PROBLEM FORMULATION In order to precisely characterize the operati… view at source ↗
Figure 3
Figure 3. Figure 3: Convergence Plots of Agents and Comparison of Robustness [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

This paper envisions a new control architecture for the protective relay setting in future power distribution systems. With deepening penetration of distributed energy resources at the end users level, it has been recognized as a key engineering challenge to redesign the protective relays in the future distribution system. Conceptually, these protective relays are the discrete ON/OFF control devices at the end of each branch and node in a power network. The key technical difficulty lies in how to set up the relay control logic so that the protection could successfully differentiate heavy load and faulty operating conditions. This paper proposes a new nested reinforcement learning approach to take advantage of the structural properties of distribution networks and develop a new set of training methods for tuning the protective relays.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a nested reinforcement learning architecture for setting protective relays in future power distribution systems with high DER penetration. It claims that exploiting network structural properties will enable new training methods allowing relays to reliably distinguish heavy-load from fault conditions.

Significance. The underlying engineering problem is timely. A working nested-RL solution could supply an adaptive, data-driven alternative to conventional relay coordination. At present the manuscript offers only a high-level concept; therefore any significance remains prospective rather than demonstrated.

major comments (3)
  1. [Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.
  2. [§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.
  3. [Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.
minor comments (1)
  1. Notation for relay logic and network topology is introduced only informally; a compact diagram or table would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful review. The manuscript is positioned as a conceptual proposal for a nested RL architecture exploiting distribution network structure. We address each major comment below and will revise the manuscript accordingly to include formal definitions and preliminary validation.

read point-by-point responses
  1. Referee: [Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.

    Authors: We agree that the current text presents the architecture at a conceptual level without the requested formal elements. In revision we will expand §1 (or add a new subsection) with explicit state/action definitions, reward structure, nesting mechanism, and supporting equations so that the proposed training methods can be evaluated. revision: yes

  2. Referee: [§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.

    Authors: The observation is correct; §2 currently lacks a concrete model description and sensitivity analysis. We will revise §2 to include the simulation model details together with sensitivity results for key parameters and measurement noise, thereby supporting the separability claim. revision: yes

  3. Referee: [Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.

    Authors: We acknowledge the absence of any results section. The revised manuscript will add a dedicated results section presenting preliminary simulation outcomes, training curves, and misclassification rates. Hardware-in-the-loop experiments lie outside the scope of the present conceptual work and will be identified as future research. revision: yes

Circularity Check

0 steps flagged

No circularity; nested RL proposal is independent of its own outputs

full rationale

The provided abstract and description present a proposal for a new nested reinforcement learning architecture that exploits distribution network structure to tune protective relays. No equations, parameter fits, self-citations, or derivations are shown that would reduce any claimed prediction or result to the inputs by construction. The central claim is the introduction of this training method itself, which does not rely on self-referential definitions or fitted quantities renamed as predictions. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, fitted parameters, or new entities are described in the abstract; the proposal rests on the unstated assumption that RL training in simulation will generalize.

pith-pipeline@v0.9.0 · 5652 in / 1005 out tokens · 29489 ms · 2026-05-25T16:05:21.447189+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

  1. [1]

    J. M. Gers and E. J. Holmes, Protection of Electricity Distribution Networks, The Institution of Engineering and Technology, 3rd edition, 2011

  2. [2]

    El-Khattam and T

    W. El-Khattam and T. S. Sidhu, Restoration of directional overcurrent relay coordination in distributed generation systems utilizing fault current limiter, IEEE Transactions on Power Delivery , 2008, vol. 23, no. 2, pp. 576585

  3. [3]

    H. Zhan, C. Wang, Y . Wang, X. Yang, X. Zhang, C. Wu, and Y . Chen, Relay protection coordination integrated optimal placement and sizing of distributed generation sources in distribution networks, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 1, pp. 5565

  4. [4]

    Fault classification and section identification of an advanced series-compensated transmission line using support vector machine

    P. Dash, S. Samantaray, and G. Panda, “Fault classification and section identification of an advanced series-compensated transmission line using support vector machine”, IEEE transactions on power delivery , 2007, vol. 22, no. 1, pp. 6773

  5. [5]

    Yang, W.-Y

    H.-T. Yang, W.-Y . Chang, and C.-L. Huang, ”A new neural networks approach to on-line fault section estimation using information of protective relays and circuit breakers”, IEEE Transactions on Power delivery, 1994, vol. 9, no. 1, pp. 220230

  6. [6]

    Mahat, Z

    P. Mahat, Z. Chen, B. Bak-Jensen and C.L. Bak, ”A Simple Adap- tive Overcurrent Protection of Distribution Systems With Distributed Generation”, IEEE Transactions on Smart Grid , 2011, vol.2, no.3, pp 428-437

  7. [7]

    H. A. Abyane, K. Faez and H. K. Karegar, ”A new method for over- current relay (O/C) using neural network and fuzzy logic”, TENCON ’97 Brisbane - Australia. Proceedings of IEEE TENCON ’97. IEEE Region 10 Annual Conference: Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162) , Brisbane, Queensland, Australia, 1997, pp. 40...

  8. [8]

    D. N. Vishwakarma and Z. Moravej, ”ANN based directional overcur- rent relay”, 2001 IEEE/PES Transmission and Distribution Conference and Exposition. Developing New Perspectives (Cat. No.01CH37294) , 2001, Atlanta, GA, USA, pp. 59-64 vol.1

  9. [9]

    Zhang, M

    Y . Zhang, M. D. Ilic and O. Tonguz, ”Application of Support Vector Machine Classification to Enhanced Protection Relay Logic in Electric Power Grids”, 2007 Large Engineering Systems Conference on Power Engineering, 2007, Montreal, Que., pp. 31-38

  10. [10]

    Zheng, X

    X. Zheng, X. Geng, L. Xie, D. Duan, L. Yang and S. Cui, ”A SVM- based setting of protection relays in distribution systems”, 2018 IEEE Texas Power and Energy Conference (TPEC) , 2018, College Station, TX, pp. 1-6

  11. [11]

    Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol

    D. Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol. 529, no. 7587, pp. 484

  12. [12]

    T. P. Lillicrap et al., ”Continuous control with deep reinforcement learning”, arXiv preprint, 2015, arXiv:1509.02971

  13. [13]

    Levine, C

    S. Levine, C. Finn, T. Darrell and P. Abbeel, ”End-to-end training of deep visuomotor policies”, The Journal of Machine Learning Research, 2016, vol. 17, no. 1, pp. 1334-1363

  14. [14]

    Glavic, R

    M. Glavic, R. Fonteneau and D. Ernst, ”Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives”, IFAC-PapersOnLine, 2017, vol. 50, no. 1, pp. 6918- 6927

  15. [15]

    B. Kim, Y . Zhang, M. van der Schaar and J. Lee, ”Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning”, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 5, pp. 2187-2198

  16. [16]

    Lincoln, S

    R. Lincoln, S. Galloway, B. Stephen and G. Burt, ”Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade”, IEEE Transactions on Power Systems, 2012, vol. 27, no. 1, pp. 373-380

  17. [17]

    Y . Xu, W. Zhang, W. Liu and F. Ferrese, ”Multiagent-Based Re- inforcement Learning for Optimal Reactive Power Dispatch”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, vol. 42, no. 6, pp. 1742-1751

  18. [18]

    T. Yu, B. Zhou, K. W. Chan, L. Chen and B. Yang, ”Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(λ)Learning”, IEEE Transactions on Power Systems, 2011, vol. 26, no. 3, pp. 1272-1282

  19. [19]

    Ruelens, B

    F. Ruelens, B. J. Claessens, S. Vandael, B. De Schutter, R. Babuka and R. Belmans, ”Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning”, IEEE Trans- actions on Smart Grid , 2017, vol. 8, no. 5, pp. 2149-2159

  20. [20]

    Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol

    M. Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol. 13, no. 5, pp. 743-751

  21. [21]

    Ademoye and A

    T. Ademoye and A. Feliachi, ”Reinforcement learning tuned decen- tralized synergetic control of power systems”, Electric Power Systems Research, 2012, vol. 86, pp. 34-40

  22. [22]

    H. C. Kilikiran, B. Kekezoglu and G. N. Paterakis, ”Reinforcement Learning for Optimal Protection Coordination”, 2018 International Conference on Smart Energy Systems and Technologies (SEST) , Sevilla, 2018, pp. 1-6

  23. [23]

    IEEE Distribution System Analysis Subcommittee, ”Radial Test Feeders”, [Online], 2019, Available: http://sites.ieee.org/pes- testfeeders/resources/

  24. [24]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc- tion, MIT Press, 2nd Edition, 2018

  25. [25]

    Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

    V . Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

  26. [26]

    OpenAI Gym

    G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, OpenAI Gym, 2016, arXiv:1606.01540

  27. [27]

    D. A. S. Jos and S. Elmer, ”Typical expected values of the fault resistance in power systems”, 2010 IEEE/PES Transmission and Distribution Conference and Exposition: Latin America , T and D-LA

  28. [28]

    10.1109/TDC-LA.2010.5762944

    602 - 609. 10.1109/TDC-LA.2010.5762944

  29. [29]

    Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

    S. Kapoor, ”Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches”, Computing Research Repository, arXiv, 2018, arXiv:1807.09427

  30. [30]

    Kraemer and B

    L. Kraemer and B. Banerjee, ”Multi-Agent Reinforcement Learning as a Rehearsal for Decentralized Planning”, Neuralcomputing, 2016, 190:82-94

  31. [31]

    Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

    M. Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

  32. [32]

    Duda, P.E

    R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification , John Wiley & sons, 2nd edition, 2002

  33. [33]

    Woodward, ”Record U.S

    M. Woodward, ”Record U.S. electricity generation in 2018 driven by record residential, commercial sales”, Independent Statistics & Analysis, [Online], 2019, U.S. Energy Information Adminstration, Available: https://www.eia.gov/todayinenergy/detail.php?id=38572

  34. [34]

    IEEE Standards Association, ”C37.112-2018 - IEEE Standard for Inverse-Time Characteristics Equations for Overcurrent Relays”, IEEE Standard, 2019, 10.1109/IEEESTD.2019.8635630