Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems

Dileep Kalathil; Dongqi Wu; Le Xie; Xiangtian Zheng

arxiv: 1906.10815 · v1 · pith:GWBQW363new · submitted 2019-06-26 · 📡 eess.SY · cs.SY

Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems

Dongqi Wu , Xiangtian Zheng , Dileep Kalathil , Le Xie This is my paper

Pith reviewed 2026-05-25 16:05 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords protective relaysreinforcement learningpower distribution systemsdistributed energy resourcesfault detectionrelay settingnested control

0 comments

The pith

Nested reinforcement learning tunes protective relays to distinguish faults from heavy loads in distribution systems with distributed energy resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a nested reinforcement learning method for setting the control logic of protective relays in future power distribution systems. The increasing presence of distributed energy resources creates difficulty in distinguishing between heavy load conditions and actual faults using traditional fixed settings. By exploiting the structural properties of the networks, the nested approach develops specialized training methods for the relays. A sympathetic reader would care because successful implementation could allow reliable protection without frequent manual adjustments or unnecessary outages in modern grids.

Core claim

The paper claims that a new nested reinforcement learning architecture can be used to tune the discrete ON/OFF control devices at branches and nodes in a power network, enabling them to successfully differentiate heavy load and faulty operating conditions in the presence of distributed energy resources.

What carries the argument

The nested reinforcement learning approach that takes advantage of the hierarchical structure of distribution networks to train relay control policies.

Load-bearing premise

A simulation environment can be built that accurately captures the distinction between heavy load and fault conditions across varying DER outputs so that policies learned in simulation transfer to real hardware without unsafe behavior.

What would settle it

Running the trained relay policies on a physical distribution system testbed with varying DER outputs and observing whether relays trip incorrectly on heavy loads or fail to trip on actual faults.

Figures

Figures reproduced from arXiv: 1906.10815 by Dileep Kalathil, Dongqi Wu, Le Xie, Xiangtian Zheng.

**Figure 2.** Figure 2: Protective relays in a radial network This paper is organized as follows. Section II formulates the relay operation problem. Section III gives a brief review on RL. Section IV provides our new algorithm. Section V presents simulation studies that show the efficiency of the proposed method. Concluding remarks are presented in Section VI. II. PROBLEM FORMULATION In order to precisely characterize the operati… view at source ↗

**Figure 3.** Figure 3: Convergence Plots of Agents and Comparison of Robustness [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

This paper envisions a new control architecture for the protective relay setting in future power distribution systems. With deepening penetration of distributed energy resources at the end users level, it has been recognized as a key engineering challenge to redesign the protective relays in the future distribution system. Conceptually, these protective relays are the discrete ON/OFF control devices at the end of each branch and node in a power network. The key technical difficulty lies in how to set up the relay control logic so that the protection could successfully differentiate heavy load and faulty operating conditions. This paper proposes a new nested reinforcement learning approach to take advantage of the structural properties of distribution networks and develop a new set of training methods for tuning the protective relays.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual proposal for nested RL on relay tuning with no implementation details or results, so the central claim stays untested.

read the letter

The paper's main point is a high-level idea for using nested reinforcement learning to tune protective relays in distribution systems with rising DER penetration. It frames the relays as discrete controls at branches and nodes, and argues that nesting the RL problems along the network structure could make training tractable for distinguishing heavy load from faults. That structural nesting is the one element that feels new relative to standard RL applications in power systems protection. The authors correctly flag the engineering issue: fixed relay settings break down when DER output varies, so adaptive policies learned in simulation could help. They also note that the tree topology of distribution networks offers a natural hierarchy for nesting the agents. Those observations are fair and point to a real problem. Beyond that, the manuscript supplies no equations for the state or reward, no description of how the inner and outer loops are trained, no simulation setup, and no numerical results. The soundness therefore rests entirely on the unverified assumption that a simulator can be built that separates fault signatures from load changes across DER variability without unsafe transfer to hardware. If that assumption fails, the learned policies will either nuisance-trip or miss real faults. The stress-test concern lands directly on the paper's central requirement and is not addressed in the text. This work is aimed at the small group of researchers already combining RL with power-system protection who might want to develop the idea further. It does not yet contain enough technical content or evidence to justify sending it to referees; a serious review would need at least a reproducible simulation study with sensitivity checks on model error and DER scenarios. I would not cite it or bring it to a reading group until those pieces appear.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a nested reinforcement learning architecture for setting protective relays in future power distribution systems with high DER penetration. It claims that exploiting network structural properties will enable new training methods allowing relays to reliably distinguish heavy-load from fault conditions.

Significance. The underlying engineering problem is timely. A working nested-RL solution could supply an adaptive, data-driven alternative to conventional relay coordination. At present the manuscript offers only a high-level concept; therefore any significance remains prospective rather than demonstrated.

major comments (3)

[Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.
[§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.
[Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.

minor comments (1)

Notation for relay logic and network topology is introduced only informally; a compact diagram or table would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful review. The manuscript is positioned as a conceptual proposal for a nested RL architecture exploiting distribution network structure. We address each major comment below and will revise the manuscript accordingly to include formal definitions and preliminary validation.

read point-by-point responses

Referee: [Abstract/§1] Abstract and §1: no equations, state/action definitions, reward structure, or nesting mechanism are supplied for the proposed RL controller, rendering the central claim of a “new set of training methods” impossible to evaluate.

Authors: We agree that the current text presents the architecture at a conceptual level without the requested formal elements. In revision we will expand §1 (or add a new subsection) with explicit state/action definitions, reward structure, nesting mechanism, and supporting equations so that the proposed training methods can be evaluated. revision: yes
Referee: [§2] §2 (problem statement): the manuscript asserts that simulation can separate heavy-load from fault signatures across DER variability, yet provides neither a model description nor sensitivity analysis to parameter error or measurement noise; this assumption is load-bearing for any claim of safe hardware transfer.

Authors: The observation is correct; §2 currently lacks a concrete model description and sensitivity analysis. We will revise §2 to include the simulation model details together with sensitivity results for key parameters and measurement noise, thereby supporting the separability claim. revision: yes
Referee: [Results/Validation] No results section or validation subsection exists; consequently there are no training curves, misclassification rates, or hardware-in-the-loop tests against which the transfer assumption can be checked.

Authors: We acknowledge the absence of any results section. The revised manuscript will add a dedicated results section presenting preliminary simulation outcomes, training curves, and misclassification rates. Hardware-in-the-loop experiments lie outside the scope of the present conceptual work and will be identified as future research. revision: yes

Circularity Check

0 steps flagged

No circularity; nested RL proposal is independent of its own outputs

full rationale

The provided abstract and description present a proposal for a new nested reinforcement learning architecture that exploits distribution network structure to tune protective relays. No equations, parameter fits, self-citations, or derivations are shown that would reduce any claimed prediction or result to the inputs by construction. The central claim is the introduction of this training method itself, which does not rely on self-referential definitions or fitted quantities renamed as predictions. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, fitted parameters, or new entities are described in the abstract; the proposal rests on the unstated assumption that RL training in simulation will generalize.

pith-pipeline@v0.9.0 · 5652 in / 1005 out tokens · 29489 ms · 2026-05-25T16:05:21.447189+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

[1]

J. M. Gers and E. J. Holmes, Protection of Electricity Distribution Networks, The Institution of Engineering and Technology, 3rd edition, 2011

work page 2011
[2]

El-Khattam and T

W. El-Khattam and T. S. Sidhu, Restoration of directional overcurrent relay coordination in distributed generation systems utilizing fault current limiter, IEEE Transactions on Power Delivery , 2008, vol. 23, no. 2, pp. 576585

work page 2008
[3]

H. Zhan, C. Wang, Y . Wang, X. Yang, X. Zhang, C. Wu, and Y . Chen, Relay protection coordination integrated optimal placement and sizing of distributed generation sources in distribution networks, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 1, pp. 5565

work page 2016
[4]

Fault classiﬁcation and section identiﬁcation of an advanced series-compensated transmission line using support vector machine

P. Dash, S. Samantaray, and G. Panda, “Fault classiﬁcation and section identiﬁcation of an advanced series-compensated transmission line using support vector machine”, IEEE transactions on power delivery , 2007, vol. 22, no. 1, pp. 6773

work page 2007
[5]

Yang, W.-Y

H.-T. Yang, W.-Y . Chang, and C.-L. Huang, ”A new neural networks approach to on-line fault section estimation using information of protective relays and circuit breakers”, IEEE Transactions on Power delivery, 1994, vol. 9, no. 1, pp. 220230

work page 1994
[6]

Mahat, Z

P. Mahat, Z. Chen, B. Bak-Jensen and C.L. Bak, ”A Simple Adap- tive Overcurrent Protection of Distribution Systems With Distributed Generation”, IEEE Transactions on Smart Grid , 2011, vol.2, no.3, pp 428-437

work page 2011
[7]

H. A. Abyane, K. Faez and H. K. Karegar, ”A new method for over- current relay (O/C) using neural network and fuzzy logic”, TENCON ’97 Brisbane - Australia. Proceedings of IEEE TENCON ’97. IEEE Region 10 Annual Conference: Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162) , Brisbane, Queensland, Australia, 1997, pp. 40...

work page 1997
[8]

D. N. Vishwakarma and Z. Moravej, ”ANN based directional overcur- rent relay”, 2001 IEEE/PES Transmission and Distribution Conference and Exposition. Developing New Perspectives (Cat. No.01CH37294) , 2001, Atlanta, GA, USA, pp. 59-64 vol.1

work page 2001
[9]

Zhang, M

Y . Zhang, M. D. Ilic and O. Tonguz, ”Application of Support Vector Machine Classiﬁcation to Enhanced Protection Relay Logic in Electric Power Grids”, 2007 Large Engineering Systems Conference on Power Engineering, 2007, Montreal, Que., pp. 31-38

work page 2007
[10]

Zheng, X

X. Zheng, X. Geng, L. Xie, D. Duan, L. Yang and S. Cui, ”A SVM- based setting of protection relays in distribution systems”, 2018 IEEE Texas Power and Energy Conference (TPEC) , 2018, College Station, TX, pp. 1-6

work page 2018
[11]

Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol

D. Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol. 529, no. 7587, pp. 484

work page 2016
[12]

T. P. Lillicrap et al., ”Continuous control with deep reinforcement learning”, arXiv preprint, 2015, arXiv:1509.02971

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Levine, C

S. Levine, C. Finn, T. Darrell and P. Abbeel, ”End-to-end training of deep visuomotor policies”, The Journal of Machine Learning Research, 2016, vol. 17, no. 1, pp. 1334-1363

work page 2016
[14]

Glavic, R

M. Glavic, R. Fonteneau and D. Ernst, ”Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives”, IFAC-PapersOnLine, 2017, vol. 50, no. 1, pp. 6918- 6927

work page 2017
[15]

B. Kim, Y . Zhang, M. van der Schaar and J. Lee, ”Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning”, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 5, pp. 2187-2198

work page 2016
[16]

Lincoln, S

R. Lincoln, S. Galloway, B. Stephen and G. Burt, ”Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade”, IEEE Transactions on Power Systems, 2012, vol. 27, no. 1, pp. 373-380

work page 2012
[17]

Y . Xu, W. Zhang, W. Liu and F. Ferrese, ”Multiagent-Based Re- inforcement Learning for Optimal Reactive Power Dispatch”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, vol. 42, no. 6, pp. 1742-1751

work page 2012
[18]

T. Yu, B. Zhou, K. W. Chan, L. Chen and B. Yang, ”Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(λ)Learning”, IEEE Transactions on Power Systems, 2011, vol. 26, no. 3, pp. 1272-1282

work page 2011
[19]

Ruelens, B

F. Ruelens, B. J. Claessens, S. Vandael, B. De Schutter, R. Babuka and R. Belmans, ”Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning”, IEEE Trans- actions on Smart Grid , 2017, vol. 8, no. 5, pp. 2149-2159

work page 2017
[20]

Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol

M. Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol. 13, no. 5, pp. 743-751

work page 2005
[21]

Ademoye and A

T. Ademoye and A. Feliachi, ”Reinforcement learning tuned decen- tralized synergetic control of power systems”, Electric Power Systems Research, 2012, vol. 86, pp. 34-40

work page 2012
[22]

H. C. Kilikiran, B. Kekezoglu and G. N. Paterakis, ”Reinforcement Learning for Optimal Protection Coordination”, 2018 International Conference on Smart Energy Systems and Technologies (SEST) , Sevilla, 2018, pp. 1-6

work page 2018
[23]

IEEE Distribution System Analysis Subcommittee, ”Radial Test Feeders”, [Online], 2019, Available: http://sites.ieee.org/pes- testfeeders/resources/

work page 2019
[24]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc- tion, MIT Press, 2nd Edition, 2018

work page 2018
[25]

Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

V . Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

work page 2015
[26]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, OpenAI Gym, 2016, arXiv:1606.01540

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

D. A. S. Jos and S. Elmer, ”Typical expected values of the fault resistance in power systems”, 2010 IEEE/PES Transmission and Distribution Conference and Exposition: Latin America , T and D-LA

work page 2010
[28]

10.1109/TDC-LA.2010.5762944

602 - 609. 10.1109/TDC-LA.2010.5762944

work page doi:10.1109/tdc-la.2010.5762944 2010
[29]

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

S. Kapoor, ”Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches”, Computing Research Repository, arXiv, 2018, arXiv:1807.09427

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Kraemer and B

L. Kraemer and B. Banerjee, ”Multi-Agent Reinforcement Learning as a Rehearsal for Decentralized Planning”, Neuralcomputing, 2016, 190:82-94

work page 2016
[31]

Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

M. Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

work page 2016
[32]

Duda, P.E

R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classiﬁcation , John Wiley & sons, 2nd edition, 2002

work page 2002
[33]

Woodward, ”Record U.S

M. Woodward, ”Record U.S. electricity generation in 2018 driven by record residential, commercial sales”, Independent Statistics & Analysis, [Online], 2019, U.S. Energy Information Adminstration, Available: https://www.eia.gov/todayinenergy/detail.php?id=38572

work page 2018
[34]

IEEE Standards Association, ”C37.112-2018 - IEEE Standard for Inverse-Time Characteristics Equations for Overcurrent Relays”, IEEE Standard, 2019, 10.1109/IEEESTD.2019.8635630

work page doi:10.1109/ieeestd.2019.8635630 2018

[1] [1]

J. M. Gers and E. J. Holmes, Protection of Electricity Distribution Networks, The Institution of Engineering and Technology, 3rd edition, 2011

work page 2011

[2] [2]

El-Khattam and T

W. El-Khattam and T. S. Sidhu, Restoration of directional overcurrent relay coordination in distributed generation systems utilizing fault current limiter, IEEE Transactions on Power Delivery , 2008, vol. 23, no. 2, pp. 576585

work page 2008

[3] [3]

H. Zhan, C. Wang, Y . Wang, X. Yang, X. Zhang, C. Wu, and Y . Chen, Relay protection coordination integrated optimal placement and sizing of distributed generation sources in distribution networks, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 1, pp. 5565

work page 2016

[4] [4]

Fault classiﬁcation and section identiﬁcation of an advanced series-compensated transmission line using support vector machine

P. Dash, S. Samantaray, and G. Panda, “Fault classiﬁcation and section identiﬁcation of an advanced series-compensated transmission line using support vector machine”, IEEE transactions on power delivery , 2007, vol. 22, no. 1, pp. 6773

work page 2007

[5] [5]

Yang, W.-Y

H.-T. Yang, W.-Y . Chang, and C.-L. Huang, ”A new neural networks approach to on-line fault section estimation using information of protective relays and circuit breakers”, IEEE Transactions on Power delivery, 1994, vol. 9, no. 1, pp. 220230

work page 1994

[6] [6]

Mahat, Z

P. Mahat, Z. Chen, B. Bak-Jensen and C.L. Bak, ”A Simple Adap- tive Overcurrent Protection of Distribution Systems With Distributed Generation”, IEEE Transactions on Smart Grid , 2011, vol.2, no.3, pp 428-437

work page 2011

[7] [7]

H. A. Abyane, K. Faez and H. K. Karegar, ”A new method for over- current relay (O/C) using neural network and fuzzy logic”, TENCON ’97 Brisbane - Australia. Proceedings of IEEE TENCON ’97. IEEE Region 10 Annual Conference: Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162) , Brisbane, Queensland, Australia, 1997, pp. 40...

work page 1997

[8] [8]

D. N. Vishwakarma and Z. Moravej, ”ANN based directional overcur- rent relay”, 2001 IEEE/PES Transmission and Distribution Conference and Exposition. Developing New Perspectives (Cat. No.01CH37294) , 2001, Atlanta, GA, USA, pp. 59-64 vol.1

work page 2001

[9] [9]

Zhang, M

Y . Zhang, M. D. Ilic and O. Tonguz, ”Application of Support Vector Machine Classiﬁcation to Enhanced Protection Relay Logic in Electric Power Grids”, 2007 Large Engineering Systems Conference on Power Engineering, 2007, Montreal, Que., pp. 31-38

work page 2007

[10] [10]

Zheng, X

X. Zheng, X. Geng, L. Xie, D. Duan, L. Yang and S. Cui, ”A SVM- based setting of protection relays in distribution systems”, 2018 IEEE Texas Power and Energy Conference (TPEC) , 2018, College Station, TX, pp. 1-6

work page 2018

[11] [11]

Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol

D. Silver et al., ”Mastering the game of Go with deep neural networks and tree search”, Nature, 2016, vol. 529, no. 7587, pp. 484

work page 2016

[12] [12]

T. P. Lillicrap et al., ”Continuous control with deep reinforcement learning”, arXiv preprint, 2015, arXiv:1509.02971

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Levine, C

S. Levine, C. Finn, T. Darrell and P. Abbeel, ”End-to-end training of deep visuomotor policies”, The Journal of Machine Learning Research, 2016, vol. 17, no. 1, pp. 1334-1363

work page 2016

[14] [14]

Glavic, R

M. Glavic, R. Fonteneau and D. Ernst, ”Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives”, IFAC-PapersOnLine, 2017, vol. 50, no. 1, pp. 6918- 6927

work page 2017

[15] [15]

B. Kim, Y . Zhang, M. van der Schaar and J. Lee, ”Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning”, IEEE Transactions on Smart Grid , 2016, vol. 7, no. 5, pp. 2187-2198

work page 2016

[16] [16]

Lincoln, S

R. Lincoln, S. Galloway, B. Stephen and G. Burt, ”Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade”, IEEE Transactions on Power Systems, 2012, vol. 27, no. 1, pp. 373-380

work page 2012

[17] [17]

Y . Xu, W. Zhang, W. Liu and F. Ferrese, ”Multiagent-Based Re- inforcement Learning for Optimal Reactive Power Dispatch”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, vol. 42, no. 6, pp. 1742-1751

work page 2012

[18] [18]

T. Yu, B. Zhou, K. W. Chan, L. Chen and B. Yang, ”Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(λ)Learning”, IEEE Transactions on Power Systems, 2011, vol. 26, no. 3, pp. 1272-1282

work page 2011

[19] [19]

Ruelens, B

F. Ruelens, B. J. Claessens, S. Vandael, B. De Schutter, R. Babuka and R. Belmans, ”Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning”, IEEE Trans- actions on Smart Grid , 2017, vol. 8, no. 5, pp. 2149-2159

work page 2017

[20] [20]

Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol

M. Glavic, ”Design of a resistive brake controller for power system stability enhancement using reinforcement learning”, IEEE Transac- tions on Control Systems Technology, 2005, vol. 13, no. 5, pp. 743-751

work page 2005

[21] [21]

Ademoye and A

T. Ademoye and A. Feliachi, ”Reinforcement learning tuned decen- tralized synergetic control of power systems”, Electric Power Systems Research, 2012, vol. 86, pp. 34-40

work page 2012

[22] [22]

H. C. Kilikiran, B. Kekezoglu and G. N. Paterakis, ”Reinforcement Learning for Optimal Protection Coordination”, 2018 International Conference on Smart Energy Systems and Technologies (SEST) , Sevilla, 2018, pp. 1-6

work page 2018

[23] [23]

IEEE Distribution System Analysis Subcommittee, ”Radial Test Feeders”, [Online], 2019, Available: http://sites.ieee.org/pes- testfeeders/resources/

work page 2019

[24] [24]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc- tion, MIT Press, 2nd Edition, 2018

work page 2018

[25] [25]

Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

V . Minh et al., ”Human-level control through deep reinforcement learning”, Nature, 2015, 518.7540:529

work page 2015

[26] [26]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, OpenAI Gym, 2016, arXiv:1606.01540

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

D. A. S. Jos and S. Elmer, ”Typical expected values of the fault resistance in power systems”, 2010 IEEE/PES Transmission and Distribution Conference and Exposition: Latin America , T and D-LA

work page 2010

[28] [28]

10.1109/TDC-LA.2010.5762944

602 - 609. 10.1109/TDC-LA.2010.5762944

work page doi:10.1109/tdc-la.2010.5762944 2010

[29] [29]

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

S. Kapoor, ”Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches”, Computing Research Repository, arXiv, 2018, arXiv:1807.09427

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Kraemer and B

L. Kraemer and B. Banerjee, ”Multi-Agent Reinforcement Learning as a Rehearsal for Decentralized Planning”, Neuralcomputing, 2016, 190:82-94

work page 2016

[31] [31]

Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

M. Plappert, keras-rl, GitHub Repository , [Online], 2016, Available: https://github.com/keras-rl/keras-rl

work page 2016

[32] [32]

Duda, P.E

R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classiﬁcation , John Wiley & sons, 2nd edition, 2002

work page 2002

[33] [33]

Woodward, ”Record U.S

M. Woodward, ”Record U.S. electricity generation in 2018 driven by record residential, commercial sales”, Independent Statistics & Analysis, [Online], 2019, U.S. Energy Information Adminstration, Available: https://www.eia.gov/todayinenergy/detail.php?id=38572

work page 2018

[34] [34]

IEEE Standards Association, ”C37.112-2018 - IEEE Standard for Inverse-Time Characteristics Equations for Overcurrent Relays”, IEEE Standard, 2019, 10.1109/IEEESTD.2019.8635630

work page doi:10.1109/ieeestd.2019.8635630 2018