Robust On-Line ADP-based Solution of a Class of Hierarchical Nonlinear Differential Game

Abolhassan Razminia; Hamed Kebriaei; Mohammad Javad Yazdanpanah; Mohammad reza Satouri

arxiv: 1907.11414 · v1 · pith:3RN5TYJMnew · submitted 2019-07-26 · 📡 eess.SY · cs.SY

Robust On-Line ADP-based Solution of a Class of Hierarchical Nonlinear Differential Game

Mohammad reza Satouri , Hamed Kebriaei , Abolhassan Razminia , Mohammad Javad Yazdanpanah This is my paper

Pith reviewed 2026-05-24 15:35 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords adaptive dynamic programminghierarchical differential gamereinforcement learningnonlinear systemsdisturbancepolicy iterationzero-sum gamenonzero-sum game

0 comments

The pith

An ADP algorithm solves hierarchical nonlinear differential games while cutting neural network usage by thirty percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an adaptive dynamic programming approach for a hierarchical game with one leader and multiple followers in continuous-time nonlinear systems that include disturbances. The setup mixes zero-sum elements to handle worst-case disturbances with nonzero-sum interactions among the players. A policy iteration reinforcement learning technique estimates the required value functions, control policies, and disturbances using about thirty percent fewer neural networks than conventional methods. Convergence of these estimates is established through Lyapunov theory together with properties of the Nemytskii operator. The result is an online procedure that produces optimal strategies for the combined game model.

Core claim

The proposed ADP method achieves optimal control strategies under the worst-case disturbance for the hierarchical one-leader-multi-followers game by integrating zero-sum and nonzero-sum game models, while reducing the number of neural networks used for estimation by about thirty percent, with convergence guaranteed via Lyapunov analysis and Nemytskii operator properties.

What carries the argument

Policy iteration reinforcement learning inside adaptive dynamic programming that jointly estimates value functions, control policies, and disturbances with reduced neural networks.

If this is right

The method yields robust optimal control for continuous-time nonlinear systems under disturbances in a hierarchical setting.
Convergence of the neural-network estimates is assured by Lyapunov theory and Nemytskii operator properties.
Both zero-sum and nonzero-sum aspects are handled inside a single algorithm.
The procedure runs online and requires no prior offline solution of the game.
Simulation examples confirm that the reduced network count still produces effective control policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the thirty-percent network reduction scales with system size, the method could lower real-time computational load in embedded controllers.
Applying the same structure to discrete-time or partially observed systems would test whether the mixed-game formulation remains tractable.
The coexistence of competitive and cooperative elements suggests the algorithm could address other multi-agent problems that contain both adversarial and shared objectives.
Hardware experiments on physical plants would reveal whether the theoretical guarantees survive sensor noise and actuator limits.

Load-bearing premise

The hierarchical game can be modeled as a simultaneous combination of zero-sum and nonzero-sum games for continuous-time nonlinear systems, with convergence following from Lyapunov theory and Nemytskii operator properties.

What would settle it

A concrete simulation of a nonlinear system in which the algorithm either fails to reach the claimed optimal strategies or requires more than the stated thirty percent reduction in neural networks.

Figures

Figures reproduced from arXiv: 1907.11414 by Abolhassan Razminia, Hamed Kebriaei, Mohammad Javad Yazdanpanah, Mohammad reza Satouri.

**Figure 1.** Figure 1: Game model. dynamical system with one leader and N followers playing over a state space whose evolution dictated by the following differential equation: x˙(t) = f(x) +X N j=1 gj (x)uj + p(x)ν + h(x)ω (1) where x(t) ∈ X ⊂ R n, uj (t) ∈ U j ⊂ R mj , ν(t) ∈ N ⊂ R α , and ω(t) ∈ W ⊂ R w, are state vector, controls or actions of followers, control or action of leader, and disturbance, respectively. Moreover, f … view at source ↗

**Figure 2.** Figure 2: Convergence of the critic NNs. The parameters of performance index functions are Q1(x) = 2x 2 1 + x 2 2 , Q2(x) = x 2 1 + 4x 2 2 , Q3(x) = x 4 1 + 2x 2 2 , S1 = 4, S2 = 2, S3 = 20 (since in real situations the effect of leader is more than the effect of followers, S3 is more than S2 and S1), R11 = 4, R12 = 1, R21 = 1, R22 = 2, R31 = 1, R32 = 1 and the disturbance attenuation γ 2 = 0.6. The initial state x1… view at source ↗

**Figure 3.** Figure 3: Convergence of the leader NN. 0 20 40 60 80 100 120 140 160 Time(s) -1 -0.5 0 0.5 1 1.5 2 System States x 1 x 2 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of the system states. VI. CONCLUSION In this paper an on-line ADP-based method is developed for solving a class of hierarchical one-leader-multi-followers nonlinear differential games. The game discussed here was made up of both zero-sum and nonzero-sum games. In the proposed algorithm the value functions approximated with NNs and the approximations improved by gradient descent. Also the actor NN… view at source ↗

read the original abstract

In this paper, a hierarchical one-leader-multi-followers game for a class of continuous-time nonlinear systems with disturbance is investigated by a novel policy iteration reinforcement learning technique in which, the game model consists both of the zero-sum and nonzero-sum games, simultaneously. An adaptive dynamic programming (ADP), method is developed to achieve optimal control strategy under the worst case of disturbance. This algorithm reduces the number of neural networks which are used for estimation for about thirty percent. The proposed algorithm uses neural networks to estimate value functions, control policies and disturbances. Convergence analysis of the estimations is investigated using Lyapunov theory and exploiting properties of the Nemytskii operator. Finally, the simulation results will show effectiveness of the developed ADP method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives an incremental ADP extension for mixed zero/nonzero-sum hierarchical games that trims neural-network count by about 30 percent, but the convergence step via the Nemytskii operator looks under-specified.

read the letter

The core contribution here is a policy-iteration ADP scheme that handles a one-leader multi-follower setup on continuous-time nonlinear systems with disturbance, treating the game as both zero-sum and nonzero-sum at once. It claims to cut the usual neural-network load for value functions, policies, and disturbances by roughly thirty percent while still guaranteeing an optimal worst-case strategy. Simulations are said to confirm the approach works in practice. That reduction in network count is the clearest practical step forward for this narrow slice of ADP work on differential games; most prior methods keep separate approximators for each player and each cost type, so any verified saving is worth noting for implementers who care about online computation.

Referee Report

2 major / 2 minor

Summary. The paper proposes an ADP-based policy iteration algorithm for a hierarchical one-leader-multi-followers differential game on continuous-time nonlinear systems subject to disturbances. The model simultaneously incorporates zero-sum and nonzero-sum game elements; neural networks approximate value functions, policies, and disturbances. The method claims an approximately 30% reduction in the number of networks while achieving optimal strategies under worst-case disturbance. Convergence of the estimates is asserted via Lyapunov analysis combined with properties of the Nemytskii operator, and effectiveness is illustrated by simulation.

Significance. If the convergence argument can be completed with the required operator conditions, the work would provide a concrete reduction in approximator count for mixed game problems, which is a practical contribution to ADP methods for hierarchical control. The explicit handling of both game types within a single ADP framework is a distinguishing feature that could influence subsequent research on multi-agent differential games.

major comments (2)

[Convergence analysis] Convergence analysis section: the proof invokes Nemytskii operator properties on the estimation errors/value-function mappings but records no explicit verification that the neural-network approximators satisfy the Carathéodory conditions (measurability in t, continuity in the state variable) or the requisite growth bounds under the combined zero-sum/nonzero-sum costs and worst-case disturbance. These conditions are load-bearing for the operator to be well-defined and for the Lyapunov argument to close.
[§3 and abstract] §3 (algorithm description) and abstract: the claim that the algorithm 'reduces the number of neural networks … by about thirty percent' is stated without a tabulated baseline count of networks required by a standard ADP treatment of the same hierarchical game or an explicit accounting of which estimators are eliminated while still covering value functions, policies, and disturbances.

minor comments (2)

[Abstract] Abstract: the phrase 'the simulation results will show effectiveness' should be changed to present tense.
Notation: the hierarchical structure (leader vs. followers, zero-sum vs. nonzero-sum subgames) would benefit from an explicit diagram or a compact equation block that distinguishes the cost functionals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify areas where the manuscript can be strengthened with additional rigor and clarity. We address each point below and will incorporate the suggested revisions in the next version.

read point-by-point responses

Referee: Convergence analysis section: the proof invokes Nemytskii operator properties on the estimation errors/value-function mappings but records no explicit verification that the neural-network approximators satisfy the Carathéodory conditions (measurability in t, continuity in the state variable) or the requisite growth bounds under the combined zero-sum/nonzero-sum costs and worst-case disturbance. These conditions are load-bearing for the operator to be well-defined and for the Lyapunov argument to close.

Authors: We agree that the convergence section would benefit from an explicit verification step. The neural-network approximators are constructed as continuous functions of the state (standard radial-basis or polynomial forms) and the time dependence enters only through the measurable disturbance and control signals, satisfying Carathéodory conditions by construction. Growth bounds follow from the quadratic cost structure and the boundedness assumptions already stated on the disturbance set. In the revised manuscript we will insert a short lemma (or appendix paragraph) that records these verifications before invoking the Nemytskii operator, thereby closing the Lyapunov argument rigorously. revision: yes
Referee: §3 (algorithm description) and abstract: the claim that the algorithm 'reduces the number of neural networks … by about thirty percent' is stated without a tabulated baseline count of networks required by a standard ADP treatment of the same hierarchical game or an explicit accounting of which estimators are eliminated while still covering value functions, policies, and disturbances.

Authors: The 30 % figure arises from replacing separate disturbance estimators for each follower with a single shared worst-case disturbance approximator that is reused across both the zero-sum leader-follower subgame and the nonzero-sum follower subgames. We acknowledge that the current text lacks an explicit side-by-side count. In the revision we will add a table in §3 that lists (i) the network count for a conventional ADP formulation of the identical hierarchical game and (ii) the reduced count achieved by our shared approximators, together with a brief accounting of which estimators are eliminated while still covering all value functions, policies, and disturbances. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses external Lyapunov and operator theory without self-referential reduction

full rationale

The paper's central claims rest on an ADP policy-iteration scheme for a mixed zero-sum/nonzero-sum hierarchical game, with convergence asserted via Lyapunov stability plus Nemytskii-operator properties on the estimation errors. No equations or steps in the provided text reduce a claimed prediction or uniqueness result to a fitted parameter or to a self-citation whose content is itself the target result. The 30 % NN reduction is presented as an algorithmic outcome rather than a definitional identity, and the convergence argument invokes standard external theorems rather than an ansatz or renaming that collapses to the paper's own inputs. The derivation chain therefore remains self-contained against the cited mathematical machinery.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on unstated modeling assumptions about the game structure and system class.

pith-pipeline@v0.9.0 · 5664 in / 1168 out tokens · 35237 ms · 2026-05-24T15:35:22.871948+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

Optimal control,

R. W. H. Sargent, “Optimal control,” J. Computational Appl. Math. , vol. 124, no. 1-2, pp. 361–371, 2000

work page 2000
[2]

Cooperative optimal control of battery energy storage system under wind uncertainties in a microgrid,

T. Zhao and Z. Ding, “Cooperative optimal control of battery energy storage system under wind uncertainties in a microgrid,” IEEE Trans. Power Syst., vol. 33, no. 2, pp. 2292–2300, 2018

work page 2018
[3]

Hex2oqtal: Translational optimal control exploiting quaternion error dynamics,

P. Ghiglino and J. L. Forshaw, “Hex2oqtal: Translational optimal control exploiting quaternion error dynamics,” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 3, pp. 1181–1195, 2017

work page 2017
[4]

Neural network-based solutions for stochastic optimal control using path in- tegrals,

K. Rajagopal, S. N. Balakrishnan, and J. R. Busemeyer, “Neural network-based solutions for stochastic optimal control using path in- tegrals,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 534– 545, 2017

work page 2017
[5]

Dynamic optimization and learning for renewal systems,

M. J. Neely, “Dynamic optimization and learning for renewal systems,” IEEE Trans. Autom. Control , vol. 58, no. 1, pp. 32–46, 2013

work page 2013
[6]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag. , vol. 9, no. 3, pp. 32–50, 2009

work page 2009
[7]

Value and policy iterations in optimal control and adaptive dynamic programming,

D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 500–509, 2017

work page 2017
[8]

Global adaptive dynamic programming for continuous-time nonlinear systems,

Y . Jiang and Z. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Trans. Autom. Control , vol. 60, no. 11, pp. 2917–2929, 2015

work page 2015
[9]

Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,

C. Mu, Z. Ni, C. Sun, and H. He, “Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,” IEEE Trans. Cybern. , vol. 47, no. 6, pp. 1460–1470, 2017

work page 2017
[10]

Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,

Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, 2016. xii

work page 2016
[11]

A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems,

W. Lu, P. Zhu, and S. Ferrari, “A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems,” IEEE Trans. Autom. Control , vol. 61, no. 10, pp. 3203–3208, 2016

work page 2016
[12]

Continuous-time q-learning for inﬁnite-horizon discounted cost linear quadratic regulator problems,

M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time q-learning for inﬁnite-horizon discounted cost linear quadratic regulator problems,” IEEE Trans. Cybern. , vol. 45, no. 2, pp. 165–176, 2015

work page 2015
[13]

Online adaptive policy learning algorithm for h∞state feedback control of unknown afﬁne nonlinear discrete-time systems,

H. Zhang, C. Qin, B. Jiang, and Y . Luo, “Online adaptive policy learning algorithm for h∞state feedback control of unknown afﬁne nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706– 2718, 2014

work page 2014
[14]

Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,

F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” IEEE Trans. Syst., Man, Cybern., B , vol. 41, no. 1, pp. 14–25, 2011

work page 2011
[15]

Finite-horizon near-optimal output feedback neural network control of quantized nonlinear discrete-time systems with input constraint,

H. Xu, Q. Zhao, and S. Jagannathan, “Finite-horizon near-optimal output feedback neural network control of quantized nonlinear discrete-time systems with input constraint,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 26, no. 8, pp. 1776–1788, 2015

work page 2015
[16]

W. B. Powell, Approximate dynamic programming: solving the curses of dimensionality. USA, NJ: Wiley, 2011

work page 2011
[17]

Power system stability control for a wind farm based on adaptive dynamic programming,

Y . Tang, H. He, J. Wen, and J. Liu, “Power system stability control for a wind farm based on adaptive dynamic programming,” IEEE Trans. Smart Grid, vol. 6, no. 1, pp. 166–177, 2015

work page 2015
[18]

Robust adaptive dynamic programming with an application to power systems,

Y . Jiang and Z. Jiang, “Robust adaptive dynamic programming with an application to power systems,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 24, no. 7, pp. 1150–1156, 2013

work page 2013
[19]

Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids,

Q. Wei, D. Liu, F. L. Lewis, Y . Liu, and J. Zhang, “Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids,” IEEE Trans. Ind. Electron., vol. 64, no. 5, pp. 4110–4120, 2017

work page 2017
[20]

Multibattery optimal coordination control for home energy management systems via distributed iterative adaptive dynamic programming,

Q. Wei, D. Liu, G. Shi, and Y . Liu, “Multibattery optimal coordination control for home energy management systems via distributed iterative adaptive dynamic programming,” IEEE Trans. Ind. Electron. , vol. 62, no. 7, pp. 4203–4214, 2015

work page 2015
[21]

Snac convergence and use in adaptive autopilot design,

S. Chen, Y . Yang, S. N. Balakrishnan, N. T. Nguyen, and K. Krishnaku- mar, “Snac convergence and use in adaptive autopilot design,” in 2009 Int. Joint Conf. Neural Netw. , pp. 530–537, 2009

work page 2009
[22]

Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks,

“Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks,” IEEE Trans. Syst., Man., Cybern. B , vol. 35, no. 2, pp. 197–207, 2005

work page 2005
[23]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamiltonjacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamiltonjacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011
[24]

An iterative adaptive dynamic pro- gramming method for solving a class of nonlinear zero-sum differential games,

H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic pro- gramming method for solving a class of nonlinear zero-sum differential games,” Automatica, vol. 47, no. 1, pp. 207–214, 2011

work page 2011
[25]

Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using adp,

Q. Wei, R. Song, and P. Yan, “Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using adp,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 27, no. 2, pp. 444–458, 2016

work page 2016
[26]

Model-free q-learning designs for linear discrete-time zero-sum games with application to h- inﬁnity control,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- inﬁnity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007

work page 2007
[27]

Adaptive dynamic programming for online solution of a zero-sum differential game,

D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,”J. Control Theory Appl., vol. 9, no. 3, pp. 353–360, 2011

work page 2011
[28]

Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,

Y . Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 28, no. 3, pp. 714–725, 2017

work page 2017
[29]

Neurodynamic program- ming and zero-sum games for constrained control systems,

M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neurodynamic program- ming and zero-sum games for constrained control systems,” IEEE Trans. Neural Netw., vol. 19, no. 7, pp. 1243–1252, 2008

work page 2008
[30]

Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems,

Y . Fu, J. Fu, and T. Chai, “Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3314–3319, 2015

work page 2015
[31]

Online partially model-free solution of two-player zero sum differential games,

“Online partially model-free solution of two-player zero sum differential games,” IFAC Proceedings Volumes, vol. 46, no. 32, pp. 696 – 701, 2013

work page 2013
[32]

Discrete-time nonzero- sum games for multiplayer using policy-iteration-based adaptive dy- namic programming algorithms,

H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero- sum games for multiplayer using policy-iteration-based adaptive dy- namic programming algorithms,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3331–3340, 2017

work page 2017
[33]

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero- sum games,

R. Song, F. L. Lewis, and Q. Wei, “Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero- sum games,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 28, no. 3, pp. 704–713, 2017

work page 2017
[34]

Approximaten-player nonzero-sum game solution for an uncertain continuous nonlinear system,

M. Johnson, R. Kamalapurkar, S. Bhasin, and W. E. Dixon, “Approximaten-player nonzero-sum game solution for an uncertain continuous nonlinear system,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 26, no. 8, pp. 1645–1658, 2015

work page 2015
[35]

von Stackelberg, Marktform und Gleichgewicht

H. von Stackelberg, Marktform und Gleichgewicht . Berlin: Springer- Verlag, 1934

work page 1934
[36]

Stackelburg solution for two-person games with biased information patterns,

C. Chen and J. Cruz, “Stackelburg solution for two-person games with biased information patterns,” IEEE Trans. Autom. Control, vol. 17, no. 6, pp. 791–798, 1972

work page 1972
[37]

Feedback stackelberg strategy for m-level hierarchical games,

B. Gardner and J. Cruz, “Feedback stackelberg strategy for m-level hierarchical games,” IEEE Trans. Autom. Control , vol. 23, no. 3, pp. 489–491, 1978

work page 1978
[38]

Discrete-time riccati equa- tions in open-loop nash and stackelberg games,

G. Freiling, G. Jank, and H. Abou-Kandil, “Discrete-time riccati equa- tions in open-loop nash and stackelberg games,” Eur. J. Control, vol. 5, no. 1, pp. 56–66, 1999

work page 1999
[39]

On the stackelberg strategy in nonzero-sum games,

M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero-sum games,” J. Optim. Theory Appl. , vol. 11, no. 5, pp. 533–555, 1973

work page 1973
[40]

A hierarchical game theoretic framework for cognitive radio networks,

Y . Xiao, G. Bi, D. Niyato, and L. A. DaSilva, “A hierarchical game theoretic framework for cognitive radio networks,” IEEE J. Sel. Areas Commun., vol. 30, no. 10, pp. 2053–2069, 2012

work page 2053
[41]

Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,

A. Mohsenian-Rad, V . W. S. Wong, J. Jatskevich, R. Schober, and A. Leon-Garcia, “Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,” IEEE Trans. Smart Grid , vol. 1, no. 3, pp. 320–331, 2010

work page 2010
[42]

Inﬁnite horizon linear-quadratic stackel- berg games for discrete-time stochastic systems,

H. Mukaidani and H. Xub, “Inﬁnite horizon linear-quadratic stackel- berg games for discrete-time stochastic systems,” Automatica, vol. 76, pp. 301–308, 2017

work page 2017
[43]

Existence and uniqueness of open-loop stackelberg equilibria in linear-quadratic differential games,

G. Freiling, G. Jank, and S. R. Lee, “Existence and uniqueness of open-loop stackelberg equilibria in linear-quadratic differential games,” J. Optim. Theory Appl. , vol. 110, no. 3, p. 515544, 2001

work page 2001
[44]

Stackelberg strategies in linear-quadratic stochastic differential games,

A. Bagchi and T. Baar, “Stackelberg strategies in linear-quadratic stochastic differential games,” J. Optim. Theory Appl. , vol. 35, no. 3, p. 443464, 1981

work page 1981
[45]

Discrete-time robust hierarchical linear- quadratic dynamic games,

H. Kebriaei and L. Iannelli, “Discrete-time robust hierarchical linear- quadratic dynamic games,” IEEE Trans. Autom. Control, vol. 63, no. 3, pp. 902–909, 2018

work page 2018
[46]

Dynamic stackelberg equilibrium congestion pricing,

B. W. Wie, “Dynamic stackelberg equilibrium congestion pricing,” Transportation Research Part C: Emerging Technologies, vol. 15, no. 3, p. 154174, 2007

work page 2007
[47]

Dynamic feedback stackelberg games with nonunique solutions,

P. Y . Nie, M. Y . Lai, and S. J. Zhu, “Dynamic feedback stackelberg games with nonunique solutions,” J. Optim. Theory Appl., vol. 69, no. 7, pp. 1904–1913, 2008

work page 1904
[48]

Online actorcritic algorithm to solve the continuous-time inﬁnite horizon optimal control problem,

K. G. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuous-time inﬁnite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, 2010

work page 2010
[49]

Basar and G

T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory , vol. 23 of Classics in Applied Mathematics . Philadelphia, PA, USA: SIAM, 1999

work page 1999
[50]

Gasinski and N

L. Gasinski and N. S. Papageorgiou, Nonlinear Analysis. Mathematical Analysis and Applications, New York, USA: Chapman and Hall/CRC, 2005

work page 2005
[51]

H. K. Khalil, Nonlinear Control. London, UK: Pearson, 2015

work page 2015

[1] [1]

Optimal control,

R. W. H. Sargent, “Optimal control,” J. Computational Appl. Math. , vol. 124, no. 1-2, pp. 361–371, 2000

work page 2000

[2] [2]

Cooperative optimal control of battery energy storage system under wind uncertainties in a microgrid,

T. Zhao and Z. Ding, “Cooperative optimal control of battery energy storage system under wind uncertainties in a microgrid,” IEEE Trans. Power Syst., vol. 33, no. 2, pp. 2292–2300, 2018

work page 2018

[3] [3]

Hex2oqtal: Translational optimal control exploiting quaternion error dynamics,

P. Ghiglino and J. L. Forshaw, “Hex2oqtal: Translational optimal control exploiting quaternion error dynamics,” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 3, pp. 1181–1195, 2017

work page 2017

[4] [4]

Neural network-based solutions for stochastic optimal control using path in- tegrals,

K. Rajagopal, S. N. Balakrishnan, and J. R. Busemeyer, “Neural network-based solutions for stochastic optimal control using path in- tegrals,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 534– 545, 2017

work page 2017

[5] [5]

Dynamic optimization and learning for renewal systems,

M. J. Neely, “Dynamic optimization and learning for renewal systems,” IEEE Trans. Autom. Control , vol. 58, no. 1, pp. 32–46, 2013

work page 2013

[6] [6]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag. , vol. 9, no. 3, pp. 32–50, 2009

work page 2009

[7] [7]

Value and policy iterations in optimal control and adaptive dynamic programming,

D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 500–509, 2017

work page 2017

[8] [8]

Global adaptive dynamic programming for continuous-time nonlinear systems,

Y . Jiang and Z. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Trans. Autom. Control , vol. 60, no. 11, pp. 2917–2929, 2015

work page 2015

[9] [9]

Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,

C. Mu, Z. Ni, C. Sun, and H. He, “Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,” IEEE Trans. Cybern. , vol. 47, no. 6, pp. 1460–1470, 2017

work page 2017

[10] [10]

Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,

Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, 2016. xii

work page 2016

[11] [11]

A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems,

W. Lu, P. Zhu, and S. Ferrari, “A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems,” IEEE Trans. Autom. Control , vol. 61, no. 10, pp. 3203–3208, 2016

work page 2016

[12] [12]

Continuous-time q-learning for inﬁnite-horizon discounted cost linear quadratic regulator problems,

M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time q-learning for inﬁnite-horizon discounted cost linear quadratic regulator problems,” IEEE Trans. Cybern. , vol. 45, no. 2, pp. 165–176, 2015

work page 2015

[13] [13]

Online adaptive policy learning algorithm for h∞state feedback control of unknown afﬁne nonlinear discrete-time systems,

H. Zhang, C. Qin, B. Jiang, and Y . Luo, “Online adaptive policy learning algorithm for h∞state feedback control of unknown afﬁne nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706– 2718, 2014

work page 2014

[14] [14]

Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,

F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” IEEE Trans. Syst., Man, Cybern., B , vol. 41, no. 1, pp. 14–25, 2011

work page 2011

[15] [15]

Finite-horizon near-optimal output feedback neural network control of quantized nonlinear discrete-time systems with input constraint,

H. Xu, Q. Zhao, and S. Jagannathan, “Finite-horizon near-optimal output feedback neural network control of quantized nonlinear discrete-time systems with input constraint,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 26, no. 8, pp. 1776–1788, 2015

work page 2015

[16] [16]

W. B. Powell, Approximate dynamic programming: solving the curses of dimensionality. USA, NJ: Wiley, 2011

work page 2011

[17] [17]

Power system stability control for a wind farm based on adaptive dynamic programming,

Y . Tang, H. He, J. Wen, and J. Liu, “Power system stability control for a wind farm based on adaptive dynamic programming,” IEEE Trans. Smart Grid, vol. 6, no. 1, pp. 166–177, 2015

work page 2015

[18] [18]

Robust adaptive dynamic programming with an application to power systems,

Y . Jiang and Z. Jiang, “Robust adaptive dynamic programming with an application to power systems,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 24, no. 7, pp. 1150–1156, 2013

work page 2013

[19] [19]

Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids,

Q. Wei, D. Liu, F. L. Lewis, Y . Liu, and J. Zhang, “Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids,” IEEE Trans. Ind. Electron., vol. 64, no. 5, pp. 4110–4120, 2017

work page 2017

[20] [20]

Multibattery optimal coordination control for home energy management systems via distributed iterative adaptive dynamic programming,

Q. Wei, D. Liu, G. Shi, and Y . Liu, “Multibattery optimal coordination control for home energy management systems via distributed iterative adaptive dynamic programming,” IEEE Trans. Ind. Electron. , vol. 62, no. 7, pp. 4203–4214, 2015

work page 2015

[21] [21]

Snac convergence and use in adaptive autopilot design,

S. Chen, Y . Yang, S. N. Balakrishnan, N. T. Nguyen, and K. Krishnaku- mar, “Snac convergence and use in adaptive autopilot design,” in 2009 Int. Joint Conf. Neural Netw. , pp. 530–537, 2009

work page 2009

[22] [22]

Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks,

“Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks,” IEEE Trans. Syst., Man., Cybern. B , vol. 35, no. 2, pp. 197–207, 2005

work page 2005

[23] [23]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamiltonjacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamiltonjacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011

[24] [24]

An iterative adaptive dynamic pro- gramming method for solving a class of nonlinear zero-sum differential games,

H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic pro- gramming method for solving a class of nonlinear zero-sum differential games,” Automatica, vol. 47, no. 1, pp. 207–214, 2011

work page 2011

[25] [25]

Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using adp,

Q. Wei, R. Song, and P. Yan, “Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using adp,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 27, no. 2, pp. 444–458, 2016

work page 2016

[26] [26]

Model-free q-learning designs for linear discrete-time zero-sum games with application to h- inﬁnity control,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- inﬁnity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007

work page 2007

[27] [27]

Adaptive dynamic programming for online solution of a zero-sum differential game,

D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,”J. Control Theory Appl., vol. 9, no. 3, pp. 353–360, 2011

work page 2011

[28] [28]

Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,

Y . Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 28, no. 3, pp. 714–725, 2017

work page 2017

[29] [29]

Neurodynamic program- ming and zero-sum games for constrained control systems,

M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neurodynamic program- ming and zero-sum games for constrained control systems,” IEEE Trans. Neural Netw., vol. 19, no. 7, pp. 1243–1252, 2008

work page 2008

[30] [30]

Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems,

Y . Fu, J. Fu, and T. Chai, “Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3314–3319, 2015

work page 2015

[31] [31]

Online partially model-free solution of two-player zero sum differential games,

“Online partially model-free solution of two-player zero sum differential games,” IFAC Proceedings Volumes, vol. 46, no. 32, pp. 696 – 701, 2013

work page 2013

[32] [32]

Discrete-time nonzero- sum games for multiplayer using policy-iteration-based adaptive dy- namic programming algorithms,

H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero- sum games for multiplayer using policy-iteration-based adaptive dy- namic programming algorithms,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3331–3340, 2017

work page 2017

[33] [33]

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero- sum games,

R. Song, F. L. Lewis, and Q. Wei, “Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero- sum games,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 28, no. 3, pp. 704–713, 2017

work page 2017

[34] [34]

Approximaten-player nonzero-sum game solution for an uncertain continuous nonlinear system,

M. Johnson, R. Kamalapurkar, S. Bhasin, and W. E. Dixon, “Approximaten-player nonzero-sum game solution for an uncertain continuous nonlinear system,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 26, no. 8, pp. 1645–1658, 2015

work page 2015

[35] [35]

von Stackelberg, Marktform und Gleichgewicht

H. von Stackelberg, Marktform und Gleichgewicht . Berlin: Springer- Verlag, 1934

work page 1934

[36] [36]

Stackelburg solution for two-person games with biased information patterns,

C. Chen and J. Cruz, “Stackelburg solution for two-person games with biased information patterns,” IEEE Trans. Autom. Control, vol. 17, no. 6, pp. 791–798, 1972

work page 1972

[37] [37]

Feedback stackelberg strategy for m-level hierarchical games,

B. Gardner and J. Cruz, “Feedback stackelberg strategy for m-level hierarchical games,” IEEE Trans. Autom. Control , vol. 23, no. 3, pp. 489–491, 1978

work page 1978

[38] [38]

Discrete-time riccati equa- tions in open-loop nash and stackelberg games,

G. Freiling, G. Jank, and H. Abou-Kandil, “Discrete-time riccati equa- tions in open-loop nash and stackelberg games,” Eur. J. Control, vol. 5, no. 1, pp. 56–66, 1999

work page 1999

[39] [39]

On the stackelberg strategy in nonzero-sum games,

M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero-sum games,” J. Optim. Theory Appl. , vol. 11, no. 5, pp. 533–555, 1973

work page 1973

[40] [40]

A hierarchical game theoretic framework for cognitive radio networks,

Y . Xiao, G. Bi, D. Niyato, and L. A. DaSilva, “A hierarchical game theoretic framework for cognitive radio networks,” IEEE J. Sel. Areas Commun., vol. 30, no. 10, pp. 2053–2069, 2012

work page 2053

[41] [41]

Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,

A. Mohsenian-Rad, V . W. S. Wong, J. Jatskevich, R. Schober, and A. Leon-Garcia, “Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,” IEEE Trans. Smart Grid , vol. 1, no. 3, pp. 320–331, 2010

work page 2010

[42] [42]

Inﬁnite horizon linear-quadratic stackel- berg games for discrete-time stochastic systems,

H. Mukaidani and H. Xub, “Inﬁnite horizon linear-quadratic stackel- berg games for discrete-time stochastic systems,” Automatica, vol. 76, pp. 301–308, 2017

work page 2017

[43] [43]

Existence and uniqueness of open-loop stackelberg equilibria in linear-quadratic differential games,

G. Freiling, G. Jank, and S. R. Lee, “Existence and uniqueness of open-loop stackelberg equilibria in linear-quadratic differential games,” J. Optim. Theory Appl. , vol. 110, no. 3, p. 515544, 2001

work page 2001

[44] [44]

Stackelberg strategies in linear-quadratic stochastic differential games,

A. Bagchi and T. Baar, “Stackelberg strategies in linear-quadratic stochastic differential games,” J. Optim. Theory Appl. , vol. 35, no. 3, p. 443464, 1981

work page 1981

[45] [45]

Discrete-time robust hierarchical linear- quadratic dynamic games,

H. Kebriaei and L. Iannelli, “Discrete-time robust hierarchical linear- quadratic dynamic games,” IEEE Trans. Autom. Control, vol. 63, no. 3, pp. 902–909, 2018

work page 2018

[46] [46]

Dynamic stackelberg equilibrium congestion pricing,

B. W. Wie, “Dynamic stackelberg equilibrium congestion pricing,” Transportation Research Part C: Emerging Technologies, vol. 15, no. 3, p. 154174, 2007

work page 2007

[47] [47]

Dynamic feedback stackelberg games with nonunique solutions,

P. Y . Nie, M. Y . Lai, and S. J. Zhu, “Dynamic feedback stackelberg games with nonunique solutions,” J. Optim. Theory Appl., vol. 69, no. 7, pp. 1904–1913, 2008

work page 1904

[48] [48]

Online actorcritic algorithm to solve the continuous-time inﬁnite horizon optimal control problem,

K. G. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuous-time inﬁnite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, 2010

work page 2010

[49] [49]

Basar and G

T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory , vol. 23 of Classics in Applied Mathematics . Philadelphia, PA, USA: SIAM, 1999

work page 1999

[50] [50]

Gasinski and N

L. Gasinski and N. S. Papageorgiou, Nonlinear Analysis. Mathematical Analysis and Applications, New York, USA: Chapman and Hall/CRC, 2005

work page 2005

[51] [51]

H. K. Khalil, Nonlinear Control. London, UK: Pearson, 2015

work page 2015