Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient

Giuseppe Notarstefano; Guido Carnevale; Ivano Notarnicola; Lorenzo Sforni

arxiv: 2403.05367 · v3 · submitted 2024-03-08 · 📡 eess.SY · cs.SY

Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient

Lorenzo Sforni , Guido Carnevale , Ivano Notarnicola , Giuseppe Notarstefano This is my paper

Pith reviewed 2026-05-24 02:41 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords data-driven LQRon-policy learningrecursive least squarespolicy gradientLyapunov stabilitytimescale separationadaptive control

0 comments

The pith

Relearn LQR combines recursive estimation and policy gradients to solve data-driven LQR while proving stability of the full closed-loop scheme.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an on-policy method for learning the optimal LQR controller when the linear dynamics are unknown. It interleaves recursive least squares updates for the system parameters with gradient steps on the feedback gain, all while the control is applied to the real plant. The central step is to represent the combined estimation, optimization, and plant evolution as one feedback interconnection of nonlinear systems. Lyapunov analysis then exploits averaging and timescale separation to establish that the trajectories remain bounded and converge to the optimal gain. This supplies formal certificates that prior data-driven LQR schemes lacked.

Core claim

The Relearn LQR procedure integrates a recursive least squares estimator with a direct policy-gradient search. By casting the overall learning-control loop as a feedback-interconnected nonlinear dynamical system and invoking averaging together with timescale separation, a Lyapunov function is constructed that certifies asymptotic stability of the equilibrium consisting of the true parameters and the optimal LQR gain.

What carries the argument

The feedback-interconnected nonlinear dynamical system formed by the plant, recursive least-squares estimator, and policy-gradient update, analyzed via averaging and timescale separation.

If this is right

The scheme converges to the optimal LQR gain while the plant state remains bounded throughout adaptation.
Stability holds for both constant and slowly drifting plant parameters.
The same Lyapunov-plus-averaging argument applies to any on-policy combination of recursive estimation and gradient-based policy search that meets the rate-separation condition.
The method can be run directly on physical plants, as demonstrated by the aircraft-control example.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same modeling trick could be tried on other adaptive controllers whose updates are slower than the plant dynamics.
Relaxing the linear-plant assumption while keeping the timescale separation might yield stability results for data-driven nonlinear control.
The persistence-of-excitation requirement points to a practical test: inject sufficiently rich probing signals only until the estimator converges, then switch to pure regulation.

Load-bearing premise

The combined learning and control process can be represented as a nonlinear feedback interconnection to which averaging and timescale separation apply, which in turn requires sufficient separation of the adaptation rates and persistence of excitation.

What would settle it

A concrete linear system with known persistence of excitation where the adaptation rates are separated yet the closed-loop trajectories diverge or the gain fails to converge to the optimal LQR solution.

Figures

Figures reproduced from arXiv: 2403.05367 by Giuseppe Notarstefano, Guido Carnevale, Ivano Notarnicola, Lorenzo Sforni.

**Figure 2.** Figure 2: Representation of the concurrent learning and optimization scheme [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Block diagram describing system (30). in this reformulation, the effect of the exogenous/dithering signal wt has been embedded in the time dependency of h, g, and f. Finally, by using the definitions of h, g, and f (cf. (31)) and the fact that G(K⋆ , θ⋆ ) = 0 since K⋆ is the solution to problem (8), we note that h(0, t) = 0, g(0, 0, t) = 0, f(0, 0, t) = 0, (34) for all t ∈ N. 4.2 Averaged System Analysis N… view at source ↗

**Figure 4.** Figure 4: Block diagram of (36) with ˜zav t = col(˜θ av t , K˜ av t ). The dynamics of ˜θ av t is trivially exponentially convergent to zero, while in the following we will formally show that the dynamics of K˜ av t is input-tostate (ISS) exponentially stable (cf. [54]). For the sake of compactness, let us also introduce the (averaged) estimates Aav t ∈ R n×n and Bav t ∈ R n×m of the matrices A and B, defined as … view at source ↗

**Figure 5.** Figure 5: (left) Evolution of the normalized cost error |J(Kt , θ⋆ t ) − J ⋆ |/J⋆ . (right) Evolution of the normalized estimation error about ∥θt − θ ⋆∥ / ∥θ ⋆∥ (left). 0 100 200 300 400 500 −10 −5 0 5 t State trajectory x1 x2 x3 x4 −0.2 0 0.2 [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: State trajectory of the closed-loop system. The states x1, x2, x3, x4 correspond, respectively, the forward velocity, the attack angle, the pitch rate and the pitch angle. 5.2 Aircraft Control with Drifting Parameters To better highlight the capabilities of our algorithm, we also consider the case where the system matrices A⋆, B⋆, slowly change over time. The new time-varying state and input matrices are d… view at source ↗

**Figure 7.** Figure 7: Comparison between J(Kt , θ⋆ t ) and J ⋆ t . 6 Conclusions In this paper, we addressed infinite-horizon LQR problems with unknown state-input matrices. Specifically, we propose a procedure mixing the iden24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Evolution of the normalized cost error |J(Kt , θ⋆ t ) − J ⋆ t |/J⋆ t (left). Evolution of the normalized estimation error ∥θt − θ ⋆ t ∥ / ∥θ ⋆ t ∥ (right). tification phase of the unknown matrices with the optimization of the feedback policy. We design an iterative algorithm combining a Recursive Least Squares (RLS) scheme (elaborating samples from the closed-loop system persistently excited by a ditheri… view at source ↗

read the original abstract

In this paper, we investigate a data-driven framework to solve Linear Quadratic Regulator (LQR) problems when the dynamics is unknown, with the additional challenge of providing stability certificates for the overall learning and control scheme. Specifically, in the proposed on-policy learning framework, the control input is applied to the actual (unknown) linear system while iteratively optimized. We propose a learning and control procedure, termed Relearn LQR, that combines a recursive least squares method with a direct policy search based on the gradient method. The resulting scheme is analyzed by modeling it as a feedback-interconnected nonlinear dynamical system. A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme. The effectiveness of the proposed strategy is corroborated by numerical simulations, where Relearn LQR is deployed on an aircraft control problem, with both static and drifting parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Relearn LQR interleaves RLS identification with on-policy policy-gradient updates and claims Lyapunov stability via averaging and two-time-scale arguments, but the PE and rate-separation conditions remain stated assumptions rather than explicitly bounded.

read the letter

The paper introduces Relearn LQR, which runs recursive least squares on the data generated by the current policy while simultaneously taking gradient steps on the LQR gain, all applied directly to the unknown plant. They model the identifier and the optimizer as a feedback interconnection of nonlinear dynamics and invoke averaging plus timescale separation to construct a composite Lyapunov function whose derivative is negative definite under suitable conditions. This joint stability claim for the learning loop is the central new element beyond standard data-driven LQR work. The aircraft example with both static and drifting parameters is a sensible practical check and shows the scheme can handle slow parameter variation. The analysis follows the usual Lyapunov-plus-averaging route once the interconnection is written down, which is clean enough. The soft spot is exactly the one flagged in the stress test. The proof needs the regressor to stay persistently exciting while the policy is being updated on the fly, plus a clear separation between the RLS rate and the gradient step size. The paper states these requirements but does not derive explicit bounds on the gains that would guarantee the separation holds for their parameterization, nor does it show that the time-varying policy input preserves uniform PE. Without those bounds the formal guarantee is narrower than the abstract suggests and will need case-by-case verification in applications. This is worth a reading group for the modeling and analysis sections. I would not cite it yet. It deserves peer review because the setup is concrete and the attempt at a joint certificate is substantive enough that referees can usefully check the conditions.

Referee Report

2 major / 0 minor

Summary. The paper proposes Relearn LQR, an on-policy data-driven method for unknown LQR problems that combines recursive least squares (RLS) identification with direct policy gradient search. The control input is applied to the real system while the policy is iteratively optimized. The scheme is modeled as a feedback interconnection of nonlinear dynamical systems, and a composite Lyapunov function is constructed using averaging and two-time-scale separation arguments to certify stability of the overall closed-loop learning process. Effectiveness is illustrated on an aircraft control example with both constant and drifting parameters.

Significance. If the stability certificates hold under explicitly verifiable conditions, the result would be a useful contribution to safe on-policy learning for linear control. The modeling choice and invocation of averaging/two-time-scale theorems are standard tools that, when applicable, can deliver rigorous guarantees without requiring offline data collection. The on-policy setting and handling of drifting parameters are practically relevant.

major comments (2)

[analysis paragraph] Abstract/analysis paragraph: the formal stability claim rests on the regressor remaining uniformly persistently exciting while the policy is updated on-policy, yet no explicit bounds on the RLS forgetting factor or policy-gradient step size are derived to guarantee that the separation of timescales remains valid for the chosen parameterization.
[analysis paragraph] Abstract/analysis paragraph: the application of averaging and two-time-scale theorems requires uniform PE of the closed-loop regressor under the time-varying policy; the manuscript states the condition but does not verify or bound the minimum eigenvalue of the regressor covariance when the input is generated by the evolving policy estimate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the stability analysis. We address each major comment below and will revise the manuscript to clarify the assumptions.

read point-by-point responses

Referee: Abstract/analysis paragraph: the formal stability claim rests on the regressor remaining uniformly persistently exciting while the policy is updated on-policy, yet no explicit bounds on the RLS forgetting factor or policy-gradient step size are derived to guarantee that the separation of timescales remains valid for the chosen parameterization.

Authors: We acknowledge the observation. The Lyapunov analysis with averaging and two-time-scale separation is performed under the standing assumption of uniform persistent excitation (PE) of the regressor and sufficient separation between the RLS and policy-gradient timescales. Explicit, parameterization-independent bounds on the forgetting factor and step size are not derived, as they would require a detailed, system-specific characterization of the closed-loop regressor that lies outside the paper's scope. In the revised version we will add an explicit remark in the analysis section stating these assumptions and referencing standard adaptive-control practice for parameter tuning to maintain them. revision: yes
Referee: Abstract/analysis paragraph: the application of averaging and two-time-scale theorems requires uniform PE of the closed-loop regressor under the time-varying policy; the manuscript states the condition but does not verify or bound the minimum eigenvalue of the regressor covariance when the input is generated by the evolving policy estimate.

Authors: We agree that the manuscript states the uniform-PE requirement without providing an explicit lower bound on the minimum eigenvalue of the covariance matrix under the time-varying policy. Such a bound depends on the unknown plant, the initial policy, and the update rates; deriving a general expression is therefore not feasible without additional assumptions. The contribution centers on stability of the interconnected system once the PE condition holds. In revision we will insert a clarifying paragraph that reiterates the assumption, notes its practical enforcement via persistent excitation in the on-policy data, and points to related literature where analogous conditions are left as standing assumptions. revision: yes

Circularity Check

0 steps flagged

No circularity: stability via external Lyapunov/averaging/timescale-separation theorems on modeled interconnection

full rationale

The derivation models the Relearn LQR scheme (RLS + policy gradient) as a feedback-interconnected nonlinear system and invokes standard external results (Lyapunov functions, averaging theory, two-time-scale separation) to certify stability under stated assumptions of uniform PE and rate separation. These theorems are independent mathematical tools whose hypotheses are not constructed from the paper's fitted quantities or outputs; the paper states the conditions but does not reduce the stability claim to a self-definition or self-citation chain. No self-definitional, fitted-input-as-prediction, or ansatz-smuggled steps appear in the abstract or reader's summary of the chain. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. The approach implicitly relies on standard LQR quadratic-cost assumptions and the applicability of averaging/timescale-separation theorems, but none are enumerated or justified in the provided text.

pith-pipeline@v0.9.0 · 5699 in / 1047 out tokens · 30140 ms · 2026-05-24T02:41:06.663228+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the closed-loop system consisting of the gradient update on the gain K, the RLS scheme, and the system dynamics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

[1]

Adaptive linear quadratic control using policy iteration,

S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” inIEEE American Control Conference, vol. 3, pp. 3475–3479, 1994

work page 1994
[2]

A tour of reinforcement learning: The view from continuous control,

B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019

work page 2019
[3]

On an iterative technique for Riccati equation computa- tions,

D. Kleinman, “On an iterative technique for Riccati equation computa- tions,”IEEE Transactions on Automatic Control, vol.13, no.1, pp.114– 115, 1968

work page 1968
[4]

Robust policy iteration for continuous-timelinearquadraticregulation,

B. Pang, T. Bian, and Z.-P. Jiang, “Robust policy iteration for continuous-timelinearquadraticregulation,” IEEE Transactions on Au- tomatic Control, vol. 67, no. 1, pp. 504–511, 2021. 25

work page 2021
[5]

Efficient off-policy Q- learning for data-based discrete-time LQR problems,

V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient off-policy Q- learning for data-based discrete-time LQR problems,”IEEE Transac- tions on Automatic Control, 2023

work page 2023
[6]

Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adap- tive dynamic programming,

C. Qin, H. Zhang, and Y. Luo, “Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adap- tive dynamic programming,”International Journal of Control, vol. 87, no. 5, pp. 1000–1009, 2014

work page 2014
[7]

Finite-time analysis of approximate policy iteration for the linear quadratic regulator,

K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[8]

Optimal output-feedback control of unknown continuous-time linear systems using off-policy rein- forcement learning,

H. Modares, F. L. Lewis, and Z.-P. Jiang, “Optimal output-feedback control of unknown continuous-time linear systems using off-policy rein- forcement learning,”IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016

work page 2016
[9]

Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,

B. Pang, T. Bian, and Z.-P. Jiang, “Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,” in2018 IEEE Conference on Decision and Control (CDC), pp. 861–866, IEEE, 2018

work page 2018
[10]

Q-learning for continuous-time linear sys- tems: A data-driven implementation of the Kleinman algorithm,

C. Possieri and M. Sassano, “Q-learning for continuous-time linear sys- tems: A data-driven implementation of the Kleinman algorithm,”IEEE Transactions on Systems, Man, and Cybernetics: Systems , vol. 52, no. 10, pp. 6487–6497, 2022

work page 2022
[11]

Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,” Automatica, vol. 71, pp. 348–360, 2016

work page 2016
[12]

How are policy gradient methods affected by the limits of control?,

I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?,” inIEEE 61st Con- ference on Decision and Control (CDC), pp. 5992–5999, 2022

work page 2022
[13]

H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,

B.Kiumarsi, F.L.Lewis, andZ.-P.Jiang, “H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,”Automatica, vol. 78, pp. 144–152, 2017

work page 2017
[14]

Formulas for data-driven control: Stabi- lization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019

work page 2019
[15]

Data informativity: a new perspective on data-driven analysis and control,

H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camli- bel, “Data informativity: a new perspective on data-driven analysis and control,”IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020. 26

work page 2020
[16]

Data-driven linear quadratic regulation via semidefinite programming,

M. Rotulo, C. De Persis, and P. Tesi, “Data-driven linear quadratic regulation via semidefinite programming,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 3995–4000, 2020

work page 2020
[17]

Online learning of data-driven controllers for unknown switched linear systems,

M. Rotulo, C. De Persis, and P. Tesi, “Online learning of data-driven controllers for unknown switched linear systems,”Automatica, vol. 145, p. 110519, 2022

work page 2022
[18]

Low-complexity learning of linear quadratic regulators from noisy data,

C. De Persis and P. Tesi, “Low-complexity learning of linear quadratic regulators from noisy data,”Automatica, vol. 128, p. 109548, 2021

work page 2021
[19]

On the certainty-equivalence approach to direct data-driven LQR design,

F. Dörfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven LQR design,”IEEE Transactions on Automatic Control, 2023

work page 2023
[20]

Robust data- driven state-feedback design,

J. Berberich, A. Koch, C. W. Scherer, and F. Allgöwer, “Robust data- driven state-feedback design,” inIEEE American Control Conference (ACC), pp. 1532–1538, 2020

work page 2020
[21]

From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,

H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,” IEEE Transactions on Automatic Control, vol. 67, no. 1, pp. 162–175, 2020

work page 2020
[22]

Learning controllers for nonlinear systems from data,

C. De Persis and P. Tesi, “Learning controllers for nonlinear systems from data,”Annual Reviews in Control, p. 100915, 2023

work page 2023
[23]

Safely learning to control the constrained linear quadratic regulator,

S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained linear quadratic regulator,” inIEEE American Control Conference (ACC), pp. 5582–5588, 2019

work page 2019
[24]

Certainty equivalence is efficient for linear quadratic control,

H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[25]

Learning robust lq-controllers using application oriented exploration,

M. Ferizbegovic, J. Umenberger, H. Hjalmarsson, and T. B. Schön, “Learning robust lq-controllers using application oriented exploration,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 19–24, 2019

work page 2019
[26]

Structured exploration in the finite horizon linear quadratic dual control problem,

A. Iannelli, M. Khosravi, and R. S. Smith, “Structured exploration in the finite horizon linear quadratic dual control problem,”IFAC- PapersOnLine, vol. 53, no. 2, pp. 959–964, 2020

work page 2020
[27]

Core: Control-oriented regularization for system identification,

S. Formentin and A. Chiuso, “Core: Control-oriented regularization for system identification,” inIEEE Conference on Decision and Control (CDC), pp. 2253–2258, 2018. 27

work page 2018
[28]

Bridging direct and indirect data-driven control formulations via regularizations and relaxations,

F. Dörfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022

work page 2022
[29]

Toward a theoretical foundation of policy optimization for learning control poli- cies,

B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a theoretical foundation of policy optimization for learning control poli- cies,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023

work page 2023
[30]

LQR through the lens of first order methods: Discrete-time case,

J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “LQR through the lens of first order methods: Discrete-time case,” arXiv preprint arXiv:1907.08921, 2019

work page arXiv 1907
[31]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning, pp. 1467–1476, PMLR, 2018

work page 2018
[32]

Policy optimization forH2 linear control with H∞ robustness guarantee: Implicit regularization and global convergence,

K. Zhang, B. Hu, and T. Basar, “Policy optimization forH2 linear control with H∞ robustness guarantee: Implicit regularization and global convergence,” inLearning for Dynamics and Control, pp. 179– 190, PMLR, 2020

work page 2020
[33]

Con- vergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

H.Mohammadi, A.Zare, M.Soltanolkotabi, andM.R.Jovanović, “Con- vergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

work page 2021
[34]

On the linear convergence of random search for discrete-time LQR,

H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanović, “On the linear convergence of random search for discrete-time LQR,”IEEE Control Systems Letters, vol. 5, no. 3, pp. 989–994, 2020

work page 2020
[35]

Regret bounds for the adaptive control of linear quadratic systems,

Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” inProceedings of the 24th Annual Conference on Learning Theory, pp. 1–26, JMLR Workshop and Con- ference Proceedings, 2011

work page 2011
[36]

Learning linear-quadratic regu- lators efficiently with only √ T regret,

A. Cohen, T. Koren, and Y. Mansour, “Learning linear-quadratic regu- lators efficiently with only √ T regret,” inInternational Conference on Machine Learning, pp. 1300–1309, PMLR, 2019

work page 2019
[37]

Logarithmic regret for learning linear quadratic regulators efficiently,

A. Cassel, A. Cohen, and T. Koren, “Logarithmic regret for learning linear quadratic regulators efficiently,” inInternational Conference on Machine Learning, pp. 1328–1337, PMLR, 2020

work page 2020
[38]

Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,

M. Akbari, B. Gharesifard, and T. Linder, “Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,” inIEEE 61st Conference on Decision and Control (CDC), pp. 4700–4705, 2022. 28

work page 2022
[39]

On the sample com- plexity of the linear quadratic regulator,

S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com- plexity of the linear quadratic regulator,”Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020

work page 2020
[40]

Adaptive optimal control for continuous-time linear systems based on policy iter- ation,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iter- ation,”Automatica, vol. 45, no. 2, pp. 477–484, 2009

work page 2009
[41]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012
[42]

Value iteration for continuous-time lin- ear time-invariant systems,

C. Possieri and M. Sassano, “Value iteration for continuous-time lin- ear time-invariant systems,”IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 3070–3077, 2022

work page 2022
[43]

Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,

B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,”IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015

work page 2015
[44]

Naive exploration is optimal for online LQR,

M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” inProceedings of the 37th International Conference on Machine Learning(H. D. III and A. Singh, eds.), vol. 119 ofProceedings of Ma- chine Learning Research, pp. 8937–8948, PMLR, 13–18 Jul 2020

work page 2020
[45]

Averaging analysis for discrete time and sampled data adaptive systems,

E.-W. Bai, L.-C. Fu, and S. S. Sastry, “Averaging analysis for discrete time and sampled data adaptive systems,”IEEE Transactions on Cir- cuits and Systems, vol. 35, no. 2, pp. 137–148, 1988

work page 1988
[46]

B. D. Anderson and J. B. Moore, Optimal control: linear quadratic methods. Courier Corporation, 2007

work page 2007
[47]

Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,

J.Bu, A.Mesbahi, andM.Mesbahi, “Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,” IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 730–744, 2020

work page 2020
[48]

Exponential convergence of recursive least squares with exponential forgetting factor,

R. M. Johnstone, C. R. Johnson Jr, R. R. Bitmead, and B. D. Anderson, “Exponential convergence of recursive least squares with exponential forgetting factor,”Systems & Control Letters, vol. 2, no. 2, pp. 77–82, 1982

work page 1982
[49]

Recursive discrete-time sinusoidal oscillators,

C. S. Turner, “Recursive discrete-time sinusoidal oscillators,”IEEE Sig- nal Processing Magazine, vol. 20, no. 3, pp. 103–111, 2003

work page 2003
[50]

A note on persistency of excitation,

J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005. 29

work page 2005
[51]

Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,

E.-W. Bai and S. S. Sastry, “Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,”Systems & control letters, vol. 6, no. 3, pp. 153–163, 1985

work page 1985
[52]

A geometric characterization ofthepersistenceofexcitationconditionforthesolutionsofautonomous systems,

A. Padoan, G. Scarciotti, and A. Astolfi, “A geometric characterization ofthepersistenceofexcitationconditionforthesolutionsofautonomous systems,”IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5666–5677, 2017

work page 2017
[53]

Isidori, Lectures in feedback design for multivariable systems

A. Isidori, Lectures in feedback design for multivariable systems . Springer, 2017

work page 2017
[54]

Asymptotic stability equals exponential stability, and iss equals finite energy gain—if you twist your eyes,

L. Grüne, E. D. Sontag, and F. R. Wirth, “Asymptotic stability equals exponential stability, and iss equals finite energy gain—if you twist your eyes,”Systems & Control Letters, vol. 38, no. 2, pp. 127–134, 1999

work page 1999
[55]

Design of feedback control systems for unstable plants with saturating actuators,

P. Kapasouris, M. Athans, and G. Stein, “Design of feedback control systems for unstable plants with saturating actuators,” inProc. IFAC Symp. on Nonlinear Control System Design, pp. 302–307, Pergamon Press, 1990

work page 1990
[56]

How and why to solve the operator equa- tion ax- xb= y,

R. Bhatia and P. Rosenthal, “How and why to solve the operator equa- tion ax- xb= y,”Bulletin of the London Mathematical Society, vol. 29, no. 1, pp. 1–21, 1997

work page 1997
[57]

Nonlinear dynamical systems and control,

W. M. Haddad and V. Chellaboina, “Nonlinear dynamical systems and control,” inNonlinear Dynamical Systems and Control, Princeton uni- versity press, 2011. A Proof of Lemma 4.1 We note that (27) is obtained by settingKt = K⋆ in (24) (which compactly collects the updates (19a), (19b), and (23)). Hence, we start by inspect- ing (19a) and (19b) restricted to ...

work page 2011
[58]

Let us arbitrarily choose ν1, ν2 ∈ (0, 1). Then, for all γ ∈ (0, ¯γa v) with ¯γa v := min n 1, ¯γ0, 2ν1 3β3 , 2ν2 1+β3β2 4 o , we further bound (C.8) as ∆V ( ˜Ka v t , ˜θa v t ) ≤ −γκν1 G( ˜Ka v t + K⋆, θ⋆) 2 + γκβ4 G( ˜Ka v t + K⋆, θ⋆) ˜θa v t − γν2 ˜θa v t 2 (a) = −γ   G( ˜Ka v t +K⋆, θ⋆) ˜θa v t   ⊤ U(κ)   G( ˜Ka v t +K⋆, θ⋆) ˜θa v t  ,(C.9) 7G...

work page

[1] [1]

Adaptive linear quadratic control using policy iteration,

S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” inIEEE American Control Conference, vol. 3, pp. 3475–3479, 1994

work page 1994

[2] [2]

A tour of reinforcement learning: The view from continuous control,

B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019

work page 2019

[3] [3]

On an iterative technique for Riccati equation computa- tions,

D. Kleinman, “On an iterative technique for Riccati equation computa- tions,”IEEE Transactions on Automatic Control, vol.13, no.1, pp.114– 115, 1968

work page 1968

[4] [4]

Robust policy iteration for continuous-timelinearquadraticregulation,

B. Pang, T. Bian, and Z.-P. Jiang, “Robust policy iteration for continuous-timelinearquadraticregulation,” IEEE Transactions on Au- tomatic Control, vol. 67, no. 1, pp. 504–511, 2021. 25

work page 2021

[5] [5]

Efficient off-policy Q- learning for data-based discrete-time LQR problems,

V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient off-policy Q- learning for data-based discrete-time LQR problems,”IEEE Transac- tions on Automatic Control, 2023

work page 2023

[6] [6]

Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adap- tive dynamic programming,

C. Qin, H. Zhang, and Y. Luo, “Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adap- tive dynamic programming,”International Journal of Control, vol. 87, no. 5, pp. 1000–1009, 2014

work page 2014

[7] [7]

Finite-time analysis of approximate policy iteration for the linear quadratic regulator,

K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[8] [8]

Optimal output-feedback control of unknown continuous-time linear systems using off-policy rein- forcement learning,

H. Modares, F. L. Lewis, and Z.-P. Jiang, “Optimal output-feedback control of unknown continuous-time linear systems using off-policy rein- forcement learning,”IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016

work page 2016

[9] [9]

Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,

B. Pang, T. Bian, and Z.-P. Jiang, “Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,” in2018 IEEE Conference on Decision and Control (CDC), pp. 861–866, IEEE, 2018

work page 2018

[10] [10]

Q-learning for continuous-time linear sys- tems: A data-driven implementation of the Kleinman algorithm,

C. Possieri and M. Sassano, “Q-learning for continuous-time linear sys- tems: A data-driven implementation of the Kleinman algorithm,”IEEE Transactions on Systems, Man, and Cybernetics: Systems , vol. 52, no. 10, pp. 6487–6497, 2022

work page 2022

[11] [11]

Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,” Automatica, vol. 71, pp. 348–360, 2016

work page 2016

[12] [12]

How are policy gradient methods affected by the limits of control?,

I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?,” inIEEE 61st Con- ference on Decision and Control (CDC), pp. 5992–5999, 2022

work page 2022

[13] [13]

H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,

B.Kiumarsi, F.L.Lewis, andZ.-P.Jiang, “H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,”Automatica, vol. 78, pp. 144–152, 2017

work page 2017

[14] [14]

Formulas for data-driven control: Stabi- lization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019

work page 2019

[15] [15]

Data informativity: a new perspective on data-driven analysis and control,

H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camli- bel, “Data informativity: a new perspective on data-driven analysis and control,”IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020. 26

work page 2020

[16] [16]

Data-driven linear quadratic regulation via semidefinite programming,

M. Rotulo, C. De Persis, and P. Tesi, “Data-driven linear quadratic regulation via semidefinite programming,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 3995–4000, 2020

work page 2020

[17] [17]

Online learning of data-driven controllers for unknown switched linear systems,

M. Rotulo, C. De Persis, and P. Tesi, “Online learning of data-driven controllers for unknown switched linear systems,”Automatica, vol. 145, p. 110519, 2022

work page 2022

[18] [18]

Low-complexity learning of linear quadratic regulators from noisy data,

C. De Persis and P. Tesi, “Low-complexity learning of linear quadratic regulators from noisy data,”Automatica, vol. 128, p. 109548, 2021

work page 2021

[19] [19]

On the certainty-equivalence approach to direct data-driven LQR design,

F. Dörfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven LQR design,”IEEE Transactions on Automatic Control, 2023

work page 2023

[20] [20]

Robust data- driven state-feedback design,

J. Berberich, A. Koch, C. W. Scherer, and F. Allgöwer, “Robust data- driven state-feedback design,” inIEEE American Control Conference (ACC), pp. 1532–1538, 2020

work page 2020

[21] [21]

From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,

H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,” IEEE Transactions on Automatic Control, vol. 67, no. 1, pp. 162–175, 2020

work page 2020

[22] [22]

Learning controllers for nonlinear systems from data,

C. De Persis and P. Tesi, “Learning controllers for nonlinear systems from data,”Annual Reviews in Control, p. 100915, 2023

work page 2023

[23] [23]

Safely learning to control the constrained linear quadratic regulator,

S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained linear quadratic regulator,” inIEEE American Control Conference (ACC), pp. 5582–5588, 2019

work page 2019

[24] [24]

Certainty equivalence is efficient for linear quadratic control,

H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[25] [25]

Learning robust lq-controllers using application oriented exploration,

M. Ferizbegovic, J. Umenberger, H. Hjalmarsson, and T. B. Schön, “Learning robust lq-controllers using application oriented exploration,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 19–24, 2019

work page 2019

[26] [26]

Structured exploration in the finite horizon linear quadratic dual control problem,

A. Iannelli, M. Khosravi, and R. S. Smith, “Structured exploration in the finite horizon linear quadratic dual control problem,”IFAC- PapersOnLine, vol. 53, no. 2, pp. 959–964, 2020

work page 2020

[27] [27]

Core: Control-oriented regularization for system identification,

S. Formentin and A. Chiuso, “Core: Control-oriented regularization for system identification,” inIEEE Conference on Decision and Control (CDC), pp. 2253–2258, 2018. 27

work page 2018

[28] [28]

Bridging direct and indirect data-driven control formulations via regularizations and relaxations,

F. Dörfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022

work page 2022

[29] [29]

Toward a theoretical foundation of policy optimization for learning control poli- cies,

B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a theoretical foundation of policy optimization for learning control poli- cies,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023

work page 2023

[30] [30]

LQR through the lens of first order methods: Discrete-time case,

J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “LQR through the lens of first order methods: Discrete-time case,” arXiv preprint arXiv:1907.08921, 2019

work page arXiv 1907

[31] [31]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning, pp. 1467–1476, PMLR, 2018

work page 2018

[32] [32]

Policy optimization forH2 linear control with H∞ robustness guarantee: Implicit regularization and global convergence,

K. Zhang, B. Hu, and T. Basar, “Policy optimization forH2 linear control with H∞ robustness guarantee: Implicit regularization and global convergence,” inLearning for Dynamics and Control, pp. 179– 190, PMLR, 2020

work page 2020

[33] [33]

Con- vergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

H.Mohammadi, A.Zare, M.Soltanolkotabi, andM.R.Jovanović, “Con- vergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

work page 2021

[34] [34]

On the linear convergence of random search for discrete-time LQR,

H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanović, “On the linear convergence of random search for discrete-time LQR,”IEEE Control Systems Letters, vol. 5, no. 3, pp. 989–994, 2020

work page 2020

[35] [35]

Regret bounds for the adaptive control of linear quadratic systems,

Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” inProceedings of the 24th Annual Conference on Learning Theory, pp. 1–26, JMLR Workshop and Con- ference Proceedings, 2011

work page 2011

[36] [36]

Learning linear-quadratic regu- lators efficiently with only √ T regret,

A. Cohen, T. Koren, and Y. Mansour, “Learning linear-quadratic regu- lators efficiently with only √ T regret,” inInternational Conference on Machine Learning, pp. 1300–1309, PMLR, 2019

work page 2019

[37] [37]

Logarithmic regret for learning linear quadratic regulators efficiently,

A. Cassel, A. Cohen, and T. Koren, “Logarithmic regret for learning linear quadratic regulators efficiently,” inInternational Conference on Machine Learning, pp. 1328–1337, PMLR, 2020

work page 2020

[38] [38]

Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,

M. Akbari, B. Gharesifard, and T. Linder, “Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,” inIEEE 61st Conference on Decision and Control (CDC), pp. 4700–4705, 2022. 28

work page 2022

[39] [39]

On the sample com- plexity of the linear quadratic regulator,

S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com- plexity of the linear quadratic regulator,”Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020

work page 2020

[40] [40]

Adaptive optimal control for continuous-time linear systems based on policy iter- ation,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iter- ation,”Automatica, vol. 45, no. 2, pp. 477–484, 2009

work page 2009

[41] [41]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012

[42] [42]

Value iteration for continuous-time lin- ear time-invariant systems,

C. Possieri and M. Sassano, “Value iteration for continuous-time lin- ear time-invariant systems,”IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 3070–3077, 2022

work page 2022

[43] [43]

Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,

B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,”IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015

work page 2015

[44] [44]

Naive exploration is optimal for online LQR,

M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” inProceedings of the 37th International Conference on Machine Learning(H. D. III and A. Singh, eds.), vol. 119 ofProceedings of Ma- chine Learning Research, pp. 8937–8948, PMLR, 13–18 Jul 2020

work page 2020

[45] [45]

Averaging analysis for discrete time and sampled data adaptive systems,

E.-W. Bai, L.-C. Fu, and S. S. Sastry, “Averaging analysis for discrete time and sampled data adaptive systems,”IEEE Transactions on Cir- cuits and Systems, vol. 35, no. 2, pp. 137–148, 1988

work page 1988

[46] [46]

B. D. Anderson and J. B. Moore, Optimal control: linear quadratic methods. Courier Corporation, 2007

work page 2007

[47] [47]

Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,

J.Bu, A.Mesbahi, andM.Mesbahi, “Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,” IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 730–744, 2020

work page 2020

[48] [48]

Exponential convergence of recursive least squares with exponential forgetting factor,

R. M. Johnstone, C. R. Johnson Jr, R. R. Bitmead, and B. D. Anderson, “Exponential convergence of recursive least squares with exponential forgetting factor,”Systems & Control Letters, vol. 2, no. 2, pp. 77–82, 1982

work page 1982

[49] [49]

Recursive discrete-time sinusoidal oscillators,

C. S. Turner, “Recursive discrete-time sinusoidal oscillators,”IEEE Sig- nal Processing Magazine, vol. 20, no. 3, pp. 103–111, 2003

work page 2003

[50] [50]

A note on persistency of excitation,

J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005. 29

work page 2005

[51] [51]

Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,

E.-W. Bai and S. S. Sastry, “Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,”Systems & control letters, vol. 6, no. 3, pp. 153–163, 1985

work page 1985

[52] [52]

A geometric characterization ofthepersistenceofexcitationconditionforthesolutionsofautonomous systems,

A. Padoan, G. Scarciotti, and A. Astolfi, “A geometric characterization ofthepersistenceofexcitationconditionforthesolutionsofautonomous systems,”IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5666–5677, 2017

work page 2017

[53] [53]

Isidori, Lectures in feedback design for multivariable systems

A. Isidori, Lectures in feedback design for multivariable systems . Springer, 2017

work page 2017

[54] [54]

Asymptotic stability equals exponential stability, and iss equals finite energy gain—if you twist your eyes,

L. Grüne, E. D. Sontag, and F. R. Wirth, “Asymptotic stability equals exponential stability, and iss equals finite energy gain—if you twist your eyes,”Systems & Control Letters, vol. 38, no. 2, pp. 127–134, 1999

work page 1999

[55] [55]

Design of feedback control systems for unstable plants with saturating actuators,

P. Kapasouris, M. Athans, and G. Stein, “Design of feedback control systems for unstable plants with saturating actuators,” inProc. IFAC Symp. on Nonlinear Control System Design, pp. 302–307, Pergamon Press, 1990

work page 1990

[56] [56]

How and why to solve the operator equa- tion ax- xb= y,

R. Bhatia and P. Rosenthal, “How and why to solve the operator equa- tion ax- xb= y,”Bulletin of the London Mathematical Society, vol. 29, no. 1, pp. 1–21, 1997

work page 1997

[57] [57]

Nonlinear dynamical systems and control,

W. M. Haddad and V. Chellaboina, “Nonlinear dynamical systems and control,” inNonlinear Dynamical Systems and Control, Princeton uni- versity press, 2011. A Proof of Lemma 4.1 We note that (27) is obtained by settingKt = K⋆ in (24) (which compactly collects the updates (19a), (19b), and (23)). Hence, we start by inspect- ing (19a) and (19b) restricted to ...

work page 2011

[58] [58]

Let us arbitrarily choose ν1, ν2 ∈ (0, 1). Then, for all γ ∈ (0, ¯γa v) with ¯γa v := min n 1, ¯γ0, 2ν1 3β3 , 2ν2 1+β3β2 4 o , we further bound (C.8) as ∆V ( ˜Ka v t , ˜θa v t ) ≤ −γκν1 G( ˜Ka v t + K⋆, θ⋆) 2 + γκβ4 G( ˜Ka v t + K⋆, θ⋆) ˜θa v t − γν2 ˜θa v t 2 (a) = −γ   G( ˜Ka v t +K⋆, θ⋆) ˜θa v t   ⊤ U(κ)   G( ˜Ka v t +K⋆, θ⋆) ˜θa v t  ,(C.9) 7G...

work page