Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient
Pith reviewed 2026-05-24 02:41 UTC · model grok-4.3
The pith
Relearn LQR combines recursive estimation and policy gradients to solve data-driven LQR while proving stability of the full closed-loop scheme.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Relearn LQR procedure integrates a recursive least squares estimator with a direct policy-gradient search. By casting the overall learning-control loop as a feedback-interconnected nonlinear dynamical system and invoking averaging together with timescale separation, a Lyapunov function is constructed that certifies asymptotic stability of the equilibrium consisting of the true parameters and the optimal LQR gain.
What carries the argument
The feedback-interconnected nonlinear dynamical system formed by the plant, recursive least-squares estimator, and policy-gradient update, analyzed via averaging and timescale separation.
If this is right
- The scheme converges to the optimal LQR gain while the plant state remains bounded throughout adaptation.
- Stability holds for both constant and slowly drifting plant parameters.
- The same Lyapunov-plus-averaging argument applies to any on-policy combination of recursive estimation and gradient-based policy search that meets the rate-separation condition.
- The method can be run directly on physical plants, as demonstrated by the aircraft-control example.
Where Pith is reading between the lines
- The same modeling trick could be tried on other adaptive controllers whose updates are slower than the plant dynamics.
- Relaxing the linear-plant assumption while keeping the timescale separation might yield stability results for data-driven nonlinear control.
- The persistence-of-excitation requirement points to a practical test: inject sufficiently rich probing signals only until the estimator converges, then switch to pure regulation.
Load-bearing premise
The combined learning and control process can be represented as a nonlinear feedback interconnection to which averaging and timescale separation apply, which in turn requires sufficient separation of the adaptation rates and persistence of excitation.
What would settle it
A concrete linear system with known persistence of excitation where the adaptation rates are separated yet the closed-loop trajectories diverge or the gain fails to converge to the optimal LQR solution.
Figures
read the original abstract
In this paper, we investigate a data-driven framework to solve Linear Quadratic Regulator (LQR) problems when the dynamics is unknown, with the additional challenge of providing stability certificates for the overall learning and control scheme. Specifically, in the proposed on-policy learning framework, the control input is applied to the actual (unknown) linear system while iteratively optimized. We propose a learning and control procedure, termed Relearn LQR, that combines a recursive least squares method with a direct policy search based on the gradient method. The resulting scheme is analyzed by modeling it as a feedback-interconnected nonlinear dynamical system. A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme. The effectiveness of the proposed strategy is corroborated by numerical simulations, where Relearn LQR is deployed on an aircraft control problem, with both static and drifting parameters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Relearn LQR, an on-policy data-driven method for unknown LQR problems that combines recursive least squares (RLS) identification with direct policy gradient search. The control input is applied to the real system while the policy is iteratively optimized. The scheme is modeled as a feedback interconnection of nonlinear dynamical systems, and a composite Lyapunov function is constructed using averaging and two-time-scale separation arguments to certify stability of the overall closed-loop learning process. Effectiveness is illustrated on an aircraft control example with both constant and drifting parameters.
Significance. If the stability certificates hold under explicitly verifiable conditions, the result would be a useful contribution to safe on-policy learning for linear control. The modeling choice and invocation of averaging/two-time-scale theorems are standard tools that, when applicable, can deliver rigorous guarantees without requiring offline data collection. The on-policy setting and handling of drifting parameters are practically relevant.
major comments (2)
- [analysis paragraph] Abstract/analysis paragraph: the formal stability claim rests on the regressor remaining uniformly persistently exciting while the policy is updated on-policy, yet no explicit bounds on the RLS forgetting factor or policy-gradient step size are derived to guarantee that the separation of timescales remains valid for the chosen parameterization.
- [analysis paragraph] Abstract/analysis paragraph: the application of averaging and two-time-scale theorems requires uniform PE of the closed-loop regressor under the time-varying policy; the manuscript states the condition but does not verify or bound the minimum eigenvalue of the regressor covariance when the input is generated by the evolving policy estimate.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the stability analysis. We address each major comment below and will revise the manuscript to clarify the assumptions.
read point-by-point responses
-
Referee: Abstract/analysis paragraph: the formal stability claim rests on the regressor remaining uniformly persistently exciting while the policy is updated on-policy, yet no explicit bounds on the RLS forgetting factor or policy-gradient step size are derived to guarantee that the separation of timescales remains valid for the chosen parameterization.
Authors: We acknowledge the observation. The Lyapunov analysis with averaging and two-time-scale separation is performed under the standing assumption of uniform persistent excitation (PE) of the regressor and sufficient separation between the RLS and policy-gradient timescales. Explicit, parameterization-independent bounds on the forgetting factor and step size are not derived, as they would require a detailed, system-specific characterization of the closed-loop regressor that lies outside the paper's scope. In the revised version we will add an explicit remark in the analysis section stating these assumptions and referencing standard adaptive-control practice for parameter tuning to maintain them. revision: yes
-
Referee: Abstract/analysis paragraph: the application of averaging and two-time-scale theorems requires uniform PE of the closed-loop regressor under the time-varying policy; the manuscript states the condition but does not verify or bound the minimum eigenvalue of the regressor covariance when the input is generated by the evolving policy estimate.
Authors: We agree that the manuscript states the uniform-PE requirement without providing an explicit lower bound on the minimum eigenvalue of the covariance matrix under the time-varying policy. Such a bound depends on the unknown plant, the initial policy, and the update rates; deriving a general expression is therefore not feasible without additional assumptions. The contribution centers on stability of the interconnected system once the PE condition holds. In revision we will insert a clarifying paragraph that reiterates the assumption, notes its practical enforcement via persistent excitation in the on-policy data, and points to related literature where analogous conditions are left as standing assumptions. revision: yes
Circularity Check
No circularity: stability via external Lyapunov/averaging/timescale-separation theorems on modeled interconnection
full rationale
The derivation models the Relearn LQR scheme (RLS + policy gradient) as a feedback-interconnected nonlinear system and invokes standard external results (Lyapunov functions, averaging theory, two-time-scale separation) to certify stability under stated assumptions of uniform PE and rate separation. These theorems are independent mathematical tools whose hypotheses are not constructed from the paper's fitted quantities or outputs; the paper states the conditions but does not reduce the stability claim to a self-definition or self-citation chain. No self-definitional, fitted-input-as-prediction, or ansatz-smuggled steps appear in the abstract or reader's summary of the chain. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the closed-loop system consisting of the gradient update on the gain K, the RLS scheme, and the system dynamics
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adaptive linear quadratic control using policy iteration,
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” inIEEE American Control Conference, vol. 3, pp. 3475–3479, 1994
work page 1994
-
[2]
A tour of reinforcement learning: The view from continuous control,
B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019
work page 2019
-
[3]
On an iterative technique for Riccati equation computa- tions,
D. Kleinman, “On an iterative technique for Riccati equation computa- tions,”IEEE Transactions on Automatic Control, vol.13, no.1, pp.114– 115, 1968
work page 1968
-
[4]
Robust policy iteration for continuous-timelinearquadraticregulation,
B. Pang, T. Bian, and Z.-P. Jiang, “Robust policy iteration for continuous-timelinearquadraticregulation,” IEEE Transactions on Au- tomatic Control, vol. 67, no. 1, pp. 504–511, 2021. 25
work page 2021
-
[5]
Efficient off-policy Q- learning for data-based discrete-time LQR problems,
V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient off-policy Q- learning for data-based discrete-time LQR problems,”IEEE Transac- tions on Automatic Control, 2023
work page 2023
-
[6]
C. Qin, H. Zhang, and Y. Luo, “Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adap- tive dynamic programming,”International Journal of Control, vol. 87, no. 5, pp. 1000–1009, 2014
work page 2014
-
[7]
Finite-time analysis of approximate policy iteration for the linear quadratic regulator,
K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,”Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[8]
H. Modares, F. L. Lewis, and Z.-P. Jiang, “Optimal output-feedback control of unknown continuous-time linear systems using off-policy rein- forcement learning,”IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016
work page 2016
-
[9]
Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,
B. Pang, T. Bian, and Z.-P. Jiang, “Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,” in2018 IEEE Conference on Decision and Control (CDC), pp. 861–866, IEEE, 2018
work page 2018
-
[10]
C. Possieri and M. Sassano, “Q-learning for continuous-time linear sys- tems: A data-driven implementation of the Kleinman algorithm,”IEEE Transactions on Systems, Man, and Cybernetics: Systems , vol. 52, no. 10, pp. 6487–6497, 2022
work page 2022
-
[11]
Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,
T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- grammingfordata-drivenadaptiveoptimalcontroldesign,” Automatica, vol. 71, pp. 348–360, 2016
work page 2016
-
[12]
How are policy gradient methods affected by the limits of control?,
I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?,” inIEEE 61st Con- ference on Decision and Control (CDC), pp. 5992–5999, 2022
work page 2022
-
[13]
H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,
B.Kiumarsi, F.L.Lewis, andZ.-P.Jiang, “H∞ controloflineardiscrete- time systems: Off-policy reinforcement learning,”Automatica, vol. 78, pp. 144–152, 2017
work page 2017
-
[14]
Formulas for data-driven control: Stabi- lization, optimality, and robustness,
C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019
work page 2019
-
[15]
Data informativity: a new perspective on data-driven analysis and control,
H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camli- bel, “Data informativity: a new perspective on data-driven analysis and control,”IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020. 26
work page 2020
-
[16]
Data-driven linear quadratic regulation via semidefinite programming,
M. Rotulo, C. De Persis, and P. Tesi, “Data-driven linear quadratic regulation via semidefinite programming,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 3995–4000, 2020
work page 2020
-
[17]
Online learning of data-driven controllers for unknown switched linear systems,
M. Rotulo, C. De Persis, and P. Tesi, “Online learning of data-driven controllers for unknown switched linear systems,”Automatica, vol. 145, p. 110519, 2022
work page 2022
-
[18]
Low-complexity learning of linear quadratic regulators from noisy data,
C. De Persis and P. Tesi, “Low-complexity learning of linear quadratic regulators from noisy data,”Automatica, vol. 128, p. 109548, 2021
work page 2021
-
[19]
On the certainty-equivalence approach to direct data-driven LQR design,
F. Dörfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven LQR design,”IEEE Transactions on Automatic Control, 2023
work page 2023
-
[20]
Robust data- driven state-feedback design,
J. Berberich, A. Koch, C. W. Scherer, and F. Allgöwer, “Robust data- driven state-feedback design,” inIEEE American Control Conference (ACC), pp. 1532–1538, 2020
work page 2020
-
[21]
From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,
H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,” IEEE Transactions on Automatic Control, vol. 67, no. 1, pp. 162–175, 2020
work page 2020
-
[22]
Learning controllers for nonlinear systems from data,
C. De Persis and P. Tesi, “Learning controllers for nonlinear systems from data,”Annual Reviews in Control, p. 100915, 2023
work page 2023
-
[23]
Safely learning to control the constrained linear quadratic regulator,
S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained linear quadratic regulator,” inIEEE American Control Conference (ACC), pp. 5582–5588, 2019
work page 2019
-
[24]
Certainty equivalence is efficient for linear quadratic control,
H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[25]
Learning robust lq-controllers using application oriented exploration,
M. Ferizbegovic, J. Umenberger, H. Hjalmarsson, and T. B. Schön, “Learning robust lq-controllers using application oriented exploration,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 19–24, 2019
work page 2019
-
[26]
Structured exploration in the finite horizon linear quadratic dual control problem,
A. Iannelli, M. Khosravi, and R. S. Smith, “Structured exploration in the finite horizon linear quadratic dual control problem,”IFAC- PapersOnLine, vol. 53, no. 2, pp. 959–964, 2020
work page 2020
-
[27]
Core: Control-oriented regularization for system identification,
S. Formentin and A. Chiuso, “Core: Control-oriented regularization for system identification,” inIEEE Conference on Decision and Control (CDC), pp. 2253–2258, 2018. 27
work page 2018
-
[28]
Bridging direct and indirect data-driven control formulations via regularizations and relaxations,
F. Dörfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022
work page 2022
-
[29]
Toward a theoretical foundation of policy optimization for learning control poli- cies,
B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a theoretical foundation of policy optimization for learning control poli- cies,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023
work page 2023
-
[30]
LQR through the lens of first order methods: Discrete-time case,
J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “LQR through the lens of first order methods: Discrete-time case,” arXiv preprint arXiv:1907.08921, 2019
-
[31]
Global convergence of policy gradient methods for the linear quadratic regulator,
M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning, pp. 1467–1476, PMLR, 2018
work page 2018
-
[32]
K. Zhang, B. Hu, and T. Basar, “Policy optimization forH2 linear control with H∞ robustness guarantee: Implicit regularization and global convergence,” inLearning for Dynamics and Control, pp. 179– 190, PMLR, 2020
work page 2020
-
[33]
H.Mohammadi, A.Zare, M.Soltanolkotabi, andM.R.Jovanović, “Con- vergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021
work page 2021
-
[34]
On the linear convergence of random search for discrete-time LQR,
H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanović, “On the linear convergence of random search for discrete-time LQR,”IEEE Control Systems Letters, vol. 5, no. 3, pp. 989–994, 2020
work page 2020
-
[35]
Regret bounds for the adaptive control of linear quadratic systems,
Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” inProceedings of the 24th Annual Conference on Learning Theory, pp. 1–26, JMLR Workshop and Con- ference Proceedings, 2011
work page 2011
-
[36]
Learning linear-quadratic regu- lators efficiently with only √ T regret,
A. Cohen, T. Koren, and Y. Mansour, “Learning linear-quadratic regu- lators efficiently with only √ T regret,” inInternational Conference on Machine Learning, pp. 1300–1309, PMLR, 2019
work page 2019
-
[37]
Logarithmic regret for learning linear quadratic regulators efficiently,
A. Cassel, A. Cohen, and T. Koren, “Logarithmic regret for learning linear quadratic regulators efficiently,” inInternational Conference on Machine Learning, pp. 1328–1337, PMLR, 2020
work page 2020
-
[38]
Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,
M. Akbari, B. Gharesifard, and T. Linder, “Achieving logarithmic re- gret via hints in online learning of noisy LQR systems,” inIEEE 61st Conference on Decision and Control (CDC), pp. 4700–4705, 2022. 28
work page 2022
-
[39]
On the sample com- plexity of the linear quadratic regulator,
S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com- plexity of the linear quadratic regulator,”Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020
work page 2020
-
[40]
Adaptive optimal control for continuous-time linear systems based on policy iter- ation,
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iter- ation,”Automatica, vol. 45, no. 2, pp. 477–484, 2009
work page 2009
-
[41]
Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012
work page 2012
-
[42]
Value iteration for continuous-time lin- ear time-invariant systems,
C. Possieri and M. Sassano, “Value iteration for continuous-time lin- ear time-invariant systems,”IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 3070–3077, 2022
work page 2022
-
[43]
Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,
B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems us- ing input-output measured data,”IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015
work page 2015
-
[44]
Naive exploration is optimal for online LQR,
M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” inProceedings of the 37th International Conference on Machine Learning(H. D. III and A. Singh, eds.), vol. 119 ofProceedings of Ma- chine Learning Research, pp. 8937–8948, PMLR, 13–18 Jul 2020
work page 2020
-
[45]
Averaging analysis for discrete time and sampled data adaptive systems,
E.-W. Bai, L.-C. Fu, and S. S. Sastry, “Averaging analysis for discrete time and sampled data adaptive systems,”IEEE Transactions on Cir- cuits and Systems, vol. 35, no. 2, pp. 137–148, 1988
work page 1988
-
[46]
B. D. Anderson and J. B. Moore, Optimal control: linear quadratic methods. Courier Corporation, 2007
work page 2007
-
[47]
Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,
J.Bu, A.Mesbahi, andM.Mesbahi, “Ontopologicalpropertiesoftheset ofstabilizingfeedbackgains,” IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 730–744, 2020
work page 2020
-
[48]
Exponential convergence of recursive least squares with exponential forgetting factor,
R. M. Johnstone, C. R. Johnson Jr, R. R. Bitmead, and B. D. Anderson, “Exponential convergence of recursive least squares with exponential forgetting factor,”Systems & Control Letters, vol. 2, no. 2, pp. 77–82, 1982
work page 1982
-
[49]
Recursive discrete-time sinusoidal oscillators,
C. S. Turner, “Recursive discrete-time sinusoidal oscillators,”IEEE Sig- nal Processing Magazine, vol. 20, no. 3, pp. 103–111, 2003
work page 2003
-
[50]
A note on persistency of excitation,
J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005. 29
work page 2005
-
[51]
E.-W. Bai and S. S. Sastry, “Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,”Systems & control letters, vol. 6, no. 3, pp. 153–163, 1985
work page 1985
-
[52]
A. Padoan, G. Scarciotti, and A. Astolfi, “A geometric characterization ofthepersistenceofexcitationconditionforthesolutionsofautonomous systems,”IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5666–5677, 2017
work page 2017
-
[53]
Isidori, Lectures in feedback design for multivariable systems
A. Isidori, Lectures in feedback design for multivariable systems . Springer, 2017
work page 2017
-
[54]
L. Grüne, E. D. Sontag, and F. R. Wirth, “Asymptotic stability equals exponential stability, and iss equals finite energy gain—if you twist your eyes,”Systems & Control Letters, vol. 38, no. 2, pp. 127–134, 1999
work page 1999
-
[55]
Design of feedback control systems for unstable plants with saturating actuators,
P. Kapasouris, M. Athans, and G. Stein, “Design of feedback control systems for unstable plants with saturating actuators,” inProc. IFAC Symp. on Nonlinear Control System Design, pp. 302–307, Pergamon Press, 1990
work page 1990
-
[56]
How and why to solve the operator equa- tion ax- xb= y,
R. Bhatia and P. Rosenthal, “How and why to solve the operator equa- tion ax- xb= y,”Bulletin of the London Mathematical Society, vol. 29, no. 1, pp. 1–21, 1997
work page 1997
-
[57]
Nonlinear dynamical systems and control,
W. M. Haddad and V. Chellaboina, “Nonlinear dynamical systems and control,” inNonlinear Dynamical Systems and Control, Princeton uni- versity press, 2011. A Proof of Lemma 4.1 We note that (27) is obtained by settingKt = K⋆ in (24) (which compactly collects the updates (19a), (19b), and (23)). Hence, we start by inspect- ing (19a) and (19b) restricted to ...
work page 2011
-
[58]
Let us arbitrarily choose ν1, ν2 ∈ (0, 1). Then, for all γ ∈ (0, ¯γa v) with ¯γa v := min n 1, ¯γ0, 2ν1 3β3 , 2ν2 1+β3β2 4 o , we further bound (C.8) as ∆V ( ˜Ka v t , ˜θa v t ) ≤ −γκν1 G( ˜Ka v t + K⋆, θ⋆) 2 + γκβ4 G( ˜Ka v t + K⋆, θ⋆) ˜θa v t − γν2 ˜θa v t 2 (a) = −γ G( ˜Ka v t +K⋆, θ⋆) ˜θa v t ⊤ U(κ) G( ˜Ka v t +K⋆, θ⋆) ˜θa v t ,(C.9) 7G...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.