Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations

Armin Gie{\ss}ler; Felix Th\"ommes; S\"oren Hohmann

arxiv: 2604.27922 · v1 · submitted 2026-04-30 · 🧮 math.OC · cs.SY· eess.SY

Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations

Armin Gie{\ss}ler , Felix Th\"ommes , S\"oren Hohmann This is my paper

Pith reviewed 2026-05-07 07:54 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords data-driven controlcontinuous-time LQRpolicy iterationreinforcement learningbehavioral theoryalgebraic Riccati equationmodel-free controloptimal control

0 comments

The pith

Two data-driven parameterizations solve the continuous-time LQR problem without needing the system model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper adapts the closed-loop parameterization from behavioral system theory to continuous time and pairs it with an integral reinforcement learning parameterization to solve the linear quadratic regulator using only collected data. It develops a policy iteration scheme, derives a data-driven continuous-time algebraic Riccati equation, and introduces convex problem formulations for both approaches. A unified treatment shows how the two parameterizations relate to each other and to prior methods, enabling model-free optimal control design when the system dynamics are unknown.

Core claim

By adapting the closed-loop parameterization to continuous time to characterize the closed-loop system via a matrix satisfying equality constraints on data, and by using off-policy data in the integral reinforcement learning parameterization for policy evaluation, the paper derives a policy iteration scheme, a data-driven continuous-time algebraic Riccati equation, policy gradient flows, and convex reformulations that solve the LQR problem without explicit model knowledge while clarifying structural relationships between the approaches.

What carries the argument

The closed-loop parameterization, a matrix satisfying data-derived equality constraints that characterizes the closed-loop system, and the integral reinforcement learning parameterization that uses off-policy data for policy evaluation and iteration.

If this is right

A policy iteration scheme for continuous-time LQR can be implemented using only input-output data.
A data-driven continuous-time algebraic Riccati equation can be solved to recover the optimal controller.
Convex problem formulations of the LQR become available through both the closed-loop and IRL parameterizations.
Policy gradient flows can be derived within the IRL framework for continuous-time systems.
The relationships between closed-loop and IRL approaches are clarified to enable systematic selection of methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The data-driven CARE offers a direct route to optimal feedback when system identification is costly or inaccurate.
The unified treatment of parameterizations may support hybrid algorithms that switch between closed-loop and IRL based on data characteristics.
Similar derivations could be tested on benchmark systems to compare numerical properties of the convex reformulations against standard Riccati solvers.

Load-bearing premise

The closed-loop parameterization from behavioral theory can be directly adapted to continuous time while preserving the required equality constraints on the data matrix, and that the collected off-policy data is sufficiently rich to allow policy evaluation and iteration without model knowledge.

What would settle it

On a known linear system with sufficiently rich data, solve the derived data-driven continuous-time algebraic Riccati equation and check whether the resulting feedback gain matches the model-based optimal solution; a mismatch would falsify the adaptation.

Figures

Figures reproduced from arXiv: 2604.27922 by Armin Gie{\ss}ler, Felix Th\"ommes, S\"oren Hohmann.

**Figure 1.** Figure 1: Normalized residuals ∥K(k)−K∗∥F ∥K(0)−K∗∥F and ∥K(t)−K∗∥F ∥K(0)−K∗∥F for the PI and policy gradient flow, respectively, under the CL and IRL parameterizations. Light colored lines correspond to the 100 individual runs, while the dark lines represent the average performance view at source ↗

**Figure 2.** Figure 2: Residuals ∥Pk−P ∗∥F ∥P ∗∥F , ∥P (t)−P ∗∥F ∥P ∗∥F for the VI and Riccati flow, respectively, under the CL and IRL parameterizations. identical, with minor differences at higher iterations caused by numerical inaccuracies. The normalized residuals of the policy gradient flows are depicted in the right panel of view at source ↗

read the original abstract

This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral reinforcement learning (IRL) parameterization. The CL parameterization characterizes the closed-loop system via a matrix that satisfies equality constraints. While this parameterization has been extensively studied for discrete-time systems, we adapt key results to the continuous-time setting and develop a policy iteration (PI) scheme, derive a data-driven continuous-time algebraic Riccati equation (CARE), and introduce an alternative convex problem formulation. The IRL parameterization utilizes off-policy data to perform policy evaluation, which is then used for PI or value iteration. Within the IRL framework, we derive a policy gradient flow and propose convex reformulations of the LQR problem. Finally, we provide a unified treatment of these parameterizations that enables a systematic understanding of existing approaches and clarifies their structural relationships.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts discrete-time closed-loop behavioral parameterization and IRL ideas to continuous-time LQR, producing a data-driven CARE and convex reformulations, but the continuous-time data constraints are the part that needs close checking.

read the letter

The main thing is that the authors take the closed-loop parameterization from behavioral theory, which gives exact linear constraints on data matrices in discrete time, and extend it to continuous-time LQR. They combine this with integral reinforcement learning to get a policy iteration scheme, a data-driven version of the continuous algebraic Riccati equation, convex problem formulations, and a policy gradient flow. They also lay out how these different parameterizations relate to each other in one framework.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to adapt the closed-loop parameterization from behavioral systems theory to the continuous-time LQR setting, yielding a policy iteration scheme, a data-driven continuous-time algebraic Riccati equation (CARE), and a convex reformulation; it further develops an integral reinforcement learning parameterization for off-policy policy evaluation, a policy gradient flow, and additional convex reformulations, while providing a unified treatment that relates the two parameterizations and existing approaches.

Significance. If the central derivations are rigorously supported, the work supplies a systematic, model-free framework for continuous-time LQR that bridges behavioral theory and reinforcement learning, clarifying structural relationships among parameterizations and enabling policy iteration without explicit system identification. The explicit construction of a data-driven CARE and the convex alternatives constitute concrete contributions that could facilitate practical data-driven control design.

major comments (2)

[§3] §3 (Closed-loop parameterization in continuous time): The adaptation of the discrete-time closed-loop behavioral parameterization is asserted to preserve the linear equality constraints on the data matrix for off-policy trajectories. However, the underlying dynamics are differential (Eq. (1)), so the data relations necessarily involve state and input derivatives. The manuscript does not appear to supply an exact integral reformulation or filtered-data construction that eliminates explicit differentiation while retaining the exact equality constraints for arbitrary persistently exciting inputs; without this, the subsequent policy iteration scheme and data-driven CARE lose their purely data-driven character.
[§4.2] §4.2 (Derivation of the data-driven CARE): The claim that the CARE can be expressed solely in terms of measured trajectory data rests on the validity of the adapted equality constraints from §3. If those constraints do not hold exactly (as the skeptic concern indicates is possible without additional filtering or integration steps), the data-driven CARE reduces to a model-based equation once derivatives are approximated or filtered, undermining the headline contribution.

minor comments (2)

[§2] Notation for the data matrices (e.g., the continuous-time analogs of the Hankel matrices) should be introduced with an explicit comparison table to the discrete-time case to aid readability.
[§6] The unified treatment section would benefit from a diagram or table summarizing the structural relationships among the CL, IRL, and existing parameterizations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify important technical aspects of adapting the closed-loop parameterization to continuous time. We address the two major comments point by point below. Where the manuscript requires additional detail to make the data-driven character fully rigorous, we have incorporated revisions.

read point-by-point responses

Referee: [§3] §3 (Closed-loop parameterization in continuous time): The adaptation of the discrete-time closed-loop behavioral parameterization is asserted to preserve the linear equality constraints on the data matrix for off-policy trajectories. However, the underlying dynamics are differential (Eq. (1)), so the data relations necessarily involve state and input derivatives. The manuscript does not appear to supply an exact integral reformulation or filtered-data construction that eliminates explicit differentiation while retaining the exact equality constraints for arbitrary persistently exciting inputs; without this, the subsequent policy iteration scheme and data-driven CARE lose their purely data-driven character.

Authors: We agree that an explicit integral reformulation is necessary to eliminate differentiation while preserving exact linear equality constraints. The original derivation in §3 starts from the differential closed-loop dynamics and obtains the data-matrix equality directly; this form is formally correct but requires state derivatives. In the revised manuscript we have added a new subsection (3.3) that supplies the missing integral reformulation. By integrating the closed-loop equation over finite intervals and invoking the persistent-excitation condition on the input, we obtain an equivalent set of linear constraints expressed solely in terms of integrated state and input trajectories. Proposition 3.2 proves the equivalence for any persistently exciting input, and a practical filtered-data implementation is also provided to handle measurement noise. These additions ensure the policy-iteration scheme and all subsequent results remain exactly data-driven. revision: yes
Referee: [§4.2] §4.2 (Derivation of the data-driven CARE): The claim that the CARE can be expressed solely in terms of measured trajectory data rests on the validity of the adapted equality constraints from §3. If those constraints do not hold exactly (as the skeptic concern indicates is possible without additional filtering or integration steps), the data-driven CARE reduces to a model-based equation once derivatives are approximated or filtered, undermining the headline contribution.

Authors: The data-driven CARE in §4.2 is obtained by substituting the closed-loop equality constraints into the standard CARE. With the integral reformulation now stated explicitly in the revised §3, the substitution yields a CARE whose only unknowns are the integrated data matrices; no system matrices or derivative approximations appear. We have updated the proof of Theorem 4.1 to reference the integral constraints directly and have added a remark clarifying that the equation remains exact for any persistently exciting trajectory. Consequently the headline claim is preserved. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations adapt external behavioral and IRL results to continuous time without reduction to fitted inputs or self-citation chains

full rationale

The paper adapts closed-loop parameterization from behavioral theory and IRL parameterization to continuous-time LQR, deriving a data-driven CARE via policy iteration and convex reformulations from off-policy data. These steps are presented as direct applications of standard results to measured trajectories, with no evidence that the CARE or gradient flow equations reduce by construction to quantities already fitted from the same data matrix. The unified treatment clarifies structural relationships but does not rely on self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations for uniqueness theorems. The central claims remain self-contained against external benchmarks from discrete-time literature and standard RL theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard LQR assumptions (linear time-invariant dynamics, quadratic cost, stabilizability) and on prior results from behavioral systems theory and IRL that are adapted rather than re-derived. No new physical entities or ad-hoc constants are introduced in the abstract.

axioms (2)

domain assumption The underlying system is linear time-invariant and the cost is quadratic in state and input.
This is the definition of the continuous-time LQR problem stated in the abstract.
domain assumption Sufficiently rich off-policy data exists to identify the closed-loop behavior or to perform policy evaluation.
Required for both the CL parameterization and the IRL policy-evaluation step described in the abstract.

pith-pipeline@v0.9.0 · 5476 in / 1505 out tokens · 63726 ms · 2026-05-07T07:54:43.707659+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 3 canonical work pages · 1 internal anchor

[1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2018

2018
[2]

A Tour of Reinforcement Learning: The View from Con- tinuous Control,

B. Recht, “A Tour of Reinforcement Learning: The View from Con- tinuous Control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019

2019
[3]

Vrabie, K

D. Vrabie, K. Vamvoudakis, and F. L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London, UK: The Institution of Engineering and Technology, 2013

2013
[4]

Reinforce- ment learning for control: Performance, stability, and deep approxima- tors,

L. Bus ¸oniu, T. De Bruin, D. Toli´c, J. Kober, and I. Palunko, “Reinforce- ment learning for control: Performance, stability, and deep approxima- tors,” Annual Reviews in Control , vol. 46, pp. 8–28, 2018

2018
[5]

Data-driven control based on the behavioral approach: From theory to applications in power systems,

I. Markovsky, L. Huang, and F. D ¨orfler, “Data-driven control based on the behavioral approach: From theory to applications in power systems,” IEEE Control Systems Magazine , vol. 43, no. 5, pp. 28–68, 2023

2023
[6]

Behavioral systems theory in data-driven analysis, signal processing, and control,

I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,” Annual Reviews in Control , vol. 52, pp. 42–64, 2021

2021
[7]

H. J. van Waarde, M. K. Camlibel, and H. L. Trentelman, Data-Based Linear Systems and Control Theory . Kindle Direct Publishing, 2025

2025
[8]

Formulas for data-driven control: Stabilization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control , vol. 65, no. 3, pp. 909–924, 2020

2020
[9]

Data-Enabled Policy Optimization for the Linear Quadratic Regulator,

F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023, pp. 6160–6165

2023
[10]

Data-based control of continuous- time linear systems with performance specifications,

V . G. Lopez and M. A. M ¨uller, “Data-based control of continuous- time linear systems with performance specifications,” 2025. [Online]. Available: https://arxiv.org/abs/2403.00424

work page internal anchor Pith review arXiv 2025
[11]

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,

——, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023, pp. 13–19

2023
[12]

On the Certainty-Equivalence Approach to Direct Data-Driven LQR Design,

F. D ¨orfler, P. Tesi, and C. De Persis, “On the Certainty-Equivalence Approach to Direct Data-Driven LQR Design,” IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023

2023
[13]

On the Role of Regularization in Direct Data-Driven LQR Control,

F. Dorfler, P. Tesi, and C. De Persis, “On the Role of Regularization in Direct Data-Driven LQR Control,” in 2022 IEEE 61st Conference on Decision and Control (CDC) , ser. Proceedings of the IEEE Conference on Decision and Control. Institute of Electrical and Electronics Engineers Inc., 2022, pp. 1091–1098

2022
[14]

Low-complexity learning of Linear Quadratic Regulators from noisy data,

C. De Persis and P. Tesi, “Low-complexity learning of Linear Quadratic Regulators from noisy data,” Automatica, vol. 128, p. 109548, 2021

2021
[15]

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters, vol. 9, pp. 961–966, 2025

2025
[16]

Data-Enabled Policy Opti- mization for Direct Adaptive Learning of the LQR,

F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-Enabled Policy Opti- mization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control , pp. 1–16, 2025

2025
[17]

Adaptive optimal control for continuous-time linear systems based on policy iteration,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009

2009
[18]

Model- free approximate dynamic programming schemes for linear systems,

A. Al-Tamimi, D. Vrabie, M. Abu-Khalaf, and F. L. Lewis, “Model- free approximate dynamic programming schemes for linear systems,” in 2007 International Joint Conference on Neural Networks , 2007, pp. 371–378

2007
[19]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

2012
[20]

Hoboken, New Jersey: John Wiley & Sons, Inc., 2017

——, Robust Adaptive Dynamic Programming. Hoboken, New Jersey: John Wiley & Sons, Inc., 2017

2017
[21]

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,

J. Y . Lee, J. B. Park, and Y . H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012

2012
[22]

Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,

K. G. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems & Control Letters, vol. 100, pp. 14–20, 2017

2017
[23]

Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm,

C. Possieri and M. Sassano, “Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems , vol. 52, no. 10, pp. 6487–6497, 2022

2022
[24]

Value Iteration for Continuous-Time Linear Time-Invariant Sys- tems,

——, “Value Iteration for Continuous-Time Linear Time-Invariant Sys- tems,” IEEE Transactions on Automatic Control , vol. 68, no. 5, pp. 3070–3077, 2023

2023
[25]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016. 16 IEEE TRANSACTIONS AND JOURNALS TEMPLATE

2016
[26]

All Data-Driven LQR Algorithms Require at Least as Much Interval Data as System Identification,

C. Song and J. Liu, “All Data-Driven LQR Algorithms Require at Least as Much Interval Data as System Identification,” IEEE Control Systems Letters, vol. 9, pp. 1778–1783, 2025

2025
[27]

Data Informativity: A New Perspective on Data-Driven Analysis and Control,

H. J. van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data Informativity: A New Perspective on Data-Driven Analysis and Control,” IEEE Transactions on Automatic Control , vol. 65, no. 11, pp. 4753–4768, 2020

2020
[28]

From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,

H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,” IEEE Trans. Autom. Control , vol. 67, no. 1, pp. 162–175, 2022

2022
[29]

When sampling works in data-driven control: Informativity for stabilization in continuous time,

J. Eising and J. Cort ´es, “When sampling works in data-driven control: Informativity for stabilization in continuous time,” IEEE Transactions on Automatic Control , vol. 70, no. 1, pp. 565–572, 2025

2025
[30]

Numerical Meth- ods for H2 Related Problems,

E. Feron, V . Balakrishnan, S. Boyd, and L. El Ghaoui, “Numerical Meth- ods for H2 Related Problems,” in 1992 American Control Conference , 1992, pp. 2921–2922

1992
[31]

A matrix finsler’s lemma with applications to data-driven control,

H. J. van Waarde and M. Kanat Camlibel, “A matrix finsler’s lemma with applications to data-driven control,” in 2021 60th IEEE Conference on Decision and Control (CDC) , 2021, pp. 5777–5782

2021
[32]

Robust data- driven state-feedback design,

J. Berberich, A. Koch, C. W. Scherer, and F. Allg ¨ower, “Robust data- driven state-feedback design,” in 2020 American Control Conference (ACC), 2020, pp. 1532–1538

2020
[33]

Data-driven quadratic stabilization and LQR control of LTI systems,

T. Dai and M. Sznaier, “Data-driven quadratic stabilization and LQR control of LTI systems,” Automatica, vol. 153, p. 111041, 2023

2023
[34]

Orthogonal poly- nomial bases for data-driven analysis and control of continuous-time systems,

P. Rapisarda, H. J. van Waarde, and M. C ¸ amlibel, “Orthogonal poly- nomial bases for data-driven analysis and control of continuous-time systems,” IEEE Transactions on Automatic Control , vol. 69, no. 7, pp. 4307–4319, 2024

2024
[35]

Data-Driven LQR Control Design,

G. R. Gonc ¸alves da Silva, A. S. Bazanella, C. Lorenzini, and L. Campestrini, “Data-Driven LQR Control Design,” IEEE Control Systems Letters, vol. 3, no. 1, pp. 180–185, 2019

2019
[36]

Anderson and J

B. Anderson and J. Moore, Optimal Control: Linear Quadratic Methods, ser. Dover Books on Engineering. Dover Publications, 2007

2007
[37]

A Review of the Matrix Riccati Equation,

V . Ku ˇcera, “A Review of the Matrix Riccati Equation,” Kybernetika, vol. 9, no. 1, pp. 42–61, 1973

1973
[38]

Connections Between Duality in Control Theory and Convex Optimization,

V . Balakrishnan and L. Vandenberghe, “Connections Between Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, vol. 6, 1995, pp. 4030–4034

1995
[39]

S. Boyd, L. El Ghaoui, E. Feron, and V . Balakrishnan, Linear Matrix Inequalities in System and Control Theory . Society for Industrial and Applied Mathematics, 1994

1994
[40]

Analysis and synthesis of robust control systems via parameter-dependent lyapunov functions,

E. Feron, P. Apkarian, and P. Gahinet, “Analysis and synthesis of robust control systems via parameter-dependent lyapunov functions,” IEEE Trans. Autom. Control, vol. 41, no. 7, pp. 1041–1046, 1996

1996
[41]

Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control,

J. Bu, A. Mesbahi, and M. Mesbahi, “Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control,” 2020. [Online]. Available: https://arxiv.org/abs/2006.09178

work page arXiv 2020
[42]

On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case

——, “On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case,” 2019. [Online]. Available: https://arxiv.org/abs/1904.02737

work page Pith review arXiv 2019
[43]

On an iterative technique for Riccati equation computa- tions,

D. Kleinman, “On an iterative technique for Riccati equation computa- tions,” IEEE Trans. Autom. Control , vol. 13, no. 1, pp. 114–115, 1968

1968
[44]

Solution of the Matrix Equation AX + XB = C,

R. H. Bartels and G. W. Stewart, “Solution of the Matrix Equation AX + XB = C,” Communications of the ACM , vol. 15, no. 9, pp. 820–826, 1972

1972
[45]

A Hessenberg-Schur method for the problem AX + XB= C,

G. Golub, S. Nash, and C. Van Loan, “A Hessenberg-Schur method for the problem AX + XB= C,” IEEE Transactions on Automatic Control , vol. 24, no. 6, pp. 909–913, 1979

1979
[46]

Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization,

Y . Sun and M. Fazel, “Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization,” in 2021 60th IEEE Conference on Decision and Control (CDC) , 2021, pp. 4576–4581

2021
[47]

R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012

2012

[1] [1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2018

2018

[2] [2]

A Tour of Reinforcement Learning: The View from Con- tinuous Control,

B. Recht, “A Tour of Reinforcement Learning: The View from Con- tinuous Control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019

2019

[3] [3]

Vrabie, K

D. Vrabie, K. Vamvoudakis, and F. L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London, UK: The Institution of Engineering and Technology, 2013

2013

[4] [4]

Reinforce- ment learning for control: Performance, stability, and deep approxima- tors,

L. Bus ¸oniu, T. De Bruin, D. Toli´c, J. Kober, and I. Palunko, “Reinforce- ment learning for control: Performance, stability, and deep approxima- tors,” Annual Reviews in Control , vol. 46, pp. 8–28, 2018

2018

[5] [5]

Data-driven control based on the behavioral approach: From theory to applications in power systems,

I. Markovsky, L. Huang, and F. D ¨orfler, “Data-driven control based on the behavioral approach: From theory to applications in power systems,” IEEE Control Systems Magazine , vol. 43, no. 5, pp. 28–68, 2023

2023

[6] [6]

Behavioral systems theory in data-driven analysis, signal processing, and control,

I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,” Annual Reviews in Control , vol. 52, pp. 42–64, 2021

2021

[7] [7]

H. J. van Waarde, M. K. Camlibel, and H. L. Trentelman, Data-Based Linear Systems and Control Theory . Kindle Direct Publishing, 2025

2025

[8] [8]

Formulas for data-driven control: Stabilization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control , vol. 65, no. 3, pp. 909–924, 2020

2020

[9] [9]

Data-Enabled Policy Optimization for the Linear Quadratic Regulator,

F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023, pp. 6160–6165

2023

[10] [10]

Data-based control of continuous- time linear systems with performance specifications,

V . G. Lopez and M. A. M ¨uller, “Data-based control of continuous- time linear systems with performance specifications,” 2025. [Online]. Available: https://arxiv.org/abs/2403.00424

work page internal anchor Pith review arXiv 2025

[11] [11]

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,

——, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023, pp. 13–19

2023

[12] [12]

On the Certainty-Equivalence Approach to Direct Data-Driven LQR Design,

F. D ¨orfler, P. Tesi, and C. De Persis, “On the Certainty-Equivalence Approach to Direct Data-Driven LQR Design,” IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023

2023

[13] [13]

On the Role of Regularization in Direct Data-Driven LQR Control,

F. Dorfler, P. Tesi, and C. De Persis, “On the Role of Regularization in Direct Data-Driven LQR Control,” in 2022 IEEE 61st Conference on Decision and Control (CDC) , ser. Proceedings of the IEEE Conference on Decision and Control. Institute of Electrical and Electronics Engineers Inc., 2022, pp. 1091–1098

2022

[14] [14]

Low-complexity learning of Linear Quadratic Regulators from noisy data,

C. De Persis and P. Tesi, “Low-complexity learning of Linear Quadratic Regulators from noisy data,” Automatica, vol. 128, p. 109548, 2021

2021

[15] [15]

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters, vol. 9, pp. 961–966, 2025

2025

[16] [16]

Data-Enabled Policy Opti- mization for Direct Adaptive Learning of the LQR,

F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-Enabled Policy Opti- mization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control , pp. 1–16, 2025

2025

[17] [17]

Adaptive optimal control for continuous-time linear systems based on policy iteration,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009

2009

[18] [18]

Model- free approximate dynamic programming schemes for linear systems,

A. Al-Tamimi, D. Vrabie, M. Abu-Khalaf, and F. L. Lewis, “Model- free approximate dynamic programming schemes for linear systems,” in 2007 International Joint Conference on Neural Networks , 2007, pp. 371–378

2007

[19] [19]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

2012

[20] [20]

Hoboken, New Jersey: John Wiley & Sons, Inc., 2017

——, Robust Adaptive Dynamic Programming. Hoboken, New Jersey: John Wiley & Sons, Inc., 2017

2017

[21] [21]

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,

J. Y . Lee, J. B. Park, and Y . H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012

2012

[22] [22]

Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,

K. G. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems & Control Letters, vol. 100, pp. 14–20, 2017

2017

[23] [23]

Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm,

C. Possieri and M. Sassano, “Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems , vol. 52, no. 10, pp. 6487–6497, 2022

2022

[24] [24]

Value Iteration for Continuous-Time Linear Time-Invariant Sys- tems,

——, “Value Iteration for Continuous-Time Linear Time-Invariant Sys- tems,” IEEE Transactions on Automatic Control , vol. 68, no. 5, pp. 3070–3077, 2023

2023

[25] [25]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016. 16 IEEE TRANSACTIONS AND JOURNALS TEMPLATE

2016

[26] [26]

All Data-Driven LQR Algorithms Require at Least as Much Interval Data as System Identification,

C. Song and J. Liu, “All Data-Driven LQR Algorithms Require at Least as Much Interval Data as System Identification,” IEEE Control Systems Letters, vol. 9, pp. 1778–1783, 2025

2025

[27] [27]

Data Informativity: A New Perspective on Data-Driven Analysis and Control,

H. J. van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data Informativity: A New Perspective on Data-Driven Analysis and Control,” IEEE Transactions on Automatic Control , vol. 65, no. 11, pp. 4753–4768, 2020

2020

[28] [28]

From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,

H. J. van Waarde, M. K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: Nonconservative design via a matrix s-lemma,” IEEE Trans. Autom. Control , vol. 67, no. 1, pp. 162–175, 2022

2022

[29] [29]

When sampling works in data-driven control: Informativity for stabilization in continuous time,

J. Eising and J. Cort ´es, “When sampling works in data-driven control: Informativity for stabilization in continuous time,” IEEE Transactions on Automatic Control , vol. 70, no. 1, pp. 565–572, 2025

2025

[30] [30]

Numerical Meth- ods for H2 Related Problems,

E. Feron, V . Balakrishnan, S. Boyd, and L. El Ghaoui, “Numerical Meth- ods for H2 Related Problems,” in 1992 American Control Conference , 1992, pp. 2921–2922

1992

[31] [31]

A matrix finsler’s lemma with applications to data-driven control,

H. J. van Waarde and M. Kanat Camlibel, “A matrix finsler’s lemma with applications to data-driven control,” in 2021 60th IEEE Conference on Decision and Control (CDC) , 2021, pp. 5777–5782

2021

[32] [32]

Robust data- driven state-feedback design,

J. Berberich, A. Koch, C. W. Scherer, and F. Allg ¨ower, “Robust data- driven state-feedback design,” in 2020 American Control Conference (ACC), 2020, pp. 1532–1538

2020

[33] [33]

Data-driven quadratic stabilization and LQR control of LTI systems,

T. Dai and M. Sznaier, “Data-driven quadratic stabilization and LQR control of LTI systems,” Automatica, vol. 153, p. 111041, 2023

2023

[34] [34]

Orthogonal poly- nomial bases for data-driven analysis and control of continuous-time systems,

P. Rapisarda, H. J. van Waarde, and M. C ¸ amlibel, “Orthogonal poly- nomial bases for data-driven analysis and control of continuous-time systems,” IEEE Transactions on Automatic Control , vol. 69, no. 7, pp. 4307–4319, 2024

2024

[35] [35]

Data-Driven LQR Control Design,

G. R. Gonc ¸alves da Silva, A. S. Bazanella, C. Lorenzini, and L. Campestrini, “Data-Driven LQR Control Design,” IEEE Control Systems Letters, vol. 3, no. 1, pp. 180–185, 2019

2019

[36] [36]

Anderson and J

B. Anderson and J. Moore, Optimal Control: Linear Quadratic Methods, ser. Dover Books on Engineering. Dover Publications, 2007

2007

[37] [37]

A Review of the Matrix Riccati Equation,

V . Ku ˇcera, “A Review of the Matrix Riccati Equation,” Kybernetika, vol. 9, no. 1, pp. 42–61, 1973

1973

[38] [38]

Connections Between Duality in Control Theory and Convex Optimization,

V . Balakrishnan and L. Vandenberghe, “Connections Between Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, vol. 6, 1995, pp. 4030–4034

1995

[39] [39]

S. Boyd, L. El Ghaoui, E. Feron, and V . Balakrishnan, Linear Matrix Inequalities in System and Control Theory . Society for Industrial and Applied Mathematics, 1994

1994

[40] [40]

Analysis and synthesis of robust control systems via parameter-dependent lyapunov functions,

E. Feron, P. Apkarian, and P. Gahinet, “Analysis and synthesis of robust control systems via parameter-dependent lyapunov functions,” IEEE Trans. Autom. Control, vol. 41, no. 7, pp. 1041–1046, 1996

1996

[41] [41]

Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control,

J. Bu, A. Mesbahi, and M. Mesbahi, “Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control,” 2020. [Online]. Available: https://arxiv.org/abs/2006.09178

work page arXiv 2020

[42] [42]

On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case

——, “On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case,” 2019. [Online]. Available: https://arxiv.org/abs/1904.02737

work page Pith review arXiv 2019

[43] [43]

On an iterative technique for Riccati equation computa- tions,

D. Kleinman, “On an iterative technique for Riccati equation computa- tions,” IEEE Trans. Autom. Control , vol. 13, no. 1, pp. 114–115, 1968

1968

[44] [44]

Solution of the Matrix Equation AX + XB = C,

R. H. Bartels and G. W. Stewart, “Solution of the Matrix Equation AX + XB = C,” Communications of the ACM , vol. 15, no. 9, pp. 820–826, 1972

1972

[45] [45]

A Hessenberg-Schur method for the problem AX + XB= C,

G. Golub, S. Nash, and C. Van Loan, “A Hessenberg-Schur method for the problem AX + XB= C,” IEEE Transactions on Automatic Control , vol. 24, no. 6, pp. 909–913, 1979

1979

[46] [46]

Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization,

Y . Sun and M. Fazel, “Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization,” in 2021 60th IEEE Conference on Decision and Control (CDC) , 2021, pp. 4576–4581

2021

[47] [47]

R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012

2012